-``SENSORS`` - List of sensors to include in the pipeline that have to match existent tables in your AWARE_ database. See SENSORS_ variable in ``config`` file.
-``FITBIT_TABLE`` - The name of table in your database that contains Fitbit data. Its ``fitbit_data`` field should contain the data coming from the Fitbit API in JSON format.
-``PID`` - The list of participant ids to be included in the analysis. These should match the names of the files created in the ``data/external`` directory (:ref:`see more details<db-configuration>`).
-``DAY_SEGMENTS`` - The list of day epochs that metrics can be segmented into: ``daily``, ``morning`` (6am-12pm), ``afternnon`` (12pm-6pm), ``evening`` (6pm-12am) and ``night`` (12am-6am). This list can be modified globally or on a per sensor basis. See DAY_SEGMENTS_ in ``config`` file.
-``TIMEZONE`` - The time zone where data was collected. Use the timezone names from this `List of Timezones`_. Double check your chosen name is correct, for example US Eastern Time is named New America/New_York, not EST.
-``DATABASE_GROUP`` - The name of your database credentials group, it should match the one in ``.env`` (:ref:`see the datbase configuration<db-configuration>`).
Contains three attributes: ``BIN_SIZE``, ``MIN_VALID_HOURS``, ``MIN_BINS_PER_HOUR``.
On any given day, Aware could have sensed data only for a few minutes or for 24 hours. Daily estimates of metrics should be considered more reliable the more hours Aware was running and logging data (for example, 10 calls logged on a day when only one hour of data was recorded is a less reliable measurement compared to 10 calls on a day when 23 hours of data were recorded.
Therefore, we define a valid hour as those that contain at least a certain number of valid bins. In turn, a valid bin are those that contain at least one row of data from any sensor logged within that period. We divide an hour into N bins of size ``BIN_SIZE`` (in minutes) and we mark an hour as valid if contains at least ``MIN_BINS_PER_HOUR`` of valid bins (out of the total possible number of bins that can be captured in an hour i.e. out of 60min/``BIN_SIZE`` bins). Days with valid sensed hours less than ``MIN_VALID_HOURS`` will be excluded form the output of this file. See PHONE_VALID_SENSED_DAYS_ in ``config.yaml``.
In RAPIDS, you will find that we use ``phone_sensed_bins`` (a list of all valid and invalid bins of all monitored days) to improve the estimation of metrics that are ratios over time periods like ``episodepersensedminutes`` of :ref:`Screen<screen-sensor-doc>`.
sms_type The particular ``sms_type`` that will be analyzed. The options for this parameter are ``received`` or ``sent``.
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
metrics The different measures that can be retrieved from the dataset. These metrics are available for both ``sent`` and ``received`` SMS messages. See :ref:`Available SMS Metrics <sms-available-metrics>` Table below
count SMS A count of the number of times that particular ``sms_type`` occurred for a particular ``day_segment``.
distinctcontacts contacts A count of distinct contacts that were communicated for a particular ``sms_type`` for a particular ``day_segment``.
timefirstsms minutes The time in minutes from 12:00am (Midnight) that the first of a particular ``sms_type`` occurred.
timelastsms minutes The time in minutes from 12:00am (Midnight) that the last of a particular ``sms_type`` occurred.
countmostfrequentcontact SMS The count of the number of sms messages of a particular``sms_type`` for the most contacted contact for a particular ``day_segment``.
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
metrics The different measures that can be retrieved from the calls dataset. Note that the same metrics are available for both ``incoming`` and ``outgoing`` calls, while ``missed`` calls has its own set of metrics. See :ref:`Available Incoming and Outgoing Call Metrics <available-in-and-out-call-metrics>` Table and :ref:`Available Missed Call Metrics <available-missed-call-metrics>` Table below.
hubermduration The generalized Huber M-estimator of location of the MAD for the durations of all the calls for a particular ``call_type`` and ``day_segment``.
varqnduration The Location-Free Scale Estimator Qn of the durations of all the calls for a particular ``call_type`` and ``day_segment``.
entropyduration The estimate of the Shannon entropy H of the durations of all the calls for a particular ``call_type`` and ``day_segment``.
timefirstcall minutes The time in minutes from 12:00am (Midnight) that the first of ``call_type`` occurred.
timelastcall minutes The time in minutes from 12:00am (Midnight) that the last of ``call_type`` occurred.
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
metrics The different measures that can be retrieved from the Bluetooth dataset. See :ref:`Available Bluetooth Metrics <bluetooth-available-metrics>` Table below
-**Rule:**``rules/preprocessing.snakefile/download_dataset`` - See the download_dataset_ rule.
-**Script:**``src/data/download_dataset.R`` - See the download_dataset.R_ script.
-**Rule:**``rules/preprocessing.snakefile/readable_datetime`` - See the readable_datetime_ rule.
-**Script:**``src/data/readable_datetime.R`` - See the readable_datetime.R_ script.
-**Rule:**``rules/features.snakefile/accelerometer_metrics`` - See the accelerometer_metrics_ rule.
-**Script:**``src/features/accelerometer_metrics.py`` - See the accelerometer_metrics.py_ script.
.._Accelerometer-parameters:
**Accelerometer Rule Parameters:**
============ ===================
Name Description
============ ===================
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
metrics The different measures that can be retrieved from the dataset. See :ref:`Available Accelerometer Metrics <accelerometer-available-metrics>` Table below
============ ===================
.._accelerometer-available-metrics:
**Available Accelerometer Metrics**
The following table shows a list of the available metrics the accelerometer sensor data for a particular ``day_segment``.
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
single_categories A single category of apps that will be included for the data collection. The available categories can be defined in the ``APPLICATION_GENRES`` in the ``config`` file. See :ref:`Assumtions and Observations <applications-foreground-observations>`.
multiple_categories Categories of apps that will be included for the data collection. The available categories can be defined in the ``APPLICATION_GENRES`` in the ``config`` file. See :ref:`Assumtions and Observations <applications-foreground-observations>`.
single_apps Any Android app can be included in the list of apps used to collect data by adding the package name to this list. (E.g. Youtube)
excluded_categories Categories of apps that will be excluded for the data collection. The available categories can be defined in the ``APPLICATION_GENRES`` in the ``config`` file. See :ref:`Assumtions and Observations <applications-foreground-observations>`.
excluded_apps Any Android app can be excluded from the list of apps used to collect data by adding the package name to this list.
metrics The different measures that can be retrieved from the dataset. See :ref:`Available Applications Foreground Metrics <applications-foreground-available-metrics>` Table below
count apps A count number of times using ``all_apps``, ``single_app``, ``single_category`` apps or ``multiple_category`` apps.
timeoffirstuse contacts The time in minutes from 12:00am (Midnight) to first use of any app (i.e. ``all_apps``), ``single_app``, ``single_category`` apps or ``multiple_category`` apps.
timeoflastuse minutes The time in minutes from 12:00am (Midnight) to the last of use of any app (i.e. ``all_apps``), ``single_app``, ``single_category`` apps or ``multiple_category`` apps.
frequencyentropy shannons The entropy of the apps frequency for ``all_apps``, ``single_category`` apps or ``multiple_category`` apps. There is no entropy for ``single_app`` apos.
================== ========= =============
.._applications-foreground-observations:
**Assumptions/Observations:**
The ``APPLICATION_GENRES`` configuration (See `Application Genres Config`_ setting defines that catalogue of categories of apps that available for the pipeline. The ``CATALOGUE_SOURCE`` defines the source of the catalogue which can be ``FILE`` i.e. a custom file like the file provided with this project (See `Custom Catalogue File`_) or ``GOOGLE`` which is category classifications provided by Google. The ``CATALOGUE_FILE`` variable defines the path to the location of the custom file that contains the custom app catalogue. If ``CATALOGUE_SOURCE`` is equal to ``FILE``, the ``UPDATE_CATALOGUE_FILE`` variable specifies (``TRUE`` or ``FALSE``) whether or not to update ``CATALOGUE_FILE``, if ``CATALOGUE_SOURCE`` is equal to ``GOOGLE`` all scraped genres will be saved to ``CATALOGUE_FILE``. The ``SCRAPE_MISSING_GENRES`` is a ``TRUE`` or ``FALSE`` variable that specifies whether or not to scrape missing genres, only effective if ``CATALOGUE_SOURCE`` is equal to ``FILE``. If ``CATALOGUE_SOURCE`` is equal to ``GOOGLE``, all genres are scraped anyway. It should be noted that the ``top1global`` option finds and uses the most used app for that participant for the study.
-**Rule:**``rules/preprocessing.snakefile/readable_datetime`` - See the readable_datetime_ rule.
-**Script:**``src/data/readable_datetime.R`` - See the readable_datetime.R_ script.
-**Rule:**``rules/features.snakefile/battery_deltas`` - See the battery_deltas_ rule.
-**Script:**``src/features/battery_deltas.R`` - See the battery_deltas.R_ script.
-**Rule:**``rules/features.snakefile/battery_metrics`` - See the battery_metrics_ rule
-**Script:**``src/features/battery_metrics.py`` - See the battery_metrics.py_ script.
.._battery-parameters:
**Battery Rule Parameters:**
============ ===================
Name Description
============ ===================
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
metrics The different measures that can be retrieved from the Battery dataset. See :ref:`Available Battery Metrics <battery-available-metrics>` Table below
============ ===================
.._battery-available-metrics:
**Available Battery Metrics**
The following table shows a list of the available metrics for Battery data.
.. - Download raw Google Activity Recognition dataset: ``expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["SENSORS"]),``
.. - Apply readable dateime to Google Activity Recognition dataset: ``expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["SENSORS"]),``
.. - Extract the deltas in Google Activity Recognition dataset: ``expand("data/processed/{pid}/plugin_google_activity_recognition_deltas.csv", pid=config["PIDS"]),``
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
metrics The different measures that can be retrieved from the Google Activity Recognition dataset. See :ref:`Available Google Activity Recognition Metrics <google-activity-recognition-available-metrics>` Table below
-**Rule:**``rules/preprocessing.snakefile/download_dataset`` - See the download_dataset_ rule.
-**Script:**``src/data/download_dataset.R`` - See the download_dataset.R_ script.
-**Rule:**``rules/preprocessing.snakefile/readable_datetime`` - See the readable_datetime_ rule.
-**Script:**``src/data/readable_datetime.R`` - See the readable_datetime.R_ script.
-**Rule:**``rules/features.snakefile/light_metrics`` - See the light_metrics_ rule.
-**Script:**``src/features/light_metrics.py`` - See the light_metrics.py_ script.
.._light-parameters:
**Light Rule Parameters:**
============ ===================
Name Description
============ ===================
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
metrics The different measures that can be retrieved from the Light dataset. See :ref:`Available Light Metrics <light-available-metrics>` Table below
-**Rule:**``rules/preprocessing.snakefile/phone_sensed_bins`` - See the phone_sensed_bins_ rule.
-**Script:**``src/data/phone_sensed_bins.R`` - See the phone_sensed_bins.R_ script.
-**Rule:**``rules/preprocessing.snakefile/resample_fused_location`` - See the resample_fused_location_ rule.
-**Script:**``src/data/resample_fused_location.R`` - See the resample_fused_location.R_ script.
-**Rule:**``rules/features.snakefile/location_barnett_metrics`` - See the location_barnett_metrics_ rule.
-**Script:**``src/features/location_barnett_metrics.R`` - See the location_barnett_metrics.R_ script.
.._location-parameters:
**Location Rule Parameters:**
================= ===================
Name Description
================= ===================
location_to_use The specifies which of the location data will be use in the analysis. Possible options are ``ALL``, ``ALL_EXCEPT_FUSED`` OR ``RESAMPLE_FUSED``
accuracy_limit This is in meters. The sensor drops location coordinates with an accuracy higher than this. This number means there's a 68% probability the true location is within this radius specified.
timezone The timezone used to calculate location.
metrics The different measures that can be retrieved from the Location dataset. See :ref:`Available Location Metrics <location-available-metrics>` Table below
================= ===================
.._location-available-metrics:
**Available Location Metrics**
The following table shows a list of the available metrics for Location dataset.
================ ========= =============
Name Units Description
================ ========= =============
hometime minutes Time at home. Time spent at home in minutes. Home is the most visited significant location between 8 pm and 8 am including any pauses within a 200-meter radius.
disttravelled meters Distance travelled. This is total distance travelled over a day.
rog meters The Radius of Gyration (RoG). It is a measure in meters of the area covered by a person over a day. A centroid is calculated for all the places (pauses) visited during a day and a weighted distance between all the places and the centroid is computed. The weights are proportional to the time spent in each place.
maxdiam meters The Maximum diameter. The largest distance in meters between any two pauses.
maxhomedist meters Max home distance. The maximum distance from home in meters.
siglocsvisited locations Significant locations. The number of significant locations visited during the day. Significant locations are computed using k-means clustering over pauses found in the whole monitoring period. The number of clusters is found iterating from 1 to 200 stopping until the centroids of two significant locations are within 400 meters of one another.
avgflightlen meters Avg flight length. Mean length of all flights
stdflightlen meters Std flight length. The standard deviation of the length of all flights.
avgflightdur meters Avg flight duration. Mean duration of all flights.
stdflightdur meters Std flight duration. The standard deviation of the duration of all flights.
probpause Pause probability. The fraction of a day spent in a pause (as opposed to a flight)
siglocentropy Significant location entropy. Entropy measurement based on the proportion of time spent at each significant location visited during a day.
minsmissing
circdnrtn Circadian routine. A continuous metric that can take any value between 0 and 1, where 0 represents a daily routine completely different from any other sensed days and 1 a routine the same as every other sensed day.
wkenddayrtn Weekend circadian routine. Same as Circadian routine but computed separately for weekends and weekdays.
================ ========= =============
**Assumptions/Observations:**
*Significant Locations Identified*
(i.e. The clustering method used)
Significant locations are determined using K-means clustering on locations that a patient visit over the course of the period of data collection. By setting K=K+1 and repeat clustering until two significant locations are within 100 meters of one another, the results from the previous step (K-1) can be used as the total number of significant locations. See `Beiwe Summary Statistics`_.
*Definition of Stationarity*
(i.e., The length of time a person have to be not moving to qualify)
This is based on a Pause-Flight model, The parameters used is a minimum pause duration of 300sec and a minimum pause distance of 60m. See the `Pause-Flight Model`_.
*The Circadian Calculation*
For a detailed description of how this measure is calculated, see Canzian and Musolesi's 2015 paper in the Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, titled "Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis." Their procedure was followed using 30-min increments as a bin size. See `Beiwe Summary Statistics`_.
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
metrics_events The different measures that can be retrieved from the events in the Screen dataset. See :ref:`Available Screen Events Metrics <screen-events-available-metrics>` Table below
metrics_deltas The different measures that can be retrieved from the episodes extracted from the Screen dataset. See :ref:`Available Screen Episodes Metrics <screen-episodes-available-metrics>` Table below
The following table shows a list of the available metrics for Screen Events.
================= ============== =============
Name Units Description
================= ============== =============
counton `ON` events Count on: A count of screen `ON` events (only available for Android)
countunlock Unlock events Count unlock: A count of screen unlock events.
unlocksperminute Unlock events Unlock events per minute: The average of the number of unlock events that occur in a minute
================= ============== =============
.._screen-episodes-available-metrics:
**Available Screen Episodes Metrics**
The following table shows a list of the available metrics for Screen Episodes.
============= ========= =============
Name Units Description
============= ========= =============
sumduration seconds Sum duration unlock: The sum duration of unlock episodes
maxduration seconds Max duration unlock: The maximum duration of unlock episodes
minduration seconds Min duration unlock: The minimum duration of unlock episodes
avgduration seconds Average duration unlock: The average duration of unlock episodes
stdduration seconds Std duration unlock: The standard deviation of the duration of unlock episodes
============= ========= =============
**Assumptions/Observations:**
An ``unlock`` episode is considered as the time between an ``unlock`` event and a ``lock`` event. iOS recorded these episodes reliable (albeit some duplicated ``lock`` events within milliseconds from each other). However, in Android there are some events unrelated to the screen state because of multiple consecutive ``unlock``/``lock`` events, so we keep the closest pair. In the experiments these are less than 10% of the screen events collected. This happens because ``ACTION_SCREEN_OFF`` and ``ON`` are "sent when the device becomes non-interactive which may have nothing to do with the screen turning off". Additionally in Android it is possible to measure the time spent on the ``lock`` screen onto the ``unlock`` event and the total screen time (i.e. ``ON`` to ``OFF``) events but we are only keeping ``unlock`` episodes (``unlock`` to ``OFF``) to be consistent with iOS.
-**Rule:**``rules/preprocessing.snakefile/download_dataset`` - See the download_dataset_ rule.
-**Script:**``src/data/download_dataset.R`` - See the download_dataset.R_ script.
-**Rule:**``rules/preprocessing.snakefile/fitbit_with_datetime`` - See the fitbit_with_datetime_ rule.
-**Script:**``src/data/fitbit_readable_datetime.py`` - See the fitbit_readable_datetime.py_ script.
-**Rule:**``rules/features.snakefile/fitbit_heartrate_metrics`` - See the fitbit_heartrate_metrics_ rule.
-**Script:**``src/features/fitbit_heartrate_metrics.py`` - See the fitbit_heartrate_metrics.py_ script.
.._fitbit-heart-rate-parameters:
**Fitbit: Heart Rate Rule Parameters:**
============ ===================
Name Description
============ ===================
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
metrics The different measures that can be retrieved from the Fitbit: Heart Rate dataset.
See :ref:`Available Fitbit: Heart Rate Metrics <fitbit-heart-rate-available-metrics>` Table below
============ ===================
.._fitbit-heart-rate-available-metrics:
**Available Fitbit: Heart Rate Metrics**
The following table shows a list of the available metrics for the Fitbit: Heart Rate dataset.
================== =========== =============
Name Units Description
================== =========== =============
maxhr beats/mins The maximum heart rate.
minhr beats/mins The minimum heart rate.
avghr beats/mins The average heart rate.
medianhr beats/mins The median heart rate.
modehr beats/mins The mode heart rate.
stdhr beats/mins The standard deviation of heart rate.
diffmaxmodehr beats/mins Diff max mode heart rate: The maximum heart rate minus mode heart rate.
diffminmodehr beats/mins Diff min mode heart rate: The mode heart rate minus minimum heart rate.
entropyhr Entropy heart rate: The entropy of heart rate.
lengthoutofrange minutes Length out of range: The duration of time the heart rate is in the ``out_of_range`` zone in minute.
lengthfatburn minutes Length fat burn: The duration of time the heart rate is in the ``fat_burn`` zone in minute.
lengthcardio minutes Length cardio: The duration of time the heart rate is in the ``cardio`` zone in minute.
lengthpeak minutes Length peak: The duration of time the heart rate is in the ``peak`` zone in minute
================== =========== =============
**Assumptions/Observations:** Heart rate zones contain 4 zones: ``out_of_range`` zone, ``fat_burn`` zone, ``cardio`` zone, and ``peak`` zone. Please refer to the `Fitbit documentation`_ for detailed information of how to define those zones.
.._fitbit-steps-sensor-doc:
Fitbit: Steps
"""""""""""""""
See `Fitbit: Steps Config Code`_
**Available Epochs:**
- daily
- morning
- afternoon
- evening
- night
**Available Platforms:**
- Fitbit
**Snakefile entry:**
.. - Download raw Fitbit: Steps dataset: ``expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["FITBIT_TABLE"]),``
..
- Apply readable datetime to Fitbit: Steps dataset:
-**Rule:**``rules/preprocessing.snakefile/download_dataset`` - See the download_dataset_ rule.
-**Script:**``src/data/download_dataset.R`` - See the download_dataset.R_ script.
-**Rule:**``rules/preprocessing.snakefile/fitbit_with_datetime`` - See the fitbit_with_datetime_ rule.
-**Script:**``src/data/fitbit_readable_datetime.py`` - See the fitbit_readable_datetime.py_ script.
-**Rule:**``rules/features.snakefile/fitbit_step_metrics`` - See the fitbit_step_metrics.py_ rule.
-**Script:**``src/features/fitbit_step_metrics.py`` - See the fitbit_step_metrics.py_ script.
.._fitbit-steps-parameters:
**Fitbit: Steps Rule Parameters:**
======================= ===================
Name Description
======================= ===================
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
metrics The different measures that can be retrieved from the dataset. See :ref:`Available Fitbit: Steps Metrics <fitbit-steps-available-metrics>` Table below
threshold_active_bout The maximum number of steps per minute necessary for a bout to be ``sedentary``. That is, if the step count per minute is greater than this value the bout has a status of ``active``.
======================= ===================
.._fitbit-steps-available-metrics:
**Available Fitbit: Steps Metrics**
The following table shows a list of the available metrics for the Fitbit: Steps dataset.
========================= ========= =============
Name Units Description
========================= ========= =============
sumallsteps steps Sum all steps: The total step count.
maxallsteps steps Max all steps: The maximum step count
minallsteps steps Min all steps: The minimum step count
avgallsteps steps Avg all steps: The average step count
stdallsteps steps Std all steps: The standard deviation of step count
countsedentarybout bouts Count sedentary bout: A count of sedentary bouts
maxdurationsedentarybout minutes Max duration sedentary bout: The maximum duration of sedentary bouts
mindurationsedentarybout minutes Min duration sedentary bout: The minimum duration of sedentary bouts
avgdurationsedentarybout minutes Avg duration sedentary bout: The average duration of sedentary bouts
stddurationsedentarybout minutes Std duration sedentary bout: The standard deviation of the duration of sedentary bouts
countactivebout bouts Count active bout: A count of active bouts
maxdurationactivebout minutes Max duration active bout: The maximum duration of active bouts
mindurationactivebout minutes Min duration active bout: The minimum duration of active bouts
avgdurationactivebout minutes Avg duration active bout: The average duration of active bouts
stddurationactivebout minutes Std duration active bout: The standard deviation of the duration of active bouts
**Assumptions/Observations:** If the step count per minute smaller than the ``THRESHOLD_ACTIVE_BOUT`` (default value is 10), it is defined as sedentary status. Otherwise, it is defined as active status. One active/sedentary bout is a period during with the user is under ``active``/``sedentary`` status.