RAPIDS is a community effort and as such we want to continue recognizing the contributions from other researchers. Besides citing RAPIDS, we ask you to cite any of the authors listed below if you used those sensor providers in your analysis, thank you!
Vega J, Li M, Aguillera K, Goel N, Joshi E, Durica KC, Kunta AR, Low CA
+RAPIDS: Reproducible Analysis Pipeline for Data Streams Collected with Mobile Devices
+JMIR Preprints. 18/08/2020:23246
+DOI: 10.2196/preprints.23246
+URL: https://preprints.jmir.org/preprint/23246
If you computed accelerometer features using the provider [PHONE_ACCLEROMETER][PANDA] cite this paper in addition to RAPIDS.
+
+
Panda et al. citation
+
Panda N, Solsky I, Huang EJ, Lipsitz S, Pradarelli JC, Delisle M, Cusack JC, Gadd MA, Lubitz CC, Mullen JT, Qadan M, Smith BL, Specht M, Stephen AE, Tanabe KK, Gawande AA, Onnela JP, Haynes AB. Using Smartphones to Capture Novel Recovery Metrics After Cancer Surgery. JAMA Surg. 2020 Feb 1;155(2):123-129. doi: 10.1001/jamasurg.2019.4702. PMID: 31657854; PMCID: PMC6820047.
If you computed applications foreground features using the app category (genre) catalogue in [PHONE_APPLICATIONS_FOREGROUND][RAPIDS] cite this paper in addition to RAPIDS.
+
+
Stachl et al. citation
+
Clemens Stachl, Quay Au, Ramona Schoedel, Samuel D. Gosling, Gabriella M. Harari, Daniel Buschek, Sarah Theres Völkel, Tobias Schuwerk, Michelle Oldemeier, Theresa Ullmann, Heinrich Hussmann, Bernd Bischl, Markus Bühner. Proceedings of the National Academy of Sciences Jul 2020, 117 (30) 17680-17687; DOI: 10.1073/pnas.1920484117
If you computed bluetooth features using the provider [PHONE_BLUETOOTH][DORYAB] cite this paper in addition to RAPIDS.
+
+
Doryab et al. citation
+
Doryab, A., Chikarsel, P., Liu, X., & Dey, A. K. (2019). Extraction of Behavioral Features from Smartphone and Wearable Data. ArXiv:1812.10394 [Cs, Stat]. http://arxiv.org/abs/1812.10394
If you computed locations features using the provider [PHONE_LOCATIONS][BARNETT] cite this paper and this paper in addition to RAPIDS.
+
+
Barnett et al. citation
+
Ian Barnett, Jukka-Pekka Onnela, Inferring mobility measures from GPS traces with missing data, Biostatistics, Volume 21, Issue 2, April 2020, Pages e98–e112, https://doi.org/10.1093/biostatistics/kxy059
+
+
+
Canzian et al. citation
+
Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ‘15). Association for Computing Machinery, New York, NY, USA, 1293–1304. DOI:https://doi.org/10.1145/2750858.2805845
If you computed locations features using the provider [PHONE_LOCATIONS][DORYAB] cite this paper and this paper in addition to RAPIDS.
+
+
Doryab et al. citation
+
Doryab, A., Chikarsel, P., Liu, X., & Dey, A. K. (2019). Extraction of Behavioral Features from Smartphone and Wearable Data. ArXiv:1812.10394 [Cs, Stat]. http://arxiv.org/abs/1812.10394
+
+
+
Canzian et al. citation
+
Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ‘15). Association for Computing Machinery, New York, NY, USA, 1293–1304. DOI:https://doi.org/10.1145/2750858.2805845
We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone, regardless of age, body
+size, visible or invisible disability, ethnicity, sex characteristics, gender
+identity and expression, level of experience, education, socio-economic status,
+nationality, personal appearance, race, religion, or sexual identity
+and orientation.
+
We pledge to act and interact in ways that contribute to an open, welcoming,
+diverse, inclusive, and healthy community.
Community leaders are responsible for clarifying and enforcing our standards of
+acceptable behavior and will take appropriate and fair corrective action in
+response to any behavior that they deem inappropriate, threatening, offensive,
+or harmful.
+
Community leaders have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are
+not aligned to this Code of Conduct, and will communicate reasons for moderation
+decisions when appropriate.
This Code of Conduct applies within all community spaces, and also applies when
+an individual is officially representing the community in public spaces.
+Examples of representing our community include using an official e-mail address,
+posting via an official social media account, or acting as an appointed
+representative at an online or offline event.
Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the community leaders responsible for enforcement at
+moshi@pitt.edu.
+All complaints will be reviewed and investigated promptly and fairly.
+
All community leaders are obligated to respect the privacy and security of the
+reporter of any incident.
Community leaders will follow these Community Impact Guidelines in determining
+the consequences for any action they deem in violation of this Code of Conduct:
Community Impact: Use of inappropriate language or other behavior deemed
+unprofessional or unwelcome in the community.
+
Consequence: A private, written warning from community leaders, providing
+clarity around the nature of the violation and an explanation of why the
+behavior was inappropriate. A public apology may be requested.
Community Impact: A violation through a single incident or series
+of actions.
+
Consequence: A warning with consequences for continued behavior. No
+interaction with the people involved, including unsolicited interaction with
+those enforcing the Code of Conduct, for a specified period of time. This
+includes avoiding interactions in community spaces as well as external channels
+like social media. Violating these terms may lead to a temporary or
+permanent ban.
Community Impact: A serious violation of community standards, including
+sustained inappropriate behavior.
+
Consequence: A temporary ban from any sort of interaction or public
+communication with the community for a specified period of time. No public or
+private interaction with the people involved, including unsolicited interaction
+with those enforcing the Code of Conduct, is allowed during this period.
+Violating these terms may lead to a permanent ban.
Community Impact: Demonstrating a pattern of violation of community
+standards, including sustained inappropriate behavior, harassment of an
+individual, or aggression toward or disparagement of classes of individuals.
+
Consequence: A permanent ban from any sort of public interaction within
+the community.
We use mkdocs with the material theme to write these docs. Whenever you make any changes, just push them back to the repo and the documentation will be deployed automatically.
The documentation config file is /mkdocs.yml, if you are adding new .md files to the docs modify the nav attribute at the bottom of that file. You can use the hierarchy there to find all the files that appear in the documentation.
Check this page to get familiar with the different visual elements we can use in the docs (admonitions, code blocks, tables, etc.) You can also refer to /docs/setup/installation.md and /docs/setup/configuration.md to see practical examples of these elements.
+
+
Hint
+
Any links to internal pages should be relative to the current page. For example, any link from this page (documentation) which is inside ./developers should begin with ../ to go one folder level up like:
+
Stage and commit your changes using VS Code git GUI or the following commands
+
git add modified-file1 modified-file2
+git commit -m "Add my new feature"# use a concise description
+
+
+
Integrate your new feature to develop
+
+
You are an internal developer if you have writing permissions to the repository.
+
Most feature branches are never pushed to the repo, only do so if you expect that its development will take days (to avoid losing your work if you computer is damaged). Otherwise follow the following instructions to locally rebase your feature branch into develop and push those rebased changes online.
Along with the continued development and the addition of new sensors and features to the RAPIDS pipeline, tests for the currently available sensors and features are being implemented. Since this is a Work In Progress this page will be updated with the list of sensors and features for which testing is available. For each of the sensors listed a description of the data used for testing (test cases) are outline. Currently for all intent and testing purposes the tests/data/raw/test01/ contains all the test data files for testing android data formats and tests/data/raw/test02/ contains all the test data files for testing iOS data formats. It follows that the expected (verified output) are contained in the tests/data/processed/test01/ and tests/data/processed/test02/ for Android and iOS respectively. tests/data/raw/test03/ and tests/data/raw/test04/ contain data files for testing empty raw data files for android and iOS respectively.
+
The following is a list of the sensors that testing is currently available.
The raw message data file contains data for 2 separate days.
+
The data for the first day contains records 5 records for every
+ epoch.
+
The second day's data contains 6 records for each of only 2
+ epoch (currently morning and evening)
+
The raw message data contains records for both message_types
+ (i.e. recieved and sent) in both days in all epochs. The
+ number records with each message_types per epoch is randomly
+ distributed There is at least one records with each
+ message_types per epoch.
+
There is one raw message data file each, as described above, for
+ testing both iOS and Android data.
+
There is also an additional empty data file for both android and
+ iOS for testing empty data files
Due to the difference in the format of the raw call data for iOS and Android the following is the expected results the calls_with_datetime_unified.csv. This would give a better idea of the use cases being tested since the calls_with_datetime_unified.csv would make both the iOS and Android data comparable.
+
+
The call data would contain data for 2 days.
+
The data for the first day contains 6 records for every epoch.
+
The second day's data contains 6 records for each of only 2
+ epoch (currently morning and evening)
+
The call data contains records for all call_types (i.e.
+ incoming, outgoing and missed) in both days in all epochs.
+ The number records with each of the call_types per epoch is
+ randomly distributed. There is at least one records with each
+ call_types per epoch.
+
There is one call data file each, as described above, for testing
+ both iOS and Android data.
+
There is also an additional empty data file for both android and
+ iOS for testing empty data files
Due to the difference in the format of the raw screen data for iOS and Android the following is the expected results the screen_deltas.csv. This would give a better idea of the use cases being tested since the screen_eltas.csv would make both the iOS and Android data comparable These files are used to calculate the features for the screen sensor
+
+
The screen delta data file contains data for 1 day.
+
The screen delta data contains 1 record to represent an unlock
+ episode that falls within an epoch for every epoch.
+
The screen delta data contains 1 record to represent an unlock
+ episode that falls across the boundary of 2 epochs. Namely the
+ unlock episode starts in one epoch and ends in the next, thus
+ there is a record for unlock episodes that fall across night
+ to morning, morning to afternoon and finally afternoon to
+ night
+
The testing is done for unlock episode_type.
+
There is one screen data file each for testing both iOS and
+ Android data formats.
+
There is also an additional empty data file for both android and
+ iOS for testing empty data files
Due to the difference in the format of the raw battery data for iOS and Android as well as versions of iOS the following is the expected results the battery_deltas.csv. This would give a better idea of the use cases being tested since the battery_deltas.csv would make both the iOS and Android data comparable. These files are used to calculate the features for the battery sensor.
+
+
The battery delta data file contains data for 1 day.
+
The battery delta data contains 1 record each for a charging and
+ discharging episode that falls within an epoch for every
+ epoch. Thus, for the daily epoch there would be multiple
+ charging and discharging episodes
+
Since either a charging episode or a discharging episode and
+ not both can occur across epochs, in order to test episodes that
+ occur across epochs alternating episodes of charging and
+ discharging episodes that fall across night to morning,
+ morning to afternoon and finally afternoon to night are
+ present in the battery delta data. This starts with a
+ discharging episode that begins in night and end in morning.
+
There is one battery data file each, for testing both iOS and
+ Android data formats.
+
There is also an additional empty data file for both android and
+ iOS for testing empty data files
The raw Bluetooth data file contains data for 1 day.
+
The raw Bluetooth data contains at least 2 records for each
+ epoch. Each epoch has a record with a timestamp for the
+ beginning boundary for that epoch and a record with a
+ timestamp for the ending boundary for that epoch. (e.g. For
+ the morning epoch there is a record with a timestamp for
+ 6:00AM and another record with a timestamp for 11:59:59AM.
+ These are to test edge cases)
+
An option of 5 Bluetooth devices are randomly distributed
+ throughout the data records.
+
There is one raw Bluetooth data file each, for testing both iOS
+ and Android data formats.
+
There is also an additional empty data file for both android and
+ iOS for testing empty data files.
There are 2 data files (wifi_raw.csv and sensor_wifi_raw.csv)
+ for each fake participant for each phone platform.
+
The raw WIFI data files contain data for 1 day.
+
The sensor_wifi_raw.csv data contains at least 2 records for
+ each epoch. Each epoch has a record with a timestamp for the
+ beginning boundary for that epoch and a record with a
+ timestamp for the ending boundary for that epoch. (e.g. For
+ the morning epoch there is a record with a timestamp for
+ 6:00AM and another record with a timestamp for 11:59:59AM.
+ These are to test edge cases)
+
The wifi_raw.csv data contains 3 records with random timestamps
+ for each epoch to represent visible broadcasting WIFI network.
+ This file is empty for the iOS phone testing data.
+
An option of 10 access point devices is randomly distributed
+ throughout the data records. 5 each for sensor_wifi_raw.csv and
+ wifi_raw.csv.
+
There data files for testing both iOS and Android data formats.
+
There are also additional empty data files for both android and
+ iOS for testing empty data files.
The raw light data contains 3 or 4 rows of data for each epoch
+ except night. The single row of data for night is for testing
+ features for single values inputs. (Example testing the standard
+ deviation of one input value)
+
Since light is only available for Android there is only one file
+ that contains data for Android. All other files (i.e. for iPhone)
+ are empty data files.
The raw application foreground data file contains data for 1 day.
+
The raw application foreground data contains 7 - 9 rows of data
+ for each epoch. The records for each epoch contains apps that
+ are randomly selected from a list of apps that are from the
+ MULTIPLE_CATEGORIES and SINGLE_CATEGORIES (See
+ testing_config.yaml). There are also records in each epoch
+ that have apps randomly selected from a list of apps that are from
+ the EXCLUDED_CATEGORIES and EXCLUDED_APPS. This is to test
+ that these apps are actually being excluded from the calculations
+ of features. There are also records to test SINGLE_APPS
+ calculations.
+
Since application foreground is only available for Android there
+ is only one file that contains data for Android. All other files
+ (i.e. for iPhone) are empty data files.
The raw Activity Recognition data file contains data for 1 day.
+
The raw Activity Recognition data each epoch period contains
+ rows that records 2 - 5 different activity_types. The is such
+ that durations of activities can be tested. Additionally, there
+ are records that mimic the duration of an activity over the time
+ boundary of neighboring epochs. (For example, there a set of
+ records that mimic the participant in_vehicle from afternoon
+ into evening)
+
There is one file each with raw Activity Recognition data for
+ testing both iOS and Android data formats.
+ (plugin_google_activity_recognition_raw.csv for android and
+ plugin_ios_activity_recognition_raw.csv for iOS)
+
There is also an additional empty data file for both android and
+ iOS for testing empty data files.
The raw conversation data file contains data for 2 day.
+
The raw conversation data contains records with a sample of both
+ datatypes (i.e. voice/noise = 0, and conversation = 2 )
+ as well as rows with for samples of each of the inference values
+ (i.e. silence = 0, noise = 1, voice = 2, and unknown
+ = 3) for each epoch. The different datatype and inference
+ records are randomly distributed throughout the epoch.
+
Additionally there are 2 - 5 records for conversations (datatype
+ = 2, and inference = -1) in each epoch and for each epoch
+ except night, there is a conversation record that has a
+ double_convo_starttimestamp that is from the previous
+ epoch. This is to test the calculations of features across
+ epochs.
+
There is a raw conversation data file for both android and iOS
+ platforms (plugin_studentlife_audio_android_raw.csv and
+ plugin_studentlife_audio_raw.csv respectively).
+
Finally, there are also additional empty data files for both
+ android and iOS for testing empty data files
To begin testing RAPIDS place the fake raw input data csv files of each fake participant in
+ tests/data/raw/. The fake participant files should be placed in
+ tests/data/external/participant_files. The expected output files of RAPIDS after
+ processing the input data should be placed in tests/data/processesd/frequency and tests/data/processesd/periodic for frequency and periodic respectively.
+
Edit tests/settings/frequency/config.yaml and tests/settings/periodic/config.yaml to add and/or remove the rules
+ to be run for testing from the forcerun list.
+
Edit tests/settings/frequency/testing_config.yaml and tests/settings/frequency/testing_config.yaml to configure the settings and enable/disable sensors to be tested.
+ [-l] will delete all the existing files in /data before running tests.
+ [all | periodic | frequency] will generate feature data for all or specific type of features and save in data/processed.
+ [test] will compare the features generated with the precomputed and verified features in /tests/data/processed.
+
+
The following is a snippet of the output you should see after running your test.
The results above show that the for periodic both test_sensors_files_exist and test_sensors_features_calculations passed while for frequency first test test_sensors_files_exist passed while test_sensors_features_calculations failed. In addition you should get the traceback of the failure (not shown here). For more information on how to implement test scripts and use unittest please see Unittest Documentation
+
Testing of the RAPIDS sensors and features is a work-in-progress. Please see test-cases for a list of sensors and features that have testing currently available.
+
Currently the repository is set up to test a number of sensors out of the box by simply running the tests/scripts/run_tests.sh command once the RAPIDS python environment is active.
Try to install any new package using conda install -c CHANNEL PACKAGE_NAME (you can use pip if the package is only available there). Make sure your Python virtual environment is active (conda activate YOUR_ENV).
After installing or removing a package you can use the following command in your terminal to update your environment.yaml before publishing your pipeline. Note that we ignore the package version for libfortran and mkl to keep compatibility with Linux:
+
conda env export --no-builds | sed 's/^.*libgfortran.*$/ - libgfortran/'| sed 's/^.*mkl=.*$/ - mkl/' > environment.yml
+
**Error in .local(drv, \...) :** **Failed to connect to database: Error:
+Can\'t initialize character set unknown (path: compiled\_in)** :
+
+Calls: dbConnect -> dbConnect -> .local -> .Call
+Execution halted
+[Tue Mar 1019:40:15 2020]
+Error in rule download_dataset:
+ jobid: 531
+ output: data/raw/p60/locations_raw.csv
+
+RuleException:
+CalledProcessError in line 20 of /home/ubuntu/rapids/rules/preprocessing.snakefile:
+Command 'set -euo pipefail; Rscript --vanilla /home/ubuntu/rapids/.snakemake/scripts/tmp_2jnvqs7.download_dataset.R' returned non-zero exit status 1.
+File "/home/ubuntu/rapids/rules/preprocessing.snakefile", line 20, in __rule_download_dataset
+File "/home/ubuntu/anaconda3/envs/moshi-env/lib/python3.7/concurrent/futures/thread.py", line 57, in run
+Shutting down, this might take some time.
+Exiting because a job execution failed. Look above for error message
+
+
+Solution
Please make sure the DATABASE_GROUP in config.yaml matches your DB credentials group in .env.
+
+
+
Cannot start mysql in linux via brew services start mysql¶
+Problem
Cannot start mysql in linux via brew services start mysql
+
+Solution
Use mysql.server start
+
+
+
Every time I run force the download_dataset rule all rules are executed¶
+Problem
When running snakemake -j1 -R download_phone_data or ./rapids -j1 -R download_phone_data all the rules and files are re-computed
+
+Solution
This is expected behavior. The advantage of using snakemake under the hood is that every time a file containing data is modified every rule that depends on that file will be re-executed to update their results. In this case, since download_dataset updates all the raw data, and you are forcing the rule with the flag -R every single rule that depends on those raw files will be executed.
+
+
+
Error Table XXX doesn't exist while running the download_phone_data or download_fitbit_data rule.¶
+Problem
Error in .local(conn, statement, ...) :
+ could not run statement: Table 'db_name.table_name' doesn't exist
+Calls: colnames ... .local -> dbSendQuery -> dbSendQuery -> .local -> .Call
+Execution halted
+
+
+Solution
Please make sure the sensors listed in [PHONE_VALID_SENSED_BINS][PHONE_SENSORS] and the [TABLE] of each sensor you activated in config.yaml match your database tables.
This is a bug in Ubuntu 20.04 when trying to connect to an old MySQL server with MySQL client 8.0. You should get the same error message if you try to connect from the command line. There you can add the option --ssl-mode=DISABLED but we can't do this from the R connector.
+
If you can't update your server, the quickest solution would be to import your database to another server or to a local environment. Alternatively, you could replace mysql-client and libmysqlclient-dev with mariadb-client and libmariadbclient-dev and reinstall renv. More info about this issue here
If you get the following error KeyError in line 43 of preprocessing.smk: 'PHONE_SENSORS', it means that the indentation of the key [PHONE_SENSORS] is not matching the other child elements of PHONE_VALID_SENSED_BINS
+
+Solution
You need to add or remove any leading whitespaces as needed on that line.
+
PHONE_VALID_SENSED_BINS:
+ COMPUTE:False# This flag is automatically ignored (set to True) if you are extracting PHONE_VALID_SENSED_DAYS or screen or Barnett's location features
+ BIN_SIZE:&bin_size5# (in minutes)
+ PHONE_SENSORS:[]
+
+
+
+
Error while updating your conda environment in Ubuntu¶
+Problem
You get the following error:
+
CondaMultiError: CondaVerificationError: The package for tk located at /home/ubuntu/miniconda2/pkgs/tk-8.6.9-hed695b0_1003
+ appears to be corrupted. The path 'include/mysqlStubs.h'
+ specified in the package manifest cannot be found.
+ClobberError: This transaction has incompatible packages due to a shared path.
+ packages: conda-forge/linux-64::llvm-openmp-10.0.0-hc9558a2_0, anaconda/linux-64::intel-openmp-2019.4-243
+ path: 'lib/libiomp5.so'
+
You get the following error when downloading sensor data:
+
Error in result_fetch(res@ptr, n= n) :
+ embedded nul in string:
+
+
+Solution
This problem is due to the way RMariaDB handles a mismatch between data types in R and MySQL (see this issue). Since it seems this problem won’t be handled by RMariaDB, you have two options:
+
+
Remove the the null character from the conflictive table cell(s). You can adapt the following query on a MySQL server 8.0 or older
+
If it’s not feasible to modify your data you can try swapping RMariaDB with RMySQL. Just have in mind you might have problems connecting to modern MySQL servers running in Linux:
+
Add RMySQL to the renv environment by running the following command in a terminal open on RAPIDS root folder
+
R -e 'renv::install("RMySQL")'
+
+
Go to src/data/download_phone_data.R or src/data/download_fitbit_data.R and replace library(RMariaDB) with library(RMySQL)
+
In the same file(s) replace dbEngine <- dbConnect(MariaDB(), default.file = "./.env", group = group) with dbEngine <- dbConnect(MySQL(), default.file = "./.env", group = group)
You get the following error when executing RAPIDS:
+
Error in library(RMariaDB) : there is no package called 'RMariaDB'
+Execution halted
+
+
+Solution
In RAPIDS v0.1.0 we replaced RMySQL R package with RMariaDB, this error means your R virtual environment is out of date, to update it run snakemake -j1 renv_restore
You won’t have to deal with time zones, dates, times, data cleaning or preprocessing. The data that RAPIDS pipes to your feature extraction code is ready to process.
As a tutorial, we will add a new provider for PHONE_ACCELEROMETER called VEGA that extracts feature1, feature2, feature3 in Python and that it requires a parameter from the user called MY_PARAMETER.
+Existing Sensors
An existing sensor is any of the phone or Fitbit sensors with a configuration entry in config.yaml:
In this step you need to add your provider configuration section under the relevant sensor in config.yaml. See our example for our tutorial’s VEGA provider for PHONE_ACCELEROMETER:
+Example configuration for a new accelerometer provider VEGA
In this step you need to add a folder, script and function for your provider.
+
+
Create your provider folder under src/feature/DEVICE_SENSOR/YOUR_PROVIDER, in our example src/feature/phone_accelerometer/vega (same as [SRC_FOLDER] in the step above).
+
Create your provider script inside your provider folder, it can be a Python file called main.py or an R file called main.R.
+
+
Add your provider function in your provider script. The name of such function should be [providername]_features, in our example vega_features
The provider function that you created in the step above will receive the following parameters:
+
+
+
+
Parameter
+
Description
+
+
+
+
+
sensor_data_files
+
Path to the CSV file containing the data of a single participant. This data has been cleaned and preprocessed. Your function will be automatically called for each participant in your study (in the [PIDS] array in config.yaml)
+
+
+
time_segment
+
The label of the time segment that should be processed.
+
+
+
provider
+
The parameters you configured for your provider in config.yaml will be available in this variable as a dictionary in Python or a list in R. In our example this dictionary contains {MY_PARAMETER:"a_string"}
+
+
+
filter_data_by_segment
+
Python only. A function that you will use to filter your data. In R this function is already available in the environment.
+
+
+
*args
+
Python only. Not used for now
+
+
+
**kwargs
+
Python only. Not used for now
+
+
+
+
The code to extract your behavioral features should be implemented in your provider function and in general terms it will have three stages:
+1. Read a participant’s data by loading the CSV data stored in the file pointed by sensor_data_files
Note that phone’s battery, screen, and activity recognition data is given as episodes instead of event rows (for example, start and end timestamps of the periods the phone screen was on)
+
+2. Filter your data to process only those rows that belong to time_segment
This step is only one line of code, but to undersand why we need it, keep reading.
+
Let’s understand the filter_data_by_segment() function with an example. A RAPIDS user can extract features on any arbitrary time segment. A time segment is a period of time that has a label and one or more instances. For example, the user (or you) could have requested features on a daily, weekly, and week-end basis for p01. The labels are arbritrary and the instances depend on the days a participant was monitored for:
+
+
the daily segment could be named my_days and if p01 was monitored for 14 days, it would have 14 instances
+
the weekly segment could be named my_weeks and if p01 was monitored for 14 days, it would have 2 instances.
+
the weekend segment could be named my_weekends and if p01 was monitored for 14 days, it would have 2 instances.
+
+
For this example, RAPIDS will call your provider function three times for p01, once where time_segment is my_days, once where time_segment is my_weeks and once where time_segment is my_weekends. In this example not every row in p01‘s data needs to take part in the feature computation for either segment and the rows need to be grouped differently.
+
Thus filter_data_by_segment() comes in handy, it will return a data frame that contains the rows that were logged during a time segment plus an extra column called local_segment. This new column will have as many unique values as time segment instances exist (14, 2, and 2 for our p01‘s my_days, my_weeks, and my_weekends examples). After filtering, you should group the data frame by this column and compute any desired features, for example:
The reason RAPIDS does not filter the participant’s data set for you is because your code might need to compute something based on a participant’s complete dataset before computing their features. For example, you might want to identify the number that called a participant the most throughout the study before computing a feature with the number of calls the participant received from this number.
+
+3. Return a data frame with your features
After filtering, grouping your data, and computing your features, your provider function should return a data frame that has:
+
+
One row per time segment instance (e.g. 14 our p01‘s my_days example)
+
The local_segment column added by filter_data_by_segment()
+
One column per feature. By convention the name of your features should only contain letters or numbers (feature1). RAPIDS will automatically add the right sensor and provider prefix (phone_accelerometr_vega_)
+
+
+PHONE_ACCELEROMETER Provider Example
For your reference, this a short example of our own provider (RAPIDS) for PHONE_ACCELEROMETER that computes five acceleration features
+
defrapids_features(sensor_data_files,time_segment,provider,filter_data_by_segment,*args,**kwargs):
+
+ acc_data=pd.read_csv(sensor_data_files["sensor_data"])
+ requested_features=provider["FEATURES"]
+ # name of the features this function can compute
+ base_features_names=["maxmagnitude","minmagnitude","avgmagnitude","medianmagnitude","stdmagnitude"]
+ # the subset of requested features this function can compute
+ features_to_compute=list(set(requested_features)&set(base_features_names))
+
+ acc_features=pd.DataFrame(columns=["local_segment"]+features_to_compute)
+ ifnotacc_data.empty:
+ acc_data=filter_data_by_segment(acc_data,time_segment)
+
+ ifnotacc_data.empty:
+ acc_features=pd.DataFrame()
+ # get magnitude related features: magnitude = sqrt(x^2+y^2+z^2)
+ magnitude=acc_data.apply(lambdarow:np.sqrt(row["double_values_0"]**2+row["double_values_1"]**2+row["double_values_2"]**2),axis=1)
+ acc_data=acc_data.assign(magnitude=magnitude.values)
+
+ if"maxmagnitude"infeatures_to_compute:
+ acc_features["maxmagnitude"]=acc_data.groupby(["local_segment"])["magnitude"].max()
+ if"minmagnitude"infeatures_to_compute:
+ acc_features["minmagnitude"]=acc_data.groupby(["local_segment"])["magnitude"].min()
+ if"avgmagnitude"infeatures_to_compute:
+ acc_features["avgmagnitude"]=acc_data.groupby(["local_segment"])["magnitude"].mean()
+ if"medianmagnitude"infeatures_to_compute:
+ acc_features["medianmagnitude"]=acc_data.groupby(["local_segment"])["magnitude"].median()
+ if"stdmagnitude"infeatures_to_compute:
+ acc_features["stdmagnitude"]=acc_data.groupby(["local_segment"])["magnitude"].std()
+
+ acc_features=acc_features.reset_index()
+
+ returnacc_features
+
If you want to add features for a device or a sensor that we do not support at the moment (those that do not appear in the "Existing Sensors" list above), contact us or request it on Slack and we can add the necessary code so you can follow the instructions above.
Every phone or Fitbit sensor has a corresponding config section in config.yaml, these sections follow a similar structure and we’ll use PHONE_ACCELEROMETER as an example to explain this structure.
+
+
Hint
+
+
We recommend reading this page if you are using RAPIDS for the first time
+
All computed sensor features are stored under /data/processed/features on files per sensor, per participant and per study (all participants).
+
Every time you change any sensor parameters, provider parameters or provider features, all the necessary files will be updated as soon as you execute RAPIDS.
+
+
+
+
Config section example for PHONE_ACCELEROMETER
+
# 1) Config section
+PHONE_ACCELEROMETER:
+ # 2) Parameters for PHONE_ACCELEROMETER
+ TABLE:accelerometer
+
+ # 3) Providers for PHONE_ACCELEROMETER
+ PROVIDERS:
+ # 4) RAPIDS provider
+ RAPIDS:
+ # 4.1) Parameters of RAPIDS provider of PHONE_ACCELEROMETER
+ COMPUTE:False
+ # 4.2) Features of RAPIDS provider of PHONE_ACCELEROMETER
+ FEATURES:["maxmagnitude","minmagnitude","avgmagnitude","medianmagnitude","stdmagnitude"]
+ SRC_FOLDER:"rapids"# inside src/features/phone_accelerometer
+ SRC_LANGUAGE:"python"
+
+ # 5) PANDA provider
+ PANDA:
+ # 5.1) Parameters of PANDA provider of PHONE_ACCELEROMETER
+ COMPUTE:False
+ VALID_SENSED_MINUTES:False
+ # 5.2) Features of PANDA provider of PHONE_ACCELEROMETER
+ FEATURES:
+ exertional_activity_episode:["sumduration","maxduration","minduration","avgduration","medianduration","stdduration"]
+ nonexertional_activity_episode:["sumduration","maxduration","minduration","avgduration","medianduration","stdduration"]
+ SRC_FOLDER:"panda"# inside src/features/phone_accelerometer
+ SRC_LANGUAGE:"python"
+
Each sensor configuration section has a “parameters” subsection (see #2 in the example). These are parameters that affect different aspects of how the raw data is downloaded, and processed. The TABLE parameter exists for every sensor, but some sensors will have extra parameters like [PHONE_LOCATIONS]. We explain these parameters in a table at the top of each sensor documentation page.
Each sensor configuration section can have zero, one or more behavioral feature providers (see #3 in the example). A provider is a script created by the core RAPIDS team or other researchers that extracts behavioral features for that sensor. In this example, accelerometer has two providers: RAPIDS (see #4) and PANDA (see #5).
Each provider has parameters that affect the computation of the behavioral features it offers (see #4.1 or #5.1 in the example). These parameters will include at least a [COMPUTE] flag that you switch to True to extract a provider’s behavioral features.
+
We explain every provider’s parameter in a table under the Parameters description heading on each provider documentation page.
Each provider offers a set of behavioral features (see #4.2 or #5.2 in the example). For some providers these features are grouped in an array (like those for RAPIDS provider in #4.2) but for others they are grouped in a collection of arrays depending on the meaning and purpose of those features (like those for PANDAS provider in #5.2). In either case, you can delete the features you are not interested in and they will not be included in the sensor’s output feature file.
+
We explain each behavioral feature in a table under the Features description heading on each provider documentation page.
Sensor parameters description for [FITBIT_HEARTRATE_INTRADAY]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[TABLE]
+
Database table name or file path where the heart rate intraday data is stored. The configuration keys in Device Data Source Configuration control whether this parameter is interpreted as table or file.
+
+
+
+
The format of the column(s) containing the Fitbit sensor data can be JSON or PLAIN_TEXT. The data in JSON format is obtained directly from the Fitbit API. We support PLAIN_TEXT in case you already parsed your data and don’t have access to your participants’ Fitbit accounts anymore. If your data is in JSON format then summary and intraday data come packed together.
+
We provide examples of the input format that RAPIDS expects, note that both examples for JSON and PLAIN_TEXT are tabular and the actual format difference comes in the fitbit_data column (we truncate the JSON example for brevity).
+Example of the structure of source data
+
+
+
+
device_id
+
fitbit_data
+
+
+
+
+
a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+
{“activities-heart”:[{“dateTime”:”2020-10-07”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:1200.6102,”max”:88,”min”:31,”minutes”:1058,”name”:”Out of Range”},{“caloriesOut”:760.3020,”max”:120,”min”:86,”minutes”:366,”name”:”Fat Burn”},{“caloriesOut”:15.2048,”max”:146,”min”:120,”minutes”:2,”name”:”Cardio”},{“caloriesOut”:0,”max”:221,”min”:148,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:72}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:68},{“time”:”00:01:00”,”value”:67},{“time”:”00:02:00”,”value”:67},…],”datasetInterval”:1,”datasetType”:”minute”}}
+
+
+
a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+
{“activities-heart”:[{“dateTime”:”2020-10-08”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:1100.1120,”max”:89,”min”:30,”minutes”:921,”name”:”Out of Range”},{“caloriesOut”:660.0012,”max”:118,”min”:82,”minutes”:361,”name”:”Fat Burn”},{“caloriesOut”:23.7088,”max”:142,”min”:108,”minutes”:3,”name”:”Cardio”},{“caloriesOut”:0,”max”:221,”min”:148,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:70}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:77},{“time”:”00:01:00”,”value”:75},{“time”:”00:02:00”,”value”:73},…],”datasetInterval”:1,”datasetType”:”minute”}}
+
+
+
a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+
{“activities-heart”:[{“dateTime”:”2020-10-09”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:750.3615,”max”:77,”min”:30,”minutes”:851,”name”:”Out of Range”},{“caloriesOut”:734.1516,”max”:107,”min”:77,”minutes”:550,”name”:”Fat Burn”},{“caloriesOut”:131.8579,”max”:130,”min”:107,”minutes”:29,”name”:”Cardio”},{“caloriesOut”:0,”max”:220,”min”:130,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:69}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:90},{“time”:”00:01:00”,”value”:89},{“time”:”00:02:00”,”value”:88},…],”datasetInterval”:1,”datasetType”:”minute”}}
+
+
+
+
+
+
All columns are mandatory, however, all except device_id and local_date_time can be empty if you don’t have that data. Just have in mind that some features will be empty if some of these columns are empty.
Parameters description for [FITBIT_HEARTRATE_INTRADAY][PROVIDERS][RAPIDS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract FITBIT_HEARTRATE_INTRADAY features from the RAPIDS provider
+
+
+
[FEATURES]
+
Features to be computed from heart rate intraday data, see table below
+
+
+
+
Features description for [FITBIT_HEARTRATE_INTRADAY][PROVIDERS][RAPIDS]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
maxhr
+
beats/mins
+
The maximum heart rate during a time segment.
+
+
+
minhr
+
beats/mins
+
The minimum heart rate during a time segment.
+
+
+
avghr
+
beats/mins
+
The average heart rate during a time segment.
+
+
+
medianhr
+
beats/mins
+
The median of heart rate during a time segment.
+
+
+
modehr
+
beats/mins
+
The mode of heart rate during a time segment.
+
+
+
stdhr
+
beats/mins
+
The standard deviation of heart rate during a time segment.
+
+
+
diffmaxmodehr
+
beats/mins
+
The difference between the maximum and mode heart rate during a time segment.
+
+
+
diffminmodehr
+
beats/mins
+
The difference between the mode and minimum heart rate during a time segment.
+
+
+
entropyhr
+
nats
+
Shannon’s entropy measurement based on heart rate during a time segment.
+
+
+
minutesonZONE
+
minutes
+
Number of minutes the user’s heart rate fell within each heartrate_zone during a time segment.
+
+
+
+
+
Assumptions/Observations
+
+
There are four heart rate zones (ZONE): outofrange, fatburn, cardio, and peak. Please refer to Fitbit documentation for more information about the way they are computed.
Sensor parameters description for [FITBIT_HEARTRATE_SUMMARY]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[TABLE]
+
Database table name or file path where the heart rate summary data is stored. The configuration keys in Device Data Source Configuration control whether this parameter is interpreted as table or file.
+
+
+
+
The format of the column(s) containing the Fitbit sensor data can be JSON or PLAIN_TEXT. The data in JSON format is obtained directly from the Fitbit API. We support PLAIN_TEXT in case you already parsed your data and don’t have access to your participants’ Fitbit accounts anymore. If your data is in JSON format then summary and intraday data come packed together.
+
We provide examples of the input format that RAPIDS expects, note that both examples for JSON and PLAIN_TEXT are tabular and the actual format difference comes in the fitbit_data column (we truncate the JSON example for brevity).
+Example of the structure of source data
+
+
+
+
device_id
+
fitbit_data
+
+
+
+
+
a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+
{“activities-heart”:[{“dateTime”:”2020-10-07”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:1200.6102,”max”:88,”min”:31,”minutes”:1058,”name”:”Out of Range”},{“caloriesOut”:760.3020,”max”:120,”min”:86,”minutes”:366,”name”:”Fat Burn”},{“caloriesOut”:15.2048,”max”:146,”min”:120,”minutes”:2,”name”:”Cardio”},{“caloriesOut”:0,”max”:221,”min”:148,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:72}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:68},{“time”:”00:01:00”,”value”:67},{“time”:”00:02:00”,”value”:67},…],”datasetInterval”:1,”datasetType”:”minute”}}
+
+
+
a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+
{“activities-heart”:[{“dateTime”:”2020-10-08”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:1100.1120,”max”:89,”min”:30,”minutes”:921,”name”:”Out of Range”},{“caloriesOut”:660.0012,”max”:118,”min”:82,”minutes”:361,”name”:”Fat Burn”},{“caloriesOut”:23.7088,”max”:142,”min”:108,”minutes”:3,”name”:”Cardio”},{“caloriesOut”:0,”max”:221,”min”:148,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:70}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:77},{“time”:”00:01:00”,”value”:75},{“time”:”00:02:00”,”value”:73},…],”datasetInterval”:1,”datasetType”:”minute”}}
+
+
+
a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+
{“activities-heart”:[{“dateTime”:”2020-10-09”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:750.3615,”max”:77,”min”:30,”minutes”:851,”name”:”Out of Range”},{“caloriesOut”:734.1516,”max”:107,”min”:77,”minutes”:550,”name”:”Fat Burn”},{“caloriesOut”:131.8579,”max”:130,”min”:107,”minutes”:29,”name”:”Cardio”},{“caloriesOut”:0,”max”:220,”min”:130,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:69}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:90},{“time”:”00:01:00”,”value”:89},{“time”:”00:02:00”,”value”:88},…],”datasetInterval”:1,”datasetType”:”minute”}}
+
+
+
+
+
+
All columns are mandatory, however, all except device_id and local_date_time can be empty if you don’t have that data. Just have in mind that some features will be empty if some of these columns are empty.
Parameters description for [FITBIT_HEARTRATE_SUMMARY][PROVIDERS][RAPIDS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract FITBIT_HEARTRATE_SUMMARY features from the RAPIDS provider
+
+
+
[FEATURES]
+
Features to be computed from heart rate summary data, see table below
+
+
+
+
Features description for [FITBIT_HEARTRATE_SUMMARY][PROVIDERS][RAPIDS]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
maxrestinghr
+
beats/mins
+
The maximum daily resting heart rate during a time segment.
+
+
+
minrestinghr
+
beats/mins
+
The minimum daily resting heart rate during a time segment.
+
+
+
avgrestinghr
+
beats/mins
+
The average daily resting heart rate during a time segment.
+
+
+
medianrestinghr
+
beats/mins
+
The median of daily resting heart rate during a time segment.
+
+
+
moderestinghr
+
beats/mins
+
The mode of daily resting heart rate during a time segment.
+
+
+
stdrestinghr
+
beats/mins
+
The standard deviation of daily resting heart rate during a time segment.
+
+
+
diffmaxmoderestinghr
+
beats/mins
+
The difference between the maximum and mode daily resting heart rate during a time segment.
+
+
+
diffminmoderestinghr
+
beats/mins
+
The difference between the mode and minimum daily resting heart rate during a time segment.
+
+
+
entropyrestinghr
+
nats
+
Shannon’s entropy measurement based on daily resting heart rate during a time segment.
+
+
+
sumcaloriesZONE
+
cals
+
The total daily calories burned within heartrate_zone during a time segment.
+
+
+
maxcaloriesZONE
+
cals
+
The maximum daily calories burned within heartrate_zone during a time segment.
+
+
+
mincaloriesZONE
+
cals
+
The minimum daily calories burned within heartrate_zone during a time segment.
+
+
+
avgcaloriesZONE
+
cals
+
The average daily calories burned within heartrate_zone during a time segment.
+
+
+
mediancaloriesZONE
+
cals
+
The median of daily calories burned within heartrate_zone during a time segment.
+
+
+
stdcaloriesZONE
+
cals
+
The standard deviation of daily calories burned within heartrate_zone during a time segment.
+
+
+
entropycaloriesZONE
+
nats
+
Shannon’s entropy measurement based on daily calories burned within heartrate_zone during a time segment.
+
+
+
+
+
Assumptions/Observations
+
+
+
There are four heart rate zones (ZONE): outofrange, fatburn, cardio, and peak. Please refer to Fitbit documentation for more information about the way they are computed.
+
+
+
Calories’ accuracy depends on the users’ Fitbit profile (weight, height, etc.).
Sensor parameters description for [FITBIT_SLEEP_SUMMARY]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[TABLE]
+
Database table name or file path where the sleep summary data is stored. The configuration keys in Device Data Source Configuration control whether this parameter is interpreted as table or file.
+
+
+
+
The format of the column(s) containing the Fitbit sensor data can be JSON or PLAIN_TEXT. The data in JSON format is obtained directly from the Fitbit API. We support PLAIN_TEXT in case you already parsed your data and don’t have access to your participants’ Fitbit accounts anymore. If your data is in JSON format then summary and intraday data come packed together.
+
We provide examples of the input format that RAPIDS expects, note that both examples for JSON and PLAIN_TEXT are tabular and the actual format difference comes in the fitbit_data column (we truncate the JSON example for brevity).
+Example of the structure of source data with Fitbit’s sleep API Version 1
All columns are mandatory, however, all except device_id and local_date_time can be empty if you don’t have that data. Just have in mind that some features will be empty if some of these columns are empty.
+
+
+
+
device_id
+
local_start_date_time
+
local_end_date_time
+
efficiency
+
minutes_after_wakeup
+
minutes_asleep
+
minutes_awake
+
minutes_to_fall_asleep
+
minutes_in_bed
+
is_main_sleep
+
type
+
count_awake
+
duration_awake
+
count_awakenings
+
count_restless
+
duration_restless
+
+
+
+
+
a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+
2020-10-07 15:55:00
+
2020-10-07 18:10:00
+
91
+
0
+
123
+
12
+
0
+
135
+
1
+
classic
+
2
+
3
+
10
+
8
+
9
+
+
+
a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+
2020-10-07 09:49:00
+
2020-10-07 10:52:30
+
100
+
1
+
62
+
0
+
0
+
63
+
0
+
classic
+
0
+
0
+
1
+
1
+
1
+
+
+
a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+
2020-10-08 00:40:00
+
2020-10-08 06:01:30
+
89
+
0
+
275
+
33
+
0
+
321
+
1
+
classic
+
3
+
21
+
16
+
13
+
25
+
+
+
a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+
2020-10-09 00:35:30
+
2020-10-09 05:57:30
+
96
+
0
+
309
+
13
+
0
+
322
+
1
+
classic
+
1
+
3
+
8
+
7
+
10
+
+
+
+
+
+
+Example of the structure of source data with Fitbit’s sleep API Version 1.2
All columns are mandatory, however, all except device_id and local_date_time can be empty if you don’t have that data. Just have in mind that some features will be empty if some of these columns are empty.
Parameters description for [FITBIT_SLEEP_SUMMARY][PROVIDERS][RAPIDS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract FITBIT_SLEEP_SUMMARY features from the RAPIDS provider
+
+
+
[SLEEP_TYPES]
+
Types of sleep to be included in the feature extraction computation. Fitbit provides 3 types of sleep: main, nap, all.
+
+
+
[FEATURES]
+
Features to be computed from sleep summary data, see table below
+
+
+
+
Features description for [FITBIT_SLEEP_SUMMARY][PROVIDERS][RAPIDS]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
countepisodeTYPE
+
episodes
+
Number of sleep episodes for a certain sleep type during a time segment.
+
+
+
avgefficiencyTYPE
+
scores
+
Average sleep efficiency for a certain sleep type during a time segment.
+
+
+
sumdurationafterwakeupTYPE
+
minutes
+
Total duration the user stayed in bed after waking up for a certain sleep type during a time segment.
+
+
+
sumdurationasleepTYPE
+
minutes
+
Total sleep duration for a certain sleep type during a time segment.
+
+
+
sumdurationawakeTYPE
+
minutes
+
Total duration the user stayed awake but still in bed for a certain sleep type during a time segment.
+
+
+
sumdurationtofallasleepTYPE
+
minutes
+
Total duration the user spent to fall asleep for a certain sleep type during a time segment.
+
+
+
sumdurationinbedTYPE
+
minutes
+
Total duration the user stayed in bed (sumdurationtofallasleep + sumdurationawake + sumdurationasleep + sumdurationafterwakeup) for a certain sleep type during a time segment.
+
+
+
avgdurationafterwakeupTYPE
+
minutes
+
Average duration the user stayed in bed after waking up for a certain sleep type during a time segment.
+
+
+
avgdurationasleepTYPE
+
minutes
+
Average sleep duration for a certain sleep type during a time segment.
+
+
+
avgdurationawakeTYPE
+
minutes
+
Average duration the user stayed awake but still in bed for a certain sleep type during a time segment.
+
+
+
avgdurationtofallasleepTYPE
+
minutes
+
Average duration the user spent to fall asleep for a certain sleep type during a time segment.
+
+
+
avgdurationinbedTYPE
+
minutes
+
Average duration the user stayed in bed (sumdurationtofallasleep + sumdurationawake + sumdurationasleep + sumdurationafterwakeup) for a certain sleep type during a time segment.
+
+
+
+
+
Assumptions/Observations
+
+
+
There are three sleep types (TYPE): main, nap, all. The all type contains both main sleep and naps.
+
+
+
There are two versions of Fitbit’s sleep API (version 1 and version 1.2), and each provides raw sleep data in a different format:
+
+
Count & duration summaries. v1 contains count_awake, duration_awake, count_awakenings, count_restless, and duration_restless fields for every sleep record but v1.2 does not.
+
+
+
+
API columns. Features are computed based on the values provided by Fitbit’s API: efficiency, minutes_after_wakeup, minutes_asleep, minutes_awake, minutes_to_fall_asleep, minutes_in_bed, is_main_sleep and type.
Sensor parameters description for [FITBIT_STEPS_INTRADAY]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[TABLE]
+
Database table name or file path where the steps intraday data is stored. The configuration keys in Device Data Source Configuration control whether this parameter is interpreted as table or file.
+
+
+
+
The format of the column(s) containing the Fitbit sensor data can be JSON or PLAIN_TEXT. The data in JSON format is obtained directly from the Fitbit API. We support PLAIN_TEXT in case you already parsed your data and don’t have access to your participants’ Fitbit accounts anymore. If your data is in JSON format then summary and intraday data come packed together.
+
We provide examples of the input format that RAPIDS expects, note that both examples for JSON and PLAIN_TEXT are tabular and the actual format difference comes in the fitbit_data column (we truncate the JSON example for brevity).
Parameters description for [FITBIT_STEPS_INTRADAY][PROVIDERS][RAPIDS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract FITBIT_STEPS_INTRADAY features from the RAPIDS provider
+
+
+
[FEATURES]
+
Features to be computed from steps intraday data, see table below
+
+
+
[THRESHOLD_ACTIVE_BOUT]
+
Every minute with Fitbit steps data wil be labelled as sedentary if its step count is below this threshold, otherwise, active.
+
+
+
[INCLUDE_ZERO_STEP_ROWS]
+
Whether or not to include time segments with a 0 step count during the whole day.
+
+
+
+
Features description for [FITBIT_STEPS_INTRADAY][PROVIDERS][RAPIDS]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
sumsteps
+
steps
+
The total step count during a time segment.
+
+
+
maxsteps
+
steps
+
The maximum step count during a time segment.
+
+
+
minsteps
+
steps
+
The minimum step count during a time segment.
+
+
+
avgsteps
+
steps
+
The average step count during a time segment.
+
+
+
stdsteps
+
steps
+
The standard deviation of step count during a time segment.
+
+
+
countepisodesedentarybout
+
bouts
+
Number of sedentary bouts during a time segment.
+
+
+
sumdurationsedentarybout
+
minutes
+
Total duration of all sedentary bouts during a time segment.
+
+
+
maxdurationsedentarybout
+
minutes
+
The maximum duration of any sedentary bout during a time segment.
+
+
+
mindurationsedentarybout
+
minutes
+
The minimum duration of any sedentary bout during a time segment.
+
+
+
avgdurationsedentarybout
+
minutes
+
The average duration of sedentary bouts during a time segment.
+
+
+
stddurationsedentarybout
+
minutes
+
The standard deviation of the duration of sedentary bouts during a time segment.
+
+
+
countepisodeactivebout
+
bouts
+
Number of active bouts during a time segment.
+
+
+
sumdurationactivebout
+
minutes
+
Total duration of all active bouts during a time segment.
+
+
+
maxdurationactivebout
+
minutes
+
The maximum duration of any active bout during a time segment.
+
+
+
mindurationactivebout
+
minutes
+
The minimum duration of any active bout during a time segment.
+
+
+
avgdurationactivebout
+
minutes
+
The average duration of active bouts during a time segment.
+
+
+
stddurationactivebout
+
minutes
+
The standard deviation of the duration of active bouts during a time segment.
+
+
+
+
+
Assumptions/Observations
+
+
Active and sedentary bouts. If the step count per minute is smaller than THRESHOLD_ACTIVE_BOUT (default value is 10), that minute is labelled as sedentary, otherwise, is labelled as active. Active and sedentary bouts are periods of consecutive minutes labelled as active or sedentary.
Sensor parameters description for [FITBIT_STEPS_SUMMARY]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[TABLE]
+
Database table name or file path where the steps summary data is stored. The configuration keys in Device Data Source Configuration control whether this parameter is interpreted as table or file.
+
+
+
+
The format of the column(s) containing the Fitbit sensor data can be JSON or PLAIN_TEXT. The data in JSON format is obtained directly from the Fitbit API. We support PLAIN_TEXT in case you already parsed your data and don’t have access to your participants’ Fitbit accounts anymore. If your data is in JSON format then summary and intraday data come packed together.
+
We provide examples of the input format that RAPIDS expects, note that both examples for JSON and PLAIN_TEXT are tabular and the actual format difference comes in the fitbit_data column (we truncate the JSON example for brevity).
Parameters description for [PHONE_ACCELEROMETER][PROVIDERS][RAPIDS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract PHONE_ACCELEROMETER features from the RAPIDS provider
+
+
+
[FEATURES]
+
Features to be computed, see table below
+
+
+
+
Features description for [PHONE_ACCELEROMETER][PROVIDERS][RAPIDS]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
maxmagnitude
+
m/s2
+
The maximum magnitude of acceleration (\(\|acceleration\| = \sqrt{x^2 + y^2 + z^2}\)).
+
+
+
minmagnitude
+
m/s2
+
The minimum magnitude of acceleration.
+
+
+
avgmagnitude
+
m/s2
+
The average magnitude of acceleration.
+
+
+
medianmagnitude
+
m/s2
+
The median magnitude of acceleration.
+
+
+
stdmagnitude
+
m/s2
+
The standard deviation of acceleration.
+
+
+
+
+
Assumptions/Observations
+
+
Analyzing accelerometer data is a memory intensive task. If RAPIDS crashes is likely because the accelerometer dataset for a participant is to big to fit in memory. We are considering different alternatives to overcome this problem.
Parameters description for [PHONE_ACCELEROMETER][PROVIDERS][PANDA]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract PHONE_ACCELEROMETER features from the PANDA provider
+
+
+
[FEATURES]
+
Features to be computed for exertional and non-exertional activity episodes, see table below
+
+
+
+
Features description for [PHONE_ACCELEROMETER][PROVIDERS][PANDA]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
sumduration
+
minutes
+
Total duration of all exertional or non-exertional activity episodes.
+
+
+
maxduration
+
minutes
+
Longest duration of any exertional or non-exertional activity episode.
+
+
+
minduration
+
minutes
+
Shortest duration of any exertional or non-exertional activity episode.
+
+
+
avgduration
+
minutes
+
Average duration of any exertional or non-exertional activity episode.
+
+
+
medianduration
+
minutes
+
Median duration of any exertional or non-exertional activity episode.
+
+
+
stdduration
+
minutes
+
Standard deviation of the duration of all exertional or non-exertional activity episodes.
+
+
+
+
+
Assumptions/Observations
+
+
Analyzing accelerometer data is a memory intensive task. If RAPIDS crashes is likely because the accelerometer dataset for a participant is to big to fit in memory. We are considering different alternatives to overcome this problem.
+
See Panda et al for a definition of exertional and non-exertional activity episodes
Parameters description for [PHONE_ACTIVITY_RECOGNITION][PROVIDERS][RAPIDS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract PHONE_ACTIVITY_RECOGNITION features from the RAPIDS provider
+
+
+
[FEATURES]
+
Features to be computed, see table below
+
+
+
[ACTIVITY_CLASSES][STATIONARY]
+
An array of the activity labels to be considered in the STATIONARY category choose any of still, tilting
+
+
+
[ACTIVITY_CLASSES][MOBILE]
+
An array of the activity labels to be considered in the MOBILE category choose any of on_foot, walking, running, on_bicycle
+
+
+
[ACTIVITY_CLASSES][VEHICLE]
+
An array of the activity labels to be considered in the VEHICLE category choose any of in_vehicule
+
+
+
+
Features description for [PHONE_ACTIVITY_RECOGNITION][PROVIDERS][RAPIDS]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
count
+
rows
+
Number of episodes.
+
+
+
mostcommonactivity
+
activity type
+
The most common activity type (e.g. still, on_foot, etc.). If there is a tie, the first one is chosen.
+
+
+
countuniqueactivities
+
activity type
+
Number of unique activities.
+
+
+
durationstationary
+
minutes
+
The total duration of [ACTIVITY_CLASSES][STATIONARY] episodes
+
+
+
durationmobile
+
minutes
+
The total duration of [ACTIVITY_CLASSES][MOBILE] episodes of on foot, running, and on bicycle activities
+
+
+
durationvehicle
+
minutes
+
The total duration of [ACTIVITY_CLASSES][VEHICLE] episodes of on vehicle activity
+
+
+
+
+
Assumptions/Observations
+
+
+
iOS Activity Recognition names and types are unified with Android labels:
+
+
+
+
iOS Activity Name
+
Android Activity Name
+
Android Activity Type
+
+
+
+
+
walking
+
walking
+
7
+
+
+
running
+
running
+
8
+
+
+
cycling
+
on_bicycle
+
1
+
+
+
automotive
+
in_vehicle
+
0
+
+
+
stationary
+
still
+
3
+
+
+
unknown
+
unknown
+
4
+
+
+
+
+
+
In AWARE, Activity Recognition data for Android and iOS are stored in two different database tables, RAPIDS automatically infers what platform each participant belongs to based on their participant file.
Sensor parameters description for [PHONE_APPLICATIONS_CRASHES]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[TABLE]
+
Database table where the applications crashes data is stored
+
+
+
[APPLICATION_CATEGORIES][CATALOGUE_SOURCE]
+
FILE or GOOGLE. If FILE, app categories (genres) are read from [CATALOGUE_FILE]. If [GOOGLE], app categories (genres) are scrapped from the Play Store
+
+
+
[APPLICATION_CATEGORIES][CATALOGUE_FILE]
+
CSV file with a package_name and genre column. By default we provide the catalogue created by Stachl et al in data/external/stachl_application_genre_catalogue.csv
+
+
+
[APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE]
+
if [CATALOGUE_SOURCE] is equal to FILE, this flag signals whether or not to update [CATALOGUE_FILE], if [CATALOGUE_SOURCE] is equal to GOOGLE all scraped genres will be saved to [CATALOGUE_FILE]
This flag signals whether or not to scrape categories (genres) missing from the [CATALOGUE_FILE]. If [CATALOGUE_SOURCE] is equal to GOOGLE, all genres are scraped anyway (this flag is ignored)
+
+
+
+
+
Note
+
No feature providers have been implemented for this sensor yet, however you can use its key (PHONE_APPLICATIONS_CRASHES) to improve PHONE_DATA_YIELD or you can implement your own features.
Sensor parameters description for [PHONE_APPLICATIONS_FOREGROUND] (these parameters are used by the only provider available at the moment, RAPIDS):
+
+
+
+
Key
+
Description
+
+
+
+
+
[TABLE]
+
Database table where the applications foreground data is stored
+
+
+
[APPLICATION_CATEGORIES][CATALOGUE_SOURCE]
+
FILE or GOOGLE. If FILE, app categories (genres) are read from [CATALOGUE_FILE]. If [GOOGLE], app categories (genres) are scrapped from the Play Store
+
+
+
[APPLICATION_CATEGORIES][CATALOGUE_FILE]
+
CSV file with a package_name and genre column. By default we provide the catalogue created by Stachl et al in data/external/stachl_application_genre_catalogue.csv
+
+
+
[APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE]
+
if [CATALOGUE_SOURCE] is equal to FILE, this flag signals whether or not to update [CATALOGUE_FILE], if [CATALOGUE_SOURCE] is equal to GOOGLE all scraped genres will be saved to [CATALOGUE_FILE]
This flag signals whether or not to scrape categories (genres) missing from the [CATALOGUE_FILE]. If [CATALOGUE_SOURCE] is equal to GOOGLE, all genres are scraped anyway (this flag is ignored)
Parameters description for [PHONE_APPLICATIONS_FOREGROUND][PROVIDERS][RAPIDS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract PHONE_APPLICATIONS_FOREGROUND features from the RAPIDS provider
+
+
+
[FEATURES]
+
Features to be computed, see table below
+
+
+
[SINGLE_CATEGORIES]
+
An array of app categories to be included in the feature extraction computation. The special keyword all represents a category with all the apps from each participant. By default we use the category catalogue pointed by [APPLICATION_CATEGORIES][CATALOGUE_FILE] (see the Sensor parameters description table above)
+
+
+
[MULTIPLE_CATEGORIES]
+
An array of collections representing meta-categories (a group of categories). They key of each element is the name of the meta-category and the value is an array of member app categories. By default we use the category catalogue pointed by [APPLICATION_CATEGORIES][CATALOGUE_FILE] (see the Sensor parameters description table above)
+
+
+
[SINGLE_APPS]
+
An array of apps to be included in the feature extraction computation. Use their package name (e.g. com.google.android.youtube) or the reserved keyword top1global (the most used app by a participant over the whole monitoring study)
+
+
+
[EXCLUDED_CATEGORIES]
+
An array of app categories to be excluded from the feature extraction computation. By default we use the category catalogue pointed by [APPLICATION_CATEGORIES][CATALOGUE_FILE] (see the Sensor parameters description table above)
+
+
+
[EXCLUDED_APPS]
+
An array of apps to be excluded from the feature extraction computation. Use their package name, for example: com.google.android.youtube
+
+
+
+
Features description for [PHONE_APPLICATIONS_FOREGROUND][PROVIDERS][RAPIDS]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
count
+
apps
+
Number of times a single app or apps within a category were used (i.e. they were brought to the foreground either by tapping their icon or switching to it from another app)
+
+
+
timeoffirstuse
+
minutes
+
The time in minutes between 12:00am (midnight) and the first use of a single app or apps within a category during a time_segment
+
+
+
timeoflastuse
+
minutes
+
The time in minutes between 12:00am (midnight) and the last use of a single app or apps within a category during a time_segment
+
+
+
frequencyentropy
+
nats
+
The entropy of the used apps within a category during a time_segment (each app is seen as a unique event, the more apps were used, the higher the entropy). This is especially relevant when computed over all apps. Entropy cannot be obtained for a single app
+
+
+
+
+
Assumptions/Observations
+
Features can be computed by app, by apps grouped under a single category (genre) and by multiple categories grouped together (meta-categories). For example, we can get features for Facebook (single app), for Social Network apps (a category including Facebook and other social media apps) or for Social (a meta-category formed by Social Network and Social Media Tools categories).
+
Apps installed by default like YouTube are considered systems apps on some phones. We do an exact match to exclude apps where “genre” == EXCLUDED_CATEGORIES or “package_name” == EXCLUDED_APPS.
+
We provide three ways of classifying and app within a category (genre): a) by automatically scraping its official category from the Google Play Store, b) by using the catalogue created by Stachl et al. which we provide in RAPIDS (data/external/stachl_application_genre_catalogue.csv), or c) by manually creating a personalized catalogue. You can choose a, b or c by modifying [APPLICATION_GENRES] keys and values (see the Sensor parameters description table above).
Sensor parameters description for [PHONE_APPLICATIONS_NOTIFICATIONS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[TABLE]
+
Database table where the applications notifications data is stored
+
+
+
[APPLICATION_CATEGORIES][CATALOGUE_SOURCE]
+
FILE or GOOGLE. If FILE, app categories (genres) are read from [CATALOGUE_FILE]. If [GOOGLE], app categories (genres) are scrapped from the Play Store
+
+
+
[APPLICATION_CATEGORIES][CATALOGUE_FILE]
+
CSV file with a package_name and genre column. By default we provide the catalogue created by Stachl et al in data/external/stachl_application_genre_catalogue.csv
+
+
+
[APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE]
+
if [CATALOGUE_SOURCE] is equal to FILE, this flag signals whether or not to update [CATALOGUE_FILE], if [CATALOGUE_SOURCE] is equal to GOOGLE all scraped genres will be saved to [CATALOGUE_FILE]
This flag signals whether or not to scrape categories (genres) missing from the [CATALOGUE_FILE]. If [CATALOGUE_SOURCE] is equal to GOOGLE, all genres are scraped anyway (this flag is ignored)
+
+
+
+
+
Note
+
No feature providers have been implemented for this sensor yet, however you can use its key (PHONE_APPLICATIONS_NOTIFICATIONS) to improve PHONE_DATA_YIELD or you can implement your own features.
Sensor parameters description for [PHONE_AWARE_LOG]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[TABLE]
+
Database table where the aware data is stored
+
+
+
+
+
Note
+
No feature providers have been implemented for this sensor yet, however you can use its key (PHONE_AWARE_LOG) to improve PHONE_DATA_YIELD or you can implement your own features.
Parameters description for [PHONE_BATTERY][PROVIDERS][RAPIDS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract PHONE_BATTERY features from the RAPIDS provider
+
+
+
[FEATURES]
+
Features to be computed, see table below
+
+
+
+
Features description for [PHONE_BATTERY][PROVIDERS][RAPIDS]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
countdischarge
+
episodes
+
Number of discharging episodes.
+
+
+
sumdurationdischarge
+
minutes
+
The total duration of all discharging episodes.
+
+
+
countcharge
+
episodes
+
Number of battery charging episodes.
+
+
+
sumdurationcharge
+
minutes
+
The total duration of all charging episodes.
+
+
+
avgconsumptionrate
+
episodes/minutes
+
The average of all episodes’ consumption rates. An episode’s consumption rate is defined as the ratio between its battery delta and duration
+
+
+
maxconsumptionrate
+
episodes/minutes
+
The highest of all episodes’ consumption rates. An episode’s consumption rate is defined as the ratio between its battery delta and duration
+
+
+
+
+
Assumptions/Observations
+
+
We convert battery data collected with iOS client v1 (autodetected because battery status 4 do not exist) to match Android battery format: we swap status 3 for 5 and 1 for 3
+
We group battery data into discharge or charge episodes considering any contiguous rows with consecutive reductions or increases of the battery level if they are logged within [EPISODE_THRESHOLD_BETWEEN_ROWS] minutes from each other.
Parameters description for [PHONE_BLUETOOTH][PROVIDERS][RAPIDS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract PHONE_BLUETOOTH features from the RAPIDS provider
+
+
+
[FEATURES]
+
Features to be computed, see table below
+
+
+
+
Features description for [PHONE_BLUETOOTH][PROVIDERS][RAPIDS]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
countscans
+
devices
+
Number of scanned devices during a time segment, a device can be detected multiple times over time and these appearances are counted separately
+
+
+
uniquedevices
+
devices
+
Number of unique devices during a time segment as identified by their hardware (bt_address) address
+
+
+
countscansmostuniquedevice
+
scans
+
Number of scans of the most sensed device within each time segment instance
+
+
+
+
+
Assumptions/Observations
+
+
From v0.2.0countscans, uniquedevices, countscansmostuniquedevice were deprecated because they overlap with the respective features for ALL devices of the PHONE_BLUETOOTHDORYAB provider
Parameters description for [PHONE_BLUETOOTH][PROVIDERS][DORYAB]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract PHONE_BLUETOOTH features from the DORYAB provider
+
+
+
[FEATURES]
+
Features to be computed, see table below. These features are computed for three device categories: all devices, own devices and other devices.
+
+
+
+
Features description for [PHONE_BLUETOOTH][PROVIDERS][DORYAB]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
countscans
+
scans
+
Number of scans (rows) from the devices sensed during a time segment instance. The more scans a bluetooth device has the longer it remained within range of the participant’s phone
+
+
+
uniquedevices
+
devices
+
Number of unique bluetooth devices sensed during a time segment instance as identified by their hardware addresses (bt_address)
+
+
+
meanscans
+
scans
+
Mean of the scans of every sensed device within each time segment instance
+
+
+
stdscans
+
scans
+
Standard deviation of the scans of every sensed device within each time segment instance
+
+
+
countscansmostfrequentdevicewithinsegments
+
scans
+
Number of scans of the most sensed device within each time segment instance
+
+
+
countscansleastfrequentdevicewithinsegments
+
scans
+
Number of scans of the least sensed device within each time segment instance
+
+
+
countscansmostfrequentdeviceacrosssegments
+
scans
+
Number of scans of the most sensed device across time segment instances of the same type
+
+
+
countscansleastfrequentdeviceacrosssegments
+
scans
+
Number of scans of the least sensed device across time segment instances of the same type per device
+
+
+
countscansmostfrequentdeviceacrossdataset
+
scans
+
Number of scans of the most sensed device across the entire dataset of every participant
+
+
+
countscansleastfrequentdeviceacrossdataset
+
scans
+
Number of scans of the least sensed device across the entire dataset of every participant
+
+
+
+
+
Assumptions/Observations
+
+
Devices are classified as belonging to the participant (own) or to other people (others) using k-means based on the number of times and the number of days each device was detected across each participant’s dataset. See Doryab et al for more details.
+
If ownership cannot be computed because all devices were detected on only one day, they are all considered as other. Thus all and other features will be equal. The likelihood of this scenario decreases the more days of data you have.
+
The most and least frequent devices will be the same across time segment instances and across the entire dataset when every time segment instance covers every hour of a dataset. For example, daily segments (00:00 to 23:59) fall in this category but morning segments (06:00am to 11:59am) or periodic 30-minute segments don’t.
+
+ExampleSimplified raw bluetooth data
The following is a simplified example with bluetooth data from three days and two time segments: morning and afternoon. There are two own devices: 5C836F5-487E-405F-8E28-21DBD40FA4FF detected seven times across two days and 499A1EAF-DDF1-4657-986C-EA5032104448 detected eight times on a single day.
+
+
+The most and least frequent OTHER devices (own_device == 0) during morning segments
The most and least frequent ALL|OWN|OTHER devices are computed within each time segment instance, across time segment instances of the same type and across the entire dataset of each person. These are the most and least frequent devices for OTHER devices during morning segments.
+
most frequent device across 2016-11-29 morning: '48872A52-68DE-420D-98DA-73339A1C4685' (this device is the only one in this instance)
+least frequent device across 2016-11-29 morning: '48872A52-68DE-420D-98DA-73339A1C4685' (this device is the only one in this instance)
+most frequent device across 2016-11-30 morning: '5B1E6981-2E50-4D9A-99D8-67AED430C5A8'
+least frequent device across 2016-11-30 morning: '25262DC7-780C-4AD5-AD3A-D9776AEF7FC1' (when tied, the first occurance is chosen)
+most frequent device across 2017-05-07 morning: '25262DC7-780C-4AD5-AD3A-D9776AEF7FC1' (when tied, the first occurance is chosen)
+least frequent device across 2017-05-07 morning: '25262DC7-780C-4AD5-AD3A-D9776AEF7FC1' (when tied, the first occurance is chosen)
+
+most frequent across morning segments: '5B1E6981-2E50-4D9A-99D8-67AED430C5A8'
+least frequent across morning segments: '6C444841-FE64-4375-BC3F-FA410CDC0AC7' (when tied, the first occurance is chosen)
+
+most frequent across dataset: '499A1EAF-DDF1-4657-986C-EA5032104448' (only taking into account "morning" segments)
+least frequent across dataset: '4DC7A22D-9F1F-4DEF-8576-086910AABCB5' (when tied, the first occurance is chosen)
+
+
+Bluetooth features for OTHER devices and morning segments
For brevity we only show the following features for morning segments:
+
Note that countscansmostfrequentdeviceacrossdatasetothers is all 0s because 499A1EAF-DDF1-4657-986C-EA5032104448 is excluded from the count as is labelled as an own device (not other).
+
Parameters description for [PHONE_CALLS][PROVIDERS][RAPIDS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract PHONE_CALLS features from the RAPIDS provider
+
+
+
[CALL_TYPES]
+
The particular call_type that will be analyzed. The options for this parameter are incoming, outgoing or missed.
+
+
+
[FEATURES]
+
Features to be computed for outgoing, incoming, and missed calls. Note that the same features are available for both incoming and outgoing calls, while missed calls has its own set of features. See the tables below.
+
+
+
+
Features description for [PHONE_CALLS][PROVIDERS][RAPIDS] incoming and outgoing calls:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
count
+
calls
+
Number of calls of a particular call_type occurred during a particular time_segment.
+
+
+
distinctcontacts
+
contacts
+
Number of distinct contacts that are associated with a particular call_type for a particular time_segment
+
+
+
meanduration
+
seconds
+
The mean duration of all calls of a particular call_type during a particular time_segment.
+
+
+
sumduration
+
seconds
+
The sum of the duration of all calls of a particular call_type during a particular time_segment.
+
+
+
minduration
+
seconds
+
The duration of the shortest call of a particular call_type during a particular time_segment.
+
+
+
maxduration
+
seconds
+
The duration of the longest call of a particular call_type during a particular time_segment.
+
+
+
stdduration
+
seconds
+
The standard deviation of the duration of all the calls of a particular call_type during a particular time_segment.
+
+
+
modeduration
+
seconds
+
The mode of the duration of all the calls of a particular call_type during a particular time_segment.
+
+
+
entropyduration
+
nats
+
The estimate of the Shannon entropy for the the duration of all the calls of a particular call_type during a particular time_segment.
+
+
+
timefirstcall
+
minutes
+
The time in minutes between 12:00am (midnight) and the first call of call_type.
+
+
+
timelastcall
+
minutes
+
The time in minutes between 12:00am (midnight) and the last call of call_type.
+
+
+
countmostfrequentcontact
+
calls
+
The number of calls of a particular call_type during a particular time_segment of the most frequent contact throughout the monitored period.
+
+
+
+
Features description for [PHONE_CALLS][PROVIDERS][RAPIDS] missed calls:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
count
+
calls
+
Number of missed calls that occurred during a particular time_segment.
+
+
+
distinctcontacts
+
contacts
+
Number of distinct contacts that are associated with missed calls for a particular time_segment
+
+
+
timefirstcall
+
minutes
+
The time in hours from 12:00am (Midnight) that the first missed call occurred.
+
+
+
timelastcall
+
minutes
+
The time in hours from 12:00am (Midnight) that the last missed call occurred.
+
+
+
countmostfrequentcontact
+
calls
+
The number of missed calls during a particular time_segment of the most frequent contact throughout the monitored period.
+
+
+
+
+
Assumptions/Observations
+
+
Traces for iOS calls are unique even for the same contact calling a participant more than once which renders countmostfrequentcontact meaningless and distinctcontacts equal to the total number of traces.
+
[CALL_TYPES] and [FEATURES] keys in config.yaml need to match. For example, [CALL_TYPES]outgoing matches the [FEATURES] key outgoing
+
iOS calls data is transformed to match Android calls data format. See our algorithm
Parameters description for [PHONE_CONVERSATION][PROVIDERS][RAPIDS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract PHONE_CONVERSATION features from the RAPIDS provider
+
+
+
[FEATURES]
+
Features to be computed, see table below
+
+
+
[RECORDING_MINUTES]
+
Minutes the plugin was recording audio (default 1 min)
+
+
+
[PAUSED_MINUTES]
+
Minutes the plugin was NOT recording audio (default 3 min)
+
+
+
+
Features description for [PHONE_CONVERSATION][PROVIDERS][RAPIDS]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
minutessilence
+
minutes
+
Minutes labeled as silence
+
+
+
minutesnoise
+
minutes
+
Minutes labeled as noise
+
+
+
minutesvoice
+
minutes
+
Minutes labeled as voice
+
+
+
minutesunknown
+
minutes
+
Minutes labeled as unknown
+
+
+
sumconversationduration
+
minutes
+
Total duration of all conversations
+
+
+
maxconversationduration
+
minutes
+
Longest duration of all conversations
+
+
+
minconversationduration
+
minutes
+
Shortest duration of all conversations
+
+
+
avgconversationduration
+
minutes
+
Average duration of all conversations
+
+
+
sdconversationduration
+
minutes
+
Standard Deviation of the duration of all conversations
+
+
+
timefirstconversation
+
minutes
+
Minutes since midnight when the first conversation for a time segment was detected
+
+
+
timelastconversation
+
minutes
+
Minutes since midnight when the last conversation for a time segment was detected
+
+
+
noisesumenergy
+
L2-norm
+
Sum of all energy values when inference is noise
+
+
+
noiseavgenergy
+
L2-norm
+
Average of all energy values when inference is noise
+
+
+
noisesdenergy
+
L2-norm
+
Standard Deviation of all energy values when inference is noise
+
+
+
noiseminenergy
+
L2-norm
+
Minimum of all energy values when inference is noise
+
+
+
noisemaxenergy
+
L2-norm
+
Maximum of all energy values when inference is noise
+
+
+
voicesumenergy
+
L2-norm
+
Sum of all energy values when inference is voice
+
+
+
voiceavgenergy
+
L2-norm
+
Average of all energy values when inference is voice
+
+
+
voicesdenergy
+
L2-norm
+
Standard Deviation of all energy values when inference is voice
+
+
+
voiceminenergy
+
L2-norm
+
Minimum of all energy values when inference is voice
+
+
+
voicemaxenergy
+
L2-norm
+
Maximum of all energy values when inference is voice
+
+
+
silencesensedfraction
+
-
+
Ratio between minutessilence and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown)
+
+
+
noisesensedfraction
+
-
+
Ratio between minutesnoise and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown)
+
+
+
voicesensedfraction
+
-
+
Ratio between minutesvoice and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown)
+
+
+
unknownsensedfraction
+
-
+
Ratio between minutesunknown and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown)
+
+
+
silenceexpectedfraction
+
-
+
Ration between minutessilence and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes)
+
+
+
noiseexpectedfraction
+
-
+
Ration between minutesnoise and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes)
+
+
+
voiceexpectedfraction
+
-
+
Ration between minutesvoice and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes)
+
+
+
unknownexpectedfraction
+
-
+
Ration between minutesunknown and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes)
+
+
+
+
+
Assumptions/Observations
+
+
The timestamp of conversation rows in iOS is in seconds so we convert it to milliseconds to match Android’s format
This is a combinatorial sensor which means that we use the data from multiple sensors to extract data yield features. Data yield features can be used to remove rows (time segments) that do not contain enough data. You should decide what is your “enough” threshold depending on the type of sensors you collected (frequency vs event based, e.g. acceleroemter vs calls), the length of your study, and the rates of missing data that your analysis could handle.
+
+
Why is data yield important?
+
Imagine that you want to extract PHONE_CALL features on daily segments (00:00 to 23:59). Let’s say that on day 1 the phone logged 10 calls and 23 hours of data from other sensors and on day 2 the phone logged 10 calls and only 2 hours of data from other sensors. It’s more likely that other calls were placed on the 22 hours of data that you didn’t log on day 2 than on the 1 hour of data you didn’t log on day 1, and so including day 2 in your analysis could bias your results.
+
+
Sensor parameters description for [PHONE_DATA_YIELD]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[SENSORS]
+
One or more phone sensor config keys (e.g. PHONE_MESSAGE). The more keys you include the more accurately RAPIDS can approximate the time an smartphone was sensing data. The supported phone sensors you can include in this list are outlined below (do NOT include Fitbit sensors, ONLY include phone sensors).
+
+
+
+
+
Supported phone sensors for [PHONE_DATA_YIELD][SENSORS]
Before explaining the data yield features, let’s define the following relevant concepts:
+
+
A valid minute is any 60 second window when any phone sensor logged at least 1 row of data
+
A valid hour is any 60 minute window with at least X valid minutes. The X or threshold is given by [MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]
+
+
The timestamps of all sensors are concatenated and then grouped per time segment. Minute and hour windows are created from the beginning of each time segment instance and these windows are marked as valid based on the definitions above. The duration of each time segment is taken into account to compute the features described below.
+
+
Available time segments and platforms
+
+
Available for all time segments
+
Available for Android and iOS
+
+
+
+
File Sequence
+
- data/raw/{pid}/{sensor}_raw.csv # one for every [PHONE_DATA_YIELD][SENSORS]
+- data/interim/{pid}/phone_yielded_timestamps.csv
+- data/interim/{pid}/phone_yielded_timestamps_with_datetime.csv
+- data/interim/{pid}/phone_data_yield_features/phone_data_yield_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_data_yield.csv
+
+
+
Parameters description for [PHONE_DATA_YIELD][PROVIDERS][RAPIDS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract PHONE_DATA_YIELD features from the RAPIDS provider
+
+
+
[FEATURES]
+
Features to be computed, see table below
+
+
+
[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]
+
The proportion [0.0 ,1.0] of valid minutes in a 60-minute window necessary to flag that window as valid.
+
+
+
+
Features description for [PHONE_DATA_YIELD][PROVIDERS][RAPIDS]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
ratiovalidyieldedminutes
+
rows
+
The ratio between the number of valid minutes and the duration in minutes of a time segment.
+
+
+
ratiovalidyieldedhours
+
lux
+
The ratio between the number of valid hours and the duration in hours of a time segment. If the time segment is shorter than 1 hour this feature will always be 1.
+
+
+
+
+
Assumptions/Observations
+
+
+
We recommend using ratiovalidyieldedminutes on time segments that are shorter than two or three hours and ratiovalidyieldedhours for longer segments. This is because relying on yielded minutes only can be misleading when a big chunk of those missing minutes are clustered together.
+
For example, let’s assume we are working with a 24-hour time segment that is missing 12 hours of data. Two extreme cases can occur:
+
+
the 12 missing hours are from the beginning of the segment or
+
30 minutes could be missing from every hour (24 * 30 minutes = 12 hours).
+
+
ratiovalidyieldedminutes would be 0.5 for both a and b (hinting the missing circumstances are similar). However, ratiovalidyieldedhours would be 0.5 for a and 1.0 for b if [MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS] is between [0.0 and 0.49] (hinting that the missing circumstances might be more favorable for b. In other words, sensed data for b is more evenly spread compared to a.
Sensor parameters description for [PHONE_KEYBOARD]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[TABLE]
+
Database table where the keyboard data is stored
+
+
+
+
+
Note
+
No feature providers have been implemented for this sensor yet, however you can use its key (PHONE_KEYBOARD) to improve PHONE_DATA_YIELD or you can implement your own features.
Sensor parameters description for [PHONE_LOCATIONS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[TABLE]
+
Database table where the location data is stored
+
+
+
[LOCATIONS_TO_USE]
+
Type of location data to use, one of ALL, GPS, ALL_RESAMPLED or FUSED_RESAMPLED. This filter is based on the provider column of the AWARE locations table, ALL includes every row, GPS only includes rows where provider is gps, ALL_RESAMPLED includes all rows after being resampled, and FUSED_RESAMPLED only includes rows where provider is fused after being resampled.
+
+
+
[FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD]
+
if ALL_RESAMPLED or FUSED_RESAMPLED is used, the original fused data has to be resampled, a location row will be resampled to the next valid timestamp (see the Assumptions/Observations below) only if the time difference between them is less or equal than this threshold (in minutes).
+
+
+
[FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION]
+
if ALL_RESAMPLED or FUSED_RESAMPLED is used, the original fused data has to be resampled, a location row will be resampled at most for this long (in minutes)
+
+
+
+
+
Assumptions/Observations
+
Types of location data to use
+AWARE Android and iOS clients can collect location coordinates through the phone's GPS, the network cellular towers around the phone, or Google's fused location API. If you want to use only the GPS provider set [LOCATIONS_TO_USE] to GPS, if you want to use all providers set [LOCATIONS_TO_USE] to ALL, if you collected location data from different providers including the fused API use ALL_RESAMPLED, if your AWARE client was configured to use fused location only or want to focus only on this provider, set [LOCATIONS_TO_USE] to RESAMPLE_FUSED. ALL_RESAMPLED and RESAMPLE_FUSED take the original location coordinates and replicate each pair forward in time as long as the phone was sensing data as indicated by the joined timestamps of [PHONE_DATA_YIELD][SENSORS], this is done because Google's API only logs a new location coordinate pair when it is sufficiently different in time or space from the previous one and because GPS and network providers can log data at variable rates.
+
There are two parameters associated with resampling fused location. FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD (in minutes, default 30) controls the maximum gap between any two coordinate pairs to replicate the last known pair (for example, participant A's phone did not collect data between 10.30am and 10:50am and between 11:05am and 11:40am, the last known coordinate pair will be replicated during the first period but not the second, in other words, we assume that we cannot longer guarantee the participant stayed at the last known location if the phone did not sense data for more than 30 minutes). FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION (in minutes, default 720 or 12 hours) stops the last known fused location from being replicated longer that this threshold even if the phone was sensing data continuously (for example, participant A went home at 9pm and their phone was sensing data without gaps until 11am the next morning, the last known location will only be replicated until 9am). If you have suggestions to modify or improve this resampling, let us know.
Parameters description for [PHONE_LOCATIONS][PROVIDERS][BARNETT]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract PHONE_LOCATIONS features from the BARNETT provider
+
+
+
[FEATURES]
+
Features to be computed, see table below
+
+
+
[ACCURACY_LIMIT]
+
An integer in meters, any location rows with an accuracy higher than this will be dropped. This number means there’s a 68% probability the true location is within this radius
+
+
+
[TIMEZONE]
+
Timezone where the location data was collected. By default points to the one defined in the Configuration
+
+
+
[MINUTES_DATA_USED]
+
Set to True to include an extra column in the final location feature file containing the number of minutes used to compute the features on each time segment. Use this for quality control purposes, the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough.
+
+
+
+
Features description for [PHONE_LOCATIONS][PROVIDERS][BARNETT] adapted from Beiwe Summary Statistics:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
hometime
+
minutes
+
Time at home. Time spent at home in minutes. Home is the most visited significant location between 8 pm and 8 am including any pauses within a 200-meter radius.
+
+
+
disttravelled
+
meters
+
Total distance travelled over a day (flights).
+
+
+
rog
+
meters
+
The Radius of Gyration (rog) is a measure in meters of the area covered by a person over a day. A centroid is calculated for all the places (pauses) visited during a day and a weighted distance between all the places and that centroid is computed. The weights are proportional to the time spent in each place.
+
+
+
maxdiam
+
meters
+
The maximum diameter is the largest distance between any two pauses.
+
+
+
maxhomedist
+
meters
+
The maximum distance from home in meters.
+
+
+
siglocsvisited
+
locations
+
The number of significant locations visited during the day. Significant locations are computed using k-means clustering over pauses found in the whole monitoring period. The number of clusters is found iterating k from 1 to 200 stopping until the centroids of two significant locations are within 400 meters of one another.
+
+
+
avgflightlen
+
meters
+
Mean length of all flights.
+
+
+
stdflightlen
+
meters
+
Standard deviation of the length of all flights.
+
+
+
avgflightdur
+
seconds
+
Mean duration of all flights.
+
+
+
stdflightdur
+
seconds
+
The standard deviation of the duration of all flights.
+
+
+
probpause
+
-
+
The fraction of a day spent in a pause (as opposed to a flight)
+
+
+
siglocentropy
+
nats
+
Shannon’s entropy measurement based on the proportion of time spent at each significant location visited during a day.
+
+
+
circdnrtn
+
-
+
A continuous metric quantifying a person’s circadian routine that can take any value between 0 and 1, where 0 represents a daily routine completely different from any other sensed days and 1 a routine the same as every other sensed day.
+
+
+
wkenddayrtn
+
-
+
Same as circdnrtn but computed separately for weekends and weekdays.
+
+
+
+
+
Assumptions/Observations
+
Barnett's et al features
+These features are based on a Pause-Flight model. A pause is defined as a mobiity trace (location pings) within a certain duration and distance (by default 300 seconds and 60 meters). A flight is any mobility trace between two pauses. Data is resampled and imputed before the features are computed. See Barnett et al for more information. In RAPIDS we only expose two parameters for these features (timezone and accuracy limit). You can change other parameters in src/features/phone_locations/barnett/library/MobilityFeatures.R.
+
Significant Locations
+Significant locations are determined using K-means clustering on pauses longer than 10 minutes. The number of clusters (K) is increased until no two clusters are within 400 meters from each other. After this, pauses within a certain range of a cluster (200 meters by default) will count as a visit to that significant location. This description was adapted from the Supplementary Materials of Barnett et al.
+
The Circadian Calculation
+For a detailed description of how this is calculated, see Canzian et al.
Parameters description for [PHONE_LOCATIONS][PROVIDERS][DORYAB]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract PHONE_LOCATIONS features from the BARNETT provider
+
+
+
[FEATURES]
+
Features to be computed, see table below
+
+
+
[ACCURACY_LIMIT]
+
An integer in meters, any location rows with an accuracy higher than this will be dropped. This number means there’s a 68% probability the true location is within this radius
+
+
+
[DBSCAN_EPS]
+
The maximum distance in meters between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function.
+
+
+
[DBSCAN_MINSAMPLES]
+
The number of samples (or total weight) in a neighborhood for a point to be considered as a core point of a cluster. This includes the point itself.
+
+
+
[THRESHOLD_STATIC]
+
It is the threshold value in km/hr which labels a row as Static or Moving.
+
+
+
[MAXIMUM_GAP_ALLOWED]
+
The maximum gap (in seconds) allowed between any two consecutive rows for them to be considered part of the same displacement. If this threshold is too high, it can throw speed and distance calculations off for periods when the the phone was not sensing.
+
+
+
[MINUTES_DATA_USED]
+
Set to True to include an extra column in the final location feature file containing the number of minutes used to compute the features on each time segment. Use this for quality control purposes, the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough.
+
+
+
[SAMPLING_FREQUENCY]
+
Expected time difference between any two location rows in minutes. If set to 0, the sampling frequency will be inferred automatically as the median of all the differences between any two consecutive row timestamps (recommended if you are using FUSED_RESAMPLED data). This parameter impacts all the time calculations.
+
+
+
[CLUSTER_ON]
+
Set this flag to PARTICIPANT_DATASET to create clusters based on the entire participant’s dataset or to TIME_SEGMENT to create clusters based on all the instances of the corresponding time segment (e.g. all mornings).
+
+
+
[CLUSTERING_ALGORITHM]
+
The original Doryab et al implementation uses DBSCAN, OPTICS is also available with similar (but not identical) clustering results and lower memory consumption.
+
+
+
+
Features description for [PHONE_LOCATIONS][PROVIDERS][DORYAB]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
locationvariance
+
\(meters^2\)
+
The sum of the variances of the latitude and longitude columns.
+
+
+
loglocationvariance
+
-
+
Log of the sum of the variances of the latitude and longitude columns.
+
+
+
totaldistance
+
meters
+
Total distance travelled in a time segment using the haversine formula.
+
+
+
averagespeed
+
km/hr
+
Average speed in a time segment considering only the instances labeled as Moving.
+
+
+
varspeed
+
km/hr
+
Speed variance in a time segment considering only the instances labeled as Moving.
+
+
+
circadianmovement
+
-
+
"It encodes the extent to which a person’s location patterns follow a 24-hour circadian cycle." Doryab et al..
+
+
+
numberofsignificantplaces
+
places
+
Number of significant locations visited. It is calculated using the DBSCAN clustering algorithm which takes in EPS and MIN_SAMPLES as parameters to identify clusters. Each cluster is a significant place.
+
+
+
numberlocationtransitions
+
transitions
+
Number of movements between any two clusters in a time segment.
+
+
+
radiusgyration
+
meters
+
Quantifies the area covered by a participant
+
+
+
timeattop1location
+
minutes
+
Time spent at the most significant location.
+
+
+
timeattop2location
+
minutes
+
Time spent at the 2nd most significant location.
+
+
+
timeattop3location
+
minutes
+
Time spent at the 3rd most significant location.
+
+
+
movingtostaticratio
+
-
+
Ratio between stationary time and total location sensed time. A lat/long coordinate pair is labelled as stationary if it’s speed (distance/time) to the next coordinate pair is less than 1km/hr. A higher value represents a more stationary routine. These times are computed by multiplying the number of rows by [SAMPLING_FREQUENCY]
+
+
+
outlierstimepercent
+
-
+
Ratio between the time spent in non-significant clusters divided by the time spent in all clusters (total location sensed time). A higher value represents more time spent in non-significant clusters. These times are computed by multiplying the number of rows by [SAMPLING_FREQUENCY]
+
+
+
maxlengthstayatclusters
+
minutes
+
Maximum time spent in a cluster (significant location).
+
+
+
minlengthstayatclusters
+
minutes
+
Minimum time spent in a cluster (significant location).
+
+
+
meanlengthstayatclusters
+
minutes
+
Average time spent in a cluster (significant location).
+
+
+
stdlengthstayatclusters
+
minutes
+
Standard deviation of time spent in a cluster (significant location).
+
+
+
locationentropy
+
nats
+
Shannon Entropy computed over the row count of each cluster (significant location), it will be higher the more rows belong to a cluster (i.e. the more time a participant spent at a significant location).
+
+
+
normalizedlocationentropy
+
nats
+
Shannon Entropy computed over the row count of each cluster (significant location) divided by the number of clusters, it will be higher the more rows belong to a cluster (i.e. the more time a participant spent at a significant location).
+
+
+
+
+
Assumptions/Observations
+
Significant Locations Identified
+Significant locations are determined using DBSCAN clustering on locations that a patient visit over the course of the period of data collection.
+
The Circadian Calculation
+For a detailed description of how this is calculated, see Canzian et al.
+
Fine Tuning Clustering Parameters
+Based on an experiment where we collected fused location data for 7 days with a mean accuracy of 86 & SD of 350.874635, we determined that EPS/MAX_EPS=100 produced closer clustering results to reality. Higher values (>100) missed out some significant places like a short grocery visit while lower values (<100) picked up traffic lights and stop signs while driving as significant locations. We recommend you set EPS based on the accuracy of your location data (the more accurate your data is, the lower you should be able to set EPS).
Parameters description for [PHONE_SCREEN][PROVIDERS][RAPIDS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract PHONE_SCREEN features from the RAPIDS provider
+
+
+
[FEATURES]
+
Features to be computed, see table below
+
+
+
[REFERENCE_HOUR_FIRST_USE]
+
The reference point from which firstuseafter is to be computed, default is midnight
+
+
+
[IGNORE_EPISODES_SHORTER_THAN]
+
Ignore episodes that are shorter than this threshold (minutes). Set to 0 to disable this filter.
+
+
+
[IGNORE_EPISODES_LONGER_THAN]
+
Ignore episodes that are longer than this threshold (minutes). Set to 0 to disable this filter.
+
+
+
[EPISODE_TYPES]
+
Currently we only support unlock episodes (from when the phone is unlocked until the screen is off)
+
+
+
+
Features description for [PHONE_SCREEN][PROVIDERS][RAPIDS]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
sumduration
+
minutes
+
Total duration of all unlock episodes.
+
+
+
maxduration
+
minutes
+
Longest duration of any unlock episode.
+
+
+
minduration
+
minutes
+
Shortest duration of any unlock episode.
+
+
+
avgduration
+
minutes
+
Average duration of all unlock episodes.
+
+
+
stdduration
+
minutes
+
Standard deviation duration of all unlock episodes.
+
+
+
countepisode
+
episodes
+
Number of all unlock episodes
+
+
+
firstuseafter
+
minutes
+
Minutes until the first unlock episode.
+
+
+
+
+
+
+
Assumptions/Observations
+
+
+
In Android, lock events can happen right after an off event, after a few seconds of an off event, or never happen depending on the phone's settings, therefore, an unlock episode is defined as the time between an unlock and a off event. In iOS, on and off events do not exist, so an unlock episode is defined as the time between an unlock and a lock event.
+
+
+
Events in iOS are recorded reliably albeit some duplicated lock events within milliseconds from each other, so we only keep consecutive unlock/lock pairs. In Android you cand find multiple consecutive unlock or lock events, so we only keep consecutive unlock/off pairs. In our experiments these cases are less than 10% of the screen events collected and this happens because ACTION_SCREEN_OFF and ACTION_SCREEN_ON are sent when the device becomes non-interactive which may have nothing to do with the screen turning off. In addition to unlock/off episodes, in Android it is possible to measure the time spent on the lock screen before an unlock event as well as the total screen time (i.e. ON to OFF) but these are not implemented at the moment.
+
+
+
We transform iOS screen events to match Android’s format, we replace lock episodes with off episodes (2 with 0) in iOS. However, as mentioned above this is still computing unlock to lock episodes.
Parameters description for [PHONE_WIFI_CONNECTED][PROVIDERS][RAPIDS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract PHONE_WIFI_CONNECTED features from the RAPIDS provider
+
+
+
[FEATURES]
+
Features to be computed, see table below
+
+
+
+
Features description for [PHONE_WIFI_CONNECTED][PROVIDERS][RAPIDS]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
countscans
+
devices
+
Number of scanned WiFi access points connected during a time_segment, an access point can be detected multiple times over time and these appearances are counted separately
+
+
+
uniquedevices
+
devices
+
Number of unique access point during a time_segment as identified by their hardware address
+
+
+
countscansmostuniquedevice
+
scans
+
Number of scans of the most scanned access point during a time_segment across the whole monitoring period
+
+
+
+
+
Assumptions/Observations
+
+
A connected WiFI access point is one that a phone was connected to.
+
By default AWARE stores this data in the sensor_wifi table.
Parameters description for [PHONE_WIFI_VISIBLE][PROVIDERS][RAPIDS]:
+
+
+
+
Key
+
Description
+
+
+
+
+
[COMPUTE]
+
Set to True to extract PHONE_WIFI_VISIBLE features from the RAPIDS provider
+
+
+
[FEATURES]
+
Features to be computed, see table below
+
+
+
+
Features description for [PHONE_WIFI_VISIBLE][PROVIDERS][RAPIDS]:
+
+
+
+
Feature
+
Units
+
Description
+
+
+
+
+
countscans
+
devices
+
Number of scanned WiFi access points visible during a time_segment, an access point can be detected multiple times over time and these appearances are counted separately
+
+
+
uniquedevices
+
devices
+
Number of unique access point during a time_segment as identified by their hardware address
+
+
+
countscansmostuniquedevice
+
scans
+
Number of scans of the most scanned access point during a time_segment across the whole monitoring period
+
+
+
+
+
Assumptions/Observations
+
+
A visible WiFI access point is one that a phone sensed around itself but that it was not connected to. Due to API restrictions, this sensor is not available on iOS.
+
By default AWARE stores this data in the wifi table.
Read this page if you want to learn more about how RAPIDS is structured. If you want to start using it go to Installation, then to Configuration, and then to Execution
+
All paths mentioned in this page are relative to RAPIDS’ root folder.
+
+
+
If you want to extract the behavioral features that RAPIDS offers, you will only have to create or modify the .env file, participants files, time segment files, and the config.yaml file as instructed in the Configuration page. The config.yaml file is the heart of RAPIDS and includes parameters to manage participants, data sources, sensor data, visualizations and more.
+
All data is saved in data/. The data/external/ folder stores any data imported or created by the user, data/raw/ stores sensor data as imported from your database, data/interim/ has intermediate files necessary to compute behavioral features from raw data, and data/processed/ has all the final files with the behavioral features in folders per participant and sensor.
+
RAPIDS source code is saved in src/. The src/data/ folder stores scripts to download, clean and pre-process sensor data, src/features has scripts to extract behavioral features organized in their respective sensor subfolders , src/models/ can host any script to create models or statistical analyses with the behavioral features you extract, and src/visualization/ has scripts to create plots of the raw and processed data. There are other files and folders but only relevant if you are interested in extending RAPIDS (e.g. virtual env files, docs, tests, Dockerfile, the Snakefile, etc.).
+
In the figure below, we represent the interactions between users and files. After a user modifies the configuration files mentioned above, the Snakefile file will search for and execute the Snakemake rules that contain the Python or R scripts necessary to generate or update the required output files (behavioral features, plots, etc.).
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/0.4/img/analysis_workflow.png b/0.4/img/analysis_workflow.png
new file mode 100644
index 00000000..89aab053
Binary files /dev/null and b/0.4/img/analysis_workflow.png differ
diff --git a/0.4/img/calls.csv b/0.4/img/calls.csv
new file mode 100644
index 00000000..530f275a
--- /dev/null
+++ b/0.4/img/calls.csv
@@ -0,0 +1,9 @@
+"_id","timestamp","device_id","call_type","call_duration","trace"
+1,1587663260695,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",2,14,"d5e84f8af01b2728021d4f43f53a163c0c90000c"
+2,1587739118007,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",3,0,"47c125dc7bd163b8612cdea13724a814917b6e93"
+5,1587746544891,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",2,95,"9cc793ffd6e88b1d850ce540b5d7e000ef5650d4"
+6,1587911379859,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",2,63,"51fb9344e988049a3fec774c7ca622358bf80264"
+7,1587992647361,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",3,0,"2a862a7730cfdfaf103a9487afe3e02935fd6e02"
+8,1588020039448,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",1,11,"a2c53f6a086d98622c06107780980cf1bb4e37bd"
+11,1588176189024,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",2,65,"56589df8c830c70e330b644921ed38e08d8fd1f3"
+12,1588197745079,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",3,0,"cab458018a8ed3b626515e794c70b6f415318adc"
diff --git a/0.4/img/files.png b/0.4/img/files.png
new file mode 100644
index 00000000..1fef7eeb
Binary files /dev/null and b/0.4/img/files.png differ
diff --git a/0.4/img/h-data-yield.html b/0.4/img/h-data-yield.html
new file mode 100644
index 00000000..082b5513
--- /dev/null
+++ b/0.4/img/h-data-yield.html
@@ -0,0 +1,39 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/0.4/img/h-data-yield.png b/0.4/img/h-data-yield.png
new file mode 100644
index 00000000..e2fcced0
Binary files /dev/null and b/0.4/img/h-data-yield.png differ
diff --git a/0.4/img/hm-data-yield-participants.html b/0.4/img/hm-data-yield-participants.html
new file mode 100644
index 00000000..bbb00c75
--- /dev/null
+++ b/0.4/img/hm-data-yield-participants.html
@@ -0,0 +1,191 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/0.4/img/hm-data-yield-participants.png b/0.4/img/hm-data-yield-participants.png
new file mode 100644
index 00000000..6778c870
Binary files /dev/null and b/0.4/img/hm-data-yield-participants.png differ
diff --git a/0.4/img/hm-feature-correlations.html b/0.4/img/hm-feature-correlations.html
new file mode 100644
index 00000000..99e596fb
--- /dev/null
+++ b/0.4/img/hm-feature-correlations.html
@@ -0,0 +1,96 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/0.4/img/hm-feature-correlations.png b/0.4/img/hm-feature-correlations.png
new file mode 100644
index 00000000..f6e60e46
Binary files /dev/null and b/0.4/img/hm-feature-correlations.png differ
diff --git a/0.4/img/hm-phone-sensors.html b/0.4/img/hm-phone-sensors.html
new file mode 100644
index 00000000..b8b3c2e0
--- /dev/null
+++ b/0.4/img/hm-phone-sensors.html
@@ -0,0 +1,627 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Sensors per Minute per Time Segment for All Participants
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Sensors per Minute per Time Segment for All Participants
Reproducible Analysis Pipeline for Data Streams (RAPIDS) allows you to process smartphone and wearable data to extract and createbehavioral features (a.k.a. digital biomarkers), visualize mobile sensor data and structure your analysis into reproducible workflows.
+
RAPIDS is open source, documented, modular, tested, and reproducible. At the moment we support smartphone data collected with AWARE and wearable data from Fitbit devices.
+
+
Tip
+
Questions or feedback can be posted on the #rapids channel in AWARE Framework's slack.
+
Bugs and feature requests should be posted on Github.
+
Join our discussions on our algorithms and assumptions for feature processing.
RAPIDS is formed by R and Python scripts orchestrated by Snakemake. We suggest you read Snakemake’s docs but in short: every link in the analysis chain is atomic and has files as input and output. Behavioral features are processed per sensor and per participant.
Consistent analysis. Every participant sensor dataset is analyzed in the exact same way and isolated from each other.
+
Efficient analysis. Every analysis step is executed only once. Whenever your data or configuration changes only the affected files are updated.
+
Parallel execution. Thanks to Snakemake, your analysis can be executed over multiple cores without changing your code.
+
Code-free features. Extract any of the behavioral features offered by RAPIDS without writing any code.
+
Extensible code. You can easily add your own behavioral features in R or Python, share them with the community, and keep authorship and citations.
+
Timezone aware. Your data is adjusted to the specified timezone (multiple timezones suport coming soon).
+
Flexible time segments. You can extract behavioral features on time windows of any length (e.g. 5 minutes, 3 hours, 2 days), on every day or particular days (e.g. weekends, Mondays, the 1st of each month, etc.) or around events of interest (e.g. surveys or clinical relapses).
+
Tested code. We are constantly adding tests to make sure our behavioral features are correct.
+
Reproducible code. If you structure your analysis within RAPIDS, you can be sure your code will run in other computers as intended thanks to R and Python virtual environments. You can share your analysis code along your publications without any overhead.
In broad terms the config.yaml, .env file, participants files, and time segment files are the only ones that you will have to modify. All data is stored in data/ and all scripts are stored in src/. For more information see RAPIDS’ File Structure.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/0.4/javascripts/extra.js b/0.4/javascripts/extra.js
new file mode 100644
index 00000000..b6d5686d
--- /dev/null
+++ b/0.4/javascripts/extra.js
@@ -0,0 +1,14 @@
+window.addEventListener("DOMContentLoaded", function() {
+ var xhr = new XMLHttpRequest();
+ xhr.open("GET", window.location.origin + "/versions.json");
+ xhr.onload = function() {
+ var versions = JSON.parse(this.responseText);
+ latest_version = ""
+ for(id in versions)
+ if(versions[id]["aliases"].length > 0 && versions[id]["aliases"].includes("latest"))
+ latest_version = "/" + versions[id].version + "/"
+ if(!window.location.pathname.includes("/latest/") && (latest_version.length > 0 && !window.location.pathname.includes(latest_version)))
+ document.querySelector("div[data-md-component=announce]").innerHTML = "
If you were relying on the old docs and the most recent version of RAPIDS you are working with is from or before Oct 13, 2020 you are using the beta version of RAPIDS.
+
You can start using the new RAPIDS (we are starting with v0.1.0) right away, just take into account the following:
+
+
Install a new copy of RAPIDS (the R and Python virtual environments didn’t change so the cached versions will be reused)
+
Make sure you don’t skip a new Installation step to give execution permissions to the RAPIDS script: chmod +x rapids
You can proceed to reconfigure your config.yaml, its structure is more consistent and should be familiar to you.
+
+
+
Info
+
If you have any questions reach out to us on Slack.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/0.4/search/search_index.json b/0.4/search/search_index.json
new file mode 100644
index 00000000..00840fe1
--- /dev/null
+++ b/0.4/search/search_index.json
@@ -0,0 +1 @@
+{"config":{"lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"Welcome to RAPIDS documentation \u00b6 Reproducible Analysis Pipeline for Data Streams (RAPIDS) allows you to process smartphone and wearable data to extract and create behavioral features (a.k.a. digital biomarkers), visualize mobile sensor data and structure your analysis into reproducible workflows. RAPIDS is open source, documented, modular, tested, and reproducible. At the moment we support smartphone data collected with AWARE and wearable data from Fitbit devices. Tip Questions or feedback can be posted on the #rapids channel in AWARE Framework's slack . Bugs and feature requests should be posted on Github . Join our discussions on our algorithms and assumptions for feature processing . Ready to start? Go to Installation , then to Configuration , and then to Execution Are you upgrading from RAPIDS beta ? Follow this guide How does it work? \u00b6 RAPIDS is formed by R and Python scripts orchestrated by Snakemake . We suggest you read Snakemake\u2019s docs but in short: every link in the analysis chain is atomic and has files as input and output. Behavioral features are processed per sensor and per participant. What are the benefits of using RAPIDS? \u00b6 Consistent analysis . Every participant sensor dataset is analyzed in the exact same way and isolated from each other. Efficient analysis . Every analysis step is executed only once. Whenever your data or configuration changes only the affected files are updated. Parallel execution . Thanks to Snakemake, your analysis can be executed over multiple cores without changing your code. Code-free features . Extract any of the behavioral features offered by RAPIDS without writing any code. Extensible code . You can easily add your own behavioral features in R or Python, share them with the community, and keep authorship and citations. Timezone aware . Your data is adjusted to the specified timezone (multiple timezones suport coming soon ). Flexible time segments . You can extract behavioral features on time windows of any length (e.g. 5 minutes, 3 hours, 2 days), on every day or particular days (e.g. weekends, Mondays, the 1 st of each month, etc.) or around events of interest (e.g. surveys or clinical relapses). Tested code . We are constantly adding tests to make sure our behavioral features are correct. Reproducible code . If you structure your analysis within RAPIDS, you can be sure your code will run in other computers as intended thanks to R and Python virtual environments. You can share your analysis code along your publications without any overhead. Private . All your data is processed locally. How is it organized? \u00b6 In broad terms the config.yaml , .env file , participants files , and time segment files are the only ones that you will have to modify. All data is stored in data/ and all scripts are stored in src/ . For more information see RAPIDS\u2019 File Structure .","title":"Home"},{"location":"#welcome-to-rapids-documentation","text":"Reproducible Analysis Pipeline for Data Streams (RAPIDS) allows you to process smartphone and wearable data to extract and create behavioral features (a.k.a. digital biomarkers), visualize mobile sensor data and structure your analysis into reproducible workflows. RAPIDS is open source, documented, modular, tested, and reproducible. At the moment we support smartphone data collected with AWARE and wearable data from Fitbit devices. Tip Questions or feedback can be posted on the #rapids channel in AWARE Framework's slack . Bugs and feature requests should be posted on Github . Join our discussions on our algorithms and assumptions for feature processing . Ready to start? Go to Installation , then to Configuration , and then to Execution Are you upgrading from RAPIDS beta ? Follow this guide","title":"Welcome to RAPIDS documentation"},{"location":"#how-does-it-work","text":"RAPIDS is formed by R and Python scripts orchestrated by Snakemake . We suggest you read Snakemake\u2019s docs but in short: every link in the analysis chain is atomic and has files as input and output. Behavioral features are processed per sensor and per participant.","title":"How does it work?"},{"location":"#what-are-the-benefits-of-using-rapids","text":"Consistent analysis . Every participant sensor dataset is analyzed in the exact same way and isolated from each other. Efficient analysis . Every analysis step is executed only once. Whenever your data or configuration changes only the affected files are updated. Parallel execution . Thanks to Snakemake, your analysis can be executed over multiple cores without changing your code. Code-free features . Extract any of the behavioral features offered by RAPIDS without writing any code. Extensible code . You can easily add your own behavioral features in R or Python, share them with the community, and keep authorship and citations. Timezone aware . Your data is adjusted to the specified timezone (multiple timezones suport coming soon ). Flexible time segments . You can extract behavioral features on time windows of any length (e.g. 5 minutes, 3 hours, 2 days), on every day or particular days (e.g. weekends, Mondays, the 1 st of each month, etc.) or around events of interest (e.g. surveys or clinical relapses). Tested code . We are constantly adding tests to make sure our behavioral features are correct. Reproducible code . If you structure your analysis within RAPIDS, you can be sure your code will run in other computers as intended thanks to R and Python virtual environments. You can share your analysis code along your publications without any overhead. Private . All your data is processed locally.","title":"What are the benefits of using RAPIDS?"},{"location":"#how-is-it-organized","text":"In broad terms the config.yaml , .env file , participants files , and time segment files are the only ones that you will have to modify. All data is stored in data/ and all scripts are stored in src/ . For more information see RAPIDS\u2019 File Structure .","title":"How is it organized?"},{"location":"change-log/","text":"Change Log \u00b6 v0.4.0 \u00b6 Add four new phone sensors that can be used for PHONE_DATA_YIELD Add code so new feature providers can be added for the new four sensors Add new clustering algorithm (OPTICS) for Doryab features Update default EPS parameter for Doryab location clustering Add clearer error message for invalid phone data yield sensors Add ALL_RESAMPLED flag and accuracy limit for location features Add FAQ about null characters in phone tables Reactivate light and wifi tests and update testing docs Fix bug when parsing Fitbit steps data Fix bugs when merging features from empty time segments Fix minor issues in the documentation v0.3.2 \u00b6 Update docker and linux instructions to use RSPM binary repo for for faster installation Update CI to create a release on a tagged push that passes the tests Clarify in DB credential configuration that we only support MySQL Add Windows installation instructions Fix bugs in the create_participants_file script Fix bugs in Fitbit data parsing. Fixed Doryab location features context of clustering. Fixed the wrong shifting while calculating distance in Doryab location features. Refactored the haversine function v0.3.1 \u00b6 Update installation docs for RAPIDS\u2019 docker container Fix example analysis use of accelerometer data in a plot Update FAQ Update minimal example documentation Minor doc updates v0.3.0 \u00b6 Update R and Python virtual environments Add GH actions CI support for tests and docker Add release and test badges to README v0.2.6 \u00b6 Fix old versions banner on nested pages v0.2.5 \u00b6 Fix docs deploy typo v0.2.4 \u00b6 Fix broken links in landing page and docs deploy v0.2.3 \u00b6 Fix participant IDS in the example analysis workflow v0.2.2 \u00b6 Fix readme link to docs v0.2.1 \u00b6 FIx link to the most recent version in the old version banner v0.2.0 \u00b6 Add new PHONE_BLUETOOTH DORYAB provider Deprecate PHONE_BLUETOOTH RAPIDS provider Fix bug in filter_data_by_segment for Python when dataset was empty Minor doc updates New FAQ item v0.1.0 \u00b6 New and more consistent docs (this website). The previous docs are marked as beta Consolidate configuration instructions Flexible time segments Simplify Fitbit behavioral feature extraction and documentation Sensor\u2019s configuration and output is more consistent Update visualizations to handle flexible day segments Create a RAPIDS execution script that allows re-computation of the pipeline after configuration changes Add citation guide Update virtual environment guide Update analysis workflow example Add a Code of Conduct Update Team page","title":"Change Log"},{"location":"change-log/#change-log","text":"","title":"Change Log"},{"location":"change-log/#v040","text":"Add four new phone sensors that can be used for PHONE_DATA_YIELD Add code so new feature providers can be added for the new four sensors Add new clustering algorithm (OPTICS) for Doryab features Update default EPS parameter for Doryab location clustering Add clearer error message for invalid phone data yield sensors Add ALL_RESAMPLED flag and accuracy limit for location features Add FAQ about null characters in phone tables Reactivate light and wifi tests and update testing docs Fix bug when parsing Fitbit steps data Fix bugs when merging features from empty time segments Fix minor issues in the documentation","title":"v0.4.0"},{"location":"change-log/#v032","text":"Update docker and linux instructions to use RSPM binary repo for for faster installation Update CI to create a release on a tagged push that passes the tests Clarify in DB credential configuration that we only support MySQL Add Windows installation instructions Fix bugs in the create_participants_file script Fix bugs in Fitbit data parsing. Fixed Doryab location features context of clustering. Fixed the wrong shifting while calculating distance in Doryab location features. Refactored the haversine function","title":"v0.3.2"},{"location":"change-log/#v031","text":"Update installation docs for RAPIDS\u2019 docker container Fix example analysis use of accelerometer data in a plot Update FAQ Update minimal example documentation Minor doc updates","title":"v0.3.1"},{"location":"change-log/#v030","text":"Update R and Python virtual environments Add GH actions CI support for tests and docker Add release and test badges to README","title":"v0.3.0"},{"location":"change-log/#v026","text":"Fix old versions banner on nested pages","title":"v0.2.6"},{"location":"change-log/#v025","text":"Fix docs deploy typo","title":"v0.2.5"},{"location":"change-log/#v024","text":"Fix broken links in landing page and docs deploy","title":"v0.2.4"},{"location":"change-log/#v023","text":"Fix participant IDS in the example analysis workflow","title":"v0.2.3"},{"location":"change-log/#v022","text":"Fix readme link to docs","title":"v0.2.2"},{"location":"change-log/#v021","text":"FIx link to the most recent version in the old version banner","title":"v0.2.1"},{"location":"change-log/#v020","text":"Add new PHONE_BLUETOOTH DORYAB provider Deprecate PHONE_BLUETOOTH RAPIDS provider Fix bug in filter_data_by_segment for Python when dataset was empty Minor doc updates New FAQ item","title":"v0.2.0"},{"location":"change-log/#v010","text":"New and more consistent docs (this website). The previous docs are marked as beta Consolidate configuration instructions Flexible time segments Simplify Fitbit behavioral feature extraction and documentation Sensor\u2019s configuration and output is more consistent Update visualizations to handle flexible day segments Create a RAPIDS execution script that allows re-computation of the pipeline after configuration changes Add citation guide Update virtual environment guide Update analysis workflow example Add a Code of Conduct Update Team page","title":"v0.1.0"},{"location":"citation/","text":"Cite RAPIDS and providers \u00b6 RAPIDS and the community RAPIDS is a community effort and as such we want to continue recognizing the contributions from other researchers. Besides citing RAPIDS, we ask you to cite any of the authors listed below if you used those sensor providers in your analysis, thank you! RAPIDS \u00b6 If you used RAPIDS, please cite this paper . RAPIDS et al. citation Vega J, Li M, Aguillera K, Goel N, Joshi E, Durica KC, Kunta AR, Low CA RAPIDS: Reproducible Analysis Pipeline for Data Streams Collected with Mobile Devices JMIR Preprints. 18/08/2020:23246 DOI: 10.2196/preprints.23246 URL: https://preprints.jmir.org/preprint/23246 Panda (accelerometer) \u00b6 If you computed accelerometer features using the provider [PHONE_ACCLEROMETER][PANDA] cite this paper in addition to RAPIDS. Panda et al. citation Panda N, Solsky I, Huang EJ, Lipsitz S, Pradarelli JC, Delisle M, Cusack JC, Gadd MA, Lubitz CC, Mullen JT, Qadan M, Smith BL, Specht M, Stephen AE, Tanabe KK, Gawande AA, Onnela JP, Haynes AB. Using Smartphones to Capture Novel Recovery Metrics After Cancer Surgery. JAMA Surg. 2020 Feb 1;155(2):123-129. doi: 10.1001/jamasurg.2019.4702. PMID: 31657854; PMCID: PMC6820047. Stachl (applications foreground) \u00b6 If you computed applications foreground features using the app category (genre) catalogue in [PHONE_APPLICATIONS_FOREGROUND][RAPIDS] cite this paper in addition to RAPIDS. Stachl et al. citation Clemens Stachl, Quay Au, Ramona Schoedel, Samuel D. Gosling, Gabriella M. Harari, Daniel Buschek, Sarah Theres V\u00f6lkel, Tobias Schuwerk, Michelle Oldemeier, Theresa Ullmann, Heinrich Hussmann, Bernd Bischl, Markus B\u00fchner. Proceedings of the National Academy of Sciences Jul 2020, 117 (30) 17680-17687; DOI: 10.1073/pnas.1920484117 Doryab (bluetooth) \u00b6 If you computed bluetooth features using the provider [PHONE_BLUETOOTH][DORYAB] cite this paper in addition to RAPIDS. Doryab et al. citation Doryab, A., Chikarsel, P., Liu, X., & Dey, A. K. (2019). Extraction of Behavioral Features from Smartphone and Wearable Data. ArXiv:1812.10394 [Cs, Stat]. http://arxiv.org/abs/1812.10394 Barnett (locations) \u00b6 If you computed locations features using the provider [PHONE_LOCATIONS][BARNETT] cite this paper and this paper in addition to RAPIDS. Barnett et al. citation Ian Barnett, Jukka-Pekka Onnela, Inferring mobility measures from GPS traces with missing data, Biostatistics, Volume 21, Issue 2, April 2020, Pages e98\u2013e112, https://doi.org/10.1093/biostatistics/kxy059 Canzian et al. citation Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp \u201815). Association for Computing Machinery, New York, NY, USA, 1293\u20131304. DOI: https://doi.org/10.1145/2750858.2805845 Doryab (locations) \u00b6 If you computed locations features using the provider [PHONE_LOCATIONS][DORYAB] cite this paper and this paper in addition to RAPIDS. Doryab et al. citation Doryab, A., Chikarsel, P., Liu, X., & Dey, A. K. (2019). Extraction of Behavioral Features from Smartphone and Wearable Data. ArXiv:1812.10394 [Cs, Stat]. http://arxiv.org/abs/1812.10394 Canzian et al. citation Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp \u201815). Association for Computing Machinery, New York, NY, USA, 1293\u20131304. DOI: https://doi.org/10.1145/2750858.2805845","title":"Citation"},{"location":"citation/#cite-rapids-and-providers","text":"RAPIDS and the community RAPIDS is a community effort and as such we want to continue recognizing the contributions from other researchers. Besides citing RAPIDS, we ask you to cite any of the authors listed below if you used those sensor providers in your analysis, thank you!","title":"Cite RAPIDS and providers"},{"location":"citation/#rapids","text":"If you used RAPIDS, please cite this paper . RAPIDS et al. citation Vega J, Li M, Aguillera K, Goel N, Joshi E, Durica KC, Kunta AR, Low CA RAPIDS: Reproducible Analysis Pipeline for Data Streams Collected with Mobile Devices JMIR Preprints. 18/08/2020:23246 DOI: 10.2196/preprints.23246 URL: https://preprints.jmir.org/preprint/23246","title":"RAPIDS"},{"location":"citation/#panda-accelerometer","text":"If you computed accelerometer features using the provider [PHONE_ACCLEROMETER][PANDA] cite this paper in addition to RAPIDS. Panda et al. citation Panda N, Solsky I, Huang EJ, Lipsitz S, Pradarelli JC, Delisle M, Cusack JC, Gadd MA, Lubitz CC, Mullen JT, Qadan M, Smith BL, Specht M, Stephen AE, Tanabe KK, Gawande AA, Onnela JP, Haynes AB. Using Smartphones to Capture Novel Recovery Metrics After Cancer Surgery. JAMA Surg. 2020 Feb 1;155(2):123-129. doi: 10.1001/jamasurg.2019.4702. PMID: 31657854; PMCID: PMC6820047.","title":"Panda (accelerometer)"},{"location":"citation/#stachl-applications-foreground","text":"If you computed applications foreground features using the app category (genre) catalogue in [PHONE_APPLICATIONS_FOREGROUND][RAPIDS] cite this paper in addition to RAPIDS. Stachl et al. citation Clemens Stachl, Quay Au, Ramona Schoedel, Samuel D. Gosling, Gabriella M. Harari, Daniel Buschek, Sarah Theres V\u00f6lkel, Tobias Schuwerk, Michelle Oldemeier, Theresa Ullmann, Heinrich Hussmann, Bernd Bischl, Markus B\u00fchner. Proceedings of the National Academy of Sciences Jul 2020, 117 (30) 17680-17687; DOI: 10.1073/pnas.1920484117","title":"Stachl (applications foreground)"},{"location":"citation/#doryab-bluetooth","text":"If you computed bluetooth features using the provider [PHONE_BLUETOOTH][DORYAB] cite this paper in addition to RAPIDS. Doryab et al. citation Doryab, A., Chikarsel, P., Liu, X., & Dey, A. K. (2019). Extraction of Behavioral Features from Smartphone and Wearable Data. ArXiv:1812.10394 [Cs, Stat]. http://arxiv.org/abs/1812.10394","title":"Doryab (bluetooth)"},{"location":"citation/#barnett-locations","text":"If you computed locations features using the provider [PHONE_LOCATIONS][BARNETT] cite this paper and this paper in addition to RAPIDS. Barnett et al. citation Ian Barnett, Jukka-Pekka Onnela, Inferring mobility measures from GPS traces with missing data, Biostatistics, Volume 21, Issue 2, April 2020, Pages e98\u2013e112, https://doi.org/10.1093/biostatistics/kxy059 Canzian et al. citation Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp \u201815). Association for Computing Machinery, New York, NY, USA, 1293\u20131304. DOI: https://doi.org/10.1145/2750858.2805845","title":"Barnett (locations)"},{"location":"citation/#doryab-locations","text":"If you computed locations features using the provider [PHONE_LOCATIONS][DORYAB] cite this paper and this paper in addition to RAPIDS. Doryab et al. citation Doryab, A., Chikarsel, P., Liu, X., & Dey, A. K. (2019). Extraction of Behavioral Features from Smartphone and Wearable Data. ArXiv:1812.10394 [Cs, Stat]. http://arxiv.org/abs/1812.10394 Canzian et al. citation Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp \u201815). Association for Computing Machinery, New York, NY, USA, 1293\u20131304. DOI: https://doi.org/10.1145/2750858.2805845","title":"Doryab (locations)"},{"location":"code_of_conduct/","text":"Contributor Covenant Code of Conduct \u00b6 Our Pledge \u00b6 We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation. We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community. Our Standards \u00b6 Examples of behavior that contributes to a positive environment for our community include: Demonstrating empathy and kindness toward other people Being respectful of differing opinions, viewpoints, and experiences Giving and gracefully accepting constructive feedback Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience Focusing on what is best not just for us as individuals, but for the overall community Examples of unacceptable behavior include: The use of sexualized language or imagery, and sexual attention or advances of any kind Trolling, insulting or derogatory comments, and personal or political attacks Public or private harassment Publishing others\u2019 private information, such as a physical or email address, without their explicit permission Other conduct which could reasonably be considered inappropriate in a professional setting Enforcement Responsibilities \u00b6 Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful. Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate. Scope \u00b6 This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Enforcement \u00b6 Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at moshi@pitt.edu . All complaints will be reviewed and investigated promptly and fairly. All community leaders are obligated to respect the privacy and security of the reporter of any incident. Enforcement Guidelines \u00b6 Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct: 1. Correction \u00b6 Community Impact : Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community. Consequence : A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested. 2. Warning \u00b6 Community Impact : A violation through a single incident or series of actions. Consequence : A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban. 3. Temporary Ban \u00b6 Community Impact : A serious violation of community standards, including sustained inappropriate behavior. Consequence : A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban. 4. Permanent Ban \u00b6 Community Impact : Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals. Consequence : A permanent ban from any sort of public interaction within the community. Attribution \u00b6 This Code of Conduct is adapted from the Contributor Covenant , version 2.0, available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html . Community Impact Guidelines were inspired by Mozilla\u2019s code of conduct enforcement ladder . For answers to common questions about this code of conduct, see the FAQ at https://www.contributor-covenant.org/faq . Translations are available at https://www.contributor-covenant.org/translations .","title":"Code of Conduct"},{"location":"code_of_conduct/#contributor-covenant-code-of-conduct","text":"","title":"Contributor Covenant Code of Conduct"},{"location":"code_of_conduct/#our-pledge","text":"We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation. We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.","title":"Our Pledge"},{"location":"code_of_conduct/#our-standards","text":"Examples of behavior that contributes to a positive environment for our community include: Demonstrating empathy and kindness toward other people Being respectful of differing opinions, viewpoints, and experiences Giving and gracefully accepting constructive feedback Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience Focusing on what is best not just for us as individuals, but for the overall community Examples of unacceptable behavior include: The use of sexualized language or imagery, and sexual attention or advances of any kind Trolling, insulting or derogatory comments, and personal or political attacks Public or private harassment Publishing others\u2019 private information, such as a physical or email address, without their explicit permission Other conduct which could reasonably be considered inappropriate in a professional setting","title":"Our Standards"},{"location":"code_of_conduct/#enforcement-responsibilities","text":"Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful. Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.","title":"Enforcement Responsibilities"},{"location":"code_of_conduct/#scope","text":"This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.","title":"Scope"},{"location":"code_of_conduct/#enforcement","text":"Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at moshi@pitt.edu . All complaints will be reviewed and investigated promptly and fairly. All community leaders are obligated to respect the privacy and security of the reporter of any incident.","title":"Enforcement"},{"location":"code_of_conduct/#enforcement-guidelines","text":"Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:","title":"Enforcement Guidelines"},{"location":"code_of_conduct/#1-correction","text":"Community Impact : Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community. Consequence : A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.","title":"1. Correction"},{"location":"code_of_conduct/#2-warning","text":"Community Impact : A violation through a single incident or series of actions. Consequence : A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.","title":"2. Warning"},{"location":"code_of_conduct/#3-temporary-ban","text":"Community Impact : A serious violation of community standards, including sustained inappropriate behavior. Consequence : A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.","title":"3. Temporary Ban"},{"location":"code_of_conduct/#4-permanent-ban","text":"Community Impact : Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals. Consequence : A permanent ban from any sort of public interaction within the community.","title":"4. Permanent Ban"},{"location":"code_of_conduct/#attribution","text":"This Code of Conduct is adapted from the Contributor Covenant , version 2.0, available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html . Community Impact Guidelines were inspired by Mozilla\u2019s code of conduct enforcement ladder . For answers to common questions about this code of conduct, see the FAQ at https://www.contributor-covenant.org/faq . Translations are available at https://www.contributor-covenant.org/translations .","title":"Attribution"},{"location":"faq/","text":"Frequently Asked Questions \u00b6 Cannot connect to your MySQL server \u00b6 Problem **Error in .local ( drv, \\. .. ) :** **Failed to connect to database: Error: Can \\' t initialize character set unknown ( path: compiled \\_ in ) ** : Calls: dbConnect -> dbConnect -> .local -> .Call Execution halted [ Tue Mar 10 19 :40:15 2020 ] Error in rule download_dataset: jobid: 531 output: data/raw/p60/locations_raw.csv RuleException: CalledProcessError in line 20 of /home/ubuntu/rapids/rules/preprocessing.snakefile: Command 'set -euo pipefail; Rscript --vanilla /home/ubuntu/rapids/.snakemake/scripts/tmp_2jnvqs7.download_dataset.R' returned non-zero exit status 1 . File \"/home/ubuntu/rapids/rules/preprocessing.snakefile\" , line 20 , in __rule_download_dataset File \"/home/ubuntu/anaconda3/envs/moshi-env/lib/python3.7/concurrent/futures/thread.py\" , line 57 , in run Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Solution Please make sure the DATABASE_GROUP in config.yaml matches your DB credentials group in .env . Cannot start mysql in linux via brew services start mysql \u00b6 Problem Cannot start mysql in linux via brew services start mysql Solution Use mysql.server start Every time I run force the download_dataset rule all rules are executed \u00b6 Problem When running snakemake -j1 -R download_phone_data or ./rapids -j1 -R download_phone_data all the rules and files are re-computed Solution This is expected behavior. The advantage of using snakemake under the hood is that every time a file containing data is modified every rule that depends on that file will be re-executed to update their results. In this case, since download_dataset updates all the raw data, and you are forcing the rule with the flag -R every single rule that depends on those raw files will be executed. Error Table XXX doesn't exist while running the download_phone_data or download_fitbit_data rule. \u00b6 Problem Error in .local ( conn, statement, ... ) : could not run statement: Table 'db_name.table_name' doesn ' t exist Calls: colnames ... .local -> dbSendQuery -> dbSendQuery -> .local -> .Call Execution halted Solution Please make sure the sensors listed in [PHONE_VALID_SENSED_BINS][PHONE_SENSORS] and the [TABLE] of each sensor you activated in config.yaml match your database tables. How do I install RAPIDS on Ubuntu 16.04 \u00b6 Solution Install dependencies (Homebrew - if not installed): sudo apt-get install libmariadb-client-lgpl-dev libxml2-dev libssl-dev Install brew for linux and add the following line to ~/.bashrc : export PATH=$HOME/.linuxbrew/bin:$PATH source ~/.bashrc Install MySQL brew install mysql brew services start mysql Install R, pandoc and rmarkdown: brew install r brew install gcc@6 (needed due to this bug ) HOMEBREW_CC=gcc-6 brew install pandoc Install miniconda using these instructions Clone our repo: git clone https://github.com/carissalow/rapids Create a python virtual environment: cd rapids conda env create -f environment.yml -n MY_ENV_NAME conda activate MY_ENV_NAME Install R packages and virtual environment: snakemake renv_install snakemake renv_init snakemake renv_restore This step could take several minutes to complete. Please be patient and let it run until completion. mysql.h cannot be found \u00b6 Problem -------------------------- [ ERROR MESSAGE ] ---------------------------- :1:10: fatal error: mysql.h: No such file or directory compilation terminated. ----------------------------------------------------------------------- ERROR: configuration failed for package 'RMySQL' Solution sudo apt install libmariadbclient-dev No package libcurl found \u00b6 Problem libcurl cannot be found Solution Install libcurl sudo apt install libcurl4-openssl-dev Configuration failed because openssl was not found. \u00b6 Problem openssl cannot be found Solution Install openssl sudo apt install libssl-dev Configuration failed because libxml-2.0 was not found \u00b6 Problem libxml-2.0 cannot be found Solution Install libxml-2.0 sudo apt install libxml2-dev SSL connection error when running RAPIDS \u00b6 Problem You are getting the following error message when running RAPIDS: Error: Failed to connect: SSL connection error: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol. Solution This is a bug in Ubuntu 20.04 when trying to connect to an old MySQL server with MySQL client 8.0. You should get the same error message if you try to connect from the command line. There you can add the option --ssl-mode=DISABLED but we can't do this from the R connector. If you can't update your server, the quickest solution would be to import your database to another server or to a local environment. Alternatively, you could replace mysql-client and libmysqlclient-dev with mariadb-client and libmariadbclient-dev and reinstall renv. More info about this issue here DB_TABLES key not found \u00b6 Problem If you get the following error KeyError in line 43 of preprocessing.smk: 'PHONE_SENSORS' , it means that the indentation of the key [PHONE_SENSORS] is not matching the other child elements of PHONE_VALID_SENSED_BINS Solution You need to add or remove any leading whitespaces as needed on that line. PHONE_VALID_SENSED_BINS : COMPUTE : False # This flag is automatically ignored (set to True) if you are extracting PHONE_VALID_SENSED_DAYS or screen or Barnett's location features BIN_SIZE : &bin_size 5 # (in minutes) PHONE_SENSORS : [] Error while updating your conda environment in Ubuntu \u00b6 Problem You get the following error: CondaMultiError: CondaVerificationError: The package for tk located at /home/ubuntu/miniconda2/pkgs/tk-8.6.9-hed695b0_1003 appears to be corrupted. The path 'include/mysqlStubs.h' specified in the package manifest cannot be found. ClobberError: This transaction has incompatible packages due to a shared path. packages: conda-forge/linux-64::llvm-openmp-10.0.0-hc9558a2_0, anaconda/linux-64::intel-openmp-2019.4-243 path: 'lib/libiomp5.so' Solution Reinstall conda Embedded nul in string \u00b6 Problem You get the following error when downloading sensor data: Error in result_fetch ( res@ptr, n = n ) : embedded nul in string: Solution This problem is due to the way RMariaDB handles a mismatch between data types in R and MySQL (see this issue ). Since it seems this problem won\u2019t be handled by RMariaDB , you have two options: Remove the the null character from the conflictive table cell(s). You can adapt the following query on a MySQL server 8.0 or older update YOUR_TABLE set YOUR_COLUMN = regexp_replace ( YOUR_COLUMN , '\\0' , '' ); If it\u2019s not feasible to modify your data you can try swapping RMariaDB with RMySQL . Just have in mind you might have problems connecting to modern MySQL servers running in Linux: Add RMySQL to the renv environment by running the following command in a terminal open on RAPIDS root folder R -e 'renv::install(\"RMySQL\")' Go to src/data/download_phone_data.R or src/data/download_fitbit_data.R and replace library(RMariaDB) with library(RMySQL) In the same file(s) replace dbEngine <- dbConnect(MariaDB(), default.file = \"./.env\", group = group) with dbEngine <- dbConnect(MySQL(), default.file = \"./.env\", group = group) There is no package called RMariaDB \u00b6 Problem You get the following error when executing RAPIDS: Error in library ( RMariaDB ) : there is no package called 'RMariaDB' Execution halted Solution In RAPIDS v0.1.0 we replaced RMySQL R package with RMariaDB , this error means your R virtual environment is out of date, to update it run snakemake -j1 renv_restore","title":"FAQ"},{"location":"faq/#frequently-asked-questions","text":"","title":"Frequently Asked Questions"},{"location":"faq/#cannot-connect-to-your-mysql-server","text":"Problem **Error in .local ( drv, \\. .. ) :** **Failed to connect to database: Error: Can \\' t initialize character set unknown ( path: compiled \\_ in ) ** : Calls: dbConnect -> dbConnect -> .local -> .Call Execution halted [ Tue Mar 10 19 :40:15 2020 ] Error in rule download_dataset: jobid: 531 output: data/raw/p60/locations_raw.csv RuleException: CalledProcessError in line 20 of /home/ubuntu/rapids/rules/preprocessing.snakefile: Command 'set -euo pipefail; Rscript --vanilla /home/ubuntu/rapids/.snakemake/scripts/tmp_2jnvqs7.download_dataset.R' returned non-zero exit status 1 . File \"/home/ubuntu/rapids/rules/preprocessing.snakefile\" , line 20 , in __rule_download_dataset File \"/home/ubuntu/anaconda3/envs/moshi-env/lib/python3.7/concurrent/futures/thread.py\" , line 57 , in run Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Solution Please make sure the DATABASE_GROUP in config.yaml matches your DB credentials group in .env .","title":"Cannot connect to your MySQL server"},{"location":"faq/#cannot-start-mysql-in-linux-via-brew-services-start-mysql","text":"Problem Cannot start mysql in linux via brew services start mysql Solution Use mysql.server start","title":"Cannot start mysql in linux via brew services start mysql"},{"location":"faq/#every-time-i-run-force-the-download_dataset-rule-all-rules-are-executed","text":"Problem When running snakemake -j1 -R download_phone_data or ./rapids -j1 -R download_phone_data all the rules and files are re-computed Solution This is expected behavior. The advantage of using snakemake under the hood is that every time a file containing data is modified every rule that depends on that file will be re-executed to update their results. In this case, since download_dataset updates all the raw data, and you are forcing the rule with the flag -R every single rule that depends on those raw files will be executed.","title":"Every time I run force the download_dataset rule all rules are executed"},{"location":"faq/#error-table-xxx-doesnt-exist-while-running-the-download_phone_data-or-download_fitbit_data-rule","text":"Problem Error in .local ( conn, statement, ... ) : could not run statement: Table 'db_name.table_name' doesn ' t exist Calls: colnames ... .local -> dbSendQuery -> dbSendQuery -> .local -> .Call Execution halted Solution Please make sure the sensors listed in [PHONE_VALID_SENSED_BINS][PHONE_SENSORS] and the [TABLE] of each sensor you activated in config.yaml match your database tables.","title":"Error Table XXX doesn't exist while running the download_phone_data or download_fitbit_data rule."},{"location":"faq/#how-do-i-install-rapids-on-ubuntu-1604","text":"Solution Install dependencies (Homebrew - if not installed): sudo apt-get install libmariadb-client-lgpl-dev libxml2-dev libssl-dev Install brew for linux and add the following line to ~/.bashrc : export PATH=$HOME/.linuxbrew/bin:$PATH source ~/.bashrc Install MySQL brew install mysql brew services start mysql Install R, pandoc and rmarkdown: brew install r brew install gcc@6 (needed due to this bug ) HOMEBREW_CC=gcc-6 brew install pandoc Install miniconda using these instructions Clone our repo: git clone https://github.com/carissalow/rapids Create a python virtual environment: cd rapids conda env create -f environment.yml -n MY_ENV_NAME conda activate MY_ENV_NAME Install R packages and virtual environment: snakemake renv_install snakemake renv_init snakemake renv_restore This step could take several minutes to complete. Please be patient and let it run until completion.","title":"How do I install RAPIDS on Ubuntu 16.04"},{"location":"faq/#mysqlh-cannot-be-found","text":"Problem -------------------------- [ ERROR MESSAGE ] ---------------------------- :1:10: fatal error: mysql.h: No such file or directory compilation terminated. ----------------------------------------------------------------------- ERROR: configuration failed for package 'RMySQL' Solution sudo apt install libmariadbclient-dev","title":"mysql.h cannot be found"},{"location":"faq/#no-package-libcurl-found","text":"Problem libcurl cannot be found Solution Install libcurl sudo apt install libcurl4-openssl-dev","title":"No package libcurl found"},{"location":"faq/#configuration-failed-because-openssl-was-not-found","text":"Problem openssl cannot be found Solution Install openssl sudo apt install libssl-dev","title":"Configuration failed because openssl was not found."},{"location":"faq/#configuration-failed-because-libxml-20-was-not-found","text":"Problem libxml-2.0 cannot be found Solution Install libxml-2.0 sudo apt install libxml2-dev","title":"Configuration failed because libxml-2.0 was not found"},{"location":"faq/#ssl-connection-error-when-running-rapids","text":"Problem You are getting the following error message when running RAPIDS: Error: Failed to connect: SSL connection error: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol. Solution This is a bug in Ubuntu 20.04 when trying to connect to an old MySQL server with MySQL client 8.0. You should get the same error message if you try to connect from the command line. There you can add the option --ssl-mode=DISABLED but we can't do this from the R connector. If you can't update your server, the quickest solution would be to import your database to another server or to a local environment. Alternatively, you could replace mysql-client and libmysqlclient-dev with mariadb-client and libmariadbclient-dev and reinstall renv. More info about this issue here","title":"SSL connection error when running RAPIDS"},{"location":"faq/#db_tables-key-not-found","text":"Problem If you get the following error KeyError in line 43 of preprocessing.smk: 'PHONE_SENSORS' , it means that the indentation of the key [PHONE_SENSORS] is not matching the other child elements of PHONE_VALID_SENSED_BINS Solution You need to add or remove any leading whitespaces as needed on that line. PHONE_VALID_SENSED_BINS : COMPUTE : False # This flag is automatically ignored (set to True) if you are extracting PHONE_VALID_SENSED_DAYS or screen or Barnett's location features BIN_SIZE : &bin_size 5 # (in minutes) PHONE_SENSORS : []","title":"DB_TABLES key not found"},{"location":"faq/#error-while-updating-your-conda-environment-in-ubuntu","text":"Problem You get the following error: CondaMultiError: CondaVerificationError: The package for tk located at /home/ubuntu/miniconda2/pkgs/tk-8.6.9-hed695b0_1003 appears to be corrupted. The path 'include/mysqlStubs.h' specified in the package manifest cannot be found. ClobberError: This transaction has incompatible packages due to a shared path. packages: conda-forge/linux-64::llvm-openmp-10.0.0-hc9558a2_0, anaconda/linux-64::intel-openmp-2019.4-243 path: 'lib/libiomp5.so' Solution Reinstall conda","title":"Error while updating your conda environment in Ubuntu"},{"location":"faq/#embedded-nul-in-string","text":"Problem You get the following error when downloading sensor data: Error in result_fetch ( res@ptr, n = n ) : embedded nul in string: Solution This problem is due to the way RMariaDB handles a mismatch between data types in R and MySQL (see this issue ). Since it seems this problem won\u2019t be handled by RMariaDB , you have two options: Remove the the null character from the conflictive table cell(s). You can adapt the following query on a MySQL server 8.0 or older update YOUR_TABLE set YOUR_COLUMN = regexp_replace ( YOUR_COLUMN , '\\0' , '' ); If it\u2019s not feasible to modify your data you can try swapping RMariaDB with RMySQL . Just have in mind you might have problems connecting to modern MySQL servers running in Linux: Add RMySQL to the renv environment by running the following command in a terminal open on RAPIDS root folder R -e 'renv::install(\"RMySQL\")' Go to src/data/download_phone_data.R or src/data/download_fitbit_data.R and replace library(RMariaDB) with library(RMySQL) In the same file(s) replace dbEngine <- dbConnect(MariaDB(), default.file = \"./.env\", group = group) with dbEngine <- dbConnect(MySQL(), default.file = \"./.env\", group = group)","title":"Embedded nul in string"},{"location":"faq/#there-is-no-package-called-rmariadb","text":"Problem You get the following error when executing RAPIDS: Error in library ( RMariaDB ) : there is no package called 'RMariaDB' Execution halted Solution In RAPIDS v0.1.0 we replaced RMySQL R package with RMariaDB , this error means your R virtual environment is out of date, to update it run snakemake -j1 renv_restore","title":"There is no package called RMariaDB"},{"location":"file-structure/","text":"File Structure \u00b6 Tip Read this page if you want to learn more about how RAPIDS is structured. If you want to start using it go to Installation , then to Configuration , and then to Execution All paths mentioned in this page are relative to RAPIDS\u2019 root folder. If you want to extract the behavioral features that RAPIDS offers, you will only have to create or modify the .env file , participants files , time segment files , and the config.yaml file as instructed in the Configuration page . The config.yaml file is the heart of RAPIDS and includes parameters to manage participants, data sources, sensor data, visualizations and more. All data is saved in data/ . The data/external/ folder stores any data imported or created by the user, data/raw/ stores sensor data as imported from your database, data/interim/ has intermediate files necessary to compute behavioral features from raw data, and data/processed/ has all the final files with the behavioral features in folders per participant and sensor. RAPIDS source code is saved in src/ . The src/data/ folder stores scripts to download, clean and pre-process sensor data, src/features has scripts to extract behavioral features organized in their respective sensor subfolders , src/models/ can host any script to create models or statistical analyses with the behavioral features you extract, and src/visualization/ has scripts to create plots of the raw and processed data. There are other files and folders but only relevant if you are interested in extending RAPIDS (e.g. virtual env files, docs, tests, Dockerfile, the Snakefile, etc.). In the figure below, we represent the interactions between users and files. After a user modifies the configuration files mentioned above, the Snakefile file will search for and execute the Snakemake rules that contain the Python or R scripts necessary to generate or update the required output files (behavioral features, plots, etc.). Interaction diagram between the user, and important files in RAPIDS","title":"File Structure"},{"location":"file-structure/#file-structure","text":"Tip Read this page if you want to learn more about how RAPIDS is structured. If you want to start using it go to Installation , then to Configuration , and then to Execution All paths mentioned in this page are relative to RAPIDS\u2019 root folder. If you want to extract the behavioral features that RAPIDS offers, you will only have to create or modify the .env file , participants files , time segment files , and the config.yaml file as instructed in the Configuration page . The config.yaml file is the heart of RAPIDS and includes parameters to manage participants, data sources, sensor data, visualizations and more. All data is saved in data/ . The data/external/ folder stores any data imported or created by the user, data/raw/ stores sensor data as imported from your database, data/interim/ has intermediate files necessary to compute behavioral features from raw data, and data/processed/ has all the final files with the behavioral features in folders per participant and sensor. RAPIDS source code is saved in src/ . The src/data/ folder stores scripts to download, clean and pre-process sensor data, src/features has scripts to extract behavioral features organized in their respective sensor subfolders , src/models/ can host any script to create models or statistical analyses with the behavioral features you extract, and src/visualization/ has scripts to create plots of the raw and processed data. There are other files and folders but only relevant if you are interested in extending RAPIDS (e.g. virtual env files, docs, tests, Dockerfile, the Snakefile, etc.). In the figure below, we represent the interactions between users and files. After a user modifies the configuration files mentioned above, the Snakefile file will search for and execute the Snakemake rules that contain the Python or R scripts necessary to generate or update the required output files (behavioral features, plots, etc.). Interaction diagram between the user, and important files in RAPIDS","title":"File Structure"},{"location":"migrating-from-old-versions/","text":"Migrating from RAPIDS beta \u00b6 If you were relying on the old docs and the most recent version of RAPIDS you are working with is from or before Oct 13, 2020 you are using the beta version of RAPIDS. You can start using the new RAPIDS (we are starting with v0.1.0 ) right away, just take into account the following: Install a new copy of RAPIDS (the R and Python virtual environments didn\u2019t change so the cached versions will be reused) Make sure you don\u2019t skip a new Installation step to give execution permissions to the RAPIDS script: chmod +x rapids Follow the new Configuration guide. You can copy and paste your old .env file You can migrate your old participant files: python tools/update_format_participant_files.py Get familiar with the new way of Executing RAPIDS You can proceed to reconfigure your config.yaml , its structure is more consistent and should be familiar to you. Info If you have any questions reach out to us on Slack .","title":"Migrating from beta"},{"location":"migrating-from-old-versions/#migrating-from-rapids-beta","text":"If you were relying on the old docs and the most recent version of RAPIDS you are working with is from or before Oct 13, 2020 you are using the beta version of RAPIDS. You can start using the new RAPIDS (we are starting with v0.1.0 ) right away, just take into account the following: Install a new copy of RAPIDS (the R and Python virtual environments didn\u2019t change so the cached versions will be reused) Make sure you don\u2019t skip a new Installation step to give execution permissions to the RAPIDS script: chmod +x rapids Follow the new Configuration guide. You can copy and paste your old .env file You can migrate your old participant files: python tools/update_format_participant_files.py Get familiar with the new way of Executing RAPIDS You can proceed to reconfigure your config.yaml , its structure is more consistent and should be familiar to you. Info If you have any questions reach out to us on Slack .","title":"Migrating from RAPIDS beta"},{"location":"team/","text":"RAPIDS Team \u00b6 If you are interested in contributing feel free to submit a pull request or contact us. Core Team \u00b6 Julio Vega (Designer and Lead Developer) \u00b6 About Julio Vega is a postdoctoral associate at the Mobile Sensing + Health Institute. He is interested in personalized methodologies to monitor chronic conditions that affect daily human behavior using mobile and wearable data. vegaju at upmc . edu Personal Website Meng Li \u00b6 About Meng Li received her Master of Science degree in Information Science from the University of Pittsburgh. She is interested in applying machine learning algorithms to the medical field. lim11 at upmc . edu Linkedin Profile Github Profile Abhineeth Reddy Kunta \u00b6 About Abhineeth Reddy Kunta is a Senior Software Engineer with the Mobile Sensing + Health Institute. He is experienced in software development and specializes in building solutions using machine learning. Abhineeth likes exploring ways to leverage technology in advancing medicine and education. Previously he worked as a Computer Programmer at Georgia Department of Public Health. He has a master\u2019s degree in Computer Science from George Mason University. Kwesi Aguillera \u00b6 About Kwesi Aguillera is currently in his first year at the University of Pittsburgh pursuing a Master of Sciences in Information Science specializing in Big Data Analytics. He received his Bachelor of Science degree in Computer Science and Management from the University of the West Indies. Kwesi considers himself a full stack developer and looks forward to applying this knowledge to big data analysis. Linkedin Profile Echhit Joshi \u00b6 About Echhit Joshi is a Masters student at the School of Computing and Information at University of Pittsburgh. His areas of interest are Machine/Deep Learning, Data Mining, and Analytics. Linkedin Profile Nicolas Leo \u00b6 About Nicolas is a rising senior studying computer science at the University of Pittsburgh. His academic interests include databases, machine learning, and application development. After completing his undergraduate degree, he plans to attend graduate school for a MS in Computer Science with a focus on Intelligent Systems. Nikunj Goel \u00b6 About Nik is a graduate student at the University of Pittsburgh pursuing Master of Science in Information Science. He earned his Bachelor of Technology degree in Information Technology from India. He is a Data Enthusiasts and passionate about finding the meaning out of raw data. In a long term, his goal is to create a breakthrough in Data Science and Deep Learning. Linkedin Profile Community Contributors \u00b6 Agam Kumar \u00b6 About Agam is a junior at Carnegie Mellon University studying Statistics and Machine Learning and pursuing an additional major in Computer Science. He is a member of the Data Science team in the Health and Human Performance Lab at CMU and has keen interests in software development and data science. His research interests include ML applications in medicine. Linkedin Profile Github Profile Yasaman S. Sefidgar \u00b6 About Linkedin Profile Advisors \u00b6 Afsaneh Doryab \u00b6 About Personal Website Carissa Low \u00b6 About Profile","title":"Team"},{"location":"team/#rapids-team","text":"If you are interested in contributing feel free to submit a pull request or contact us.","title":"RAPIDS Team"},{"location":"team/#core-team","text":"","title":"Core Team"},{"location":"team/#julio-vega-designer-and-lead-developer","text":"About Julio Vega is a postdoctoral associate at the Mobile Sensing + Health Institute. He is interested in personalized methodologies to monitor chronic conditions that affect daily human behavior using mobile and wearable data. vegaju at upmc . edu Personal Website","title":"Julio Vega (Designer and Lead Developer)"},{"location":"team/#meng-li","text":"About Meng Li received her Master of Science degree in Information Science from the University of Pittsburgh. She is interested in applying machine learning algorithms to the medical field. lim11 at upmc . edu Linkedin Profile Github Profile","title":"Meng Li"},{"location":"team/#abhineeth-reddy-kunta","text":"About Abhineeth Reddy Kunta is a Senior Software Engineer with the Mobile Sensing + Health Institute. He is experienced in software development and specializes in building solutions using machine learning. Abhineeth likes exploring ways to leverage technology in advancing medicine and education. Previously he worked as a Computer Programmer at Georgia Department of Public Health. He has a master\u2019s degree in Computer Science from George Mason University.","title":"Abhineeth Reddy Kunta"},{"location":"team/#kwesi-aguillera","text":"About Kwesi Aguillera is currently in his first year at the University of Pittsburgh pursuing a Master of Sciences in Information Science specializing in Big Data Analytics. He received his Bachelor of Science degree in Computer Science and Management from the University of the West Indies. Kwesi considers himself a full stack developer and looks forward to applying this knowledge to big data analysis. Linkedin Profile","title":"Kwesi Aguillera"},{"location":"team/#echhit-joshi","text":"About Echhit Joshi is a Masters student at the School of Computing and Information at University of Pittsburgh. His areas of interest are Machine/Deep Learning, Data Mining, and Analytics. Linkedin Profile","title":"Echhit Joshi"},{"location":"team/#nicolas-leo","text":"About Nicolas is a rising senior studying computer science at the University of Pittsburgh. His academic interests include databases, machine learning, and application development. After completing his undergraduate degree, he plans to attend graduate school for a MS in Computer Science with a focus on Intelligent Systems.","title":"Nicolas Leo"},{"location":"team/#nikunj-goel","text":"About Nik is a graduate student at the University of Pittsburgh pursuing Master of Science in Information Science. He earned his Bachelor of Technology degree in Information Technology from India. He is a Data Enthusiasts and passionate about finding the meaning out of raw data. In a long term, his goal is to create a breakthrough in Data Science and Deep Learning. Linkedin Profile","title":"Nikunj Goel"},{"location":"team/#community-contributors","text":"","title":"Community Contributors"},{"location":"team/#agam-kumar","text":"About Agam is a junior at Carnegie Mellon University studying Statistics and Machine Learning and pursuing an additional major in Computer Science. He is a member of the Data Science team in the Health and Human Performance Lab at CMU and has keen interests in software development and data science. His research interests include ML applications in medicine. Linkedin Profile Github Profile","title":"Agam Kumar"},{"location":"team/#yasaman-s-sefidgar","text":"About Linkedin Profile","title":"Yasaman S. Sefidgar"},{"location":"team/#advisors","text":"","title":"Advisors"},{"location":"team/#afsaneh-doryab","text":"About Personal Website","title":"Afsaneh Doryab"},{"location":"team/#carissa-low","text":"About Profile","title":"Carissa Low"},{"location":"developers/documentation/","text":"Documentation \u00b6 We use mkdocs with the material theme to write these docs. Whenever you make any changes, just push them back to the repo and the documentation will be deployed automatically. Set up development environment \u00b6 Make sure your conda environment is active pip install mkdocs pip install mkdocs-material Preview \u00b6 Run the following command in RAPIDS root folder and go to http://127.0.0.1:8000 : mkdocs serve File Structure \u00b6 The documentation config file is /mkdocs.yml , if you are adding new .md files to the docs modify the nav attribute at the bottom of that file. You can use the hierarchy there to find all the files that appear in the documentation. Reference \u00b6 Check this page to get familiar with the different visual elements we can use in the docs (admonitions, code blocks, tables, etc.) You can also refer to /docs/setup/installation.md and /docs/setup/configuration.md to see practical examples of these elements. Hint Any links to internal pages should be relative to the current page. For example, any link from this page (documentation) which is inside ./developers should begin with ../ to go one folder level up like: [ mylink ]( ../setup/installation.md ) Extras \u00b6 You can insert emojis using this syntax :[SOURCE]-[ICON_NAME] from the following sources: https://materialdesignicons.com/ https://fontawesome.com/icons/tasks?style=solid https://primer.style/octicons/ You can use this page to create markdown tables more easily","title":"Documentation"},{"location":"developers/documentation/#documentation","text":"We use mkdocs with the material theme to write these docs. Whenever you make any changes, just push them back to the repo and the documentation will be deployed automatically.","title":"Documentation"},{"location":"developers/documentation/#set-up-development-environment","text":"Make sure your conda environment is active pip install mkdocs pip install mkdocs-material","title":"Set up development environment"},{"location":"developers/documentation/#preview","text":"Run the following command in RAPIDS root folder and go to http://127.0.0.1:8000 : mkdocs serve","title":"Preview"},{"location":"developers/documentation/#file-structure","text":"The documentation config file is /mkdocs.yml , if you are adding new .md files to the docs modify the nav attribute at the bottom of that file. You can use the hierarchy there to find all the files that appear in the documentation.","title":"File Structure"},{"location":"developers/documentation/#reference","text":"Check this page to get familiar with the different visual elements we can use in the docs (admonitions, code blocks, tables, etc.) You can also refer to /docs/setup/installation.md and /docs/setup/configuration.md to see practical examples of these elements. Hint Any links to internal pages should be relative to the current page. For example, any link from this page (documentation) which is inside ./developers should begin with ../ to go one folder level up like: [ mylink ]( ../setup/installation.md )","title":"Reference"},{"location":"developers/documentation/#extras","text":"You can insert emojis using this syntax :[SOURCE]-[ICON_NAME] from the following sources: https://materialdesignicons.com/ https://fontawesome.com/icons/tasks?style=solid https://primer.style/octicons/ You can use this page to create markdown tables more easily","title":"Extras"},{"location":"developers/git-flow/","text":"Git Flow \u00b6 We use the develop/master variation of the OneFlow git flow Add New Features \u00b6 We use feature (topic) branches to implement new features Pull the latest develop git checkout develop git pull Create your feature branch git checkout -b feature/feature1 Add, modify or delete the necessary files to add your new feature Update the change log ( docs/change-log.md ) Stage and commit your changes using VS Code git GUI or the following commands git add modified-file1 modified-file2 git commit -m \"Add my new feature\" # use a concise description Integrate your new feature to develop Internal Developer You are an internal developer if you have writing permissions to the repository. Most feature branches are never pushed to the repo, only do so if you expect that its development will take days (to avoid losing your work if you computer is damaged). Otherwise follow the following instructions to locally rebase your feature branch into develop and push those rebased changes online. git checkout feature/feature1 git fetch origin develop git rebase -i develop git checkout develop git merge --no-ff feature/feature1 # (use the default merge message) git push origin develop git branch -d feature/feature1 External Developer You are an external developer if you do NOT have writing permissions to the repository. Push your feature branch online git push --set-upstream origin feature/external-test Then open a pull request to the develop branch using Github\u2019s GUI Release a New Version \u00b6 Pull the latest develop git checkout develop git pull Create a new release branch git describe --abbrev = 0 --tags # Bump the release (0.1.0 to 0.2.0 => NEW_HOTFIX) git checkout -b release/v [ NEW_RELEASE ] develop Add new tag git tag v [ NEW_RELEASE ] Merge and push the release branch git checkout develop git merge release/v [ NEW_RELEASE ] git push --tags origin develop git branch -d release/v [ NEW_RELEASE ] Fast-forward master git checkout master git merge --ff-only develop git push Go to GitHub and create a new release based on the newest tag v[NEW_RELEASE] (remember to add the change log) Release a Hotfix \u00b6 Pull the latest master git checkout master git pull Start a hotfix branch git describe --abbrev = 0 --tags # Bump the hotfix (0.1.0 to 0.1.1 => NEW_HOTFIX) git checkout -b hotfix/v [ NEW_HOTFIX ] master Fix whatever needs to be fixed Update the change log Tag and merge the hotfix git tag v [ NEW_HOTFIX ] git checkout develop git merge hotfix/v [ NEW_HOTFIX ] git push --tags origin develop git branch -d hotfix/v [ NEW_HOTFIX ] Fast-forward master git checkout master git merge --ff-only v[NEW_HOTFIX] git push Go to GitHub and create a new release based on the newest tag v[NEW_HOTFIX] (remember to add the change log)","title":"Git Flow"},{"location":"developers/git-flow/#git-flow","text":"We use the develop/master variation of the OneFlow git flow","title":"Git Flow"},{"location":"developers/git-flow/#add-new-features","text":"We use feature (topic) branches to implement new features Pull the latest develop git checkout develop git pull Create your feature branch git checkout -b feature/feature1 Add, modify or delete the necessary files to add your new feature Update the change log ( docs/change-log.md ) Stage and commit your changes using VS Code git GUI or the following commands git add modified-file1 modified-file2 git commit -m \"Add my new feature\" # use a concise description Integrate your new feature to develop Internal Developer You are an internal developer if you have writing permissions to the repository. Most feature branches are never pushed to the repo, only do so if you expect that its development will take days (to avoid losing your work if you computer is damaged). Otherwise follow the following instructions to locally rebase your feature branch into develop and push those rebased changes online. git checkout feature/feature1 git fetch origin develop git rebase -i develop git checkout develop git merge --no-ff feature/feature1 # (use the default merge message) git push origin develop git branch -d feature/feature1 External Developer You are an external developer if you do NOT have writing permissions to the repository. Push your feature branch online git push --set-upstream origin feature/external-test Then open a pull request to the develop branch using Github\u2019s GUI","title":"Add New Features"},{"location":"developers/git-flow/#release-a-new-version","text":"Pull the latest develop git checkout develop git pull Create a new release branch git describe --abbrev = 0 --tags # Bump the release (0.1.0 to 0.2.0 => NEW_HOTFIX) git checkout -b release/v [ NEW_RELEASE ] develop Add new tag git tag v [ NEW_RELEASE ] Merge and push the release branch git checkout develop git merge release/v [ NEW_RELEASE ] git push --tags origin develop git branch -d release/v [ NEW_RELEASE ] Fast-forward master git checkout master git merge --ff-only develop git push Go to GitHub and create a new release based on the newest tag v[NEW_RELEASE] (remember to add the change log)","title":"Release a New Version"},{"location":"developers/git-flow/#release-a-hotfix","text":"Pull the latest master git checkout master git pull Start a hotfix branch git describe --abbrev = 0 --tags # Bump the hotfix (0.1.0 to 0.1.1 => NEW_HOTFIX) git checkout -b hotfix/v [ NEW_HOTFIX ] master Fix whatever needs to be fixed Update the change log Tag and merge the hotfix git tag v [ NEW_HOTFIX ] git checkout develop git merge hotfix/v [ NEW_HOTFIX ] git push --tags origin develop git branch -d hotfix/v [ NEW_HOTFIX ] Fast-forward master git checkout master git merge --ff-only v[NEW_HOTFIX] git push Go to GitHub and create a new release based on the newest tag v[NEW_HOTFIX] (remember to add the change log)","title":"Release a Hotfix"},{"location":"developers/remote-support/","text":"Remote Support \u00b6 We use the Live Share extension of Visual Studio Code to debug bugs when sharing data or database credentials is not possible. Install Visual Studio Code Open you RAPIDS root folder in a new VSCode window Open a new Terminal Terminal > New terminal Install the Live Share extension pack Press Ctrl + P or Cmd + P and run this command: >live share: start collaboration session 6. Follow the instructions and share the session link you receive","title":"Remote Support"},{"location":"developers/remote-support/#remote-support","text":"We use the Live Share extension of Visual Studio Code to debug bugs when sharing data or database credentials is not possible. Install Visual Studio Code Open you RAPIDS root folder in a new VSCode window Open a new Terminal Terminal > New terminal Install the Live Share extension pack Press Ctrl + P or Cmd + P and run this command: >live share: start collaboration session 6. Follow the instructions and share the session link you receive","title":"Remote Support"},{"location":"developers/test-cases/","text":"Test Cases \u00b6 Along with the continued development and the addition of new sensors and features to the RAPIDS pipeline, tests for the currently available sensors and features are being implemented. Since this is a Work In Progress this page will be updated with the list of sensors and features for which testing is available. For each of the sensors listed a description of the data used for testing (test cases) are outline. Currently for all intent and testing purposes the tests/data/raw/test01/ contains all the test data files for testing android data formats and tests/data/raw/test02/ contains all the test data files for testing iOS data formats. It follows that the expected (verified output) are contained in the tests/data/processed/test01/ and tests/data/processed/test02/ for Android and iOS respectively. tests/data/raw/test03/ and tests/data/raw/test04/ contain data files for testing empty raw data files for android and iOS respectively. The following is a list of the sensors that testing is currently available. Messages (SMS) \u00b6 The raw message data file contains data for 2 separate days. The data for the first day contains records 5 records for every epoch . The second day's data contains 6 records for each of only 2 epoch (currently morning and evening ) The raw message data contains records for both message_types (i.e. recieved and sent ) in both days in all epochs. The number records with each message_types per epoch is randomly distributed There is at least one records with each message_types per epoch. There is one raw message data file each, as described above, for testing both iOS and Android data. There is also an additional empty data file for both android and iOS for testing empty data files Calls \u00b6 Due to the difference in the format of the raw call data for iOS and Android the following is the expected results the calls_with_datetime_unified.csv . This would give a better idea of the use cases being tested since the calls_with_datetime_unified.csv would make both the iOS and Android data comparable. The call data would contain data for 2 days. The data for the first day contains 6 records for every epoch . The second day's data contains 6 records for each of only 2 epoch (currently morning and evening ) The call data contains records for all call_types (i.e. incoming , outgoing and missed ) in both days in all epochs. The number records with each of the call_types per epoch is randomly distributed. There is at least one records with each call_types per epoch. There is one call data file each, as described above, for testing both iOS and Android data. There is also an additional empty data file for both android and iOS for testing empty data files Screen \u00b6 Due to the difference in the format of the raw screen data for iOS and Android the following is the expected results the screen_deltas.csv . This would give a better idea of the use cases being tested since the screen_eltas.csv would make both the iOS and Android data comparable These files are used to calculate the features for the screen sensor The screen delta data file contains data for 1 day. The screen delta data contains 1 record to represent an unlock episode that falls within an epoch for every epoch . The screen delta data contains 1 record to represent an unlock episode that falls across the boundary of 2 epochs. Namely the unlock episode starts in one epoch and ends in the next, thus there is a record for unlock episodes that fall across night to morning , morning to afternoon and finally afternoon to night The testing is done for unlock episode_type. There is one screen data file each for testing both iOS and Android data formats. There is also an additional empty data file for both android and iOS for testing empty data files Battery \u00b6 Due to the difference in the format of the raw battery data for iOS and Android as well as versions of iOS the following is the expected results the battery_deltas.csv . This would give a better idea of the use cases being tested since the battery_deltas.csv would make both the iOS and Android data comparable. These files are used to calculate the features for the battery sensor. The battery delta data file contains data for 1 day. The battery delta data contains 1 record each for a charging and discharging episode that falls within an epoch for every epoch . Thus, for the daily epoch there would be multiple charging and discharging episodes Since either a charging episode or a discharging episode and not both can occur across epochs, in order to test episodes that occur across epochs alternating episodes of charging and discharging episodes that fall across night to morning , morning to afternoon and finally afternoon to night are present in the battery delta data. This starts with a discharging episode that begins in night and end in morning . There is one battery data file each, for testing both iOS and Android data formats. There is also an additional empty data file for both android and iOS for testing empty data files Bluetooth \u00b6 The raw Bluetooth data file contains data for 1 day. The raw Bluetooth data contains at least 2 records for each epoch . Each epoch has a record with a timestamp for the beginning boundary for that epoch and a record with a timestamp for the ending boundary for that epoch . (e.g. For the morning epoch there is a record with a timestamp for 6:00AM and another record with a timestamp for 11:59:59AM . These are to test edge cases) An option of 5 Bluetooth devices are randomly distributed throughout the data records. There is one raw Bluetooth data file each, for testing both iOS and Android data formats. There is also an additional empty data file for both android and iOS for testing empty data files. WIFI \u00b6 There are 2 data files ( wifi_raw.csv and sensor_wifi_raw.csv ) for each fake participant for each phone platform. The raw WIFI data files contain data for 1 day. The sensor_wifi_raw.csv data contains at least 2 records for each epoch . Each epoch has a record with a timestamp for the beginning boundary for that epoch and a record with a timestamp for the ending boundary for that epoch . (e.g. For the morning epoch there is a record with a timestamp for 6:00AM and another record with a timestamp for 11:59:59AM . These are to test edge cases) The wifi_raw.csv data contains 3 records with random timestamps for each epoch to represent visible broadcasting WIFI network. This file is empty for the iOS phone testing data. An option of 10 access point devices is randomly distributed throughout the data records. 5 each for sensor_wifi_raw.csv and wifi_raw.csv . There data files for testing both iOS and Android data formats. There are also additional empty data files for both android and iOS for testing empty data files. Light \u00b6 The raw light data file contains data for 1 day. The raw light data contains 3 or 4 rows of data for each epoch except night . The single row of data for night is for testing features for single values inputs. (Example testing the standard deviation of one input value) Since light is only available for Android there is only one file that contains data for Android. All other files (i.e. for iPhone) are empty data files. Application Foreground \u00b6 The raw application foreground data file contains data for 1 day. The raw application foreground data contains 7 - 9 rows of data for each epoch . The records for each epoch contains apps that are randomly selected from a list of apps that are from the MULTIPLE_CATEGORIES and SINGLE_CATEGORIES (See testing_config.yaml ). There are also records in each epoch that have apps randomly selected from a list of apps that are from the EXCLUDED_CATEGORIES and EXCLUDED_APPS . This is to test that these apps are actually being excluded from the calculations of features. There are also records to test SINGLE_APPS calculations. Since application foreground is only available for Android there is only one file that contains data for Android. All other files (i.e. for iPhone) are empty data files. Activity Recognition \u00b6 The raw Activity Recognition data file contains data for 1 day. The raw Activity Recognition data each epoch period contains rows that records 2 - 5 different activity_types . The is such that durations of activities can be tested. Additionally, there are records that mimic the duration of an activity over the time boundary of neighboring epochs. (For example, there a set of records that mimic the participant in_vehicle from afternoon into evening ) There is one file each with raw Activity Recognition data for testing both iOS and Android data formats. (plugin_google_activity_recognition_raw.csv for android and plugin_ios_activity_recognition_raw.csv for iOS) There is also an additional empty data file for both android and iOS for testing empty data files. Conversation \u00b6 The raw conversation data file contains data for 2 day. The raw conversation data contains records with a sample of both datatypes (i.e. voice/noise = 0 , and conversation = 2 ) as well as rows with for samples of each of the inference values (i.e. silence = 0 , noise = 1 , voice = 2 , and unknown = 3 ) for each epoch . The different datatype and inference records are randomly distributed throughout the epoch . Additionally there are 2 - 5 records for conversations ( datatype = 2, and inference = -1) in each epoch and for each epoch except night, there is a conversation record that has a double_convo_start timestamp that is from the previous epoch . This is to test the calculations of features across epochs . There is a raw conversation data file for both android and iOS platforms ( plugin_studentlife_audio_android_raw.csv and plugin_studentlife_audio_raw.csv respectively). Finally, there are also additional empty data files for both android and iOS for testing empty data files","title":"Test cases"},{"location":"developers/test-cases/#test-cases","text":"Along with the continued development and the addition of new sensors and features to the RAPIDS pipeline, tests for the currently available sensors and features are being implemented. Since this is a Work In Progress this page will be updated with the list of sensors and features for which testing is available. For each of the sensors listed a description of the data used for testing (test cases) are outline. Currently for all intent and testing purposes the tests/data/raw/test01/ contains all the test data files for testing android data formats and tests/data/raw/test02/ contains all the test data files for testing iOS data formats. It follows that the expected (verified output) are contained in the tests/data/processed/test01/ and tests/data/processed/test02/ for Android and iOS respectively. tests/data/raw/test03/ and tests/data/raw/test04/ contain data files for testing empty raw data files for android and iOS respectively. The following is a list of the sensors that testing is currently available.","title":"Test Cases"},{"location":"developers/test-cases/#messages-sms","text":"The raw message data file contains data for 2 separate days. The data for the first day contains records 5 records for every epoch . The second day's data contains 6 records for each of only 2 epoch (currently morning and evening ) The raw message data contains records for both message_types (i.e. recieved and sent ) in both days in all epochs. The number records with each message_types per epoch is randomly distributed There is at least one records with each message_types per epoch. There is one raw message data file each, as described above, for testing both iOS and Android data. There is also an additional empty data file for both android and iOS for testing empty data files","title":"Messages (SMS)"},{"location":"developers/test-cases/#calls","text":"Due to the difference in the format of the raw call data for iOS and Android the following is the expected results the calls_with_datetime_unified.csv . This would give a better idea of the use cases being tested since the calls_with_datetime_unified.csv would make both the iOS and Android data comparable. The call data would contain data for 2 days. The data for the first day contains 6 records for every epoch . The second day's data contains 6 records for each of only 2 epoch (currently morning and evening ) The call data contains records for all call_types (i.e. incoming , outgoing and missed ) in both days in all epochs. The number records with each of the call_types per epoch is randomly distributed. There is at least one records with each call_types per epoch. There is one call data file each, as described above, for testing both iOS and Android data. There is also an additional empty data file for both android and iOS for testing empty data files","title":"Calls"},{"location":"developers/test-cases/#screen","text":"Due to the difference in the format of the raw screen data for iOS and Android the following is the expected results the screen_deltas.csv . This would give a better idea of the use cases being tested since the screen_eltas.csv would make both the iOS and Android data comparable These files are used to calculate the features for the screen sensor The screen delta data file contains data for 1 day. The screen delta data contains 1 record to represent an unlock episode that falls within an epoch for every epoch . The screen delta data contains 1 record to represent an unlock episode that falls across the boundary of 2 epochs. Namely the unlock episode starts in one epoch and ends in the next, thus there is a record for unlock episodes that fall across night to morning , morning to afternoon and finally afternoon to night The testing is done for unlock episode_type. There is one screen data file each for testing both iOS and Android data formats. There is also an additional empty data file for both android and iOS for testing empty data files","title":"Screen"},{"location":"developers/test-cases/#battery","text":"Due to the difference in the format of the raw battery data for iOS and Android as well as versions of iOS the following is the expected results the battery_deltas.csv . This would give a better idea of the use cases being tested since the battery_deltas.csv would make both the iOS and Android data comparable. These files are used to calculate the features for the battery sensor. The battery delta data file contains data for 1 day. The battery delta data contains 1 record each for a charging and discharging episode that falls within an epoch for every epoch . Thus, for the daily epoch there would be multiple charging and discharging episodes Since either a charging episode or a discharging episode and not both can occur across epochs, in order to test episodes that occur across epochs alternating episodes of charging and discharging episodes that fall across night to morning , morning to afternoon and finally afternoon to night are present in the battery delta data. This starts with a discharging episode that begins in night and end in morning . There is one battery data file each, for testing both iOS and Android data formats. There is also an additional empty data file for both android and iOS for testing empty data files","title":"Battery"},{"location":"developers/test-cases/#bluetooth","text":"The raw Bluetooth data file contains data for 1 day. The raw Bluetooth data contains at least 2 records for each epoch . Each epoch has a record with a timestamp for the beginning boundary for that epoch and a record with a timestamp for the ending boundary for that epoch . (e.g. For the morning epoch there is a record with a timestamp for 6:00AM and another record with a timestamp for 11:59:59AM . These are to test edge cases) An option of 5 Bluetooth devices are randomly distributed throughout the data records. There is one raw Bluetooth data file each, for testing both iOS and Android data formats. There is also an additional empty data file for both android and iOS for testing empty data files.","title":"Bluetooth"},{"location":"developers/test-cases/#wifi","text":"There are 2 data files ( wifi_raw.csv and sensor_wifi_raw.csv ) for each fake participant for each phone platform. The raw WIFI data files contain data for 1 day. The sensor_wifi_raw.csv data contains at least 2 records for each epoch . Each epoch has a record with a timestamp for the beginning boundary for that epoch and a record with a timestamp for the ending boundary for that epoch . (e.g. For the morning epoch there is a record with a timestamp for 6:00AM and another record with a timestamp for 11:59:59AM . These are to test edge cases) The wifi_raw.csv data contains 3 records with random timestamps for each epoch to represent visible broadcasting WIFI network. This file is empty for the iOS phone testing data. An option of 10 access point devices is randomly distributed throughout the data records. 5 each for sensor_wifi_raw.csv and wifi_raw.csv . There data files for testing both iOS and Android data formats. There are also additional empty data files for both android and iOS for testing empty data files.","title":"WIFI"},{"location":"developers/test-cases/#light","text":"The raw light data file contains data for 1 day. The raw light data contains 3 or 4 rows of data for each epoch except night . The single row of data for night is for testing features for single values inputs. (Example testing the standard deviation of one input value) Since light is only available for Android there is only one file that contains data for Android. All other files (i.e. for iPhone) are empty data files.","title":"Light"},{"location":"developers/test-cases/#application-foreground","text":"The raw application foreground data file contains data for 1 day. The raw application foreground data contains 7 - 9 rows of data for each epoch . The records for each epoch contains apps that are randomly selected from a list of apps that are from the MULTIPLE_CATEGORIES and SINGLE_CATEGORIES (See testing_config.yaml ). There are also records in each epoch that have apps randomly selected from a list of apps that are from the EXCLUDED_CATEGORIES and EXCLUDED_APPS . This is to test that these apps are actually being excluded from the calculations of features. There are also records to test SINGLE_APPS calculations. Since application foreground is only available for Android there is only one file that contains data for Android. All other files (i.e. for iPhone) are empty data files.","title":"Application Foreground"},{"location":"developers/test-cases/#activity-recognition","text":"The raw Activity Recognition data file contains data for 1 day. The raw Activity Recognition data each epoch period contains rows that records 2 - 5 different activity_types . The is such that durations of activities can be tested. Additionally, there are records that mimic the duration of an activity over the time boundary of neighboring epochs. (For example, there a set of records that mimic the participant in_vehicle from afternoon into evening ) There is one file each with raw Activity Recognition data for testing both iOS and Android data formats. (plugin_google_activity_recognition_raw.csv for android and plugin_ios_activity_recognition_raw.csv for iOS) There is also an additional empty data file for both android and iOS for testing empty data files.","title":"Activity Recognition"},{"location":"developers/test-cases/#conversation","text":"The raw conversation data file contains data for 2 day. The raw conversation data contains records with a sample of both datatypes (i.e. voice/noise = 0 , and conversation = 2 ) as well as rows with for samples of each of the inference values (i.e. silence = 0 , noise = 1 , voice = 2 , and unknown = 3 ) for each epoch . The different datatype and inference records are randomly distributed throughout the epoch . Additionally there are 2 - 5 records for conversations ( datatype = 2, and inference = -1) in each epoch and for each epoch except night, there is a conversation record that has a double_convo_start timestamp that is from the previous epoch . This is to test the calculations of features across epochs . There is a raw conversation data file for both android and iOS platforms ( plugin_studentlife_audio_android_raw.csv and plugin_studentlife_audio_raw.csv respectively). Finally, there are also additional empty data files for both android and iOS for testing empty data files","title":"Conversation"},{"location":"developers/testing/","text":"Testing \u00b6 The following is a simple guide to testing RAPIDS. All files necessary for testing are stored in the /tests directory Steps for Testing \u00b6 To begin testing RAPIDS place the fake raw input data csv files of each fake participant in tests/data/raw/ . The fake participant files should be placed in tests/data/external/participant_files . The expected output files of RAPIDS after processing the input data should be placed in tests/data/processesd/frequency and tests/data/processesd/periodic for frequency and periodic respectively. Edit tests/settings/frequency/config.yaml and tests/settings/periodic/config.yaml to add and/or remove the rules to be run for testing from the forcerun list. Edit tests/settings/frequency/testing_config.yaml and tests/settings/frequency/testing_config.yaml to configure the settings and enable/disable sensors to be tested. Add any additional testscripts in tests/scripts . Run the testing shell script with tests/scripts/run_tests.sh run_test.sh [ -l ] [ all | periodic | frequency ] [ test ] [-l] will delete all the existing files in /data before running tests. [all | periodic | frequency] will generate feature data for all or specific type of features and save in data/processed . [test] will compare the features generated with the precomputed and verified features in /tests/data/processed . The following is a snippet of the output you should see after running your test. test_sensors_files_exist ( test_sensor_features.TestSensorFeatures ) ... periodic ok test_sensors_features_calculations ( test_sensor_features.TestSensorFeatures ) ... periodic ok test_sensors_files_exist ( test_sensor_features.TestSensorFeatures ) ... frequency ok test_sensors_features_calculations ( test_sensor_features.TestSensorFeatures ) ... frequency FAIL The results above show that the for periodic both test_sensors_files_exist and test_sensors_features_calculations passed while for frequency first test test_sensors_files_exist passed while test_sensors_features_calculations failed. In addition you should get the traceback of the failure (not shown here). For more information on how to implement test scripts and use unittest please see Unittest Documentation Testing of the RAPIDS sensors and features is a work-in-progress. Please see test-cases for a list of sensors and features that have testing currently available. Currently the repository is set up to test a number of sensors out of the box by simply running the tests/scripts/run_tests.sh command once the RAPIDS python environment is active.","title":"Testing"},{"location":"developers/testing/#testing","text":"The following is a simple guide to testing RAPIDS. All files necessary for testing are stored in the /tests directory","title":"Testing"},{"location":"developers/testing/#steps-for-testing","text":"To begin testing RAPIDS place the fake raw input data csv files of each fake participant in tests/data/raw/ . The fake participant files should be placed in tests/data/external/participant_files . The expected output files of RAPIDS after processing the input data should be placed in tests/data/processesd/frequency and tests/data/processesd/periodic for frequency and periodic respectively. Edit tests/settings/frequency/config.yaml and tests/settings/periodic/config.yaml to add and/or remove the rules to be run for testing from the forcerun list. Edit tests/settings/frequency/testing_config.yaml and tests/settings/frequency/testing_config.yaml to configure the settings and enable/disable sensors to be tested. Add any additional testscripts in tests/scripts . Run the testing shell script with tests/scripts/run_tests.sh run_test.sh [ -l ] [ all | periodic | frequency ] [ test ] [-l] will delete all the existing files in /data before running tests. [all | periodic | frequency] will generate feature data for all or specific type of features and save in data/processed . [test] will compare the features generated with the precomputed and verified features in /tests/data/processed . The following is a snippet of the output you should see after running your test. test_sensors_files_exist ( test_sensor_features.TestSensorFeatures ) ... periodic ok test_sensors_features_calculations ( test_sensor_features.TestSensorFeatures ) ... periodic ok test_sensors_files_exist ( test_sensor_features.TestSensorFeatures ) ... frequency ok test_sensors_features_calculations ( test_sensor_features.TestSensorFeatures ) ... frequency FAIL The results above show that the for periodic both test_sensors_files_exist and test_sensors_features_calculations passed while for frequency first test test_sensors_files_exist passed while test_sensors_features_calculations failed. In addition you should get the traceback of the failure (not shown here). For more information on how to implement test scripts and use unittest please see Unittest Documentation Testing of the RAPIDS sensors and features is a work-in-progress. Please see test-cases for a list of sensors and features that have testing currently available. Currently the repository is set up to test a number of sensors out of the box by simply running the tests/scripts/run_tests.sh command once the RAPIDS python environment is active.","title":"Steps for Testing"},{"location":"developers/virtual-environments/","text":"Python Virtual Environment \u00b6 Add new packages \u00b6 Try to install any new package using conda install -c CHANNEL PACKAGE_NAME (you can use pip if the package is only available there). Make sure your Python virtual environment is active ( conda activate YOUR_ENV ). Remove packages \u00b6 Uninstall packages using the same manager you used to install them conda remove PACKAGE_NAME or pip uninstall PACKAGE_NAME Updating all packages \u00b6 Make sure your Python virtual environment is active ( conda activate YOUR_ENV ), then run conda update --all Update your conda environment.yaml \u00b6 After installing or removing a package you can use the following command in your terminal to update your environment.yaml before publishing your pipeline. Note that we ignore the package version for libfortran and mkl to keep compatibility with Linux: conda env export --no-builds | sed 's/^.*libgfortran.*$/ - libgfortran/' | sed 's/^.*mkl=.*$/ - mkl/' > environment.yml R Virtual Environment \u00b6 Add new packages \u00b6 Open your terminal and navigate to RAPIDS\u2019 root folder Run R to open an R interactive session Run renv::install(\"PACKAGE_NAME\") Remove packages \u00b6 Open your terminal and navigate to RAPIDS\u2019 root folder Run R to open an R interactive session Run renv::remove(\"PACKAGE_NAME\") Updating all packages \u00b6 Open your terminal and navigate to RAPIDS\u2019 root folder Run R to open an R interactive session Run renv::update() Update your R renv.lock \u00b6 After installing or removing a package you can use the following command in your terminal to update your renv.lock before publishing your pipeline. Open your terminal and navigate to RAPIDS\u2019 root folder Run R to open an R interactive session Run renv::snapshot() (renv will ask you to confirm any updates to this file)","title":"Virtual Environments"},{"location":"developers/virtual-environments/#python-virtual-environment","text":"","title":"Python Virtual Environment"},{"location":"developers/virtual-environments/#add-new-packages","text":"Try to install any new package using conda install -c CHANNEL PACKAGE_NAME (you can use pip if the package is only available there). Make sure your Python virtual environment is active ( conda activate YOUR_ENV ).","title":"Add new packages"},{"location":"developers/virtual-environments/#remove-packages","text":"Uninstall packages using the same manager you used to install them conda remove PACKAGE_NAME or pip uninstall PACKAGE_NAME","title":"Remove packages"},{"location":"developers/virtual-environments/#updating-all-packages","text":"Make sure your Python virtual environment is active ( conda activate YOUR_ENV ), then run conda update --all","title":"Updating all packages"},{"location":"developers/virtual-environments/#update-your-conda-environmentyaml","text":"After installing or removing a package you can use the following command in your terminal to update your environment.yaml before publishing your pipeline. Note that we ignore the package version for libfortran and mkl to keep compatibility with Linux: conda env export --no-builds | sed 's/^.*libgfortran.*$/ - libgfortran/' | sed 's/^.*mkl=.*$/ - mkl/' > environment.yml","title":"Update your conda environment.yaml"},{"location":"developers/virtual-environments/#r-virtual-environment","text":"","title":"R Virtual Environment"},{"location":"developers/virtual-environments/#add-new-packages_1","text":"Open your terminal and navigate to RAPIDS\u2019 root folder Run R to open an R interactive session Run renv::install(\"PACKAGE_NAME\")","title":"Add new packages"},{"location":"developers/virtual-environments/#remove-packages_1","text":"Open your terminal and navigate to RAPIDS\u2019 root folder Run R to open an R interactive session Run renv::remove(\"PACKAGE_NAME\")","title":"Remove packages"},{"location":"developers/virtual-environments/#updating-all-packages_1","text":"Open your terminal and navigate to RAPIDS\u2019 root folder Run R to open an R interactive session Run renv::update()","title":"Updating all packages"},{"location":"developers/virtual-environments/#update-your-r-renvlock","text":"After installing or removing a package you can use the following command in your terminal to update your renv.lock before publishing your pipeline. Open your terminal and navigate to RAPIDS\u2019 root folder Run R to open an R interactive session Run renv::snapshot() (renv will ask you to confirm any updates to this file)","title":"Update your R renv.lock"},{"location":"features/add-new-features/","text":"Add New Features \u00b6 Hint We recommend reading the Behavioral Features Introduction before reading this page Hint You won\u2019t have to deal with time zones, dates, times, data cleaning or preprocessing. The data that RAPIDS pipes to your feature extraction code is ready to process. New Features for Existing Sensors \u00b6 You can add new features to any existing sensors (see list below) by adding a new provider in three steps: Modify the config.yaml file Create a provider folder, script and function Implement your features extraction code As a tutorial, we will add a new provider for PHONE_ACCELEROMETER called VEGA that extracts feature1 , feature2 , feature3 in Python and that it requires a parameter from the user called MY_PARAMETER . Existing Sensors An existing sensor is any of the phone or Fitbit sensors with a configuration entry in config.yaml : Phone Accelerometer Phone Activity Recognition Phone Applications Foreground Phone Battery Phone Bluetooth Phone Calls Phone Conversation Phone Data Yield Phone Light Phone Locations Phone Messages Phone Screen Phone WiFI Connected Phone WiFI Visible Fitbit Heart Rate Summary Fitbit Heart Rate Intraday Fitbit Sleep Summary Fitbit Steps Summary Fitbit Steps Intraday Modify the config.yaml file \u00b6 In this step you need to add your provider configuration section under the relevant sensor in config.yaml . See our example for our tutorial\u2019s VEGA provider for PHONE_ACCELEROMETER : Example configuration for a new accelerometer provider VEGA PHONE_ACCELEROMETER : TABLE : accelerometer PROVIDERS : RAPIDS : COMPUTE : False ... PANDA : COMPUTE : False ... VEGA : COMPUTE : False FEATURES : [ \"feature1\" , \"feature2\" , \"feature3\" ] MY_PARAMTER : a_string SRC_FOLDER : \"vega\" SRC_LANGUAGE : \"python\" Key Description [COMPUTE] Flag to activate/deactivate your provider [FEATURES] List of features your provider supports. Your provider code should only return the features on this list [MY_PARAMTER] An arbitrary parameter that our example provider VEGA needs. This can be a boolean, integer, float, string or an array of any of such types. [SRC_LANGUAGE] The programming language of your provider script, it can be python or r , in our example python [SRC_FOLDER] The name of your provider in lower case, in our example vega (this will be the name of your folder in the next step) Create a provider folder, script and function \u00b6 In this step you need to add a folder, script and function for your provider. Create your provider folder under src/feature/DEVICE_SENSOR/YOUR_PROVIDER , in our example src/feature/phone_accelerometer/vega (same as [SRC_FOLDER] in the step above). Create your provider script inside your provider folder, it can be a Python file called main.py or an R file called main.R . Add your provider function in your provider script. The name of such function should be [providername]_features , in our example vega_features Python function def [ providername ] _features ( sensor_data_files , time_segment , provider , filter_data_by_segment , * args , ** kwargs ): R function [ providername ] _ features <- function ( sensor_data , time_segment , provider ) Implement your feature extraction code \u00b6 The provider function that you created in the step above will receive the following parameters: Parameter Description sensor_data_files Path to the CSV file containing the data of a single participant. This data has been cleaned and preprocessed. Your function will be automatically called for each participant in your study (in the [PIDS] array in config.yaml ) time_segment The label of the time segment that should be processed. provider The parameters you configured for your provider in config.yaml will be available in this variable as a dictionary in Python or a list in R. In our example this dictionary contains {MY_PARAMETER:\"a_string\"} filter_data_by_segment Python only. A function that you will use to filter your data. In R this function is already available in the environment. *args Python only. Not used for now **kwargs Python only. Not used for now The code to extract your behavioral features should be implemented in your provider function and in general terms it will have three stages: 1. Read a participant\u2019s data by loading the CSV data stored in the file pointed by sensor_data_files acc_data = pd . read_csv ( sensor_data_files [ \"sensor_data\" ]) Note that phone\u2019s battery, screen, and activity recognition data is given as episodes instead of event rows (for example, start and end timestamps of the periods the phone screen was on) 2. Filter your data to process only those rows that belong to time_segment This step is only one line of code, but to undersand why we need it, keep reading. acc_data = filter_data_by_segment ( acc_data , time_segment ) You should use the filter_data_by_segment() function to process and group those rows that belong to each of the time segments RAPIDS could be configured with . Let\u2019s understand the filter_data_by_segment() function with an example. A RAPIDS user can extract features on any arbitrary time segment . A time segment is a period of time that has a label and one or more instances. For example, the user (or you) could have requested features on a daily, weekly, and week-end basis for p01 . The labels are arbritrary and the instances depend on the days a participant was monitored for: the daily segment could be named my_days and if p01 was monitored for 14 days, it would have 14 instances the weekly segment could be named my_weeks and if p01 was monitored for 14 days, it would have 2 instances. the weekend segment could be named my_weekends and if p01 was monitored for 14 days, it would have 2 instances. For this example, RAPIDS will call your provider function three times for p01 , once where time_segment is my_days , once where time_segment is my_weeks and once where time_segment is my_weekends . In this example not every row in p01 \u2018s data needs to take part in the feature computation for either segment and the rows need to be grouped differently. Thus filter_data_by_segment() comes in handy, it will return a data frame that contains the rows that were logged during a time segment plus an extra column called local_segment . This new column will have as many unique values as time segment instances exist (14, 2, and 2 for our p01 \u2018s my_days , my_weeks , and my_weekends examples). After filtering, you should group the data frame by this column and compute any desired features , for example: acc_features [ \"maxmagnitude\" ] = acc_data . groupby ([ \"local_segment\" ])[ \"magnitude\" ] . max () The reason RAPIDS does not filter the participant\u2019s data set for you is because your code might need to compute something based on a participant\u2019s complete dataset before computing their features. For example, you might want to identify the number that called a participant the most throughout the study before computing a feature with the number of calls the participant received from this number. 3. Return a data frame with your features After filtering, grouping your data, and computing your features, your provider function should return a data frame that has: One row per time segment instance (e.g. 14 our p01 \u2018s my_days example) The local_segment column added by filter_data_by_segment() One column per feature. By convention the name of your features should only contain letters or numbers ( feature1 ). RAPIDS will automatically add the right sensor and provider prefix ( phone_accelerometr_vega_ ) PHONE_ACCELEROMETER Provider Example For your reference, this a short example of our own provider ( RAPIDS ) for PHONE_ACCELEROMETER that computes five acceleration features def rapids_features ( sensor_data_files , time_segment , provider , filter_data_by_segment , * args , ** kwargs ): acc_data = pd . read_csv ( sensor_data_files [ \"sensor_data\" ]) requested_features = provider [ \"FEATURES\" ] # name of the features this function can compute base_features_names = [ \"maxmagnitude\" , \"minmagnitude\" , \"avgmagnitude\" , \"medianmagnitude\" , \"stdmagnitude\" ] # the subset of requested features this function can compute features_to_compute = list ( set ( requested_features ) & set ( base_features_names )) acc_features = pd . DataFrame ( columns = [ \"local_segment\" ] + features_to_compute ) if not acc_data . empty : acc_data = filter_data_by_segment ( acc_data , time_segment ) if not acc_data . empty : acc_features = pd . DataFrame () # get magnitude related features: magnitude = sqrt(x^2+y^2+z^2) magnitude = acc_data . apply ( lambda row : np . sqrt ( row [ \"double_values_0\" ] ** 2 + row [ \"double_values_1\" ] ** 2 + row [ \"double_values_2\" ] ** 2 ), axis = 1 ) acc_data = acc_data . assign ( magnitude = magnitude . values ) if \"maxmagnitude\" in features_to_compute : acc_features [ \"maxmagnitude\" ] = acc_data . groupby ([ \"local_segment\" ])[ \"magnitude\" ] . max () if \"minmagnitude\" in features_to_compute : acc_features [ \"minmagnitude\" ] = acc_data . groupby ([ \"local_segment\" ])[ \"magnitude\" ] . min () if \"avgmagnitude\" in features_to_compute : acc_features [ \"avgmagnitude\" ] = acc_data . groupby ([ \"local_segment\" ])[ \"magnitude\" ] . mean () if \"medianmagnitude\" in features_to_compute : acc_features [ \"medianmagnitude\" ] = acc_data . groupby ([ \"local_segment\" ])[ \"magnitude\" ] . median () if \"stdmagnitude\" in features_to_compute : acc_features [ \"stdmagnitude\" ] = acc_data . groupby ([ \"local_segment\" ])[ \"magnitude\" ] . std () acc_features = acc_features . reset_index () return acc_features New Features for Non-Existing Sensors \u00b6 If you want to add features for a device or a sensor that we do not support at the moment (those that do not appear in the \"Existing Sensors\" list above), contact us or request it on Slack and we can add the necessary code so you can follow the instructions above.","title":"Add New Features"},{"location":"features/add-new-features/#add-new-features","text":"Hint We recommend reading the Behavioral Features Introduction before reading this page Hint You won\u2019t have to deal with time zones, dates, times, data cleaning or preprocessing. The data that RAPIDS pipes to your feature extraction code is ready to process.","title":"Add New Features"},{"location":"features/add-new-features/#new-features-for-existing-sensors","text":"You can add new features to any existing sensors (see list below) by adding a new provider in three steps: Modify the config.yaml file Create a provider folder, script and function Implement your features extraction code As a tutorial, we will add a new provider for PHONE_ACCELEROMETER called VEGA that extracts feature1 , feature2 , feature3 in Python and that it requires a parameter from the user called MY_PARAMETER . Existing Sensors An existing sensor is any of the phone or Fitbit sensors with a configuration entry in config.yaml : Phone Accelerometer Phone Activity Recognition Phone Applications Foreground Phone Battery Phone Bluetooth Phone Calls Phone Conversation Phone Data Yield Phone Light Phone Locations Phone Messages Phone Screen Phone WiFI Connected Phone WiFI Visible Fitbit Heart Rate Summary Fitbit Heart Rate Intraday Fitbit Sleep Summary Fitbit Steps Summary Fitbit Steps Intraday","title":"New Features for Existing Sensors"},{"location":"features/add-new-features/#modify-the-configyaml-file","text":"In this step you need to add your provider configuration section under the relevant sensor in config.yaml . See our example for our tutorial\u2019s VEGA provider for PHONE_ACCELEROMETER : Example configuration for a new accelerometer provider VEGA PHONE_ACCELEROMETER : TABLE : accelerometer PROVIDERS : RAPIDS : COMPUTE : False ... PANDA : COMPUTE : False ... VEGA : COMPUTE : False FEATURES : [ \"feature1\" , \"feature2\" , \"feature3\" ] MY_PARAMTER : a_string SRC_FOLDER : \"vega\" SRC_LANGUAGE : \"python\" Key Description [COMPUTE] Flag to activate/deactivate your provider [FEATURES] List of features your provider supports. Your provider code should only return the features on this list [MY_PARAMTER] An arbitrary parameter that our example provider VEGA needs. This can be a boolean, integer, float, string or an array of any of such types. [SRC_LANGUAGE] The programming language of your provider script, it can be python or r , in our example python [SRC_FOLDER] The name of your provider in lower case, in our example vega (this will be the name of your folder in the next step)","title":"Modify the config.yaml file"},{"location":"features/add-new-features/#create-a-provider-folder-script-and-function","text":"In this step you need to add a folder, script and function for your provider. Create your provider folder under src/feature/DEVICE_SENSOR/YOUR_PROVIDER , in our example src/feature/phone_accelerometer/vega (same as [SRC_FOLDER] in the step above). Create your provider script inside your provider folder, it can be a Python file called main.py or an R file called main.R . Add your provider function in your provider script. The name of such function should be [providername]_features , in our example vega_features Python function def [ providername ] _features ( sensor_data_files , time_segment , provider , filter_data_by_segment , * args , ** kwargs ): R function [ providername ] _ features <- function ( sensor_data , time_segment , provider )","title":"Create a provider folder, script and function"},{"location":"features/add-new-features/#implement-your-feature-extraction-code","text":"The provider function that you created in the step above will receive the following parameters: Parameter Description sensor_data_files Path to the CSV file containing the data of a single participant. This data has been cleaned and preprocessed. Your function will be automatically called for each participant in your study (in the [PIDS] array in config.yaml ) time_segment The label of the time segment that should be processed. provider The parameters you configured for your provider in config.yaml will be available in this variable as a dictionary in Python or a list in R. In our example this dictionary contains {MY_PARAMETER:\"a_string\"} filter_data_by_segment Python only. A function that you will use to filter your data. In R this function is already available in the environment. *args Python only. Not used for now **kwargs Python only. Not used for now The code to extract your behavioral features should be implemented in your provider function and in general terms it will have three stages: 1. Read a participant\u2019s data by loading the CSV data stored in the file pointed by sensor_data_files acc_data = pd . read_csv ( sensor_data_files [ \"sensor_data\" ]) Note that phone\u2019s battery, screen, and activity recognition data is given as episodes instead of event rows (for example, start and end timestamps of the periods the phone screen was on) 2. Filter your data to process only those rows that belong to time_segment This step is only one line of code, but to undersand why we need it, keep reading. acc_data = filter_data_by_segment ( acc_data , time_segment ) You should use the filter_data_by_segment() function to process and group those rows that belong to each of the time segments RAPIDS could be configured with . Let\u2019s understand the filter_data_by_segment() function with an example. A RAPIDS user can extract features on any arbitrary time segment . A time segment is a period of time that has a label and one or more instances. For example, the user (or you) could have requested features on a daily, weekly, and week-end basis for p01 . The labels are arbritrary and the instances depend on the days a participant was monitored for: the daily segment could be named my_days and if p01 was monitored for 14 days, it would have 14 instances the weekly segment could be named my_weeks and if p01 was monitored for 14 days, it would have 2 instances. the weekend segment could be named my_weekends and if p01 was monitored for 14 days, it would have 2 instances. For this example, RAPIDS will call your provider function three times for p01 , once where time_segment is my_days , once where time_segment is my_weeks and once where time_segment is my_weekends . In this example not every row in p01 \u2018s data needs to take part in the feature computation for either segment and the rows need to be grouped differently. Thus filter_data_by_segment() comes in handy, it will return a data frame that contains the rows that were logged during a time segment plus an extra column called local_segment . This new column will have as many unique values as time segment instances exist (14, 2, and 2 for our p01 \u2018s my_days , my_weeks , and my_weekends examples). After filtering, you should group the data frame by this column and compute any desired features , for example: acc_features [ \"maxmagnitude\" ] = acc_data . groupby ([ \"local_segment\" ])[ \"magnitude\" ] . max () The reason RAPIDS does not filter the participant\u2019s data set for you is because your code might need to compute something based on a participant\u2019s complete dataset before computing their features. For example, you might want to identify the number that called a participant the most throughout the study before computing a feature with the number of calls the participant received from this number. 3. Return a data frame with your features After filtering, grouping your data, and computing your features, your provider function should return a data frame that has: One row per time segment instance (e.g. 14 our p01 \u2018s my_days example) The local_segment column added by filter_data_by_segment() One column per feature. By convention the name of your features should only contain letters or numbers ( feature1 ). RAPIDS will automatically add the right sensor and provider prefix ( phone_accelerometr_vega_ ) PHONE_ACCELEROMETER Provider Example For your reference, this a short example of our own provider ( RAPIDS ) for PHONE_ACCELEROMETER that computes five acceleration features def rapids_features ( sensor_data_files , time_segment , provider , filter_data_by_segment , * args , ** kwargs ): acc_data = pd . read_csv ( sensor_data_files [ \"sensor_data\" ]) requested_features = provider [ \"FEATURES\" ] # name of the features this function can compute base_features_names = [ \"maxmagnitude\" , \"minmagnitude\" , \"avgmagnitude\" , \"medianmagnitude\" , \"stdmagnitude\" ] # the subset of requested features this function can compute features_to_compute = list ( set ( requested_features ) & set ( base_features_names )) acc_features = pd . DataFrame ( columns = [ \"local_segment\" ] + features_to_compute ) if not acc_data . empty : acc_data = filter_data_by_segment ( acc_data , time_segment ) if not acc_data . empty : acc_features = pd . DataFrame () # get magnitude related features: magnitude = sqrt(x^2+y^2+z^2) magnitude = acc_data . apply ( lambda row : np . sqrt ( row [ \"double_values_0\" ] ** 2 + row [ \"double_values_1\" ] ** 2 + row [ \"double_values_2\" ] ** 2 ), axis = 1 ) acc_data = acc_data . assign ( magnitude = magnitude . values ) if \"maxmagnitude\" in features_to_compute : acc_features [ \"maxmagnitude\" ] = acc_data . groupby ([ \"local_segment\" ])[ \"magnitude\" ] . max () if \"minmagnitude\" in features_to_compute : acc_features [ \"minmagnitude\" ] = acc_data . groupby ([ \"local_segment\" ])[ \"magnitude\" ] . min () if \"avgmagnitude\" in features_to_compute : acc_features [ \"avgmagnitude\" ] = acc_data . groupby ([ \"local_segment\" ])[ \"magnitude\" ] . mean () if \"medianmagnitude\" in features_to_compute : acc_features [ \"medianmagnitude\" ] = acc_data . groupby ([ \"local_segment\" ])[ \"magnitude\" ] . median () if \"stdmagnitude\" in features_to_compute : acc_features [ \"stdmagnitude\" ] = acc_data . groupby ([ \"local_segment\" ])[ \"magnitude\" ] . std () acc_features = acc_features . reset_index () return acc_features","title":"Implement your feature extraction code"},{"location":"features/add-new-features/#new-features-for-non-existing-sensors","text":"If you want to add features for a device or a sensor that we do not support at the moment (those that do not appear in the \"Existing Sensors\" list above), contact us or request it on Slack and we can add the necessary code so you can follow the instructions above.","title":"New Features for Non-Existing Sensors"},{"location":"features/feature-introduction/","text":"Behavioral Features Introduction \u00b6 Every phone or Fitbit sensor has a corresponding config section in config.yaml , these sections follow a similar structure and we\u2019ll use PHONE_ACCELEROMETER as an example to explain this structure. Hint We recommend reading this page if you are using RAPIDS for the first time All computed sensor features are stored under /data/processed/features on files per sensor, per participant and per study (all participants). Every time you change any sensor parameters, provider parameters or provider features, all the necessary files will be updated as soon as you execute RAPIDS. Config section example for PHONE_ACCELEROMETER # 1) Config section PHONE_ACCELEROMETER : # 2) Parameters for PHONE_ACCELEROMETER TABLE : accelerometer # 3) Providers for PHONE_ACCELEROMETER PROVIDERS : # 4) RAPIDS provider RAPIDS : # 4.1) Parameters of RAPIDS provider of PHONE_ACCELEROMETER COMPUTE : False # 4.2) Features of RAPIDS provider of PHONE_ACCELEROMETER FEATURES : [ \"maxmagnitude\" , \"minmagnitude\" , \"avgmagnitude\" , \"medianmagnitude\" , \"stdmagnitude\" ] SRC_FOLDER : \"rapids\" # inside src/features/phone_accelerometer SRC_LANGUAGE : \"python\" # 5) PANDA provider PANDA : # 5.1) Parameters of PANDA provider of PHONE_ACCELEROMETER COMPUTE : False VALID_SENSED_MINUTES : False # 5.2) Features of PANDA provider of PHONE_ACCELEROMETER FEATURES : exertional_activity_episode : [ \"sumduration\" , \"maxduration\" , \"minduration\" , \"avgduration\" , \"medianduration\" , \"stdduration\" ] nonexertional_activity_episode : [ \"sumduration\" , \"maxduration\" , \"minduration\" , \"avgduration\" , \"medianduration\" , \"stdduration\" ] SRC_FOLDER : \"panda\" # inside src/features/phone_accelerometer SRC_LANGUAGE : \"python\" Sensor Parameters \u00b6 Each sensor configuration section has a \u201cparameters\u201d subsection (see #2 in the example). These are parameters that affect different aspects of how the raw data is downloaded, and processed. The TABLE parameter exists for every sensor, but some sensors will have extra parameters like [PHONE_LOCATIONS] . We explain these parameters in a table at the top of each sensor documentation page. Sensor Providers \u00b6 Each sensor configuration section can have zero, one or more behavioral feature providers (see #3 in the example). A provider is a script created by the core RAPIDS team or other researchers that extracts behavioral features for that sensor. In this example, accelerometer has two providers: RAPIDS (see #4 ) and PANDA (see #5 ). Provider Parameters \u00b6 Each provider has parameters that affect the computation of the behavioral features it offers (see #4.1 or #5.1 in the example). These parameters will include at least a [COMPUTE] flag that you switch to True to extract a provider\u2019s behavioral features. We explain every provider\u2019s parameter in a table under the Parameters description heading on each provider documentation page. Provider Features \u00b6 Each provider offers a set of behavioral features (see #4.2 or #5.2 in the example). For some providers these features are grouped in an array (like those for RAPIDS provider in #4.2 ) but for others they are grouped in a collection of arrays depending on the meaning and purpose of those features (like those for PANDAS provider in #5.2 ). In either case, you can delete the features you are not interested in and they will not be included in the sensor\u2019s output feature file. We explain each behavioral feature in a table under the Features description heading on each provider documentation page.","title":"Introduction"},{"location":"features/feature-introduction/#behavioral-features-introduction","text":"Every phone or Fitbit sensor has a corresponding config section in config.yaml , these sections follow a similar structure and we\u2019ll use PHONE_ACCELEROMETER as an example to explain this structure. Hint We recommend reading this page if you are using RAPIDS for the first time All computed sensor features are stored under /data/processed/features on files per sensor, per participant and per study (all participants). Every time you change any sensor parameters, provider parameters or provider features, all the necessary files will be updated as soon as you execute RAPIDS. Config section example for PHONE_ACCELEROMETER # 1) Config section PHONE_ACCELEROMETER : # 2) Parameters for PHONE_ACCELEROMETER TABLE : accelerometer # 3) Providers for PHONE_ACCELEROMETER PROVIDERS : # 4) RAPIDS provider RAPIDS : # 4.1) Parameters of RAPIDS provider of PHONE_ACCELEROMETER COMPUTE : False # 4.2) Features of RAPIDS provider of PHONE_ACCELEROMETER FEATURES : [ \"maxmagnitude\" , \"minmagnitude\" , \"avgmagnitude\" , \"medianmagnitude\" , \"stdmagnitude\" ] SRC_FOLDER : \"rapids\" # inside src/features/phone_accelerometer SRC_LANGUAGE : \"python\" # 5) PANDA provider PANDA : # 5.1) Parameters of PANDA provider of PHONE_ACCELEROMETER COMPUTE : False VALID_SENSED_MINUTES : False # 5.2) Features of PANDA provider of PHONE_ACCELEROMETER FEATURES : exertional_activity_episode : [ \"sumduration\" , \"maxduration\" , \"minduration\" , \"avgduration\" , \"medianduration\" , \"stdduration\" ] nonexertional_activity_episode : [ \"sumduration\" , \"maxduration\" , \"minduration\" , \"avgduration\" , \"medianduration\" , \"stdduration\" ] SRC_FOLDER : \"panda\" # inside src/features/phone_accelerometer SRC_LANGUAGE : \"python\"","title":"Behavioral Features Introduction"},{"location":"features/feature-introduction/#sensor-parameters","text":"Each sensor configuration section has a \u201cparameters\u201d subsection (see #2 in the example). These are parameters that affect different aspects of how the raw data is downloaded, and processed. The TABLE parameter exists for every sensor, but some sensors will have extra parameters like [PHONE_LOCATIONS] . We explain these parameters in a table at the top of each sensor documentation page.","title":"Sensor Parameters"},{"location":"features/feature-introduction/#sensor-providers","text":"Each sensor configuration section can have zero, one or more behavioral feature providers (see #3 in the example). A provider is a script created by the core RAPIDS team or other researchers that extracts behavioral features for that sensor. In this example, accelerometer has two providers: RAPIDS (see #4 ) and PANDA (see #5 ).","title":"Sensor Providers"},{"location":"features/feature-introduction/#provider-parameters","text":"Each provider has parameters that affect the computation of the behavioral features it offers (see #4.1 or #5.1 in the example). These parameters will include at least a [COMPUTE] flag that you switch to True to extract a provider\u2019s behavioral features. We explain every provider\u2019s parameter in a table under the Parameters description heading on each provider documentation page.","title":"Provider Parameters"},{"location":"features/feature-introduction/#provider-features","text":"Each provider offers a set of behavioral features (see #4.2 or #5.2 in the example). For some providers these features are grouped in an array (like those for RAPIDS provider in #4.2 ) but for others they are grouped in a collection of arrays depending on the meaning and purpose of those features (like those for PANDAS provider in #5.2 ). In either case, you can delete the features you are not interested in and they will not be included in the sensor\u2019s output feature file. We explain each behavioral feature in a table under the Features description heading on each provider documentation page.","title":"Provider Features"},{"location":"features/fitbit-heartrate-intraday/","text":"Fitbit Heart Rate Intraday \u00b6 Sensor parameters description for [FITBIT_HEARTRATE_INTRADAY] : Key Description [TABLE] Database table name or file path where the heart rate intraday data is stored. The configuration keys in Device Data Source Configuration control whether this parameter is interpreted as table or file. The format of the column(s) containing the Fitbit sensor data can be JSON or PLAIN_TEXT . The data in JSON format is obtained directly from the Fitbit API. We support PLAIN_TEXT in case you already parsed your data and don\u2019t have access to your participants\u2019 Fitbit accounts anymore. If your data is in JSON format then summary and intraday data come packed together. We provide examples of the input format that RAPIDS expects, note that both examples for JSON and PLAIN_TEXT are tabular and the actual format difference comes in the fitbit_data column (we truncate the JSON example for brevity). Example of the structure of source data JSON device_id fitbit_data a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201cactivities-heart\u201d:[{\u201cdateTime\u201d:\u201d2020-10-07\u201d,\u201dvalue\u201d:{\u201ccustomHeartRateZones\u201d:[],\u201dheartRateZones\u201d:[{\u201ccaloriesOut\u201d:1200.6102,\u201dmax\u201d:88,\u201dmin\u201d:31,\u201dminutes\u201d:1058,\u201dname\u201d:\u201dOut of Range\u201d},{\u201ccaloriesOut\u201d:760.3020,\u201dmax\u201d:120,\u201dmin\u201d:86,\u201dminutes\u201d:366,\u201dname\u201d:\u201dFat Burn\u201d},{\u201ccaloriesOut\u201d:15.2048,\u201dmax\u201d:146,\u201dmin\u201d:120,\u201dminutes\u201d:2,\u201dname\u201d:\u201dCardio\u201d},{\u201ccaloriesOut\u201d:0,\u201dmax\u201d:221,\u201dmin\u201d:148,\u201dminutes\u201d:0,\u201dname\u201d:\u201dPeak\u201d}],\u201drestingHeartRate\u201d:72}}],\u201dactivities-heart-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:68},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:67},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:67},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201cactivities-heart\u201d:[{\u201cdateTime\u201d:\u201d2020-10-08\u201d,\u201dvalue\u201d:{\u201ccustomHeartRateZones\u201d:[],\u201dheartRateZones\u201d:[{\u201ccaloriesOut\u201d:1100.1120,\u201dmax\u201d:89,\u201dmin\u201d:30,\u201dminutes\u201d:921,\u201dname\u201d:\u201dOut of Range\u201d},{\u201ccaloriesOut\u201d:660.0012,\u201dmax\u201d:118,\u201dmin\u201d:82,\u201dminutes\u201d:361,\u201dname\u201d:\u201dFat Burn\u201d},{\u201ccaloriesOut\u201d:23.7088,\u201dmax\u201d:142,\u201dmin\u201d:108,\u201dminutes\u201d:3,\u201dname\u201d:\u201dCardio\u201d},{\u201ccaloriesOut\u201d:0,\u201dmax\u201d:221,\u201dmin\u201d:148,\u201dminutes\u201d:0,\u201dname\u201d:\u201dPeak\u201d}],\u201drestingHeartRate\u201d:70}}],\u201dactivities-heart-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:77},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:75},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:73},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201cactivities-heart\u201d:[{\u201cdateTime\u201d:\u201d2020-10-09\u201d,\u201dvalue\u201d:{\u201ccustomHeartRateZones\u201d:[],\u201dheartRateZones\u201d:[{\u201ccaloriesOut\u201d:750.3615,\u201dmax\u201d:77,\u201dmin\u201d:30,\u201dminutes\u201d:851,\u201dname\u201d:\u201dOut of Range\u201d},{\u201ccaloriesOut\u201d:734.1516,\u201dmax\u201d:107,\u201dmin\u201d:77,\u201dminutes\u201d:550,\u201dname\u201d:\u201dFat Burn\u201d},{\u201ccaloriesOut\u201d:131.8579,\u201dmax\u201d:130,\u201dmin\u201d:107,\u201dminutes\u201d:29,\u201dname\u201d:\u201dCardio\u201d},{\u201ccaloriesOut\u201d:0,\u201dmax\u201d:220,\u201dmin\u201d:130,\u201dminutes\u201d:0,\u201dname\u201d:\u201dPeak\u201d}],\u201drestingHeartRate\u201d:69}}],\u201dactivities-heart-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:90},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:89},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:88},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} PLAIN_TEXT All columns are mandatory, however, all except device_id and local_date_time can be empty if you don\u2019t have that data. Just have in mind that some features will be empty if some of these columns are empty. device_id local_date_time heartrate heartrate_zone a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 00:00:00 68 outofrange a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 00:01:00 67 outofrange a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 00:02:00 67 outofrange RAPIDS provider \u00b6 Available time segments Available for all time segments File Sequence - data/raw/ { pid } /fitbit_heartrate_intraday_raw.csv - data/raw/ { pid } /fitbit_heartrate_intraday_parsed.csv - data/raw/ { pid } /fitbit_heartrate_intraday_parsed_with_datetime.csv - data/interim/ { pid } /fitbit_heartrate_intraday_features/fitbit_heartrate_intraday_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /fitbit_heartrate_intraday.csv Parameters description for [FITBIT_HEARTRATE_INTRADAY][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract FITBIT_HEARTRATE_INTRADAY features from the RAPIDS provider [FEATURES] Features to be computed from heart rate intraday data, see table below Features description for [FITBIT_HEARTRATE_INTRADAY][PROVIDERS][RAPIDS] : Feature Units Description maxhr beats/mins The maximum heart rate during a time segment. minhr beats/mins The minimum heart rate during a time segment. avghr beats/mins The average heart rate during a time segment. medianhr beats/mins The median of heart rate during a time segment. modehr beats/mins The mode of heart rate during a time segment. stdhr beats/mins The standard deviation of heart rate during a time segment. diffmaxmodehr beats/mins The difference between the maximum and mode heart rate during a time segment. diffminmodehr beats/mins The difference between the mode and minimum heart rate during a time segment. entropyhr nats Shannon\u2019s entropy measurement based on heart rate during a time segment. minutesonZONE minutes Number of minutes the user\u2019s heart rate fell within each heartrate_zone during a time segment. Assumptions/Observations There are four heart rate zones (ZONE): outofrange , fatburn , cardio , and peak . Please refer to Fitbit documentation for more information about the way they are computed.","title":"Fitbit Heart Rate Intraday"},{"location":"features/fitbit-heartrate-intraday/#fitbit-heart-rate-intraday","text":"Sensor parameters description for [FITBIT_HEARTRATE_INTRADAY] : Key Description [TABLE] Database table name or file path where the heart rate intraday data is stored. The configuration keys in Device Data Source Configuration control whether this parameter is interpreted as table or file. The format of the column(s) containing the Fitbit sensor data can be JSON or PLAIN_TEXT . The data in JSON format is obtained directly from the Fitbit API. We support PLAIN_TEXT in case you already parsed your data and don\u2019t have access to your participants\u2019 Fitbit accounts anymore. If your data is in JSON format then summary and intraday data come packed together. We provide examples of the input format that RAPIDS expects, note that both examples for JSON and PLAIN_TEXT are tabular and the actual format difference comes in the fitbit_data column (we truncate the JSON example for brevity). Example of the structure of source data JSON device_id fitbit_data a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201cactivities-heart\u201d:[{\u201cdateTime\u201d:\u201d2020-10-07\u201d,\u201dvalue\u201d:{\u201ccustomHeartRateZones\u201d:[],\u201dheartRateZones\u201d:[{\u201ccaloriesOut\u201d:1200.6102,\u201dmax\u201d:88,\u201dmin\u201d:31,\u201dminutes\u201d:1058,\u201dname\u201d:\u201dOut of Range\u201d},{\u201ccaloriesOut\u201d:760.3020,\u201dmax\u201d:120,\u201dmin\u201d:86,\u201dminutes\u201d:366,\u201dname\u201d:\u201dFat Burn\u201d},{\u201ccaloriesOut\u201d:15.2048,\u201dmax\u201d:146,\u201dmin\u201d:120,\u201dminutes\u201d:2,\u201dname\u201d:\u201dCardio\u201d},{\u201ccaloriesOut\u201d:0,\u201dmax\u201d:221,\u201dmin\u201d:148,\u201dminutes\u201d:0,\u201dname\u201d:\u201dPeak\u201d}],\u201drestingHeartRate\u201d:72}}],\u201dactivities-heart-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:68},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:67},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:67},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201cactivities-heart\u201d:[{\u201cdateTime\u201d:\u201d2020-10-08\u201d,\u201dvalue\u201d:{\u201ccustomHeartRateZones\u201d:[],\u201dheartRateZones\u201d:[{\u201ccaloriesOut\u201d:1100.1120,\u201dmax\u201d:89,\u201dmin\u201d:30,\u201dminutes\u201d:921,\u201dname\u201d:\u201dOut of Range\u201d},{\u201ccaloriesOut\u201d:660.0012,\u201dmax\u201d:118,\u201dmin\u201d:82,\u201dminutes\u201d:361,\u201dname\u201d:\u201dFat Burn\u201d},{\u201ccaloriesOut\u201d:23.7088,\u201dmax\u201d:142,\u201dmin\u201d:108,\u201dminutes\u201d:3,\u201dname\u201d:\u201dCardio\u201d},{\u201ccaloriesOut\u201d:0,\u201dmax\u201d:221,\u201dmin\u201d:148,\u201dminutes\u201d:0,\u201dname\u201d:\u201dPeak\u201d}],\u201drestingHeartRate\u201d:70}}],\u201dactivities-heart-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:77},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:75},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:73},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201cactivities-heart\u201d:[{\u201cdateTime\u201d:\u201d2020-10-09\u201d,\u201dvalue\u201d:{\u201ccustomHeartRateZones\u201d:[],\u201dheartRateZones\u201d:[{\u201ccaloriesOut\u201d:750.3615,\u201dmax\u201d:77,\u201dmin\u201d:30,\u201dminutes\u201d:851,\u201dname\u201d:\u201dOut of Range\u201d},{\u201ccaloriesOut\u201d:734.1516,\u201dmax\u201d:107,\u201dmin\u201d:77,\u201dminutes\u201d:550,\u201dname\u201d:\u201dFat Burn\u201d},{\u201ccaloriesOut\u201d:131.8579,\u201dmax\u201d:130,\u201dmin\u201d:107,\u201dminutes\u201d:29,\u201dname\u201d:\u201dCardio\u201d},{\u201ccaloriesOut\u201d:0,\u201dmax\u201d:220,\u201dmin\u201d:130,\u201dminutes\u201d:0,\u201dname\u201d:\u201dPeak\u201d}],\u201drestingHeartRate\u201d:69}}],\u201dactivities-heart-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:90},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:89},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:88},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} PLAIN_TEXT All columns are mandatory, however, all except device_id and local_date_time can be empty if you don\u2019t have that data. Just have in mind that some features will be empty if some of these columns are empty. device_id local_date_time heartrate heartrate_zone a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 00:00:00 68 outofrange a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 00:01:00 67 outofrange a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 00:02:00 67 outofrange","title":"Fitbit Heart Rate Intraday"},{"location":"features/fitbit-heartrate-intraday/#rapids-provider","text":"Available time segments Available for all time segments File Sequence - data/raw/ { pid } /fitbit_heartrate_intraday_raw.csv - data/raw/ { pid } /fitbit_heartrate_intraday_parsed.csv - data/raw/ { pid } /fitbit_heartrate_intraday_parsed_with_datetime.csv - data/interim/ { pid } /fitbit_heartrate_intraday_features/fitbit_heartrate_intraday_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /fitbit_heartrate_intraday.csv Parameters description for [FITBIT_HEARTRATE_INTRADAY][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract FITBIT_HEARTRATE_INTRADAY features from the RAPIDS provider [FEATURES] Features to be computed from heart rate intraday data, see table below Features description for [FITBIT_HEARTRATE_INTRADAY][PROVIDERS][RAPIDS] : Feature Units Description maxhr beats/mins The maximum heart rate during a time segment. minhr beats/mins The minimum heart rate during a time segment. avghr beats/mins The average heart rate during a time segment. medianhr beats/mins The median of heart rate during a time segment. modehr beats/mins The mode of heart rate during a time segment. stdhr beats/mins The standard deviation of heart rate during a time segment. diffmaxmodehr beats/mins The difference between the maximum and mode heart rate during a time segment. diffminmodehr beats/mins The difference between the mode and minimum heart rate during a time segment. entropyhr nats Shannon\u2019s entropy measurement based on heart rate during a time segment. minutesonZONE minutes Number of minutes the user\u2019s heart rate fell within each heartrate_zone during a time segment. Assumptions/Observations There are four heart rate zones (ZONE): outofrange , fatburn , cardio , and peak . Please refer to Fitbit documentation for more information about the way they are computed.","title":"RAPIDS provider"},{"location":"features/fitbit-heartrate-summary/","text":"Fitbit Heart Rate Summary \u00b6 Sensor parameters description for [FITBIT_HEARTRATE_SUMMARY] : Key Description [TABLE] Database table name or file path where the heart rate summary data is stored. The configuration keys in Device Data Source Configuration control whether this parameter is interpreted as table or file. The format of the column(s) containing the Fitbit sensor data can be JSON or PLAIN_TEXT . The data in JSON format is obtained directly from the Fitbit API. We support PLAIN_TEXT in case you already parsed your data and don\u2019t have access to your participants\u2019 Fitbit accounts anymore. If your data is in JSON format then summary and intraday data come packed together. We provide examples of the input format that RAPIDS expects, note that both examples for JSON and PLAIN_TEXT are tabular and the actual format difference comes in the fitbit_data column (we truncate the JSON example for brevity). Example of the structure of source data JSON device_id fitbit_data a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201cactivities-heart\u201d:[{\u201cdateTime\u201d:\u201d2020-10-07\u201d,\u201dvalue\u201d:{\u201ccustomHeartRateZones\u201d:[],\u201dheartRateZones\u201d:[{\u201ccaloriesOut\u201d:1200.6102,\u201dmax\u201d:88,\u201dmin\u201d:31,\u201dminutes\u201d:1058,\u201dname\u201d:\u201dOut of Range\u201d},{\u201ccaloriesOut\u201d:760.3020,\u201dmax\u201d:120,\u201dmin\u201d:86,\u201dminutes\u201d:366,\u201dname\u201d:\u201dFat Burn\u201d},{\u201ccaloriesOut\u201d:15.2048,\u201dmax\u201d:146,\u201dmin\u201d:120,\u201dminutes\u201d:2,\u201dname\u201d:\u201dCardio\u201d},{\u201ccaloriesOut\u201d:0,\u201dmax\u201d:221,\u201dmin\u201d:148,\u201dminutes\u201d:0,\u201dname\u201d:\u201dPeak\u201d}],\u201drestingHeartRate\u201d:72}}],\u201dactivities-heart-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:68},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:67},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:67},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201cactivities-heart\u201d:[{\u201cdateTime\u201d:\u201d2020-10-08\u201d,\u201dvalue\u201d:{\u201ccustomHeartRateZones\u201d:[],\u201dheartRateZones\u201d:[{\u201ccaloriesOut\u201d:1100.1120,\u201dmax\u201d:89,\u201dmin\u201d:30,\u201dminutes\u201d:921,\u201dname\u201d:\u201dOut of Range\u201d},{\u201ccaloriesOut\u201d:660.0012,\u201dmax\u201d:118,\u201dmin\u201d:82,\u201dminutes\u201d:361,\u201dname\u201d:\u201dFat Burn\u201d},{\u201ccaloriesOut\u201d:23.7088,\u201dmax\u201d:142,\u201dmin\u201d:108,\u201dminutes\u201d:3,\u201dname\u201d:\u201dCardio\u201d},{\u201ccaloriesOut\u201d:0,\u201dmax\u201d:221,\u201dmin\u201d:148,\u201dminutes\u201d:0,\u201dname\u201d:\u201dPeak\u201d}],\u201drestingHeartRate\u201d:70}}],\u201dactivities-heart-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:77},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:75},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:73},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201cactivities-heart\u201d:[{\u201cdateTime\u201d:\u201d2020-10-09\u201d,\u201dvalue\u201d:{\u201ccustomHeartRateZones\u201d:[],\u201dheartRateZones\u201d:[{\u201ccaloriesOut\u201d:750.3615,\u201dmax\u201d:77,\u201dmin\u201d:30,\u201dminutes\u201d:851,\u201dname\u201d:\u201dOut of Range\u201d},{\u201ccaloriesOut\u201d:734.1516,\u201dmax\u201d:107,\u201dmin\u201d:77,\u201dminutes\u201d:550,\u201dname\u201d:\u201dFat Burn\u201d},{\u201ccaloriesOut\u201d:131.8579,\u201dmax\u201d:130,\u201dmin\u201d:107,\u201dminutes\u201d:29,\u201dname\u201d:\u201dCardio\u201d},{\u201ccaloriesOut\u201d:0,\u201dmax\u201d:220,\u201dmin\u201d:130,\u201dminutes\u201d:0,\u201dname\u201d:\u201dPeak\u201d}],\u201drestingHeartRate\u201d:69}}],\u201dactivities-heart-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:90},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:89},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:88},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} PLAIN_TEXT All columns are mandatory, however, all except device_id and local_date_time can be empty if you don\u2019t have that data. Just have in mind that some features will be empty if some of these columns are empty. device_id local_date_time heartrate_daily_restinghr heartrate_daily_caloriesoutofrange heartrate_daily_caloriesfatburn heartrate_daily_caloriescardio heartrate_daily_caloriespeak a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 72 1200.6102 760.3020 15.2048 0 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-08 70 1100.1120 660.0012 23.7088 0 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-09 69 750.3615 734.1516 131.8579 0 RAPIDS provider \u00b6 Available time segments Only available for segments that span 1 or more complete days (e.g. Jan 1 st 00:00 to Jan 3 rd 23:59) File Sequence - data/raw/ { pid } /fitbit_heartrate_summary_raw.csv - data/raw/ { pid } /fitbit_heartrate_summary_parsed.csv - data/raw/ { pid } /fitbit_heartrate_summary_parsed_with_datetime.csv - data/interim/ { pid } /fitbit_heartrate_summary_features/fitbit_heartrate_summary_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /fitbit_heartrate_summary.csv Parameters description for [FITBIT_HEARTRATE_SUMMARY][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract FITBIT_HEARTRATE_SUMMARY features from the RAPIDS provider [FEATURES] Features to be computed from heart rate summary data, see table below Features description for [FITBIT_HEARTRATE_SUMMARY][PROVIDERS][RAPIDS] : Feature Units Description maxrestinghr beats/mins The maximum daily resting heart rate during a time segment. minrestinghr beats/mins The minimum daily resting heart rate during a time segment. avgrestinghr beats/mins The average daily resting heart rate during a time segment. medianrestinghr beats/mins The median of daily resting heart rate during a time segment. moderestinghr beats/mins The mode of daily resting heart rate during a time segment. stdrestinghr beats/mins The standard deviation of daily resting heart rate during a time segment. diffmaxmoderestinghr beats/mins The difference between the maximum and mode daily resting heart rate during a time segment. diffminmoderestinghr beats/mins The difference between the mode and minimum daily resting heart rate during a time segment. entropyrestinghr nats Shannon\u2019s entropy measurement based on daily resting heart rate during a time segment. sumcaloriesZONE cals The total daily calories burned within heartrate_zone during a time segment. maxcaloriesZONE cals The maximum daily calories burned within heartrate_zone during a time segment. mincaloriesZONE cals The minimum daily calories burned within heartrate_zone during a time segment. avgcaloriesZONE cals The average daily calories burned within heartrate_zone during a time segment. mediancaloriesZONE cals The median of daily calories burned within heartrate_zone during a time segment. stdcaloriesZONE cals The standard deviation of daily calories burned within heartrate_zone during a time segment. entropycaloriesZONE nats Shannon\u2019s entropy measurement based on daily calories burned within heartrate_zone during a time segment. Assumptions/Observations There are four heart rate zones (ZONE): outofrange , fatburn , cardio , and peak . Please refer to Fitbit documentation for more information about the way they are computed. Calories\u2019 accuracy depends on the users\u2019 Fitbit profile (weight, height, etc.).","title":"Fitbit Heart Rate Summary"},{"location":"features/fitbit-heartrate-summary/#fitbit-heart-rate-summary","text":"Sensor parameters description for [FITBIT_HEARTRATE_SUMMARY] : Key Description [TABLE] Database table name or file path where the heart rate summary data is stored. The configuration keys in Device Data Source Configuration control whether this parameter is interpreted as table or file. The format of the column(s) containing the Fitbit sensor data can be JSON or PLAIN_TEXT . The data in JSON format is obtained directly from the Fitbit API. We support PLAIN_TEXT in case you already parsed your data and don\u2019t have access to your participants\u2019 Fitbit accounts anymore. If your data is in JSON format then summary and intraday data come packed together. We provide examples of the input format that RAPIDS expects, note that both examples for JSON and PLAIN_TEXT are tabular and the actual format difference comes in the fitbit_data column (we truncate the JSON example for brevity). Example of the structure of source data JSON device_id fitbit_data a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201cactivities-heart\u201d:[{\u201cdateTime\u201d:\u201d2020-10-07\u201d,\u201dvalue\u201d:{\u201ccustomHeartRateZones\u201d:[],\u201dheartRateZones\u201d:[{\u201ccaloriesOut\u201d:1200.6102,\u201dmax\u201d:88,\u201dmin\u201d:31,\u201dminutes\u201d:1058,\u201dname\u201d:\u201dOut of Range\u201d},{\u201ccaloriesOut\u201d:760.3020,\u201dmax\u201d:120,\u201dmin\u201d:86,\u201dminutes\u201d:366,\u201dname\u201d:\u201dFat Burn\u201d},{\u201ccaloriesOut\u201d:15.2048,\u201dmax\u201d:146,\u201dmin\u201d:120,\u201dminutes\u201d:2,\u201dname\u201d:\u201dCardio\u201d},{\u201ccaloriesOut\u201d:0,\u201dmax\u201d:221,\u201dmin\u201d:148,\u201dminutes\u201d:0,\u201dname\u201d:\u201dPeak\u201d}],\u201drestingHeartRate\u201d:72}}],\u201dactivities-heart-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:68},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:67},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:67},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201cactivities-heart\u201d:[{\u201cdateTime\u201d:\u201d2020-10-08\u201d,\u201dvalue\u201d:{\u201ccustomHeartRateZones\u201d:[],\u201dheartRateZones\u201d:[{\u201ccaloriesOut\u201d:1100.1120,\u201dmax\u201d:89,\u201dmin\u201d:30,\u201dminutes\u201d:921,\u201dname\u201d:\u201dOut of Range\u201d},{\u201ccaloriesOut\u201d:660.0012,\u201dmax\u201d:118,\u201dmin\u201d:82,\u201dminutes\u201d:361,\u201dname\u201d:\u201dFat Burn\u201d},{\u201ccaloriesOut\u201d:23.7088,\u201dmax\u201d:142,\u201dmin\u201d:108,\u201dminutes\u201d:3,\u201dname\u201d:\u201dCardio\u201d},{\u201ccaloriesOut\u201d:0,\u201dmax\u201d:221,\u201dmin\u201d:148,\u201dminutes\u201d:0,\u201dname\u201d:\u201dPeak\u201d}],\u201drestingHeartRate\u201d:70}}],\u201dactivities-heart-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:77},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:75},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:73},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201cactivities-heart\u201d:[{\u201cdateTime\u201d:\u201d2020-10-09\u201d,\u201dvalue\u201d:{\u201ccustomHeartRateZones\u201d:[],\u201dheartRateZones\u201d:[{\u201ccaloriesOut\u201d:750.3615,\u201dmax\u201d:77,\u201dmin\u201d:30,\u201dminutes\u201d:851,\u201dname\u201d:\u201dOut of Range\u201d},{\u201ccaloriesOut\u201d:734.1516,\u201dmax\u201d:107,\u201dmin\u201d:77,\u201dminutes\u201d:550,\u201dname\u201d:\u201dFat Burn\u201d},{\u201ccaloriesOut\u201d:131.8579,\u201dmax\u201d:130,\u201dmin\u201d:107,\u201dminutes\u201d:29,\u201dname\u201d:\u201dCardio\u201d},{\u201ccaloriesOut\u201d:0,\u201dmax\u201d:220,\u201dmin\u201d:130,\u201dminutes\u201d:0,\u201dname\u201d:\u201dPeak\u201d}],\u201drestingHeartRate\u201d:69}}],\u201dactivities-heart-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:90},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:89},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:88},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} PLAIN_TEXT All columns are mandatory, however, all except device_id and local_date_time can be empty if you don\u2019t have that data. Just have in mind that some features will be empty if some of these columns are empty. device_id local_date_time heartrate_daily_restinghr heartrate_daily_caloriesoutofrange heartrate_daily_caloriesfatburn heartrate_daily_caloriescardio heartrate_daily_caloriespeak a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 72 1200.6102 760.3020 15.2048 0 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-08 70 1100.1120 660.0012 23.7088 0 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-09 69 750.3615 734.1516 131.8579 0","title":"Fitbit Heart Rate Summary"},{"location":"features/fitbit-heartrate-summary/#rapids-provider","text":"Available time segments Only available for segments that span 1 or more complete days (e.g. Jan 1 st 00:00 to Jan 3 rd 23:59) File Sequence - data/raw/ { pid } /fitbit_heartrate_summary_raw.csv - data/raw/ { pid } /fitbit_heartrate_summary_parsed.csv - data/raw/ { pid } /fitbit_heartrate_summary_parsed_with_datetime.csv - data/interim/ { pid } /fitbit_heartrate_summary_features/fitbit_heartrate_summary_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /fitbit_heartrate_summary.csv Parameters description for [FITBIT_HEARTRATE_SUMMARY][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract FITBIT_HEARTRATE_SUMMARY features from the RAPIDS provider [FEATURES] Features to be computed from heart rate summary data, see table below Features description for [FITBIT_HEARTRATE_SUMMARY][PROVIDERS][RAPIDS] : Feature Units Description maxrestinghr beats/mins The maximum daily resting heart rate during a time segment. minrestinghr beats/mins The minimum daily resting heart rate during a time segment. avgrestinghr beats/mins The average daily resting heart rate during a time segment. medianrestinghr beats/mins The median of daily resting heart rate during a time segment. moderestinghr beats/mins The mode of daily resting heart rate during a time segment. stdrestinghr beats/mins The standard deviation of daily resting heart rate during a time segment. diffmaxmoderestinghr beats/mins The difference between the maximum and mode daily resting heart rate during a time segment. diffminmoderestinghr beats/mins The difference between the mode and minimum daily resting heart rate during a time segment. entropyrestinghr nats Shannon\u2019s entropy measurement based on daily resting heart rate during a time segment. sumcaloriesZONE cals The total daily calories burned within heartrate_zone during a time segment. maxcaloriesZONE cals The maximum daily calories burned within heartrate_zone during a time segment. mincaloriesZONE cals The minimum daily calories burned within heartrate_zone during a time segment. avgcaloriesZONE cals The average daily calories burned within heartrate_zone during a time segment. mediancaloriesZONE cals The median of daily calories burned within heartrate_zone during a time segment. stdcaloriesZONE cals The standard deviation of daily calories burned within heartrate_zone during a time segment. entropycaloriesZONE nats Shannon\u2019s entropy measurement based on daily calories burned within heartrate_zone during a time segment. Assumptions/Observations There are four heart rate zones (ZONE): outofrange , fatburn , cardio , and peak . Please refer to Fitbit documentation for more information about the way they are computed. Calories\u2019 accuracy depends on the users\u2019 Fitbit profile (weight, height, etc.).","title":"RAPIDS provider"},{"location":"features/fitbit-sleep-summary/","text":"Fitbit Sleep Summary \u00b6 Sensor parameters description for [FITBIT_SLEEP_SUMMARY] : Key Description [TABLE] Database table name or file path where the sleep summary data is stored. The configuration keys in Device Data Source Configuration control whether this parameter is interpreted as table or file. The format of the column(s) containing the Fitbit sensor data can be JSON or PLAIN_TEXT . The data in JSON format is obtained directly from the Fitbit API. We support PLAIN_TEXT in case you already parsed your data and don\u2019t have access to your participants\u2019 Fitbit accounts anymore. If your data is in JSON format then summary and intraday data come packed together. We provide examples of the input format that RAPIDS expects, note that both examples for JSON and PLAIN_TEXT are tabular and the actual format difference comes in the fitbit_data column (we truncate the JSON example for brevity). Example of the structure of source data with Fitbit\u2019s sleep API Version 1 JSON device_id fitbit_data a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201csleep\u201d: [{\u201cawakeCount\u201d: 2, \u201cawakeDuration\u201d: 3, \u201cawakeningsCount\u201d: 10, \u201cdateOfSleep\u201d: \u201c2020-10-07\u201d, \u201cduration\u201d: 8100000, \u201cefficiency\u201d: 91, \u201cendTime\u201d: \u201c2020-10-07T18:10:00.000\u201d, \u201cisMainSleep\u201d: true, \u201clogId\u201d: 14147921940, \u201cminuteData\u201d: [{\u201cdateTime\u201d: \u201c15:55:00\u201d, \u201cvalue\u201d: \u201c3\u201d}, {\u201cdateTime\u201d: \u201c15:56:00\u201d, \u201cvalue\u201d: \u201c3\u201d}, {\u201cdateTime\u201d: \u201c15:57:00\u201d, \u201cvalue\u201d: \u201c2\u201d},\u2026], \u201cminutesAfterWakeup\u201d: 0, \u201cminutesAsleep\u201d: 123, \u201cminutesAwake\u201d: 12, \u201cminutesToFallAsleep\u201d: 0, \u201crestlessCount\u201d: 8, \u201crestlessDuration\u201d: 9, \u201cstartTime\u201d: \u201c2020-10-07T15:55:00.000\u201d, \u201ctimeInBed\u201d: 135}, {\u201cawakeCount\u201d: 0, \u201cawakeDuration\u201d: 0, \u201cawakeningsCount\u201d: 1, \u201cdateOfSleep\u201d: \u201c2020-10-07\u201d, \u201cduration\u201d: 3780000, \u201cefficiency\u201d: 100, \u201cendTime\u201d: \u201c2020-10-07T10:52:30.000\u201d, \u201cisMainSleep\u201d: false, \u201clogId\u201d: 14144903977, \u201cminuteData\u201d: [{\u201cdateTime\u201d: \u201c09:49:00\u201d, \u201cvalue\u201d: \u201c1\u201d}, {\u201cdateTime\u201d: \u201c09:50:00\u201d, \u201cvalue\u201d: \u201c1\u201d}, {\u201cdateTime\u201d: \u201c09:51:00\u201d, \u201cvalue\u201d: \u201c1\u201d},\u2026], \u201cminutesAfterWakeup\u201d: 1, \u201cminutesAsleep\u201d: 62, \u201cminutesAwake\u201d: 0, \u201cminutesToFallAsleep\u201d: 0, \u201crestlessCount\u201d: 1, \u201crestlessDuration\u201d: 1, \u201cstartTime\u201d: \u201c2020-10-07T09:49:00.000\u201d, \u201ctimeInBed\u201d: 63}], \u201csummary\u201d: {\u201ctotalMinutesAsleep\u201d: 185, \u201ctotalSleepRecords\u201d: 2, \u201ctotalTimeInBed\u201d: 198}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201csleep\u201d: [{\u201cawakeCount\u201d: 3, \u201cawakeDuration\u201d: 21, \u201cawakeningsCount\u201d: 16, \u201cdateOfSleep\u201d: \u201c2020-10-08\u201d, \u201cduration\u201d: 19260000, \u201cefficiency\u201d: 89, \u201cendTime\u201d: \u201c2020-10-08T06:01:30.000\u201d, \u201cisMainSleep\u201d: true, \u201clogId\u201d: 14150613895, \u201cminuteData\u201d: [{\u201cdateTime\u201d: \u201c00:40:00\u201d, \u201cvalue\u201d: \u201c3\u201d}, {\u201cdateTime\u201d: \u201c00:41:00\u201d, \u201cvalue\u201d: \u201c3\u201d}, {\u201cdateTime\u201d: \u201c00:42:00\u201d, \u201cvalue\u201d: \u201c3\u201d},\u2026], \u201cminutesAfterWakeup\u201d: 0, \u201cminutesAsleep\u201d: 275, \u201cminutesAwake\u201d: 33, \u201cminutesToFallAsleep\u201d: 0, \u201crestlessCount\u201d: 13, \u201crestlessDuration\u201d: 25, \u201cstartTime\u201d: \u201c2020-10-08T00:40:00.000\u201d, \u201ctimeInBed\u201d: 321}], \u201csummary\u201d: {\u201ctotalMinutesAsleep\u201d: 275, \u201ctotalSleepRecords\u201d: 1, \u201ctotalTimeInBed\u201d: 321}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201csleep\u201d: [{\u201cawakeCount\u201d: 1, \u201cawakeDuration\u201d: 3, \u201cawakeningsCount\u201d: 8, \u201cdateOfSleep\u201d: \u201c2020-10-09\u201d, \u201cduration\u201d: 19320000, \u201cefficiency\u201d: 96, \u201cendTime\u201d: \u201c2020-10-09T05:57:30.000\u201d, \u201cisMainSleep\u201d: true, \u201clogId\u201d: 14161136803, \u201cminuteData\u201d: [{\u201cdateTime\u201d: \u201c00:35:30\u201d, \u201cvalue\u201d: \u201c2\u201d}, {\u201cdateTime\u201d: \u201c00:36:30\u201d, \u201cvalue\u201d: \u201c1\u201d}, {\u201cdateTime\u201d: \u201c00:37:30\u201d, \u201cvalue\u201d: \u201c1\u201d},\u2026], \u201cminutesAfterWakeup\u201d: 0, \u201cminutesAsleep\u201d: 309, \u201cminutesAwake\u201d: 13, \u201cminutesToFallAsleep\u201d: 0, \u201crestlessCount\u201d: 7, \u201crestlessDuration\u201d: 10, \u201cstartTime\u201d: \u201c2020-10-09T00:35:30.000\u201d, \u201ctimeInBed\u201d: 322}], \u201csummary\u201d: {\u201ctotalMinutesAsleep\u201d: 309, \u201ctotalSleepRecords\u201d: 1, \u201ctotalTimeInBed\u201d: 322}} PLAIN_TEXT All columns are mandatory, however, all except device_id and local_date_time can be empty if you don\u2019t have that data. Just have in mind that some features will be empty if some of these columns are empty. device_id local_start_date_time local_end_date_time efficiency minutes_after_wakeup minutes_asleep minutes_awake minutes_to_fall_asleep minutes_in_bed is_main_sleep type count_awake duration_awake count_awakenings count_restless duration_restless a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 15:55:00 2020-10-07 18:10:00 91 0 123 12 0 135 1 classic 2 3 10 8 9 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 09:49:00 2020-10-07 10:52:30 100 1 62 0 0 63 0 classic 0 0 1 1 1 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-08 00:40:00 2020-10-08 06:01:30 89 0 275 33 0 321 1 classic 3 21 16 13 25 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-09 00:35:30 2020-10-09 05:57:30 96 0 309 13 0 322 1 classic 1 3 8 7 10 Example of the structure of source data with Fitbit\u2019s sleep API Version 1.2 JSON device_id fitbit_data a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201csleep\u201d:[{\u201cdateOfSleep\u201d:\u201d2020-10-10\u201d,\u201dduration\u201d:3600000,\u201defficiency\u201d:92,\u201dendTime\u201d:\u201d2020-10-10T16:37:00.000\u201d,\u201dinfoCode\u201d:2,\u201disMainSleep\u201d:false,\u201dlevels\u201d:{\u201cdata\u201d:[{\u201cdateTime\u201d:\u201d2020-10-10T15:36:30.000\u201d,\u201dlevel\u201d:\u201drestless\u201d,\u201dseconds\u201d:60},{\u201cdateTime\u201d:\u201d2020-10-10T15:37:30.000\u201d,\u201dlevel\u201d:\u201dasleep\u201d,\u201dseconds\u201d:660},{\u201cdateTime\u201d:\u201d2020-10-10T15:48:30.000\u201d,\u201dlevel\u201d:\u201drestless\u201d,\u201dseconds\u201d:60},\u2026], \u201csummary\u201d:{\u201casleep\u201d:{\u201ccount\u201d:0,\u201dminutes\u201d:56},\u201dawake\u201d:{\u201ccount\u201d:0,\u201dminutes\u201d:0},\u201drestless\u201d:{\u201ccount\u201d:3,\u201dminutes\u201d:4}}},\u201dlogId\u201d:26315914306,\u201dminutesAfterWakeup\u201d:0,\u201dminutesAsleep\u201d:55,\u201dminutesAwake\u201d:5,\u201dminutesToFallAsleep\u201d:0,\u201dstartTime\u201d:\u201d2020-10-10T15:36:30.000\u201d,\u201dtimeInBed\u201d:60,\u201dtype\u201d:\u201dclassic\u201d},{\u201cdateOfSleep\u201d:\u201d2020-10-10\u201d,\u201dduration\u201d:22980000,\u201defficiency\u201d:88,\u201dendTime\u201d:\u201d2020-10-10T08:10:00.000\u201d,\u201dinfoCode\u201d:0,\u201disMainSleep\u201d:true,\u201dlevels\u201d:{\u201cdata\u201d:[{\u201cdateTime\u201d:\u201d2020-10-10T01:46:30.000\u201d,\u201dlevel\u201d:\u201dlight\u201d,\u201dseconds\u201d:420},{\u201cdateTime\u201d:\u201d2020-10-10T01:53:30.000\u201d,\u201dlevel\u201d:\u201ddeep\u201d,\u201dseconds\u201d:1230},{\u201cdateTime\u201d:\u201d2020-10-10T02:14:00.000\u201d,\u201dlevel\u201d:\u201dlight\u201d,\u201dseconds\u201d:360},\u2026], \u201csummary\u201d:{\u201cdeep\u201d:{\u201ccount\u201d:3,\u201dminutes\u201d:92,\u201dthirtyDayAvgMinutes\u201d:0},\u201dlight\u201d:{\u201ccount\u201d:29,\u201dminutes\u201d:193,\u201dthirtyDayAvgMinutes\u201d:0},\u201drem\u201d:{\u201ccount\u201d:4,\u201dminutes\u201d:33,\u201dthirtyDayAvgMinutes\u201d:0},\u201dwake\u201d:{\u201ccount\u201d:28,\u201dminutes\u201d:65,\u201dthirtyDayAvgMinutes\u201d:0}}},\u201dlogId\u201d:26311786557,\u201dminutesAfterWakeup\u201d:0,\u201dminutesAsleep\u201d:318,\u201dminutesAwake\u201d:65,\u201dminutesToFallAsleep\u201d:0,\u201dstartTime\u201d:\u201d2020-10-10T01:46:30.000\u201d,\u201dtimeInBed\u201d:383,\u201dtype\u201d:\u201dstages\u201d}],\u201dsummary\u201d:{\u201cstages\u201d:{\u201cdeep\u201d:92,\u201dlight\u201d:193,\u201drem\u201d:33,\u201dwake\u201d:65},\u201dtotalMinutesAsleep\u201d:373,\u201dtotalSleepRecords\u201d:2,\u201dtotalTimeInBed\u201d:443}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201csleep\u201d:[{\u201cdateOfSleep\u201d:\u201d2020-10-11\u201d,\u201dduration\u201d:41640000,\u201defficiency\u201d:89,\u201dendTime\u201d:\u201d2020-10-11T11:47:00.000\u201d,\u201dinfoCode\u201d:0,\u201disMainSleep\u201d:true,\u201dlevels\u201d:{\u201cdata\u201d:[{\u201cdateTime\u201d:\u201d2020-10-11T00:12:30.000\u201d,\u201dlevel\u201d:\u201dwake\u201d,\u201dseconds\u201d:450},{\u201cdateTime\u201d:\u201d2020-10-11T00:20:00.000\u201d,\u201dlevel\u201d:\u201dlight\u201d,\u201dseconds\u201d:870},{\u201cdateTime\u201d:\u201d2020-10-11T00:34:30.000\u201d,\u201dlevel\u201d:\u201dwake\u201d,\u201dseconds\u201d:780},\u2026], \u201csummary\u201d:{\u201cdeep\u201d:{\u201ccount\u201d:4,\u201dminutes\u201d:52,\u201dthirtyDayAvgMinutes\u201d:62},\u201dlight\u201d:{\u201ccount\u201d:32,\u201dminutes\u201d:442,\u201dthirtyDayAvgMinutes\u201d:364},\u201drem\u201d:{\u201ccount\u201d:6,\u201dminutes\u201d:68,\u201dthirtyDayAvgMinutes\u201d:58},\u201dwake\u201d:{\u201ccount\u201d:29,\u201dminutes\u201d:132,\u201dthirtyDayAvgMinutes\u201d:94}}},\u201dlogId\u201d:26589710670,\u201dminutesAfterWakeup\u201d:1,\u201dminutesAsleep\u201d:562,\u201dminutesAwake\u201d:132,\u201dminutesToFallAsleep\u201d:0,\u201dstartTime\u201d:\u201d2020-10-11T00:12:30.000\u201d,\u201dtimeInBed\u201d:694,\u201dtype\u201d:\u201dstages\u201d}],\u201dsummary\u201d:{\u201cstages\u201d:{\u201cdeep\u201d:52,\u201dlight\u201d:442,\u201drem\u201d:68,\u201dwake\u201d:132},\u201dtotalMinutesAsleep\u201d:562,\u201dtotalSleepRecords\u201d:1,\u201dtotalTimeInBed\u201d:694}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201csleep\u201d:[{\u201cdateOfSleep\u201d:\u201d2020-10-12\u201d,\u201dduration\u201d:28980000,\u201defficiency\u201d:93,\u201dendTime\u201d:\u201d2020-10-12T09:34:30.000\u201d,\u201dinfoCode\u201d:0,\u201disMainSleep\u201d:true,\u201dlevels\u201d:{\u201cdata\u201d:[{\u201cdateTime\u201d:\u201d2020-10-12T01:31:00.000\u201d,\u201dlevel\u201d:\u201dwake\u201d,\u201dseconds\u201d:600},{\u201cdateTime\u201d:\u201d2020-10-12T01:41:00.000\u201d,\u201dlevel\u201d:\u201dlight\u201d,\u201dseconds\u201d:60},{\u201cdateTime\u201d:\u201d2020-10-12T01:42:00.000\u201d,\u201dlevel\u201d:\u201ddeep\u201d,\u201dseconds\u201d:2340},\u2026], \u201csummary\u201d:{\u201cdeep\u201d:{\u201ccount\u201d:4,\u201dminutes\u201d:63,\u201dthirtyDayAvgMinutes\u201d:59},\u201dlight\u201d:{\u201ccount\u201d:27,\u201dminutes\u201d:257,\u201dthirtyDayAvgMinutes\u201d:364},\u201drem\u201d:{\u201ccount\u201d:5,\u201dminutes\u201d:94,\u201dthirtyDayAvgMinutes\u201d:58},\u201dwake\u201d:{\u201ccount\u201d:24,\u201dminutes\u201d:69,\u201dthirtyDayAvgMinutes\u201d:95}}},\u201dlogId\u201d:26589710673,\u201dminutesAfterWakeup\u201d:0,\u201dminutesAsleep\u201d:415,\u201dminutesAwake\u201d:68,\u201dminutesToFallAsleep\u201d:0,\u201dstartTime\u201d:\u201d2020-10-12T01:31:00.000\u201d,\u201dtimeInBed\u201d:483,\u201dtype\u201d:\u201dstages\u201d}],\u201dsummary\u201d:{\u201cstages\u201d:{\u201cdeep\u201d:63,\u201dlight\u201d:257,\u201drem\u201d:94,\u201dwake\u201d:69},\u201dtotalMinutesAsleep\u201d:415,\u201dtotalSleepRecords\u201d:1,\u201dtotalTimeInBed\u201d:483}} PLAIN_TEXT All columns are mandatory, however, all except device_id and local_date_time can be empty if you don\u2019t have that data. Just have in mind that some features will be empty if some of these columns are empty. device_id local_start_date_time local_end_date_time efficiency minutes_after_wakeup minutes_asleep minutes_awake minutes_to_fall_asleep minutes_in_bed is_main_sleep type a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-10 15:36:30 2020-10-10 16:37:00 92 0 55 5 0 60 0 classic a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-10 01:46:30 2020-10-10 08:10:00 88 0 318 65 0 383 1 stages a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-11 00:12:30 2020-10-11 11:47:00 89 1 562 132 0 694 1 stages a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-12 01:31:00 2020-10-12 09:34:30 93 0 415 68 0 483 1 stages RAPIDS provider \u00b6 Available time segments Only available for segments that span 1 or more complete days (e.g. Jan 1 st 00:00 to Jan 3 rd 23:59) File Sequence - data/raw/ { pid } /fitbit_sleep_summary_raw.csv - data/raw/ { pid } /fitbit_sleep_summary_parsed.csv - data/raw/ { pid } /fitbit_sleep_summary_parsed_with_datetime.csv - data/interim/ { pid } /fitbit_sleep_summary_features/fitbit_sleep_summary_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /fitbit_sleep_summary.csv Parameters description for [FITBIT_SLEEP_SUMMARY][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract FITBIT_SLEEP_SUMMARY features from the RAPIDS provider [SLEEP_TYPES] Types of sleep to be included in the feature extraction computation. Fitbit provides 3 types of sleep: main , nap , all . [FEATURES] Features to be computed from sleep summary data, see table below Features description for [FITBIT_SLEEP_SUMMARY][PROVIDERS][RAPIDS] : Feature Units Description countepisodeTYPE episodes Number of sleep episodes for a certain sleep type during a time segment. avgefficiencyTYPE scores Average sleep efficiency for a certain sleep type during a time segment. sumdurationafterwakeupTYPE minutes Total duration the user stayed in bed after waking up for a certain sleep type during a time segment. sumdurationasleepTYPE minutes Total sleep duration for a certain sleep type during a time segment. sumdurationawakeTYPE minutes Total duration the user stayed awake but still in bed for a certain sleep type during a time segment. sumdurationtofallasleepTYPE minutes Total duration the user spent to fall asleep for a certain sleep type during a time segment. sumdurationinbedTYPE minutes Total duration the user stayed in bed (sumdurationtofallasleep + sumdurationawake + sumdurationasleep + sumdurationafterwakeup) for a certain sleep type during a time segment. avgdurationafterwakeupTYPE minutes Average duration the user stayed in bed after waking up for a certain sleep type during a time segment. avgdurationasleepTYPE minutes Average sleep duration for a certain sleep type during a time segment. avgdurationawakeTYPE minutes Average duration the user stayed awake but still in bed for a certain sleep type during a time segment. avgdurationtofallasleepTYPE minutes Average duration the user spent to fall asleep for a certain sleep type during a time segment. avgdurationinbedTYPE minutes Average duration the user stayed in bed (sumdurationtofallasleep + sumdurationawake + sumdurationasleep + sumdurationafterwakeup) for a certain sleep type during a time segment. Assumptions/Observations There are three sleep types (TYPE): main , nap , all . The all type contains both main sleep and naps. There are two versions of Fitbit\u2019s sleep API ( version 1 and version 1.2 ), and each provides raw sleep data in a different format: Count & duration summaries . v1 contains count_awake , duration_awake , count_awakenings , count_restless , and duration_restless fields for every sleep record but v1.2 does not. API columns . Features are computed based on the values provided by Fitbit\u2019s API: efficiency , minutes_after_wakeup , minutes_asleep , minutes_awake , minutes_to_fall_asleep , minutes_in_bed , is_main_sleep and type .","title":"Fitbit Sleep Summary"},{"location":"features/fitbit-sleep-summary/#fitbit-sleep-summary","text":"Sensor parameters description for [FITBIT_SLEEP_SUMMARY] : Key Description [TABLE] Database table name or file path where the sleep summary data is stored. The configuration keys in Device Data Source Configuration control whether this parameter is interpreted as table or file. The format of the column(s) containing the Fitbit sensor data can be JSON or PLAIN_TEXT . The data in JSON format is obtained directly from the Fitbit API. We support PLAIN_TEXT in case you already parsed your data and don\u2019t have access to your participants\u2019 Fitbit accounts anymore. If your data is in JSON format then summary and intraday data come packed together. We provide examples of the input format that RAPIDS expects, note that both examples for JSON and PLAIN_TEXT are tabular and the actual format difference comes in the fitbit_data column (we truncate the JSON example for brevity). Example of the structure of source data with Fitbit\u2019s sleep API Version 1 JSON device_id fitbit_data a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201csleep\u201d: [{\u201cawakeCount\u201d: 2, \u201cawakeDuration\u201d: 3, \u201cawakeningsCount\u201d: 10, \u201cdateOfSleep\u201d: \u201c2020-10-07\u201d, \u201cduration\u201d: 8100000, \u201cefficiency\u201d: 91, \u201cendTime\u201d: \u201c2020-10-07T18:10:00.000\u201d, \u201cisMainSleep\u201d: true, \u201clogId\u201d: 14147921940, \u201cminuteData\u201d: [{\u201cdateTime\u201d: \u201c15:55:00\u201d, \u201cvalue\u201d: \u201c3\u201d}, {\u201cdateTime\u201d: \u201c15:56:00\u201d, \u201cvalue\u201d: \u201c3\u201d}, {\u201cdateTime\u201d: \u201c15:57:00\u201d, \u201cvalue\u201d: \u201c2\u201d},\u2026], \u201cminutesAfterWakeup\u201d: 0, \u201cminutesAsleep\u201d: 123, \u201cminutesAwake\u201d: 12, \u201cminutesToFallAsleep\u201d: 0, \u201crestlessCount\u201d: 8, \u201crestlessDuration\u201d: 9, \u201cstartTime\u201d: \u201c2020-10-07T15:55:00.000\u201d, \u201ctimeInBed\u201d: 135}, {\u201cawakeCount\u201d: 0, \u201cawakeDuration\u201d: 0, \u201cawakeningsCount\u201d: 1, \u201cdateOfSleep\u201d: \u201c2020-10-07\u201d, \u201cduration\u201d: 3780000, \u201cefficiency\u201d: 100, \u201cendTime\u201d: \u201c2020-10-07T10:52:30.000\u201d, \u201cisMainSleep\u201d: false, \u201clogId\u201d: 14144903977, \u201cminuteData\u201d: [{\u201cdateTime\u201d: \u201c09:49:00\u201d, \u201cvalue\u201d: \u201c1\u201d}, {\u201cdateTime\u201d: \u201c09:50:00\u201d, \u201cvalue\u201d: \u201c1\u201d}, {\u201cdateTime\u201d: \u201c09:51:00\u201d, \u201cvalue\u201d: \u201c1\u201d},\u2026], \u201cminutesAfterWakeup\u201d: 1, \u201cminutesAsleep\u201d: 62, \u201cminutesAwake\u201d: 0, \u201cminutesToFallAsleep\u201d: 0, \u201crestlessCount\u201d: 1, \u201crestlessDuration\u201d: 1, \u201cstartTime\u201d: \u201c2020-10-07T09:49:00.000\u201d, \u201ctimeInBed\u201d: 63}], \u201csummary\u201d: {\u201ctotalMinutesAsleep\u201d: 185, \u201ctotalSleepRecords\u201d: 2, \u201ctotalTimeInBed\u201d: 198}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201csleep\u201d: [{\u201cawakeCount\u201d: 3, \u201cawakeDuration\u201d: 21, \u201cawakeningsCount\u201d: 16, \u201cdateOfSleep\u201d: \u201c2020-10-08\u201d, \u201cduration\u201d: 19260000, \u201cefficiency\u201d: 89, \u201cendTime\u201d: \u201c2020-10-08T06:01:30.000\u201d, \u201cisMainSleep\u201d: true, \u201clogId\u201d: 14150613895, \u201cminuteData\u201d: [{\u201cdateTime\u201d: \u201c00:40:00\u201d, \u201cvalue\u201d: \u201c3\u201d}, {\u201cdateTime\u201d: \u201c00:41:00\u201d, \u201cvalue\u201d: \u201c3\u201d}, {\u201cdateTime\u201d: \u201c00:42:00\u201d, \u201cvalue\u201d: \u201c3\u201d},\u2026], \u201cminutesAfterWakeup\u201d: 0, \u201cminutesAsleep\u201d: 275, \u201cminutesAwake\u201d: 33, \u201cminutesToFallAsleep\u201d: 0, \u201crestlessCount\u201d: 13, \u201crestlessDuration\u201d: 25, \u201cstartTime\u201d: \u201c2020-10-08T00:40:00.000\u201d, \u201ctimeInBed\u201d: 321}], \u201csummary\u201d: {\u201ctotalMinutesAsleep\u201d: 275, \u201ctotalSleepRecords\u201d: 1, \u201ctotalTimeInBed\u201d: 321}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201csleep\u201d: [{\u201cawakeCount\u201d: 1, \u201cawakeDuration\u201d: 3, \u201cawakeningsCount\u201d: 8, \u201cdateOfSleep\u201d: \u201c2020-10-09\u201d, \u201cduration\u201d: 19320000, \u201cefficiency\u201d: 96, \u201cendTime\u201d: \u201c2020-10-09T05:57:30.000\u201d, \u201cisMainSleep\u201d: true, \u201clogId\u201d: 14161136803, \u201cminuteData\u201d: [{\u201cdateTime\u201d: \u201c00:35:30\u201d, \u201cvalue\u201d: \u201c2\u201d}, {\u201cdateTime\u201d: \u201c00:36:30\u201d, \u201cvalue\u201d: \u201c1\u201d}, {\u201cdateTime\u201d: \u201c00:37:30\u201d, \u201cvalue\u201d: \u201c1\u201d},\u2026], \u201cminutesAfterWakeup\u201d: 0, \u201cminutesAsleep\u201d: 309, \u201cminutesAwake\u201d: 13, \u201cminutesToFallAsleep\u201d: 0, \u201crestlessCount\u201d: 7, \u201crestlessDuration\u201d: 10, \u201cstartTime\u201d: \u201c2020-10-09T00:35:30.000\u201d, \u201ctimeInBed\u201d: 322}], \u201csummary\u201d: {\u201ctotalMinutesAsleep\u201d: 309, \u201ctotalSleepRecords\u201d: 1, \u201ctotalTimeInBed\u201d: 322}} PLAIN_TEXT All columns are mandatory, however, all except device_id and local_date_time can be empty if you don\u2019t have that data. Just have in mind that some features will be empty if some of these columns are empty. device_id local_start_date_time local_end_date_time efficiency minutes_after_wakeup minutes_asleep minutes_awake minutes_to_fall_asleep minutes_in_bed is_main_sleep type count_awake duration_awake count_awakenings count_restless duration_restless a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 15:55:00 2020-10-07 18:10:00 91 0 123 12 0 135 1 classic 2 3 10 8 9 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 09:49:00 2020-10-07 10:52:30 100 1 62 0 0 63 0 classic 0 0 1 1 1 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-08 00:40:00 2020-10-08 06:01:30 89 0 275 33 0 321 1 classic 3 21 16 13 25 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-09 00:35:30 2020-10-09 05:57:30 96 0 309 13 0 322 1 classic 1 3 8 7 10 Example of the structure of source data with Fitbit\u2019s sleep API Version 1.2 JSON device_id fitbit_data a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201csleep\u201d:[{\u201cdateOfSleep\u201d:\u201d2020-10-10\u201d,\u201dduration\u201d:3600000,\u201defficiency\u201d:92,\u201dendTime\u201d:\u201d2020-10-10T16:37:00.000\u201d,\u201dinfoCode\u201d:2,\u201disMainSleep\u201d:false,\u201dlevels\u201d:{\u201cdata\u201d:[{\u201cdateTime\u201d:\u201d2020-10-10T15:36:30.000\u201d,\u201dlevel\u201d:\u201drestless\u201d,\u201dseconds\u201d:60},{\u201cdateTime\u201d:\u201d2020-10-10T15:37:30.000\u201d,\u201dlevel\u201d:\u201dasleep\u201d,\u201dseconds\u201d:660},{\u201cdateTime\u201d:\u201d2020-10-10T15:48:30.000\u201d,\u201dlevel\u201d:\u201drestless\u201d,\u201dseconds\u201d:60},\u2026], \u201csummary\u201d:{\u201casleep\u201d:{\u201ccount\u201d:0,\u201dminutes\u201d:56},\u201dawake\u201d:{\u201ccount\u201d:0,\u201dminutes\u201d:0},\u201drestless\u201d:{\u201ccount\u201d:3,\u201dminutes\u201d:4}}},\u201dlogId\u201d:26315914306,\u201dminutesAfterWakeup\u201d:0,\u201dminutesAsleep\u201d:55,\u201dminutesAwake\u201d:5,\u201dminutesToFallAsleep\u201d:0,\u201dstartTime\u201d:\u201d2020-10-10T15:36:30.000\u201d,\u201dtimeInBed\u201d:60,\u201dtype\u201d:\u201dclassic\u201d},{\u201cdateOfSleep\u201d:\u201d2020-10-10\u201d,\u201dduration\u201d:22980000,\u201defficiency\u201d:88,\u201dendTime\u201d:\u201d2020-10-10T08:10:00.000\u201d,\u201dinfoCode\u201d:0,\u201disMainSleep\u201d:true,\u201dlevels\u201d:{\u201cdata\u201d:[{\u201cdateTime\u201d:\u201d2020-10-10T01:46:30.000\u201d,\u201dlevel\u201d:\u201dlight\u201d,\u201dseconds\u201d:420},{\u201cdateTime\u201d:\u201d2020-10-10T01:53:30.000\u201d,\u201dlevel\u201d:\u201ddeep\u201d,\u201dseconds\u201d:1230},{\u201cdateTime\u201d:\u201d2020-10-10T02:14:00.000\u201d,\u201dlevel\u201d:\u201dlight\u201d,\u201dseconds\u201d:360},\u2026], \u201csummary\u201d:{\u201cdeep\u201d:{\u201ccount\u201d:3,\u201dminutes\u201d:92,\u201dthirtyDayAvgMinutes\u201d:0},\u201dlight\u201d:{\u201ccount\u201d:29,\u201dminutes\u201d:193,\u201dthirtyDayAvgMinutes\u201d:0},\u201drem\u201d:{\u201ccount\u201d:4,\u201dminutes\u201d:33,\u201dthirtyDayAvgMinutes\u201d:0},\u201dwake\u201d:{\u201ccount\u201d:28,\u201dminutes\u201d:65,\u201dthirtyDayAvgMinutes\u201d:0}}},\u201dlogId\u201d:26311786557,\u201dminutesAfterWakeup\u201d:0,\u201dminutesAsleep\u201d:318,\u201dminutesAwake\u201d:65,\u201dminutesToFallAsleep\u201d:0,\u201dstartTime\u201d:\u201d2020-10-10T01:46:30.000\u201d,\u201dtimeInBed\u201d:383,\u201dtype\u201d:\u201dstages\u201d}],\u201dsummary\u201d:{\u201cstages\u201d:{\u201cdeep\u201d:92,\u201dlight\u201d:193,\u201drem\u201d:33,\u201dwake\u201d:65},\u201dtotalMinutesAsleep\u201d:373,\u201dtotalSleepRecords\u201d:2,\u201dtotalTimeInBed\u201d:443}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201csleep\u201d:[{\u201cdateOfSleep\u201d:\u201d2020-10-11\u201d,\u201dduration\u201d:41640000,\u201defficiency\u201d:89,\u201dendTime\u201d:\u201d2020-10-11T11:47:00.000\u201d,\u201dinfoCode\u201d:0,\u201disMainSleep\u201d:true,\u201dlevels\u201d:{\u201cdata\u201d:[{\u201cdateTime\u201d:\u201d2020-10-11T00:12:30.000\u201d,\u201dlevel\u201d:\u201dwake\u201d,\u201dseconds\u201d:450},{\u201cdateTime\u201d:\u201d2020-10-11T00:20:00.000\u201d,\u201dlevel\u201d:\u201dlight\u201d,\u201dseconds\u201d:870},{\u201cdateTime\u201d:\u201d2020-10-11T00:34:30.000\u201d,\u201dlevel\u201d:\u201dwake\u201d,\u201dseconds\u201d:780},\u2026], \u201csummary\u201d:{\u201cdeep\u201d:{\u201ccount\u201d:4,\u201dminutes\u201d:52,\u201dthirtyDayAvgMinutes\u201d:62},\u201dlight\u201d:{\u201ccount\u201d:32,\u201dminutes\u201d:442,\u201dthirtyDayAvgMinutes\u201d:364},\u201drem\u201d:{\u201ccount\u201d:6,\u201dminutes\u201d:68,\u201dthirtyDayAvgMinutes\u201d:58},\u201dwake\u201d:{\u201ccount\u201d:29,\u201dminutes\u201d:132,\u201dthirtyDayAvgMinutes\u201d:94}}},\u201dlogId\u201d:26589710670,\u201dminutesAfterWakeup\u201d:1,\u201dminutesAsleep\u201d:562,\u201dminutesAwake\u201d:132,\u201dminutesToFallAsleep\u201d:0,\u201dstartTime\u201d:\u201d2020-10-11T00:12:30.000\u201d,\u201dtimeInBed\u201d:694,\u201dtype\u201d:\u201dstages\u201d}],\u201dsummary\u201d:{\u201cstages\u201d:{\u201cdeep\u201d:52,\u201dlight\u201d:442,\u201drem\u201d:68,\u201dwake\u201d:132},\u201dtotalMinutesAsleep\u201d:562,\u201dtotalSleepRecords\u201d:1,\u201dtotalTimeInBed\u201d:694}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 {\u201csleep\u201d:[{\u201cdateOfSleep\u201d:\u201d2020-10-12\u201d,\u201dduration\u201d:28980000,\u201defficiency\u201d:93,\u201dendTime\u201d:\u201d2020-10-12T09:34:30.000\u201d,\u201dinfoCode\u201d:0,\u201disMainSleep\u201d:true,\u201dlevels\u201d:{\u201cdata\u201d:[{\u201cdateTime\u201d:\u201d2020-10-12T01:31:00.000\u201d,\u201dlevel\u201d:\u201dwake\u201d,\u201dseconds\u201d:600},{\u201cdateTime\u201d:\u201d2020-10-12T01:41:00.000\u201d,\u201dlevel\u201d:\u201dlight\u201d,\u201dseconds\u201d:60},{\u201cdateTime\u201d:\u201d2020-10-12T01:42:00.000\u201d,\u201dlevel\u201d:\u201ddeep\u201d,\u201dseconds\u201d:2340},\u2026], \u201csummary\u201d:{\u201cdeep\u201d:{\u201ccount\u201d:4,\u201dminutes\u201d:63,\u201dthirtyDayAvgMinutes\u201d:59},\u201dlight\u201d:{\u201ccount\u201d:27,\u201dminutes\u201d:257,\u201dthirtyDayAvgMinutes\u201d:364},\u201drem\u201d:{\u201ccount\u201d:5,\u201dminutes\u201d:94,\u201dthirtyDayAvgMinutes\u201d:58},\u201dwake\u201d:{\u201ccount\u201d:24,\u201dminutes\u201d:69,\u201dthirtyDayAvgMinutes\u201d:95}}},\u201dlogId\u201d:26589710673,\u201dminutesAfterWakeup\u201d:0,\u201dminutesAsleep\u201d:415,\u201dminutesAwake\u201d:68,\u201dminutesToFallAsleep\u201d:0,\u201dstartTime\u201d:\u201d2020-10-12T01:31:00.000\u201d,\u201dtimeInBed\u201d:483,\u201dtype\u201d:\u201dstages\u201d}],\u201dsummary\u201d:{\u201cstages\u201d:{\u201cdeep\u201d:63,\u201dlight\u201d:257,\u201drem\u201d:94,\u201dwake\u201d:69},\u201dtotalMinutesAsleep\u201d:415,\u201dtotalSleepRecords\u201d:1,\u201dtotalTimeInBed\u201d:483}} PLAIN_TEXT All columns are mandatory, however, all except device_id and local_date_time can be empty if you don\u2019t have that data. Just have in mind that some features will be empty if some of these columns are empty. device_id local_start_date_time local_end_date_time efficiency minutes_after_wakeup minutes_asleep minutes_awake minutes_to_fall_asleep minutes_in_bed is_main_sleep type a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-10 15:36:30 2020-10-10 16:37:00 92 0 55 5 0 60 0 classic a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-10 01:46:30 2020-10-10 08:10:00 88 0 318 65 0 383 1 stages a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-11 00:12:30 2020-10-11 11:47:00 89 1 562 132 0 694 1 stages a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-12 01:31:00 2020-10-12 09:34:30 93 0 415 68 0 483 1 stages","title":"Fitbit Sleep Summary"},{"location":"features/fitbit-sleep-summary/#rapids-provider","text":"Available time segments Only available for segments that span 1 or more complete days (e.g. Jan 1 st 00:00 to Jan 3 rd 23:59) File Sequence - data/raw/ { pid } /fitbit_sleep_summary_raw.csv - data/raw/ { pid } /fitbit_sleep_summary_parsed.csv - data/raw/ { pid } /fitbit_sleep_summary_parsed_with_datetime.csv - data/interim/ { pid } /fitbit_sleep_summary_features/fitbit_sleep_summary_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /fitbit_sleep_summary.csv Parameters description for [FITBIT_SLEEP_SUMMARY][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract FITBIT_SLEEP_SUMMARY features from the RAPIDS provider [SLEEP_TYPES] Types of sleep to be included in the feature extraction computation. Fitbit provides 3 types of sleep: main , nap , all . [FEATURES] Features to be computed from sleep summary data, see table below Features description for [FITBIT_SLEEP_SUMMARY][PROVIDERS][RAPIDS] : Feature Units Description countepisodeTYPE episodes Number of sleep episodes for a certain sleep type during a time segment. avgefficiencyTYPE scores Average sleep efficiency for a certain sleep type during a time segment. sumdurationafterwakeupTYPE minutes Total duration the user stayed in bed after waking up for a certain sleep type during a time segment. sumdurationasleepTYPE minutes Total sleep duration for a certain sleep type during a time segment. sumdurationawakeTYPE minutes Total duration the user stayed awake but still in bed for a certain sleep type during a time segment. sumdurationtofallasleepTYPE minutes Total duration the user spent to fall asleep for a certain sleep type during a time segment. sumdurationinbedTYPE minutes Total duration the user stayed in bed (sumdurationtofallasleep + sumdurationawake + sumdurationasleep + sumdurationafterwakeup) for a certain sleep type during a time segment. avgdurationafterwakeupTYPE minutes Average duration the user stayed in bed after waking up for a certain sleep type during a time segment. avgdurationasleepTYPE minutes Average sleep duration for a certain sleep type during a time segment. avgdurationawakeTYPE minutes Average duration the user stayed awake but still in bed for a certain sleep type during a time segment. avgdurationtofallasleepTYPE minutes Average duration the user spent to fall asleep for a certain sleep type during a time segment. avgdurationinbedTYPE minutes Average duration the user stayed in bed (sumdurationtofallasleep + sumdurationawake + sumdurationasleep + sumdurationafterwakeup) for a certain sleep type during a time segment. Assumptions/Observations There are three sleep types (TYPE): main , nap , all . The all type contains both main sleep and naps. There are two versions of Fitbit\u2019s sleep API ( version 1 and version 1.2 ), and each provides raw sleep data in a different format: Count & duration summaries . v1 contains count_awake , duration_awake , count_awakenings , count_restless , and duration_restless fields for every sleep record but v1.2 does not. API columns . Features are computed based on the values provided by Fitbit\u2019s API: efficiency , minutes_after_wakeup , minutes_asleep , minutes_awake , minutes_to_fall_asleep , minutes_in_bed , is_main_sleep and type .","title":"RAPIDS provider"},{"location":"features/fitbit-steps-intraday/","text":"Fitbit Steps Intraday \u00b6 Sensor parameters description for [FITBIT_STEPS_INTRADAY] : Key Description [TABLE] Database table name or file path where the steps intraday data is stored. The configuration keys in Device Data Source Configuration control whether this parameter is interpreted as table or file. The format of the column(s) containing the Fitbit sensor data can be JSON or PLAIN_TEXT . The data in JSON format is obtained directly from the Fitbit API. We support PLAIN_TEXT in case you already parsed your data and don\u2019t have access to your participants\u2019 Fitbit accounts anymore. If your data is in JSON format then summary and intraday data come packed together. We provide examples of the input format that RAPIDS expects, note that both examples for JSON and PLAIN_TEXT are tabular and the actual format difference comes in the fitbit_data column (we truncate the JSON example for brevity). Example of the structure of source data JSON device_id fitbit_data a748ee1a-1d0b-4ae9-9074-279a2b6ba524 \u201cactivities-steps\u201d:[{\u201cdateTime\u201d:\u201d2020-10-07\u201d,\u201dvalue\u201d:\u201d1775\u201d}],\u201dactivities-steps-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:5},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:3},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:0},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 \u201cactivities-steps\u201d:[{\u201cdateTime\u201d:\u201d2020-10-08\u201d,\u201dvalue\u201d:\u201d3201\u201d}],\u201dactivities-steps-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:14},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:11},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:10},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 \u201cactivities-steps\u201d:[{\u201cdateTime\u201d:\u201d2020-10-09\u201d,\u201dvalue\u201d:\u201d998\u201d}],\u201dactivities-steps-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:0},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:0},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:0},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} PLAIN_TEXT All columns are mandatory. device_id local_date_time steps a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 00:00:00 5 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 00:01:00 3 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 00:02:00 0 RAPIDS provider \u00b6 Available time segments Available for all time segments File Sequence - data/raw/ { pid } /fitbit_steps_intraday_raw.csv - data/raw/ { pid } /fitbit_steps_intraday_parsed.csv - data/raw/ { pid } /fitbit_steps_intraday_parsed_with_datetime.csv - data/interim/ { pid } /fitbit_steps_intraday_features/fitbit_steps_intraday_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /fitbit_steps_intraday.csv Parameters description for [FITBIT_STEPS_INTRADAY][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract FITBIT_STEPS_INTRADAY features from the RAPIDS provider [FEATURES] Features to be computed from steps intraday data, see table below [THRESHOLD_ACTIVE_BOUT] Every minute with Fitbit steps data wil be labelled as sedentary if its step count is below this threshold, otherwise, active . [INCLUDE_ZERO_STEP_ROWS] Whether or not to include time segments with a 0 step count during the whole day. Features description for [FITBIT_STEPS_INTRADAY][PROVIDERS][RAPIDS] : Feature Units Description sumsteps steps The total step count during a time segment. maxsteps steps The maximum step count during a time segment. minsteps steps The minimum step count during a time segment. avgsteps steps The average step count during a time segment. stdsteps steps The standard deviation of step count during a time segment. countepisodesedentarybout bouts Number of sedentary bouts during a time segment. sumdurationsedentarybout minutes Total duration of all sedentary bouts during a time segment. maxdurationsedentarybout minutes The maximum duration of any sedentary bout during a time segment. mindurationsedentarybout minutes The minimum duration of any sedentary bout during a time segment. avgdurationsedentarybout minutes The average duration of sedentary bouts during a time segment. stddurationsedentarybout minutes The standard deviation of the duration of sedentary bouts during a time segment. countepisodeactivebout bouts Number of active bouts during a time segment. sumdurationactivebout minutes Total duration of all active bouts during a time segment. maxdurationactivebout minutes The maximum duration of any active bout during a time segment. mindurationactivebout minutes The minimum duration of any active bout during a time segment. avgdurationactivebout minutes The average duration of active bouts during a time segment. stddurationactivebout minutes The standard deviation of the duration of active bouts during a time segment. Assumptions/Observations Active and sedentary bouts . If the step count per minute is smaller than THRESHOLD_ACTIVE_BOUT (default value is 10), that minute is labelled as sedentary, otherwise, is labelled as active. Active and sedentary bouts are periods of consecutive minutes labelled as active or sedentary .","title":"Fitbit Steps Intraday"},{"location":"features/fitbit-steps-intraday/#fitbit-steps-intraday","text":"Sensor parameters description for [FITBIT_STEPS_INTRADAY] : Key Description [TABLE] Database table name or file path where the steps intraday data is stored. The configuration keys in Device Data Source Configuration control whether this parameter is interpreted as table or file. The format of the column(s) containing the Fitbit sensor data can be JSON or PLAIN_TEXT . The data in JSON format is obtained directly from the Fitbit API. We support PLAIN_TEXT in case you already parsed your data and don\u2019t have access to your participants\u2019 Fitbit accounts anymore. If your data is in JSON format then summary and intraday data come packed together. We provide examples of the input format that RAPIDS expects, note that both examples for JSON and PLAIN_TEXT are tabular and the actual format difference comes in the fitbit_data column (we truncate the JSON example for brevity). Example of the structure of source data JSON device_id fitbit_data a748ee1a-1d0b-4ae9-9074-279a2b6ba524 \u201cactivities-steps\u201d:[{\u201cdateTime\u201d:\u201d2020-10-07\u201d,\u201dvalue\u201d:\u201d1775\u201d}],\u201dactivities-steps-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:5},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:3},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:0},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 \u201cactivities-steps\u201d:[{\u201cdateTime\u201d:\u201d2020-10-08\u201d,\u201dvalue\u201d:\u201d3201\u201d}],\u201dactivities-steps-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:14},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:11},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:10},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 \u201cactivities-steps\u201d:[{\u201cdateTime\u201d:\u201d2020-10-09\u201d,\u201dvalue\u201d:\u201d998\u201d}],\u201dactivities-steps-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:0},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:0},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:0},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} PLAIN_TEXT All columns are mandatory. device_id local_date_time steps a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 00:00:00 5 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 00:01:00 3 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 00:02:00 0","title":"Fitbit Steps Intraday"},{"location":"features/fitbit-steps-intraday/#rapids-provider","text":"Available time segments Available for all time segments File Sequence - data/raw/ { pid } /fitbit_steps_intraday_raw.csv - data/raw/ { pid } /fitbit_steps_intraday_parsed.csv - data/raw/ { pid } /fitbit_steps_intraday_parsed_with_datetime.csv - data/interim/ { pid } /fitbit_steps_intraday_features/fitbit_steps_intraday_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /fitbit_steps_intraday.csv Parameters description for [FITBIT_STEPS_INTRADAY][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract FITBIT_STEPS_INTRADAY features from the RAPIDS provider [FEATURES] Features to be computed from steps intraday data, see table below [THRESHOLD_ACTIVE_BOUT] Every minute with Fitbit steps data wil be labelled as sedentary if its step count is below this threshold, otherwise, active . [INCLUDE_ZERO_STEP_ROWS] Whether or not to include time segments with a 0 step count during the whole day. Features description for [FITBIT_STEPS_INTRADAY][PROVIDERS][RAPIDS] : Feature Units Description sumsteps steps The total step count during a time segment. maxsteps steps The maximum step count during a time segment. minsteps steps The minimum step count during a time segment. avgsteps steps The average step count during a time segment. stdsteps steps The standard deviation of step count during a time segment. countepisodesedentarybout bouts Number of sedentary bouts during a time segment. sumdurationsedentarybout minutes Total duration of all sedentary bouts during a time segment. maxdurationsedentarybout minutes The maximum duration of any sedentary bout during a time segment. mindurationsedentarybout minutes The minimum duration of any sedentary bout during a time segment. avgdurationsedentarybout minutes The average duration of sedentary bouts during a time segment. stddurationsedentarybout minutes The standard deviation of the duration of sedentary bouts during a time segment. countepisodeactivebout bouts Number of active bouts during a time segment. sumdurationactivebout minutes Total duration of all active bouts during a time segment. maxdurationactivebout minutes The maximum duration of any active bout during a time segment. mindurationactivebout minutes The minimum duration of any active bout during a time segment. avgdurationactivebout minutes The average duration of active bouts during a time segment. stddurationactivebout minutes The standard deviation of the duration of active bouts during a time segment. Assumptions/Observations Active and sedentary bouts . If the step count per minute is smaller than THRESHOLD_ACTIVE_BOUT (default value is 10), that minute is labelled as sedentary, otherwise, is labelled as active. Active and sedentary bouts are periods of consecutive minutes labelled as active or sedentary .","title":"RAPIDS provider"},{"location":"features/fitbit-steps-summary/","text":"Fitbit Steps Summary \u00b6 Sensor parameters description for [FITBIT_STEPS_SUMMARY] : Key Description [TABLE] Database table name or file path where the steps summary data is stored. The configuration keys in Device Data Source Configuration control whether this parameter is interpreted as table or file. The format of the column(s) containing the Fitbit sensor data can be JSON or PLAIN_TEXT . The data in JSON format is obtained directly from the Fitbit API. We support PLAIN_TEXT in case you already parsed your data and don\u2019t have access to your participants\u2019 Fitbit accounts anymore. If your data is in JSON format then summary and intraday data come packed together. We provide examples of the input format that RAPIDS expects, note that both examples for JSON and PLAIN_TEXT are tabular and the actual format difference comes in the fitbit_data column (we truncate the JSON example for brevity). Example of the structure of source data JSON device_id fitbit_data a748ee1a-1d0b-4ae9-9074-279a2b6ba524 \u201cactivities-steps\u201d:[{\u201cdateTime\u201d:\u201d2020-10-07\u201d,\u201dvalue\u201d:\u201d1775\u201d}],\u201dactivities-steps-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:5},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:3},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:0},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 \u201cactivities-steps\u201d:[{\u201cdateTime\u201d:\u201d2020-10-08\u201d,\u201dvalue\u201d:\u201d3201\u201d}],\u201dactivities-steps-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:14},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:11},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:10},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 \u201cactivities-steps\u201d:[{\u201cdateTime\u201d:\u201d2020-10-09\u201d,\u201dvalue\u201d:\u201d998\u201d}],\u201dactivities-steps-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:0},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:0},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:0},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} PLAIN_TEXT All columns are mandatory. device_id local_date_time steps a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 1775 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-08 3201 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-09 998 RAPIDS provider \u00b6 Available time segments Only available for segments that span 1 or more complete days (e.g. Jan 1 st 00:00 to Jan 3 rd 23:59) File Sequence - data/raw/ { pid } /fitbit_steps_summary_raw.csv - data/raw/ { pid } /fitbit_steps_summary_parsed.csv - data/raw/ { pid } /fitbit_steps_summary_parsed_with_datetime.csv - data/interim/ { pid } /fitbit_steps_summary_features/fitbit_steps_summary_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /fitbit_steps_summary.csv Parameters description for [FITBIT_STEPS_SUMMARY][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract FITBIT_STEPS_SUMMARY features from the RAPIDS provider [FEATURES] Features to be computed from steps summary data, see table below Features description for [FITBIT_STEPS_SUMMARY][PROVIDERS][RAPIDS] : Feature Units Description maxsumsteps steps The maximum daily step count during a time segment. minsumsteps steps The minimum daily step count during a time segment. avgsumsteps steps The average daily step count during a time segment. mediansumsteps steps The median of daily step count during a time segment. stdsumsteps steps The standard deviation of daily step count during a time segment. Assumptions/Observations NA","title":"Fitbit Steps Summary"},{"location":"features/fitbit-steps-summary/#fitbit-steps-summary","text":"Sensor parameters description for [FITBIT_STEPS_SUMMARY] : Key Description [TABLE] Database table name or file path where the steps summary data is stored. The configuration keys in Device Data Source Configuration control whether this parameter is interpreted as table or file. The format of the column(s) containing the Fitbit sensor data can be JSON or PLAIN_TEXT . The data in JSON format is obtained directly from the Fitbit API. We support PLAIN_TEXT in case you already parsed your data and don\u2019t have access to your participants\u2019 Fitbit accounts anymore. If your data is in JSON format then summary and intraday data come packed together. We provide examples of the input format that RAPIDS expects, note that both examples for JSON and PLAIN_TEXT are tabular and the actual format difference comes in the fitbit_data column (we truncate the JSON example for brevity). Example of the structure of source data JSON device_id fitbit_data a748ee1a-1d0b-4ae9-9074-279a2b6ba524 \u201cactivities-steps\u201d:[{\u201cdateTime\u201d:\u201d2020-10-07\u201d,\u201dvalue\u201d:\u201d1775\u201d}],\u201dactivities-steps-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:5},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:3},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:0},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 \u201cactivities-steps\u201d:[{\u201cdateTime\u201d:\u201d2020-10-08\u201d,\u201dvalue\u201d:\u201d3201\u201d}],\u201dactivities-steps-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:14},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:11},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:10},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} a748ee1a-1d0b-4ae9-9074-279a2b6ba524 \u201cactivities-steps\u201d:[{\u201cdateTime\u201d:\u201d2020-10-09\u201d,\u201dvalue\u201d:\u201d998\u201d}],\u201dactivities-steps-intraday\u201d:{\u201cdataset\u201d:[{\u201ctime\u201d:\u201d00:00:00\u201d,\u201dvalue\u201d:0},{\u201ctime\u201d:\u201d00:01:00\u201d,\u201dvalue\u201d:0},{\u201ctime\u201d:\u201d00:02:00\u201d,\u201dvalue\u201d:0},\u2026],\u201ddatasetInterval\u201d:1,\u201ddatasetType\u201d:\u201dminute\u201d}} PLAIN_TEXT All columns are mandatory. device_id local_date_time steps a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-07 1775 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-08 3201 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2020-10-09 998","title":"Fitbit Steps Summary"},{"location":"features/fitbit-steps-summary/#rapids-provider","text":"Available time segments Only available for segments that span 1 or more complete days (e.g. Jan 1 st 00:00 to Jan 3 rd 23:59) File Sequence - data/raw/ { pid } /fitbit_steps_summary_raw.csv - data/raw/ { pid } /fitbit_steps_summary_parsed.csv - data/raw/ { pid } /fitbit_steps_summary_parsed_with_datetime.csv - data/interim/ { pid } /fitbit_steps_summary_features/fitbit_steps_summary_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /fitbit_steps_summary.csv Parameters description for [FITBIT_STEPS_SUMMARY][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract FITBIT_STEPS_SUMMARY features from the RAPIDS provider [FEATURES] Features to be computed from steps summary data, see table below Features description for [FITBIT_STEPS_SUMMARY][PROVIDERS][RAPIDS] : Feature Units Description maxsumsteps steps The maximum daily step count during a time segment. minsumsteps steps The minimum daily step count during a time segment. avgsumsteps steps The average daily step count during a time segment. mediansumsteps steps The median of daily step count during a time segment. stdsumsteps steps The standard deviation of daily step count during a time segment. Assumptions/Observations NA","title":"RAPIDS provider"},{"location":"features/phone-accelerometer/","text":"Phone Accelerometer \u00b6 Sensor parameters description for [PHONE_ACCELEROMETER] : Key Description [TABLE] Database table where the accelerometer data is stored RAPIDS provider \u00b6 Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } /phone_accelerometer_raw.csv - data/raw/ { pid } /phone_accelerometer_with_datetime.csv - data/interim/ { pid } /phone_accelerometer_features/phone_accelerometer_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_accelerometer.csv Parameters description for [PHONE_ACCELEROMETER][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_ACCELEROMETER features from the RAPIDS provider [FEATURES] Features to be computed, see table below Features description for [PHONE_ACCELEROMETER][PROVIDERS][RAPIDS] : Feature Units Description maxmagnitude m/s 2 The maximum magnitude of acceleration ( \\(\\|acceleration\\| = \\sqrt{x^2 + y^2 + z^2}\\) ). minmagnitude m/s 2 The minimum magnitude of acceleration. avgmagnitude m/s 2 The average magnitude of acceleration. medianmagnitude m/s 2 The median magnitude of acceleration. stdmagnitude m/s 2 The standard deviation of acceleration. Assumptions/Observations Analyzing accelerometer data is a memory intensive task. If RAPIDS crashes is likely because the accelerometer dataset for a participant is to big to fit in memory. We are considering different alternatives to overcome this problem. PANDA provider \u00b6 These features are based on the work by Panda et al . Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } /phone_accelerometer_raw.csv - data/raw/ { pid } /phone_accelerometer_with_datetime.csv - data/interim/ { pid } /phone_accelerometer_features/phone_accelerometer_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_accelerometer.csv Parameters description for [PHONE_ACCELEROMETER][PROVIDERS][PANDA] : Key Description [COMPUTE] Set to True to extract PHONE_ACCELEROMETER features from the PANDA provider [FEATURES] Features to be computed for exertional and non-exertional activity episodes, see table below Features description for [PHONE_ACCELEROMETER][PROVIDERS][PANDA] : Feature Units Description sumduration minutes Total duration of all exertional or non-exertional activity episodes. maxduration minutes Longest duration of any exertional or non-exertional activity episode. minduration minutes Shortest duration of any exertional or non-exertional activity episode. avgduration minutes Average duration of any exertional or non-exertional activity episode. medianduration minutes Median duration of any exertional or non-exertional activity episode. stdduration minutes Standard deviation of the duration of all exertional or non-exertional activity episodes. Assumptions/Observations Analyzing accelerometer data is a memory intensive task. If RAPIDS crashes is likely because the accelerometer dataset for a participant is to big to fit in memory. We are considering different alternatives to overcome this problem. See Panda et al for a definition of exertional and non-exertional activity episodes","title":"Phone Accelerometer"},{"location":"features/phone-accelerometer/#phone-accelerometer","text":"Sensor parameters description for [PHONE_ACCELEROMETER] : Key Description [TABLE] Database table where the accelerometer data is stored","title":"Phone Accelerometer"},{"location":"features/phone-accelerometer/#rapids-provider","text":"Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } /phone_accelerometer_raw.csv - data/raw/ { pid } /phone_accelerometer_with_datetime.csv - data/interim/ { pid } /phone_accelerometer_features/phone_accelerometer_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_accelerometer.csv Parameters description for [PHONE_ACCELEROMETER][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_ACCELEROMETER features from the RAPIDS provider [FEATURES] Features to be computed, see table below Features description for [PHONE_ACCELEROMETER][PROVIDERS][RAPIDS] : Feature Units Description maxmagnitude m/s 2 The maximum magnitude of acceleration ( \\(\\|acceleration\\| = \\sqrt{x^2 + y^2 + z^2}\\) ). minmagnitude m/s 2 The minimum magnitude of acceleration. avgmagnitude m/s 2 The average magnitude of acceleration. medianmagnitude m/s 2 The median magnitude of acceleration. stdmagnitude m/s 2 The standard deviation of acceleration. Assumptions/Observations Analyzing accelerometer data is a memory intensive task. If RAPIDS crashes is likely because the accelerometer dataset for a participant is to big to fit in memory. We are considering different alternatives to overcome this problem.","title":"RAPIDS provider"},{"location":"features/phone-accelerometer/#panda-provider","text":"These features are based on the work by Panda et al . Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } /phone_accelerometer_raw.csv - data/raw/ { pid } /phone_accelerometer_with_datetime.csv - data/interim/ { pid } /phone_accelerometer_features/phone_accelerometer_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_accelerometer.csv Parameters description for [PHONE_ACCELEROMETER][PROVIDERS][PANDA] : Key Description [COMPUTE] Set to True to extract PHONE_ACCELEROMETER features from the PANDA provider [FEATURES] Features to be computed for exertional and non-exertional activity episodes, see table below Features description for [PHONE_ACCELEROMETER][PROVIDERS][PANDA] : Feature Units Description sumduration minutes Total duration of all exertional or non-exertional activity episodes. maxduration minutes Longest duration of any exertional or non-exertional activity episode. minduration minutes Shortest duration of any exertional or non-exertional activity episode. avgduration minutes Average duration of any exertional or non-exertional activity episode. medianduration minutes Median duration of any exertional or non-exertional activity episode. stdduration minutes Standard deviation of the duration of all exertional or non-exertional activity episodes. Assumptions/Observations Analyzing accelerometer data is a memory intensive task. If RAPIDS crashes is likely because the accelerometer dataset for a participant is to big to fit in memory. We are considering different alternatives to overcome this problem. See Panda et al for a definition of exertional and non-exertional activity episodes","title":"PANDA provider"},{"location":"features/phone-activity-recognition/","text":"Phone Activity Recognition \u00b6 Sensor parameters description for [PHONE_ACTIVITY_RECOGNITION] : Key Description [TABLE][ANDROID] Database table where the activity data from Android devices is stored (the AWARE client saves this data on different tables for Android and iOS) [TABLE][IOS] Database table where the activity data from iOS devices is stored (the AWARE client saves this data on different tables for Android and iOS) [EPISODE_THRESHOLD_BETWEEN_ROWS] Difference in minutes between any two rows for them to be considered part of the same activity episode RAPIDS provider \u00b6 Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } /phone_activity_recognition_raw.csv - data/raw/ { pid } /phone_activity_recognition_with_datetime.csv - data/raw/ { pid } /phone_activity_recognition_with_datetime_unified.csv - data/interim/ { pid } /phone_activity_recognition_episodes.csv - data/interim/ { pid } /phone_activity_recognition_episodes_resampled.csv - data/interim/ { pid } /phone_activity_recognition_episodes_resampled_with_datetime.csv - data/interim/ { pid } /phone_activity_recognition_features/phone_activity_recognition_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_activity_recognition.csv Parameters description for [PHONE_ACTIVITY_RECOGNITION][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_ACTIVITY_RECOGNITION features from the RAPIDS provider [FEATURES] Features to be computed, see table below [ACTIVITY_CLASSES][STATIONARY] An array of the activity labels to be considered in the STATIONARY category choose any of still , tilting [ACTIVITY_CLASSES][MOBILE] An array of the activity labels to be considered in the MOBILE category choose any of on_foot , walking , running , on_bicycle [ACTIVITY_CLASSES][VEHICLE] An array of the activity labels to be considered in the VEHICLE category choose any of in_vehicule Features description for [PHONE_ACTIVITY_RECOGNITION][PROVIDERS][RAPIDS] : Feature Units Description count rows Number of episodes. mostcommonactivity activity type The most common activity type (e.g. still , on_foot , etc.). If there is a tie, the first one is chosen. countuniqueactivities activity type Number of unique activities. durationstationary minutes The total duration of [ACTIVITY_CLASSES][STATIONARY] episodes durationmobile minutes The total duration of [ACTIVITY_CLASSES][MOBILE] episodes of on foot, running, and on bicycle activities durationvehicle minutes The total duration of [ACTIVITY_CLASSES][VEHICLE] episodes of on vehicle activity Assumptions/Observations iOS Activity Recognition names and types are unified with Android labels: iOS Activity Name Android Activity Name Android Activity Type walking walking 7 running running 8 cycling on_bicycle 1 automotive in_vehicle 0 stationary still 3 unknown unknown 4 In AWARE, Activity Recognition data for Android and iOS are stored in two different database tables, RAPIDS automatically infers what platform each participant belongs to based on their participant file .","title":"Phone Activity Recognition"},{"location":"features/phone-activity-recognition/#phone-activity-recognition","text":"Sensor parameters description for [PHONE_ACTIVITY_RECOGNITION] : Key Description [TABLE][ANDROID] Database table where the activity data from Android devices is stored (the AWARE client saves this data on different tables for Android and iOS) [TABLE][IOS] Database table where the activity data from iOS devices is stored (the AWARE client saves this data on different tables for Android and iOS) [EPISODE_THRESHOLD_BETWEEN_ROWS] Difference in minutes between any two rows for them to be considered part of the same activity episode","title":"Phone Activity Recognition"},{"location":"features/phone-activity-recognition/#rapids-provider","text":"Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } /phone_activity_recognition_raw.csv - data/raw/ { pid } /phone_activity_recognition_with_datetime.csv - data/raw/ { pid } /phone_activity_recognition_with_datetime_unified.csv - data/interim/ { pid } /phone_activity_recognition_episodes.csv - data/interim/ { pid } /phone_activity_recognition_episodes_resampled.csv - data/interim/ { pid } /phone_activity_recognition_episodes_resampled_with_datetime.csv - data/interim/ { pid } /phone_activity_recognition_features/phone_activity_recognition_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_activity_recognition.csv Parameters description for [PHONE_ACTIVITY_RECOGNITION][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_ACTIVITY_RECOGNITION features from the RAPIDS provider [FEATURES] Features to be computed, see table below [ACTIVITY_CLASSES][STATIONARY] An array of the activity labels to be considered in the STATIONARY category choose any of still , tilting [ACTIVITY_CLASSES][MOBILE] An array of the activity labels to be considered in the MOBILE category choose any of on_foot , walking , running , on_bicycle [ACTIVITY_CLASSES][VEHICLE] An array of the activity labels to be considered in the VEHICLE category choose any of in_vehicule Features description for [PHONE_ACTIVITY_RECOGNITION][PROVIDERS][RAPIDS] : Feature Units Description count rows Number of episodes. mostcommonactivity activity type The most common activity type (e.g. still , on_foot , etc.). If there is a tie, the first one is chosen. countuniqueactivities activity type Number of unique activities. durationstationary minutes The total duration of [ACTIVITY_CLASSES][STATIONARY] episodes durationmobile minutes The total duration of [ACTIVITY_CLASSES][MOBILE] episodes of on foot, running, and on bicycle activities durationvehicle minutes The total duration of [ACTIVITY_CLASSES][VEHICLE] episodes of on vehicle activity Assumptions/Observations iOS Activity Recognition names and types are unified with Android labels: iOS Activity Name Android Activity Name Android Activity Type walking walking 7 running running 8 cycling on_bicycle 1 automotive in_vehicle 0 stationary still 3 unknown unknown 4 In AWARE, Activity Recognition data for Android and iOS are stored in two different database tables, RAPIDS automatically infers what platform each participant belongs to based on their participant file .","title":"RAPIDS provider"},{"location":"features/phone-applications-crashes/","text":"Phone Applications Crashes \u00b6 Sensor parameters description for [PHONE_APPLICATIONS_CRASHES] : Key Description [TABLE] Database table where the applications crashes data is stored [APPLICATION_CATEGORIES][CATALOGUE_SOURCE] FILE or GOOGLE . If FILE , app categories (genres) are read from [CATALOGUE_FILE] . If [GOOGLE] , app categories (genres) are scrapped from the Play Store [APPLICATION_CATEGORIES][CATALOGUE_FILE] CSV file with a package_name and genre column. By default we provide the catalogue created by Stachl et al in data/external/stachl_application_genre_catalogue.csv [APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE] if [CATALOGUE_SOURCE] is equal to FILE , this flag signals whether or not to update [CATALOGUE_FILE] , if [CATALOGUE_SOURCE] is equal to GOOGLE all scraped genres will be saved to [CATALOGUE_FILE] [APPLICATION_CATEGORIES][SCRAPE_MISSING_CATEGORIES] This flag signals whether or not to scrape categories (genres) missing from the [CATALOGUE_FILE] . If [CATALOGUE_SOURCE] is equal to GOOGLE , all genres are scraped anyway (this flag is ignored) Note No feature providers have been implemented for this sensor yet, however you can use its key ( PHONE_APPLICATIONS_CRASHES ) to improve PHONE_DATA_YIELD or you can implement your own features .","title":"Phone Applications Crashes"},{"location":"features/phone-applications-crashes/#phone-applications-crashes","text":"Sensor parameters description for [PHONE_APPLICATIONS_CRASHES] : Key Description [TABLE] Database table where the applications crashes data is stored [APPLICATION_CATEGORIES][CATALOGUE_SOURCE] FILE or GOOGLE . If FILE , app categories (genres) are read from [CATALOGUE_FILE] . If [GOOGLE] , app categories (genres) are scrapped from the Play Store [APPLICATION_CATEGORIES][CATALOGUE_FILE] CSV file with a package_name and genre column. By default we provide the catalogue created by Stachl et al in data/external/stachl_application_genre_catalogue.csv [APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE] if [CATALOGUE_SOURCE] is equal to FILE , this flag signals whether or not to update [CATALOGUE_FILE] , if [CATALOGUE_SOURCE] is equal to GOOGLE all scraped genres will be saved to [CATALOGUE_FILE] [APPLICATION_CATEGORIES][SCRAPE_MISSING_CATEGORIES] This flag signals whether or not to scrape categories (genres) missing from the [CATALOGUE_FILE] . If [CATALOGUE_SOURCE] is equal to GOOGLE , all genres are scraped anyway (this flag is ignored) Note No feature providers have been implemented for this sensor yet, however you can use its key ( PHONE_APPLICATIONS_CRASHES ) to improve PHONE_DATA_YIELD or you can implement your own features .","title":"Phone Applications Crashes"},{"location":"features/phone-applications-foreground/","text":"Phone Applications Foreground \u00b6 Sensor parameters description for [PHONE_APPLICATIONS_FOREGROUND] (these parameters are used by the only provider available at the moment, RAPIDS): Key Description [TABLE] Database table where the applications foreground data is stored [APPLICATION_CATEGORIES][CATALOGUE_SOURCE] FILE or GOOGLE . If FILE , app categories (genres) are read from [CATALOGUE_FILE] . If [GOOGLE] , app categories (genres) are scrapped from the Play Store [APPLICATION_CATEGORIES][CATALOGUE_FILE] CSV file with a package_name and genre column. By default we provide the catalogue created by Stachl et al in data/external/stachl_application_genre_catalogue.csv [APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE] if [CATALOGUE_SOURCE] is equal to FILE , this flag signals whether or not to update [CATALOGUE_FILE] , if [CATALOGUE_SOURCE] is equal to GOOGLE all scraped genres will be saved to [CATALOGUE_FILE] [APPLICATION_CATEGORIES][SCRAPE_MISSING_CATEGORIES] This flag signals whether or not to scrape categories (genres) missing from the [CATALOGUE_FILE] . If [CATALOGUE_SOURCE] is equal to GOOGLE , all genres are scraped anyway (this flag is ignored) RAPIDS provider \u00b6 The app category (genre) catalogue used in these features was originally created by Stachl et al . Available time segments and platforms Available for all time segments Available for Android only File Sequence - data/raw/ { pid } /phone_applications_foreground_raw.csv - data/raw/ { pid } /phone_applications_foreground_with_datetime.csv - data/raw/ { pid } /phone_applications_foreground_with_datetime_with_categories.csv - data/interim/ { pid } /phone_applications_foreground_features/phone_applications_foreground_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_applications_foreground.csv Parameters description for [PHONE_APPLICATIONS_FOREGROUND][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_APPLICATIONS_FOREGROUND features from the RAPIDS provider [FEATURES] Features to be computed, see table below [SINGLE_CATEGORIES] An array of app categories to be included in the feature extraction computation. The special keyword all represents a category with all the apps from each participant. By default we use the category catalogue pointed by [APPLICATION_CATEGORIES][CATALOGUE_FILE] (see the Sensor parameters description table above) [MULTIPLE_CATEGORIES] An array of collections representing meta-categories (a group of categories). They key of each element is the name of the meta-category and the value is an array of member app categories. By default we use the category catalogue pointed by [APPLICATION_CATEGORIES][CATALOGUE_FILE] (see the Sensor parameters description table above) [SINGLE_APPS] An array of apps to be included in the feature extraction computation. Use their package name (e.g. com.google.android.youtube ) or the reserved keyword top1global (the most used app by a participant over the whole monitoring study) [EXCLUDED_CATEGORIES] An array of app categories to be excluded from the feature extraction computation. By default we use the category catalogue pointed by [APPLICATION_CATEGORIES][CATALOGUE_FILE] (see the Sensor parameters description table above) [EXCLUDED_APPS] An array of apps to be excluded from the feature extraction computation. Use their package name, for example: com.google.android.youtube Features description for [PHONE_APPLICATIONS_FOREGROUND][PROVIDERS][RAPIDS] : Feature Units Description count apps Number of times a single app or apps within a category were used (i.e. they were brought to the foreground either by tapping their icon or switching to it from another app) timeoffirstuse minutes The time in minutes between 12:00am (midnight) and the first use of a single app or apps within a category during a time_segment timeoflastuse minutes The time in minutes between 12:00am (midnight) and the last use of a single app or apps within a category during a time_segment frequencyentropy nats The entropy of the used apps within a category during a time_segment (each app is seen as a unique event, the more apps were used, the higher the entropy). This is especially relevant when computed over all apps. Entropy cannot be obtained for a single app Assumptions/Observations Features can be computed by app, by apps grouped under a single category (genre) and by multiple categories grouped together (meta-categories). For example, we can get features for Facebook (single app), for Social Network apps (a category including Facebook and other social media apps) or for Social (a meta-category formed by Social Network and Social Media Tools categories). Apps installed by default like YouTube are considered systems apps on some phones. We do an exact match to exclude apps where \u201cgenre\u201d == EXCLUDED_CATEGORIES or \u201cpackage_name\u201d == EXCLUDED_APPS . We provide three ways of classifying and app within a category (genre): a) by automatically scraping its official category from the Google Play Store, b) by using the catalogue created by Stachl et al. which we provide in RAPIDS ( data/external/stachl_application_genre_catalogue.csv ), or c) by manually creating a personalized catalogue. You can choose a, b or c by modifying [APPLICATION_GENRES] keys and values (see the Sensor parameters description table above).","title":"Phone Applications Foreground"},{"location":"features/phone-applications-foreground/#phone-applications-foreground","text":"Sensor parameters description for [PHONE_APPLICATIONS_FOREGROUND] (these parameters are used by the only provider available at the moment, RAPIDS): Key Description [TABLE] Database table where the applications foreground data is stored [APPLICATION_CATEGORIES][CATALOGUE_SOURCE] FILE or GOOGLE . If FILE , app categories (genres) are read from [CATALOGUE_FILE] . If [GOOGLE] , app categories (genres) are scrapped from the Play Store [APPLICATION_CATEGORIES][CATALOGUE_FILE] CSV file with a package_name and genre column. By default we provide the catalogue created by Stachl et al in data/external/stachl_application_genre_catalogue.csv [APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE] if [CATALOGUE_SOURCE] is equal to FILE , this flag signals whether or not to update [CATALOGUE_FILE] , if [CATALOGUE_SOURCE] is equal to GOOGLE all scraped genres will be saved to [CATALOGUE_FILE] [APPLICATION_CATEGORIES][SCRAPE_MISSING_CATEGORIES] This flag signals whether or not to scrape categories (genres) missing from the [CATALOGUE_FILE] . If [CATALOGUE_SOURCE] is equal to GOOGLE , all genres are scraped anyway (this flag is ignored)","title":"Phone Applications Foreground"},{"location":"features/phone-applications-foreground/#rapids-provider","text":"The app category (genre) catalogue used in these features was originally created by Stachl et al . Available time segments and platforms Available for all time segments Available for Android only File Sequence - data/raw/ { pid } /phone_applications_foreground_raw.csv - data/raw/ { pid } /phone_applications_foreground_with_datetime.csv - data/raw/ { pid } /phone_applications_foreground_with_datetime_with_categories.csv - data/interim/ { pid } /phone_applications_foreground_features/phone_applications_foreground_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_applications_foreground.csv Parameters description for [PHONE_APPLICATIONS_FOREGROUND][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_APPLICATIONS_FOREGROUND features from the RAPIDS provider [FEATURES] Features to be computed, see table below [SINGLE_CATEGORIES] An array of app categories to be included in the feature extraction computation. The special keyword all represents a category with all the apps from each participant. By default we use the category catalogue pointed by [APPLICATION_CATEGORIES][CATALOGUE_FILE] (see the Sensor parameters description table above) [MULTIPLE_CATEGORIES] An array of collections representing meta-categories (a group of categories). They key of each element is the name of the meta-category and the value is an array of member app categories. By default we use the category catalogue pointed by [APPLICATION_CATEGORIES][CATALOGUE_FILE] (see the Sensor parameters description table above) [SINGLE_APPS] An array of apps to be included in the feature extraction computation. Use their package name (e.g. com.google.android.youtube ) or the reserved keyword top1global (the most used app by a participant over the whole monitoring study) [EXCLUDED_CATEGORIES] An array of app categories to be excluded from the feature extraction computation. By default we use the category catalogue pointed by [APPLICATION_CATEGORIES][CATALOGUE_FILE] (see the Sensor parameters description table above) [EXCLUDED_APPS] An array of apps to be excluded from the feature extraction computation. Use their package name, for example: com.google.android.youtube Features description for [PHONE_APPLICATIONS_FOREGROUND][PROVIDERS][RAPIDS] : Feature Units Description count apps Number of times a single app or apps within a category were used (i.e. they were brought to the foreground either by tapping their icon or switching to it from another app) timeoffirstuse minutes The time in minutes between 12:00am (midnight) and the first use of a single app or apps within a category during a time_segment timeoflastuse minutes The time in minutes between 12:00am (midnight) and the last use of a single app or apps within a category during a time_segment frequencyentropy nats The entropy of the used apps within a category during a time_segment (each app is seen as a unique event, the more apps were used, the higher the entropy). This is especially relevant when computed over all apps. Entropy cannot be obtained for a single app Assumptions/Observations Features can be computed by app, by apps grouped under a single category (genre) and by multiple categories grouped together (meta-categories). For example, we can get features for Facebook (single app), for Social Network apps (a category including Facebook and other social media apps) or for Social (a meta-category formed by Social Network and Social Media Tools categories). Apps installed by default like YouTube are considered systems apps on some phones. We do an exact match to exclude apps where \u201cgenre\u201d == EXCLUDED_CATEGORIES or \u201cpackage_name\u201d == EXCLUDED_APPS . We provide three ways of classifying and app within a category (genre): a) by automatically scraping its official category from the Google Play Store, b) by using the catalogue created by Stachl et al. which we provide in RAPIDS ( data/external/stachl_application_genre_catalogue.csv ), or c) by manually creating a personalized catalogue. You can choose a, b or c by modifying [APPLICATION_GENRES] keys and values (see the Sensor parameters description table above).","title":"RAPIDS provider"},{"location":"features/phone-applications-notifications/","text":"Phone Applications Notifications \u00b6 Sensor parameters description for [PHONE_APPLICATIONS_NOTIFICATIONS] : Key Description [TABLE] Database table where the applications notifications data is stored [APPLICATION_CATEGORIES][CATALOGUE_SOURCE] FILE or GOOGLE . If FILE , app categories (genres) are read from [CATALOGUE_FILE] . If [GOOGLE] , app categories (genres) are scrapped from the Play Store [APPLICATION_CATEGORIES][CATALOGUE_FILE] CSV file with a package_name and genre column. By default we provide the catalogue created by Stachl et al in data/external/stachl_application_genre_catalogue.csv [APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE] if [CATALOGUE_SOURCE] is equal to FILE , this flag signals whether or not to update [CATALOGUE_FILE] , if [CATALOGUE_SOURCE] is equal to GOOGLE all scraped genres will be saved to [CATALOGUE_FILE] [APPLICATION_CATEGORIES][SCRAPE_MISSING_CATEGORIES] This flag signals whether or not to scrape categories (genres) missing from the [CATALOGUE_FILE] . If [CATALOGUE_SOURCE] is equal to GOOGLE , all genres are scraped anyway (this flag is ignored) Note No feature providers have been implemented for this sensor yet, however you can use its key ( PHONE_APPLICATIONS_NOTIFICATIONS ) to improve PHONE_DATA_YIELD or you can implement your own features .","title":"Phone Applications Notifications"},{"location":"features/phone-applications-notifications/#phone-applications-notifications","text":"Sensor parameters description for [PHONE_APPLICATIONS_NOTIFICATIONS] : Key Description [TABLE] Database table where the applications notifications data is stored [APPLICATION_CATEGORIES][CATALOGUE_SOURCE] FILE or GOOGLE . If FILE , app categories (genres) are read from [CATALOGUE_FILE] . If [GOOGLE] , app categories (genres) are scrapped from the Play Store [APPLICATION_CATEGORIES][CATALOGUE_FILE] CSV file with a package_name and genre column. By default we provide the catalogue created by Stachl et al in data/external/stachl_application_genre_catalogue.csv [APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE] if [CATALOGUE_SOURCE] is equal to FILE , this flag signals whether or not to update [CATALOGUE_FILE] , if [CATALOGUE_SOURCE] is equal to GOOGLE all scraped genres will be saved to [CATALOGUE_FILE] [APPLICATION_CATEGORIES][SCRAPE_MISSING_CATEGORIES] This flag signals whether or not to scrape categories (genres) missing from the [CATALOGUE_FILE] . If [CATALOGUE_SOURCE] is equal to GOOGLE , all genres are scraped anyway (this flag is ignored) Note No feature providers have been implemented for this sensor yet, however you can use its key ( PHONE_APPLICATIONS_NOTIFICATIONS ) to improve PHONE_DATA_YIELD or you can implement your own features .","title":"Phone Applications Notifications"},{"location":"features/phone-aware-log/","text":"Phone Aware \u00b6 Sensor parameters description for [PHONE_AWARE_LOG] : Key Description [TABLE] Database table where the aware data is stored Note No feature providers have been implemented for this sensor yet, however you can use its key ( PHONE_AWARE_LOG ) to improve PHONE_DATA_YIELD or you can implement your own features .","title":"Phone Aware Log"},{"location":"features/phone-aware-log/#phone-aware","text":"Sensor parameters description for [PHONE_AWARE_LOG] : Key Description [TABLE] Database table where the aware data is stored Note No feature providers have been implemented for this sensor yet, however you can use its key ( PHONE_AWARE_LOG ) to improve PHONE_DATA_YIELD or you can implement your own features .","title":"Phone Aware"},{"location":"features/phone-battery/","text":"Phone Battery \u00b6 Sensor parameters description for [PHONE_BATTERY] : Key Description [TABLE] Database table where the battery data is stored [EPISODE_THRESHOLD_BETWEEN_ROWS] Difference in minutes between any two rows for them to be considered part of the same battery charge or discharge episode RAPIDS provider \u00b6 Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } /phone_battery_raw.csv - data/interim/ { pid } /phone_battery_episodes.csv - data/interim/ { pid } /phone_battery_episodes_resampled.csv - data/interim/ { pid } /phone_battery_episodes_resampled_with_datetime.csv - data/interim/ { pid } /phone_battery_features/phone_battery_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_battery.csv Parameters description for [PHONE_BATTERY][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_BATTERY features from the RAPIDS provider [FEATURES] Features to be computed, see table below Features description for [PHONE_BATTERY][PROVIDERS][RAPIDS] : Feature Units Description countdischarge episodes Number of discharging episodes. sumdurationdischarge minutes The total duration of all discharging episodes. countcharge episodes Number of battery charging episodes. sumdurationcharge minutes The total duration of all charging episodes. avgconsumptionrate episodes/minutes The average of all episodes\u2019 consumption rates. An episode\u2019s consumption rate is defined as the ratio between its battery delta and duration maxconsumptionrate episodes/minutes The highest of all episodes\u2019 consumption rates. An episode\u2019s consumption rate is defined as the ratio between its battery delta and duration Assumptions/Observations We convert battery data collected with iOS client v1 (autodetected because battery status 4 do not exist) to match Android battery format: we swap status 3 for 5 and 1 for 3 We group battery data into discharge or charge episodes considering any contiguous rows with consecutive reductions or increases of the battery level if they are logged within [EPISODE_THRESHOLD_BETWEEN_ROWS] minutes from each other.","title":"Phone Battery"},{"location":"features/phone-battery/#phone-battery","text":"Sensor parameters description for [PHONE_BATTERY] : Key Description [TABLE] Database table where the battery data is stored [EPISODE_THRESHOLD_BETWEEN_ROWS] Difference in minutes between any two rows for them to be considered part of the same battery charge or discharge episode","title":"Phone Battery"},{"location":"features/phone-battery/#rapids-provider","text":"Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } /phone_battery_raw.csv - data/interim/ { pid } /phone_battery_episodes.csv - data/interim/ { pid } /phone_battery_episodes_resampled.csv - data/interim/ { pid } /phone_battery_episodes_resampled_with_datetime.csv - data/interim/ { pid } /phone_battery_features/phone_battery_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_battery.csv Parameters description for [PHONE_BATTERY][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_BATTERY features from the RAPIDS provider [FEATURES] Features to be computed, see table below Features description for [PHONE_BATTERY][PROVIDERS][RAPIDS] : Feature Units Description countdischarge episodes Number of discharging episodes. sumdurationdischarge minutes The total duration of all discharging episodes. countcharge episodes Number of battery charging episodes. sumdurationcharge minutes The total duration of all charging episodes. avgconsumptionrate episodes/minutes The average of all episodes\u2019 consumption rates. An episode\u2019s consumption rate is defined as the ratio between its battery delta and duration maxconsumptionrate episodes/minutes The highest of all episodes\u2019 consumption rates. An episode\u2019s consumption rate is defined as the ratio between its battery delta and duration Assumptions/Observations We convert battery data collected with iOS client v1 (autodetected because battery status 4 do not exist) to match Android battery format: we swap status 3 for 5 and 1 for 3 We group battery data into discharge or charge episodes considering any contiguous rows with consecutive reductions or increases of the battery level if they are logged within [EPISODE_THRESHOLD_BETWEEN_ROWS] minutes from each other.","title":"RAPIDS provider"},{"location":"features/phone-bluetooth/","text":"Phone Bluetooth \u00b6 Sensor parameters description for [PHONE_BLUETOOTH] : Key Description [TABLE] Database table where the bluetooth data is stored RAPIDS provider \u00b6 Warning The features of this provider are deprecated in favor of DORYAB provider (see below). Available time segments and platforms Available for all time segments Available for Android only File Sequence - data/raw/ { pid } /phone_bluetooth_raw.csv - data/raw/ { pid } /phone_bluetooth_with_datetime.csv - data/interim/ { pid } /phone_bluetooth_features/phone_bluetooth_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_bluetooth.csv \" Parameters description for [PHONE_BLUETOOTH][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_BLUETOOTH features from the RAPIDS provider [FEATURES] Features to be computed, see table below Features description for [PHONE_BLUETOOTH][PROVIDERS][RAPIDS] : Feature Units Description countscans devices Number of scanned devices during a time segment, a device can be detected multiple times over time and these appearances are counted separately uniquedevices devices Number of unique devices during a time segment as identified by their hardware ( bt_address ) address countscansmostuniquedevice scans Number of scans of the most sensed device within each time segment instance Assumptions/Observations From v0.2.0 countscans , uniquedevices , countscansmostuniquedevice were deprecated because they overlap with the respective features for ALL devices of the PHONE_BLUETOOTH DORYAB provider DORYAB provider \u00b6 This provider is adapted from the work by Doryab et al . Available time segments and platforms Available for all time segments Available for Android only File Sequence - data/raw/ { pid } /phone_bluetooth_raw.csv - data/raw/ { pid } /phone_bluetooth_with_datetime.csv - data/interim/ { pid } /phone_bluetooth_features/phone_bluetooth_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_bluetooth.csv \" Parameters description for [PHONE_BLUETOOTH][PROVIDERS][DORYAB] : Key Description [COMPUTE] Set to True to extract PHONE_BLUETOOTH features from the DORYAB provider [FEATURES] Features to be computed, see table below. These features are computed for three device categories: all devices, own devices and other devices. Features description for [PHONE_BLUETOOTH][PROVIDERS][DORYAB] : Feature Units Description countscans scans Number of scans (rows) from the devices sensed during a time segment instance. The more scans a bluetooth device has the longer it remained within range of the participant\u2019s phone uniquedevices devices Number of unique bluetooth devices sensed during a time segment instance as identified by their hardware addresses ( bt_address ) meanscans scans Mean of the scans of every sensed device within each time segment instance stdscans scans Standard deviation of the scans of every sensed device within each time segment instance countscans most frequentdevice within segments scans Number of scans of the most sensed device within each time segment instance countscans least frequentdevice within segments scans Number of scans of the least sensed device within each time segment instance countscans most frequentdevice across segments scans Number of scans of the most sensed device across time segment instances of the same type countscans least frequentdevice across segments scans Number of scans of the least sensed device across time segment instances of the same type per device countscans most frequentdevice acrossdataset scans Number of scans of the most sensed device across the entire dataset of every participant countscans least frequentdevice acrossdataset scans Number of scans of the least sensed device across the entire dataset of every participant Assumptions/Observations Devices are classified as belonging to the participant ( own ) or to other people ( others ) using k-means based on the number of times and the number of days each device was detected across each participant\u2019s dataset. See Doryab et al for more details. If ownership cannot be computed because all devices were detected on only one day, they are all considered as other . Thus all and other features will be equal. The likelihood of this scenario decreases the more days of data you have. The most and least frequent devices will be the same across time segment instances and across the entire dataset when every time segment instance covers every hour of a dataset. For example, daily segments (00:00 to 23:59) fall in this category but morning segments (06:00am to 11:59am) or periodic 30-minute segments don\u2019t. Example Simplified raw bluetooth data The following is a simplified example with bluetooth data from three days and two time segments: morning and afternoon. There are two own devices: 5C836F5-487E-405F-8E28-21DBD40FA4FF detected seven times across two days and 499A1EAF-DDF1-4657-986C-EA5032104448 detected eight times on a single day. local_date segment bt_address own_device 2016-11-29 morning 55C836F5-487E-405F-8E28-21DBD40FA4FF 1 2016-11-29 morning 55C836F5-487E-405F-8E28-21DBD40FA4FF 1 2016-11-29 morning 55C836F5-487E-405F-8E28-21DBD40FA4FF 1 2016-11-29 morning 55C836F5-487E-405F-8E28-21DBD40FA4FF 1 2016-11-29 morning 48872A52-68DE-420D-98DA-73339A1C4685 0 2016-11-29 afternoon 55C836F5-487E-405F-8E28-21DBD40FA4FF 1 2016-11-29 afternoon 48872A52-68DE-420D-98DA-73339A1C4685 0 2016-11-30 morning 55C836F5-487E-405F-8E28-21DBD40FA4FF 1 2016-11-30 morning 48872A52-68DE-420D-98DA-73339A1C4685 0 2016-11-30 morning 25262DC7-780C-4AD5-AD3A-D9776AEF7FC1 0 2016-11-30 morning 5B1E6981-2E50-4D9A-99D8-67AED430C5A8 0 2016-11-30 morning 5B1E6981-2E50-4D9A-99D8-67AED430C5A8 0 2016-11-30 afternoon 55C836F5-487E-405F-8E28-21DBD40FA4FF 1 2017-05-07 morning 5C5A9C41-2F68-4CEB-96D0-77DE3729B729 0 2017-05-07 morning 25262DC7-780C-4AD5-AD3A-D9776AEF7FC1 0 2017-05-07 morning 5B1E6981-2E50-4D9A-99D8-67AED430C5A8 0 2017-05-07 morning 6C444841-FE64-4375-BC3F-FA410CDC0AC7 0 2017-05-07 morning 4DC7A22D-9F1F-4DEF-8576-086910AABCB5 0 2017-05-07 afternoon 5B1E6981-2E50-4D9A-99D8-67AED430C5A8 0 2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1 2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1 2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1 2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1 2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1 2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1 2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1 2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1 The most and least frequent OTHER devices ( own_device == 0 ) during morning segments The most and least frequent ALL | OWN | OTHER devices are computed within each time segment instance, across time segment instances of the same type and across the entire dataset of each person. These are the most and least frequent devices for OTHER devices during morning segments. most frequent device across 2016-11-29 morning: '48872A52-68DE-420D-98DA-73339A1C4685' (this device is the only one in this instance) least frequent device across 2016-11-29 morning: '48872A52-68DE-420D-98DA-73339A1C4685' (this device is the only one in this instance) most frequent device across 2016-11-30 morning: '5B1E6981-2E50-4D9A-99D8-67AED430C5A8' least frequent device across 2016-11-30 morning: '25262DC7-780C-4AD5-AD3A-D9776AEF7FC1' (when tied, the first occurance is chosen) most frequent device across 2017-05-07 morning: '25262DC7-780C-4AD5-AD3A-D9776AEF7FC1' (when tied, the first occurance is chosen) least frequent device across 2017-05-07 morning: '25262DC7-780C-4AD5-AD3A-D9776AEF7FC1' (when tied, the first occurance is chosen) most frequent across morning segments: '5B1E6981-2E50-4D9A-99D8-67AED430C5A8' least frequent across morning segments: '6C444841-FE64-4375-BC3F-FA410CDC0AC7' (when tied, the first occurance is chosen) most frequent across dataset: '499A1EAF-DDF1-4657-986C-EA5032104448' (only taking into account \"morning\" segments) least frequent across dataset: '4DC7A22D-9F1F-4DEF-8576-086910AABCB5' (when tied, the first occurance is chosen) Bluetooth features for OTHER devices and morning segments For brevity we only show the following features for morning segments: OTHER : DEVICES : [ \"countscans\" , \"uniquedevices\" , \"meanscans\" , \"stdscans\" ] SCANS_MOST_FREQUENT_DEVICE : [ \"withinsegments\" , \"acrosssegments\" , \"acrossdataset\" ] Note that countscansmostfrequentdeviceacrossdatasetothers is all 0 s because 499A1EAF-DDF1-4657-986C-EA5032104448 is excluded from the count as is labelled as an own device (not other ). local_segment countscansothers uniquedevicesothers meanscansothers stdscansothers countscansmostfrequentdevicewithinsegmentsothers countscansmostfrequentdeviceacrosssegmentsothers countscansmostfrequentdeviceacrossdatasetothers 2016-11-29-morning 1 1 1.000000 NaN 1 0.0 0.0 2016-11-30-morning 4 3 1.333333 0.57735 2 2.0 2.0 2017-05-07-morning 5 5 1.000000 0.00000 1 1.0 1.0","title":"Phone Bluetooth"},{"location":"features/phone-bluetooth/#phone-bluetooth","text":"Sensor parameters description for [PHONE_BLUETOOTH] : Key Description [TABLE] Database table where the bluetooth data is stored","title":"Phone Bluetooth"},{"location":"features/phone-bluetooth/#rapids-provider","text":"Warning The features of this provider are deprecated in favor of DORYAB provider (see below). Available time segments and platforms Available for all time segments Available for Android only File Sequence - data/raw/ { pid } /phone_bluetooth_raw.csv - data/raw/ { pid } /phone_bluetooth_with_datetime.csv - data/interim/ { pid } /phone_bluetooth_features/phone_bluetooth_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_bluetooth.csv \" Parameters description for [PHONE_BLUETOOTH][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_BLUETOOTH features from the RAPIDS provider [FEATURES] Features to be computed, see table below Features description for [PHONE_BLUETOOTH][PROVIDERS][RAPIDS] : Feature Units Description countscans devices Number of scanned devices during a time segment, a device can be detected multiple times over time and these appearances are counted separately uniquedevices devices Number of unique devices during a time segment as identified by their hardware ( bt_address ) address countscansmostuniquedevice scans Number of scans of the most sensed device within each time segment instance Assumptions/Observations From v0.2.0 countscans , uniquedevices , countscansmostuniquedevice were deprecated because they overlap with the respective features for ALL devices of the PHONE_BLUETOOTH DORYAB provider","title":"RAPIDS provider"},{"location":"features/phone-bluetooth/#doryab-provider","text":"This provider is adapted from the work by Doryab et al . Available time segments and platforms Available for all time segments Available for Android only File Sequence - data/raw/ { pid } /phone_bluetooth_raw.csv - data/raw/ { pid } /phone_bluetooth_with_datetime.csv - data/interim/ { pid } /phone_bluetooth_features/phone_bluetooth_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_bluetooth.csv \" Parameters description for [PHONE_BLUETOOTH][PROVIDERS][DORYAB] : Key Description [COMPUTE] Set to True to extract PHONE_BLUETOOTH features from the DORYAB provider [FEATURES] Features to be computed, see table below. These features are computed for three device categories: all devices, own devices and other devices. Features description for [PHONE_BLUETOOTH][PROVIDERS][DORYAB] : Feature Units Description countscans scans Number of scans (rows) from the devices sensed during a time segment instance. The more scans a bluetooth device has the longer it remained within range of the participant\u2019s phone uniquedevices devices Number of unique bluetooth devices sensed during a time segment instance as identified by their hardware addresses ( bt_address ) meanscans scans Mean of the scans of every sensed device within each time segment instance stdscans scans Standard deviation of the scans of every sensed device within each time segment instance countscans most frequentdevice within segments scans Number of scans of the most sensed device within each time segment instance countscans least frequentdevice within segments scans Number of scans of the least sensed device within each time segment instance countscans most frequentdevice across segments scans Number of scans of the most sensed device across time segment instances of the same type countscans least frequentdevice across segments scans Number of scans of the least sensed device across time segment instances of the same type per device countscans most frequentdevice acrossdataset scans Number of scans of the most sensed device across the entire dataset of every participant countscans least frequentdevice acrossdataset scans Number of scans of the least sensed device across the entire dataset of every participant Assumptions/Observations Devices are classified as belonging to the participant ( own ) or to other people ( others ) using k-means based on the number of times and the number of days each device was detected across each participant\u2019s dataset. See Doryab et al for more details. If ownership cannot be computed because all devices were detected on only one day, they are all considered as other . Thus all and other features will be equal. The likelihood of this scenario decreases the more days of data you have. The most and least frequent devices will be the same across time segment instances and across the entire dataset when every time segment instance covers every hour of a dataset. For example, daily segments (00:00 to 23:59) fall in this category but morning segments (06:00am to 11:59am) or periodic 30-minute segments don\u2019t. Example Simplified raw bluetooth data The following is a simplified example with bluetooth data from three days and two time segments: morning and afternoon. There are two own devices: 5C836F5-487E-405F-8E28-21DBD40FA4FF detected seven times across two days and 499A1EAF-DDF1-4657-986C-EA5032104448 detected eight times on a single day. local_date segment bt_address own_device 2016-11-29 morning 55C836F5-487E-405F-8E28-21DBD40FA4FF 1 2016-11-29 morning 55C836F5-487E-405F-8E28-21DBD40FA4FF 1 2016-11-29 morning 55C836F5-487E-405F-8E28-21DBD40FA4FF 1 2016-11-29 morning 55C836F5-487E-405F-8E28-21DBD40FA4FF 1 2016-11-29 morning 48872A52-68DE-420D-98DA-73339A1C4685 0 2016-11-29 afternoon 55C836F5-487E-405F-8E28-21DBD40FA4FF 1 2016-11-29 afternoon 48872A52-68DE-420D-98DA-73339A1C4685 0 2016-11-30 morning 55C836F5-487E-405F-8E28-21DBD40FA4FF 1 2016-11-30 morning 48872A52-68DE-420D-98DA-73339A1C4685 0 2016-11-30 morning 25262DC7-780C-4AD5-AD3A-D9776AEF7FC1 0 2016-11-30 morning 5B1E6981-2E50-4D9A-99D8-67AED430C5A8 0 2016-11-30 morning 5B1E6981-2E50-4D9A-99D8-67AED430C5A8 0 2016-11-30 afternoon 55C836F5-487E-405F-8E28-21DBD40FA4FF 1 2017-05-07 morning 5C5A9C41-2F68-4CEB-96D0-77DE3729B729 0 2017-05-07 morning 25262DC7-780C-4AD5-AD3A-D9776AEF7FC1 0 2017-05-07 morning 5B1E6981-2E50-4D9A-99D8-67AED430C5A8 0 2017-05-07 morning 6C444841-FE64-4375-BC3F-FA410CDC0AC7 0 2017-05-07 morning 4DC7A22D-9F1F-4DEF-8576-086910AABCB5 0 2017-05-07 afternoon 5B1E6981-2E50-4D9A-99D8-67AED430C5A8 0 2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1 2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1 2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1 2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1 2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1 2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1 2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1 2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1 The most and least frequent OTHER devices ( own_device == 0 ) during morning segments The most and least frequent ALL | OWN | OTHER devices are computed within each time segment instance, across time segment instances of the same type and across the entire dataset of each person. These are the most and least frequent devices for OTHER devices during morning segments. most frequent device across 2016-11-29 morning: '48872A52-68DE-420D-98DA-73339A1C4685' (this device is the only one in this instance) least frequent device across 2016-11-29 morning: '48872A52-68DE-420D-98DA-73339A1C4685' (this device is the only one in this instance) most frequent device across 2016-11-30 morning: '5B1E6981-2E50-4D9A-99D8-67AED430C5A8' least frequent device across 2016-11-30 morning: '25262DC7-780C-4AD5-AD3A-D9776AEF7FC1' (when tied, the first occurance is chosen) most frequent device across 2017-05-07 morning: '25262DC7-780C-4AD5-AD3A-D9776AEF7FC1' (when tied, the first occurance is chosen) least frequent device across 2017-05-07 morning: '25262DC7-780C-4AD5-AD3A-D9776AEF7FC1' (when tied, the first occurance is chosen) most frequent across morning segments: '5B1E6981-2E50-4D9A-99D8-67AED430C5A8' least frequent across morning segments: '6C444841-FE64-4375-BC3F-FA410CDC0AC7' (when tied, the first occurance is chosen) most frequent across dataset: '499A1EAF-DDF1-4657-986C-EA5032104448' (only taking into account \"morning\" segments) least frequent across dataset: '4DC7A22D-9F1F-4DEF-8576-086910AABCB5' (when tied, the first occurance is chosen) Bluetooth features for OTHER devices and morning segments For brevity we only show the following features for morning segments: OTHER : DEVICES : [ \"countscans\" , \"uniquedevices\" , \"meanscans\" , \"stdscans\" ] SCANS_MOST_FREQUENT_DEVICE : [ \"withinsegments\" , \"acrosssegments\" , \"acrossdataset\" ] Note that countscansmostfrequentdeviceacrossdatasetothers is all 0 s because 499A1EAF-DDF1-4657-986C-EA5032104448 is excluded from the count as is labelled as an own device (not other ). local_segment countscansothers uniquedevicesothers meanscansothers stdscansothers countscansmostfrequentdevicewithinsegmentsothers countscansmostfrequentdeviceacrosssegmentsothers countscansmostfrequentdeviceacrossdatasetothers 2016-11-29-morning 1 1 1.000000 NaN 1 0.0 0.0 2016-11-30-morning 4 3 1.333333 0.57735 2 2.0 2.0 2017-05-07-morning 5 5 1.000000 0.00000 1 1.0 1.0","title":"DORYAB provider"},{"location":"features/phone-calls/","text":"Phone Calls \u00b6 Sensor parameters description for [PHONE_CALLS] : Key Description [TABLE] Database table where the calls data is stored RAPIDS Provider \u00b6 Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } /phone_calls_raw.csv - data/raw/ { pid } /phone_calls_with_datetime.csv - data/raw/ { pid } /phone_calls_with_datetime_unified.csv - data/interim/ { pid } /phone_calls_features/phone_calls_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_calls.csv Parameters description for [PHONE_CALLS][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_CALLS features from the RAPIDS provider [CALL_TYPES] The particular call_type that will be analyzed. The options for this parameter are incoming, outgoing or missed. [FEATURES] Features to be computed for outgoing , incoming , and missed calls. Note that the same features are available for both incoming and outgoing calls, while missed calls has its own set of features. See the tables below. Features description for [PHONE_CALLS][PROVIDERS][RAPIDS] incoming and outgoing calls: Feature Units Description count calls Number of calls of a particular call_type occurred during a particular time_segment . distinctcontacts contacts Number of distinct contacts that are associated with a particular call_type for a particular time_segment meanduration seconds The mean duration of all calls of a particular call_type during a particular time_segment . sumduration seconds The sum of the duration of all calls of a particular call_type during a particular time_segment . minduration seconds The duration of the shortest call of a particular call_type during a particular time_segment . maxduration seconds The duration of the longest call of a particular call_type during a particular time_segment . stdduration seconds The standard deviation of the duration of all the calls of a particular call_type during a particular time_segment . modeduration seconds The mode of the duration of all the calls of a particular call_type during a particular time_segment . entropyduration nats The estimate of the Shannon entropy for the the duration of all the calls of a particular call_type during a particular time_segment . timefirstcall minutes The time in minutes between 12:00am (midnight) and the first call of call_type . timelastcall minutes The time in minutes between 12:00am (midnight) and the last call of call_type . countmostfrequentcontact calls The number of calls of a particular call_type during a particular time_segment of the most frequent contact throughout the monitored period. Features description for [PHONE_CALLS][PROVIDERS][RAPIDS] missed calls: Feature Units Description count calls Number of missed calls that occurred during a particular time_segment . distinctcontacts contacts Number of distinct contacts that are associated with missed calls for a particular time_segment timefirstcall minutes The time in hours from 12:00am (Midnight) that the first missed call occurred. timelastcall minutes The time in hours from 12:00am (Midnight) that the last missed call occurred. countmostfrequentcontact calls The number of missed calls during a particular time_segment of the most frequent contact throughout the monitored period. Assumptions/Observations Traces for iOS calls are unique even for the same contact calling a participant more than once which renders countmostfrequentcontact meaningless and distinctcontacts equal to the total number of traces. [CALL_TYPES] and [FEATURES] keys in config.yaml need to match. For example, [CALL_TYPES] outgoing matches the [FEATURES] key outgoing iOS calls data is transformed to match Android calls data format. See our algorithm","title":"Phone Calls"},{"location":"features/phone-calls/#phone-calls","text":"Sensor parameters description for [PHONE_CALLS] : Key Description [TABLE] Database table where the calls data is stored","title":"Phone Calls"},{"location":"features/phone-calls/#rapids-provider","text":"Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } /phone_calls_raw.csv - data/raw/ { pid } /phone_calls_with_datetime.csv - data/raw/ { pid } /phone_calls_with_datetime_unified.csv - data/interim/ { pid } /phone_calls_features/phone_calls_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_calls.csv Parameters description for [PHONE_CALLS][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_CALLS features from the RAPIDS provider [CALL_TYPES] The particular call_type that will be analyzed. The options for this parameter are incoming, outgoing or missed. [FEATURES] Features to be computed for outgoing , incoming , and missed calls. Note that the same features are available for both incoming and outgoing calls, while missed calls has its own set of features. See the tables below. Features description for [PHONE_CALLS][PROVIDERS][RAPIDS] incoming and outgoing calls: Feature Units Description count calls Number of calls of a particular call_type occurred during a particular time_segment . distinctcontacts contacts Number of distinct contacts that are associated with a particular call_type for a particular time_segment meanduration seconds The mean duration of all calls of a particular call_type during a particular time_segment . sumduration seconds The sum of the duration of all calls of a particular call_type during a particular time_segment . minduration seconds The duration of the shortest call of a particular call_type during a particular time_segment . maxduration seconds The duration of the longest call of a particular call_type during a particular time_segment . stdduration seconds The standard deviation of the duration of all the calls of a particular call_type during a particular time_segment . modeduration seconds The mode of the duration of all the calls of a particular call_type during a particular time_segment . entropyduration nats The estimate of the Shannon entropy for the the duration of all the calls of a particular call_type during a particular time_segment . timefirstcall minutes The time in minutes between 12:00am (midnight) and the first call of call_type . timelastcall minutes The time in minutes between 12:00am (midnight) and the last call of call_type . countmostfrequentcontact calls The number of calls of a particular call_type during a particular time_segment of the most frequent contact throughout the monitored period. Features description for [PHONE_CALLS][PROVIDERS][RAPIDS] missed calls: Feature Units Description count calls Number of missed calls that occurred during a particular time_segment . distinctcontacts contacts Number of distinct contacts that are associated with missed calls for a particular time_segment timefirstcall minutes The time in hours from 12:00am (Midnight) that the first missed call occurred. timelastcall minutes The time in hours from 12:00am (Midnight) that the last missed call occurred. countmostfrequentcontact calls The number of missed calls during a particular time_segment of the most frequent contact throughout the monitored period. Assumptions/Observations Traces for iOS calls are unique even for the same contact calling a participant more than once which renders countmostfrequentcontact meaningless and distinctcontacts equal to the total number of traces. [CALL_TYPES] and [FEATURES] keys in config.yaml need to match. For example, [CALL_TYPES] outgoing matches the [FEATURES] key outgoing iOS calls data is transformed to match Android calls data format. See our algorithm","title":"RAPIDS Provider"},{"location":"features/phone-conversation/","text":"Phone Conversation \u00b6 Sensor parameters description for [PHONE_CONVERSATION] : Key Description [TABLE][ANDROID] Database table where the conversation data from Android devices is stored (the AWARE client saves this data on different tables for Android and iOS) [TABLE][IOS] Database table where the conversation data from iOS devices is stored (the AWARE client saves this data on different tables for Android and iOS) RAPIDS provider \u00b6 Available time segments and platforms Available for all time segments Available for Android only File Sequence - data/raw/ { pid } /phone_conversation_raw.csv - data/raw/ { pid } /phone_conversation_with_datetime.csv - data/raw/ { pid } /phone_conversation_with_datetime_unified.csv - data/interim/ { pid } /phone_conversation_features/phone_conversation_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_conversation.csv Parameters description for [PHONE_CONVERSATION][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_CONVERSATION features from the RAPIDS provider [FEATURES] Features to be computed, see table below [RECORDING_MINUTES] Minutes the plugin was recording audio (default 1 min) [PAUSED_MINUTES] Minutes the plugin was NOT recording audio (default 3 min) Features description for [PHONE_CONVERSATION][PROVIDERS][RAPIDS] : Feature Units Description minutessilence minutes Minutes labeled as silence minutesnoise minutes Minutes labeled as noise minutesvoice minutes Minutes labeled as voice minutesunknown minutes Minutes labeled as unknown sumconversationduration minutes Total duration of all conversations maxconversationduration minutes Longest duration of all conversations minconversationduration minutes Shortest duration of all conversations avgconversationduration minutes Average duration of all conversations sdconversationduration minutes Standard Deviation of the duration of all conversations timefirstconversation minutes Minutes since midnight when the first conversation for a time segment was detected timelastconversation minutes Minutes since midnight when the last conversation for a time segment was detected noisesumenergy L2-norm Sum of all energy values when inference is noise noiseavgenergy L2-norm Average of all energy values when inference is noise noisesdenergy L2-norm Standard Deviation of all energy values when inference is noise noiseminenergy L2-norm Minimum of all energy values when inference is noise noisemaxenergy L2-norm Maximum of all energy values when inference is noise voicesumenergy L2-norm Sum of all energy values when inference is voice voiceavgenergy L2-norm Average of all energy values when inference is voice voicesdenergy L2-norm Standard Deviation of all energy values when inference is voice voiceminenergy L2-norm Minimum of all energy values when inference is voice voicemaxenergy L2-norm Maximum of all energy values when inference is voice silencesensedfraction - Ratio between minutessilence and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown) noisesensedfraction - Ratio between minutesnoise and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown) voicesensedfraction - Ratio between minutesvoice and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown) unknownsensedfraction - Ratio between minutesunknown and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown) silenceexpectedfraction - Ration between minutessilence and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes) noiseexpectedfraction - Ration between minutesnoise and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes) voiceexpectedfraction - Ration between minutesvoice and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes) unknownexpectedfraction - Ration between minutesunknown and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes) Assumptions/Observations The timestamp of conversation rows in iOS is in seconds so we convert it to milliseconds to match Android\u2019s format","title":"Phone Conversation"},{"location":"features/phone-conversation/#phone-conversation","text":"Sensor parameters description for [PHONE_CONVERSATION] : Key Description [TABLE][ANDROID] Database table where the conversation data from Android devices is stored (the AWARE client saves this data on different tables for Android and iOS) [TABLE][IOS] Database table where the conversation data from iOS devices is stored (the AWARE client saves this data on different tables for Android and iOS)","title":"Phone Conversation"},{"location":"features/phone-conversation/#rapids-provider","text":"Available time segments and platforms Available for all time segments Available for Android only File Sequence - data/raw/ { pid } /phone_conversation_raw.csv - data/raw/ { pid } /phone_conversation_with_datetime.csv - data/raw/ { pid } /phone_conversation_with_datetime_unified.csv - data/interim/ { pid } /phone_conversation_features/phone_conversation_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_conversation.csv Parameters description for [PHONE_CONVERSATION][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_CONVERSATION features from the RAPIDS provider [FEATURES] Features to be computed, see table below [RECORDING_MINUTES] Minutes the plugin was recording audio (default 1 min) [PAUSED_MINUTES] Minutes the plugin was NOT recording audio (default 3 min) Features description for [PHONE_CONVERSATION][PROVIDERS][RAPIDS] : Feature Units Description minutessilence minutes Minutes labeled as silence minutesnoise minutes Minutes labeled as noise minutesvoice minutes Minutes labeled as voice minutesunknown minutes Minutes labeled as unknown sumconversationduration minutes Total duration of all conversations maxconversationduration minutes Longest duration of all conversations minconversationduration minutes Shortest duration of all conversations avgconversationduration minutes Average duration of all conversations sdconversationduration minutes Standard Deviation of the duration of all conversations timefirstconversation minutes Minutes since midnight when the first conversation for a time segment was detected timelastconversation minutes Minutes since midnight when the last conversation for a time segment was detected noisesumenergy L2-norm Sum of all energy values when inference is noise noiseavgenergy L2-norm Average of all energy values when inference is noise noisesdenergy L2-norm Standard Deviation of all energy values when inference is noise noiseminenergy L2-norm Minimum of all energy values when inference is noise noisemaxenergy L2-norm Maximum of all energy values when inference is noise voicesumenergy L2-norm Sum of all energy values when inference is voice voiceavgenergy L2-norm Average of all energy values when inference is voice voicesdenergy L2-norm Standard Deviation of all energy values when inference is voice voiceminenergy L2-norm Minimum of all energy values when inference is voice voicemaxenergy L2-norm Maximum of all energy values when inference is voice silencesensedfraction - Ratio between minutessilence and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown) noisesensedfraction - Ratio between minutesnoise and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown) voicesensedfraction - Ratio between minutesvoice and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown) unknownsensedfraction - Ratio between minutesunknown and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown) silenceexpectedfraction - Ration between minutessilence and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes) noiseexpectedfraction - Ration between minutesnoise and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes) voiceexpectedfraction - Ration between minutesvoice and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes) unknownexpectedfraction - Ration between minutesunknown and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes) Assumptions/Observations The timestamp of conversation rows in iOS is in seconds so we convert it to milliseconds to match Android\u2019s format","title":"RAPIDS provider"},{"location":"features/phone-data-yield/","text":"Phone Data Yield \u00b6 This is a combinatorial sensor which means that we use the data from multiple sensors to extract data yield features. Data yield features can be used to remove rows ( time segments ) that do not contain enough data. You should decide what is your \u201cenough\u201d threshold depending on the type of sensors you collected (frequency vs event based, e.g. acceleroemter vs calls), the length of your study, and the rates of missing data that your analysis could handle. Why is data yield important? Imagine that you want to extract PHONE_CALL features on daily segments ( 00:00 to 23:59 ). Let\u2019s say that on day 1 the phone logged 10 calls and 23 hours of data from other sensors and on day 2 the phone logged 10 calls and only 2 hours of data from other sensors. It\u2019s more likely that other calls were placed on the 22 hours of data that you didn\u2019t log on day 2 than on the 1 hour of data you didn\u2019t log on day 1, and so including day 2 in your analysis could bias your results. Sensor parameters description for [PHONE_DATA_YIELD] : Key Description [SENSORS] One or more phone sensor config keys (e.g. PHONE_MESSAGE ). The more keys you include the more accurately RAPIDS can approximate the time an smartphone was sensing data. The supported phone sensors you can include in this list are outlined below ( do NOT include Fitbit sensors, ONLY include phone sensors ). Supported phone sensors for [PHONE_DATA_YIELD][SENSORS] PHONE_ACCELEROMETER PHONE_ACTIVITY_RECOGNITION PHONE_APPLICATIONS_CRASHES PHONE_APPLICATIONS_FOREGROUND PHONE_APPLICATIONS_NOTIFICATIONS PHONE_AWARE_LOG PHONE_BATTERY PHONE_BLUETOOTH PHONE_CALLS PHONE_CONVERSATION PHONE_MESSAGES PHONE_KEYBOARD PHONE_LIGHT PHONE_LOCATIONS PHONE_SCREEN PHONE_WIFI_VISIBLE PHONE_WIFI_CONNECTED RAPIDS provider \u00b6 Before explaining the data yield features, let\u2019s define the following relevant concepts: A valid minute is any 60 second window when any phone sensor logged at least 1 row of data A valid hour is any 60 minute window with at least X valid minutes. The X or threshold is given by [MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS] The timestamps of all sensors are concatenated and then grouped per time segment. Minute and hour windows are created from the beginning of each time segment instance and these windows are marked as valid based on the definitions above. The duration of each time segment is taken into account to compute the features described below. Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } / { sensor } _raw.csv # one for every [PHONE_DATA_YIELD][SENSORS] - data/interim/ { pid } /phone_yielded_timestamps.csv - data/interim/ { pid } /phone_yielded_timestamps_with_datetime.csv - data/interim/ { pid } /phone_data_yield_features/phone_data_yield_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_data_yield.csv Parameters description for [PHONE_DATA_YIELD][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_DATA_YIELD features from the RAPIDS provider [FEATURES] Features to be computed, see table below [MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS] The proportion [0.0 ,1.0] of valid minutes in a 60-minute window necessary to flag that window as valid. Features description for [PHONE_DATA_YIELD][PROVIDERS][RAPIDS] : Feature Units Description ratiovalidyieldedminutes rows The ratio between the number of valid minutes and the duration in minutes of a time segment. ratiovalidyieldedhours lux The ratio between the number of valid hours and the duration in hours of a time segment. If the time segment is shorter than 1 hour this feature will always be 1. Assumptions/Observations We recommend using ratiovalidyieldedminutes on time segments that are shorter than two or three hours and ratiovalidyieldedhours for longer segments. This is because relying on yielded minutes only can be misleading when a big chunk of those missing minutes are clustered together. For example, let\u2019s assume we are working with a 24-hour time segment that is missing 12 hours of data. Two extreme cases can occur: the 12 missing hours are from the beginning of the segment or 30 minutes could be missing from every hour (24 * 30 minutes = 12 hours). ratiovalidyieldedminutes would be 0.5 for both a and b (hinting the missing circumstances are similar). However, ratiovalidyieldedhours would be 0.5 for a and 1.0 for b if [MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS] is between [0.0 and 0.49] (hinting that the missing circumstances might be more favorable for b . In other words, sensed data for b is more evenly spread compared to a .","title":"Phone Data Yield"},{"location":"features/phone-data-yield/#phone-data-yield","text":"This is a combinatorial sensor which means that we use the data from multiple sensors to extract data yield features. Data yield features can be used to remove rows ( time segments ) that do not contain enough data. You should decide what is your \u201cenough\u201d threshold depending on the type of sensors you collected (frequency vs event based, e.g. acceleroemter vs calls), the length of your study, and the rates of missing data that your analysis could handle. Why is data yield important? Imagine that you want to extract PHONE_CALL features on daily segments ( 00:00 to 23:59 ). Let\u2019s say that on day 1 the phone logged 10 calls and 23 hours of data from other sensors and on day 2 the phone logged 10 calls and only 2 hours of data from other sensors. It\u2019s more likely that other calls were placed on the 22 hours of data that you didn\u2019t log on day 2 than on the 1 hour of data you didn\u2019t log on day 1, and so including day 2 in your analysis could bias your results. Sensor parameters description for [PHONE_DATA_YIELD] : Key Description [SENSORS] One or more phone sensor config keys (e.g. PHONE_MESSAGE ). The more keys you include the more accurately RAPIDS can approximate the time an smartphone was sensing data. The supported phone sensors you can include in this list are outlined below ( do NOT include Fitbit sensors, ONLY include phone sensors ). Supported phone sensors for [PHONE_DATA_YIELD][SENSORS] PHONE_ACCELEROMETER PHONE_ACTIVITY_RECOGNITION PHONE_APPLICATIONS_CRASHES PHONE_APPLICATIONS_FOREGROUND PHONE_APPLICATIONS_NOTIFICATIONS PHONE_AWARE_LOG PHONE_BATTERY PHONE_BLUETOOTH PHONE_CALLS PHONE_CONVERSATION PHONE_MESSAGES PHONE_KEYBOARD PHONE_LIGHT PHONE_LOCATIONS PHONE_SCREEN PHONE_WIFI_VISIBLE PHONE_WIFI_CONNECTED","title":"Phone Data Yield"},{"location":"features/phone-data-yield/#rapids-provider","text":"Before explaining the data yield features, let\u2019s define the following relevant concepts: A valid minute is any 60 second window when any phone sensor logged at least 1 row of data A valid hour is any 60 minute window with at least X valid minutes. The X or threshold is given by [MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS] The timestamps of all sensors are concatenated and then grouped per time segment. Minute and hour windows are created from the beginning of each time segment instance and these windows are marked as valid based on the definitions above. The duration of each time segment is taken into account to compute the features described below. Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } / { sensor } _raw.csv # one for every [PHONE_DATA_YIELD][SENSORS] - data/interim/ { pid } /phone_yielded_timestamps.csv - data/interim/ { pid } /phone_yielded_timestamps_with_datetime.csv - data/interim/ { pid } /phone_data_yield_features/phone_data_yield_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_data_yield.csv Parameters description for [PHONE_DATA_YIELD][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_DATA_YIELD features from the RAPIDS provider [FEATURES] Features to be computed, see table below [MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS] The proportion [0.0 ,1.0] of valid minutes in a 60-minute window necessary to flag that window as valid. Features description for [PHONE_DATA_YIELD][PROVIDERS][RAPIDS] : Feature Units Description ratiovalidyieldedminutes rows The ratio between the number of valid minutes and the duration in minutes of a time segment. ratiovalidyieldedhours lux The ratio between the number of valid hours and the duration in hours of a time segment. If the time segment is shorter than 1 hour this feature will always be 1. Assumptions/Observations We recommend using ratiovalidyieldedminutes on time segments that are shorter than two or three hours and ratiovalidyieldedhours for longer segments. This is because relying on yielded minutes only can be misleading when a big chunk of those missing minutes are clustered together. For example, let\u2019s assume we are working with a 24-hour time segment that is missing 12 hours of data. Two extreme cases can occur: the 12 missing hours are from the beginning of the segment or 30 minutes could be missing from every hour (24 * 30 minutes = 12 hours). ratiovalidyieldedminutes would be 0.5 for both a and b (hinting the missing circumstances are similar). However, ratiovalidyieldedhours would be 0.5 for a and 1.0 for b if [MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS] is between [0.0 and 0.49] (hinting that the missing circumstances might be more favorable for b . In other words, sensed data for b is more evenly spread compared to a .","title":"RAPIDS provider"},{"location":"features/phone-keyboard/","text":"Phone Keyboard \u00b6 Sensor parameters description for [PHONE_KEYBOARD] : Key Description [TABLE] Database table where the keyboard data is stored Note No feature providers have been implemented for this sensor yet, however you can use its key ( PHONE_KEYBOARD ) to improve PHONE_DATA_YIELD or you can implement your own features .","title":"Phone Keyboard"},{"location":"features/phone-keyboard/#phone-keyboard","text":"Sensor parameters description for [PHONE_KEYBOARD] : Key Description [TABLE] Database table where the keyboard data is stored Note No feature providers have been implemented for this sensor yet, however you can use its key ( PHONE_KEYBOARD ) to improve PHONE_DATA_YIELD or you can implement your own features .","title":"Phone Keyboard"},{"location":"features/phone-light/","text":"Phone Light \u00b6 Sensor parameters description for [PHONE_LIGHT] : Key Description [TABLE] Database table where the light data is stored RAPIDS provider \u00b6 Available time segments and platforms Available for all time segments Available for Android only File Sequence - data/raw/ { pid } /phone_light_raw.csv - data/raw/ { pid } /phone_light_with_datetime.csv - data/interim/ { pid } /phone_light_features/phone_light_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_light.csv Parameters description for [PHONE_LIGHT][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_LIGHT features from the RAPIDS provider [FEATURES] Features to be computed, see table below Features description for [PHONE_LIGHT][PROVIDERS][RAPIDS] : Feature Units Description count rows Number light sensor rows recorded. maxlux lux The maximum ambient luminance. minlux lux The minimum ambient luminance. avglux lux The average ambient luminance. medianlux lux The median ambient luminance. stdlux lux The standard deviation of ambient luminance. Assumptions/Observations NA","title":"Phone Light"},{"location":"features/phone-light/#phone-light","text":"Sensor parameters description for [PHONE_LIGHT] : Key Description [TABLE] Database table where the light data is stored","title":"Phone Light"},{"location":"features/phone-light/#rapids-provider","text":"Available time segments and platforms Available for all time segments Available for Android only File Sequence - data/raw/ { pid } /phone_light_raw.csv - data/raw/ { pid } /phone_light_with_datetime.csv - data/interim/ { pid } /phone_light_features/phone_light_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_light.csv Parameters description for [PHONE_LIGHT][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_LIGHT features from the RAPIDS provider [FEATURES] Features to be computed, see table below Features description for [PHONE_LIGHT][PROVIDERS][RAPIDS] : Feature Units Description count rows Number light sensor rows recorded. maxlux lux The maximum ambient luminance. minlux lux The minimum ambient luminance. avglux lux The average ambient luminance. medianlux lux The median ambient luminance. stdlux lux The standard deviation of ambient luminance. Assumptions/Observations NA","title":"RAPIDS provider"},{"location":"features/phone-locations/","text":"Phone Locations \u00b6 Sensor parameters description for [PHONE_LOCATIONS] : Key Description [TABLE] Database table where the location data is stored [LOCATIONS_TO_USE] Type of location data to use, one of ALL , GPS , ALL_RESAMPLED or FUSED_RESAMPLED . This filter is based on the provider column of the AWARE locations table, ALL includes every row, GPS only includes rows where provider is gps, ALL_RESAMPLED includes all rows after being resampled, and FUSED_RESAMPLED only includes rows where provider is fused after being resampled. [FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD] if ALL_RESAMPLED or FUSED_RESAMPLED is used, the original fused data has to be resampled, a location row will be resampled to the next valid timestamp (see the Assumptions/Observations below) only if the time difference between them is less or equal than this threshold (in minutes). [FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION] if ALL_RESAMPLED or FUSED_RESAMPLED is used, the original fused data has to be resampled, a location row will be resampled at most for this long (in minutes) Assumptions/Observations Types of location data to use AWARE Android and iOS clients can collect location coordinates through the phone's GPS, the network cellular towers around the phone, or Google's fused location API. If you want to use only the GPS provider set [LOCATIONS_TO_USE] to GPS , if you want to use all providers set [LOCATIONS_TO_USE] to ALL , if you collected location data from different providers including the fused API use ALL_RESAMPLED , if your AWARE client was configured to use fused location only or want to focus only on this provider, set [LOCATIONS_TO_USE] to RESAMPLE_FUSED . ALL_RESAMPLED and RESAMPLE_FUSED take the original location coordinates and replicate each pair forward in time as long as the phone was sensing data as indicated by the joined timestamps of [PHONE_DATA_YIELD][SENSORS] , this is done because Google's API only logs a new location coordinate pair when it is sufficiently different in time or space from the previous one and because GPS and network providers can log data at variable rates. There are two parameters associated with resampling fused location. FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD (in minutes, default 30) controls the maximum gap between any two coordinate pairs to replicate the last known pair (for example, participant A's phone did not collect data between 10.30am and 10:50am and between 11:05am and 11:40am, the last known coordinate pair will be replicated during the first period but not the second, in other words, we assume that we cannot longer guarantee the participant stayed at the last known location if the phone did not sense data for more than 30 minutes). FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION (in minutes, default 720 or 12 hours) stops the last known fused location from being replicated longer that this threshold even if the phone was sensing data continuously (for example, participant A went home at 9pm and their phone was sensing data without gaps until 11am the next morning, the last known location will only be replicated until 9am). If you have suggestions to modify or improve this resampling, let us know. BARNETT provider \u00b6 These features are based on the original open-source implementation by Barnett et al and some features created by Canzian et al . Available time segments and platforms Available only for segments that start at 00:00:00 and end at 23:59:59 of the same day (daily segments) Available for Android and iOS File Sequence - data/raw/ { pid } /phone_locations_raw.csv - data/interim/ { pid } /phone_locations_processed.csv - data/interim/ { pid } /phone_locations_processed_with_datetime.csv - data/interim/ { pid } /phone_locations_features/phone_locations_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_locations.csv Parameters description for [PHONE_LOCATIONS][PROVIDERS][BARNETT] : Key Description [COMPUTE] Set to True to extract PHONE_LOCATIONS features from the BARNETT provider [FEATURES] Features to be computed, see table below [ACCURACY_LIMIT] An integer in meters, any location rows with an accuracy higher than this will be dropped. This number means there\u2019s a 68% probability the true location is within this radius [TIMEZONE] Timezone where the location data was collected. By default points to the one defined in the Configuration [MINUTES_DATA_USED] Set to True to include an extra column in the final location feature file containing the number of minutes used to compute the features on each time segment. Use this for quality control purposes, the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough. Features description for [PHONE_LOCATIONS][PROVIDERS][BARNETT] adapted from Beiwe Summary Statistics : Feature Units Description hometime minutes Time at home. Time spent at home in minutes. Home is the most visited significant location between 8 pm and 8 am including any pauses within a 200-meter radius. disttravelled meters Total distance travelled over a day (flights). rog meters The Radius of Gyration (rog) is a measure in meters of the area covered by a person over a day. A centroid is calculated for all the places (pauses) visited during a day and a weighted distance between all the places and that centroid is computed. The weights are proportional to the time spent in each place. maxdiam meters The maximum diameter is the largest distance between any two pauses. maxhomedist meters The maximum distance from home in meters. siglocsvisited locations The number of significant locations visited during the day. Significant locations are computed using k-means clustering over pauses found in the whole monitoring period. The number of clusters is found iterating k from 1 to 200 stopping until the centroids of two significant locations are within 400 meters of one another. avgflightlen meters Mean length of all flights. stdflightlen meters Standard deviation of the length of all flights. avgflightdur seconds Mean duration of all flights. stdflightdur seconds The standard deviation of the duration of all flights. probpause - The fraction of a day spent in a pause (as opposed to a flight) siglocentropy nats Shannon\u2019s entropy measurement based on the proportion of time spent at each significant location visited during a day. circdnrtn - A continuous metric quantifying a person\u2019s circadian routine that can take any value between 0 and 1, where 0 represents a daily routine completely different from any other sensed days and 1 a routine the same as every other sensed day. wkenddayrtn - Same as circdnrtn but computed separately for weekends and weekdays. Assumptions/Observations Barnett's et al features These features are based on a Pause-Flight model. A pause is defined as a mobiity trace (location pings) within a certain duration and distance (by default 300 seconds and 60 meters). A flight is any mobility trace between two pauses. Data is resampled and imputed before the features are computed. See Barnett et al for more information. In RAPIDS we only expose two parameters for these features (timezone and accuracy limit). You can change other parameters in src/features/phone_locations/barnett/library/MobilityFeatures.R . Significant Locations Significant locations are determined using K-means clustering on pauses longer than 10 minutes. The number of clusters (K) is increased until no two clusters are within 400 meters from each other. After this, pauses within a certain range of a cluster (200 meters by default) will count as a visit to that significant location. This description was adapted from the Supplementary Materials of Barnett et al . The Circadian Calculation For a detailed description of how this is calculated, see Canzian et al . DORYAB provider \u00b6 These features are based on the original implementation by Doryab et al. . Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } /phone_locations_raw.csv - data/interim/ { pid } /phone_locations_processed.csv - data/interim/ { pid } /phone_locations_processed_with_datetime.csv - data/interim/ { pid } /phone_locations_features/phone_locations_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_locations.csv Parameters description for [PHONE_LOCATIONS][PROVIDERS][DORYAB] : Key Description [COMPUTE] Set to True to extract PHONE_LOCATIONS features from the BARNETT provider [FEATURES] Features to be computed, see table below [ACCURACY_LIMIT] An integer in meters, any location rows with an accuracy higher than this will be dropped. This number means there\u2019s a 68% probability the true location is within this radius [DBSCAN_EPS] The maximum distance in meters between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function. [DBSCAN_MINSAMPLES] The number of samples (or total weight) in a neighborhood for a point to be considered as a core point of a cluster. This includes the point itself. [THRESHOLD_STATIC] It is the threshold value in km/hr which labels a row as Static or Moving. [MAXIMUM_GAP_ALLOWED] The maximum gap (in seconds) allowed between any two consecutive rows for them to be considered part of the same displacement. If this threshold is too high, it can throw speed and distance calculations off for periods when the the phone was not sensing. [MINUTES_DATA_USED] Set to True to include an extra column in the final location feature file containing the number of minutes used to compute the features on each time segment. Use this for quality control purposes, the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough. [SAMPLING_FREQUENCY] Expected time difference between any two location rows in minutes. If set to 0 , the sampling frequency will be inferred automatically as the median of all the differences between any two consecutive row timestamps (recommended if you are using FUSED_RESAMPLED data). This parameter impacts all the time calculations. [CLUSTER_ON] Set this flag to PARTICIPANT_DATASET to create clusters based on the entire participant\u2019s dataset or to TIME_SEGMENT to create clusters based on all the instances of the corresponding time segment (e.g. all mornings). [CLUSTERING_ALGORITHM] The original Doryab et al implementation uses DBSCAN , OPTICS is also available with similar (but not identical) clustering results and lower memory consumption. Features description for [PHONE_LOCATIONS][PROVIDERS][DORYAB] : Feature Units Description locationvariance \\(meters^2\\) The sum of the variances of the latitude and longitude columns. loglocationvariance - Log of the sum of the variances of the latitude and longitude columns. totaldistance meters Total distance travelled in a time segment using the haversine formula. averagespeed km/hr Average speed in a time segment considering only the instances labeled as Moving. varspeed km/hr Speed variance in a time segment considering only the instances labeled as Moving. circadianmovement - \"It encodes the extent to which a person\u2019s location patterns follow a 24-hour circadian cycle.\" Doryab et al. . numberofsignificantplaces places Number of significant locations visited. It is calculated using the DBSCAN clustering algorithm which takes in EPS and MIN_SAMPLES as parameters to identify clusters. Each cluster is a significant place. numberlocationtransitions transitions Number of movements between any two clusters in a time segment. radiusgyration meters Quantifies the area covered by a participant timeattop1location minutes Time spent at the most significant location. timeattop2location minutes Time spent at the 2 nd most significant location. timeattop3location minutes Time spent at the 3 rd most significant location. movingtostaticratio - Ratio between stationary time and total location sensed time. A lat/long coordinate pair is labelled as stationary if it\u2019s speed (distance/time) to the next coordinate pair is less than 1km/hr. A higher value represents a more stationary routine. These times are computed by multiplying the number of rows by [SAMPLING_FREQUENCY] outlierstimepercent - Ratio between the time spent in non-significant clusters divided by the time spent in all clusters (total location sensed time). A higher value represents more time spent in non-significant clusters. These times are computed by multiplying the number of rows by [SAMPLING_FREQUENCY] maxlengthstayatclusters minutes Maximum time spent in a cluster (significant location). minlengthstayatclusters minutes Minimum time spent in a cluster (significant location). meanlengthstayatclusters minutes Average time spent in a cluster (significant location). stdlengthstayatclusters minutes Standard deviation of time spent in a cluster (significant location). locationentropy nats Shannon Entropy computed over the row count of each cluster (significant location), it will be higher the more rows belong to a cluster (i.e. the more time a participant spent at a significant location). normalizedlocationentropy nats Shannon Entropy computed over the row count of each cluster (significant location) divided by the number of clusters, it will be higher the more rows belong to a cluster (i.e. the more time a participant spent at a significant location). Assumptions/Observations Significant Locations Identified Significant locations are determined using DBSCAN clustering on locations that a patient visit over the course of the period of data collection. The Circadian Calculation For a detailed description of how this is calculated, see Canzian et al . Fine Tuning Clustering Parameters Based on an experiment where we collected fused location data for 7 days with a mean accuracy of 86 & SD of 350.874635, we determined that EPS/MAX_EPS =100 produced closer clustering results to reality. Higher values (>100) missed out some significant places like a short grocery visit while lower values (<100) picked up traffic lights and stop signs while driving as significant locations. We recommend you set EPS based on the accuracy of your location data (the more accurate your data is, the lower you should be able to set EPS).","title":"Phone Locations"},{"location":"features/phone-locations/#phone-locations","text":"Sensor parameters description for [PHONE_LOCATIONS] : Key Description [TABLE] Database table where the location data is stored [LOCATIONS_TO_USE] Type of location data to use, one of ALL , GPS , ALL_RESAMPLED or FUSED_RESAMPLED . This filter is based on the provider column of the AWARE locations table, ALL includes every row, GPS only includes rows where provider is gps, ALL_RESAMPLED includes all rows after being resampled, and FUSED_RESAMPLED only includes rows where provider is fused after being resampled. [FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD] if ALL_RESAMPLED or FUSED_RESAMPLED is used, the original fused data has to be resampled, a location row will be resampled to the next valid timestamp (see the Assumptions/Observations below) only if the time difference between them is less or equal than this threshold (in minutes). [FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION] if ALL_RESAMPLED or FUSED_RESAMPLED is used, the original fused data has to be resampled, a location row will be resampled at most for this long (in minutes) Assumptions/Observations Types of location data to use AWARE Android and iOS clients can collect location coordinates through the phone's GPS, the network cellular towers around the phone, or Google's fused location API. If you want to use only the GPS provider set [LOCATIONS_TO_USE] to GPS , if you want to use all providers set [LOCATIONS_TO_USE] to ALL , if you collected location data from different providers including the fused API use ALL_RESAMPLED , if your AWARE client was configured to use fused location only or want to focus only on this provider, set [LOCATIONS_TO_USE] to RESAMPLE_FUSED . ALL_RESAMPLED and RESAMPLE_FUSED take the original location coordinates and replicate each pair forward in time as long as the phone was sensing data as indicated by the joined timestamps of [PHONE_DATA_YIELD][SENSORS] , this is done because Google's API only logs a new location coordinate pair when it is sufficiently different in time or space from the previous one and because GPS and network providers can log data at variable rates. There are two parameters associated with resampling fused location. FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD (in minutes, default 30) controls the maximum gap between any two coordinate pairs to replicate the last known pair (for example, participant A's phone did not collect data between 10.30am and 10:50am and between 11:05am and 11:40am, the last known coordinate pair will be replicated during the first period but not the second, in other words, we assume that we cannot longer guarantee the participant stayed at the last known location if the phone did not sense data for more than 30 minutes). FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION (in minutes, default 720 or 12 hours) stops the last known fused location from being replicated longer that this threshold even if the phone was sensing data continuously (for example, participant A went home at 9pm and their phone was sensing data without gaps until 11am the next morning, the last known location will only be replicated until 9am). If you have suggestions to modify or improve this resampling, let us know.","title":"Phone Locations"},{"location":"features/phone-locations/#barnett-provider","text":"These features are based on the original open-source implementation by Barnett et al and some features created by Canzian et al . Available time segments and platforms Available only for segments that start at 00:00:00 and end at 23:59:59 of the same day (daily segments) Available for Android and iOS File Sequence - data/raw/ { pid } /phone_locations_raw.csv - data/interim/ { pid } /phone_locations_processed.csv - data/interim/ { pid } /phone_locations_processed_with_datetime.csv - data/interim/ { pid } /phone_locations_features/phone_locations_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_locations.csv Parameters description for [PHONE_LOCATIONS][PROVIDERS][BARNETT] : Key Description [COMPUTE] Set to True to extract PHONE_LOCATIONS features from the BARNETT provider [FEATURES] Features to be computed, see table below [ACCURACY_LIMIT] An integer in meters, any location rows with an accuracy higher than this will be dropped. This number means there\u2019s a 68% probability the true location is within this radius [TIMEZONE] Timezone where the location data was collected. By default points to the one defined in the Configuration [MINUTES_DATA_USED] Set to True to include an extra column in the final location feature file containing the number of minutes used to compute the features on each time segment. Use this for quality control purposes, the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough. Features description for [PHONE_LOCATIONS][PROVIDERS][BARNETT] adapted from Beiwe Summary Statistics : Feature Units Description hometime minutes Time at home. Time spent at home in minutes. Home is the most visited significant location between 8 pm and 8 am including any pauses within a 200-meter radius. disttravelled meters Total distance travelled over a day (flights). rog meters The Radius of Gyration (rog) is a measure in meters of the area covered by a person over a day. A centroid is calculated for all the places (pauses) visited during a day and a weighted distance between all the places and that centroid is computed. The weights are proportional to the time spent in each place. maxdiam meters The maximum diameter is the largest distance between any two pauses. maxhomedist meters The maximum distance from home in meters. siglocsvisited locations The number of significant locations visited during the day. Significant locations are computed using k-means clustering over pauses found in the whole monitoring period. The number of clusters is found iterating k from 1 to 200 stopping until the centroids of two significant locations are within 400 meters of one another. avgflightlen meters Mean length of all flights. stdflightlen meters Standard deviation of the length of all flights. avgflightdur seconds Mean duration of all flights. stdflightdur seconds The standard deviation of the duration of all flights. probpause - The fraction of a day spent in a pause (as opposed to a flight) siglocentropy nats Shannon\u2019s entropy measurement based on the proportion of time spent at each significant location visited during a day. circdnrtn - A continuous metric quantifying a person\u2019s circadian routine that can take any value between 0 and 1, where 0 represents a daily routine completely different from any other sensed days and 1 a routine the same as every other sensed day. wkenddayrtn - Same as circdnrtn but computed separately for weekends and weekdays. Assumptions/Observations Barnett's et al features These features are based on a Pause-Flight model. A pause is defined as a mobiity trace (location pings) within a certain duration and distance (by default 300 seconds and 60 meters). A flight is any mobility trace between two pauses. Data is resampled and imputed before the features are computed. See Barnett et al for more information. In RAPIDS we only expose two parameters for these features (timezone and accuracy limit). You can change other parameters in src/features/phone_locations/barnett/library/MobilityFeatures.R . Significant Locations Significant locations are determined using K-means clustering on pauses longer than 10 minutes. The number of clusters (K) is increased until no two clusters are within 400 meters from each other. After this, pauses within a certain range of a cluster (200 meters by default) will count as a visit to that significant location. This description was adapted from the Supplementary Materials of Barnett et al . The Circadian Calculation For a detailed description of how this is calculated, see Canzian et al .","title":"BARNETT provider"},{"location":"features/phone-locations/#doryab-provider","text":"These features are based on the original implementation by Doryab et al. . Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } /phone_locations_raw.csv - data/interim/ { pid } /phone_locations_processed.csv - data/interim/ { pid } /phone_locations_processed_with_datetime.csv - data/interim/ { pid } /phone_locations_features/phone_locations_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_locations.csv Parameters description for [PHONE_LOCATIONS][PROVIDERS][DORYAB] : Key Description [COMPUTE] Set to True to extract PHONE_LOCATIONS features from the BARNETT provider [FEATURES] Features to be computed, see table below [ACCURACY_LIMIT] An integer in meters, any location rows with an accuracy higher than this will be dropped. This number means there\u2019s a 68% probability the true location is within this radius [DBSCAN_EPS] The maximum distance in meters between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function. [DBSCAN_MINSAMPLES] The number of samples (or total weight) in a neighborhood for a point to be considered as a core point of a cluster. This includes the point itself. [THRESHOLD_STATIC] It is the threshold value in km/hr which labels a row as Static or Moving. [MAXIMUM_GAP_ALLOWED] The maximum gap (in seconds) allowed between any two consecutive rows for them to be considered part of the same displacement. If this threshold is too high, it can throw speed and distance calculations off for periods when the the phone was not sensing. [MINUTES_DATA_USED] Set to True to include an extra column in the final location feature file containing the number of minutes used to compute the features on each time segment. Use this for quality control purposes, the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough. [SAMPLING_FREQUENCY] Expected time difference between any two location rows in minutes. If set to 0 , the sampling frequency will be inferred automatically as the median of all the differences between any two consecutive row timestamps (recommended if you are using FUSED_RESAMPLED data). This parameter impacts all the time calculations. [CLUSTER_ON] Set this flag to PARTICIPANT_DATASET to create clusters based on the entire participant\u2019s dataset or to TIME_SEGMENT to create clusters based on all the instances of the corresponding time segment (e.g. all mornings). [CLUSTERING_ALGORITHM] The original Doryab et al implementation uses DBSCAN , OPTICS is also available with similar (but not identical) clustering results and lower memory consumption. Features description for [PHONE_LOCATIONS][PROVIDERS][DORYAB] : Feature Units Description locationvariance \\(meters^2\\) The sum of the variances of the latitude and longitude columns. loglocationvariance - Log of the sum of the variances of the latitude and longitude columns. totaldistance meters Total distance travelled in a time segment using the haversine formula. averagespeed km/hr Average speed in a time segment considering only the instances labeled as Moving. varspeed km/hr Speed variance in a time segment considering only the instances labeled as Moving. circadianmovement - \"It encodes the extent to which a person\u2019s location patterns follow a 24-hour circadian cycle.\" Doryab et al. . numberofsignificantplaces places Number of significant locations visited. It is calculated using the DBSCAN clustering algorithm which takes in EPS and MIN_SAMPLES as parameters to identify clusters. Each cluster is a significant place. numberlocationtransitions transitions Number of movements between any two clusters in a time segment. radiusgyration meters Quantifies the area covered by a participant timeattop1location minutes Time spent at the most significant location. timeattop2location minutes Time spent at the 2 nd most significant location. timeattop3location minutes Time spent at the 3 rd most significant location. movingtostaticratio - Ratio between stationary time and total location sensed time. A lat/long coordinate pair is labelled as stationary if it\u2019s speed (distance/time) to the next coordinate pair is less than 1km/hr. A higher value represents a more stationary routine. These times are computed by multiplying the number of rows by [SAMPLING_FREQUENCY] outlierstimepercent - Ratio between the time spent in non-significant clusters divided by the time spent in all clusters (total location sensed time). A higher value represents more time spent in non-significant clusters. These times are computed by multiplying the number of rows by [SAMPLING_FREQUENCY] maxlengthstayatclusters minutes Maximum time spent in a cluster (significant location). minlengthstayatclusters minutes Minimum time spent in a cluster (significant location). meanlengthstayatclusters minutes Average time spent in a cluster (significant location). stdlengthstayatclusters minutes Standard deviation of time spent in a cluster (significant location). locationentropy nats Shannon Entropy computed over the row count of each cluster (significant location), it will be higher the more rows belong to a cluster (i.e. the more time a participant spent at a significant location). normalizedlocationentropy nats Shannon Entropy computed over the row count of each cluster (significant location) divided by the number of clusters, it will be higher the more rows belong to a cluster (i.e. the more time a participant spent at a significant location). Assumptions/Observations Significant Locations Identified Significant locations are determined using DBSCAN clustering on locations that a patient visit over the course of the period of data collection. The Circadian Calculation For a detailed description of how this is calculated, see Canzian et al . Fine Tuning Clustering Parameters Based on an experiment where we collected fused location data for 7 days with a mean accuracy of 86 & SD of 350.874635, we determined that EPS/MAX_EPS =100 produced closer clustering results to reality. Higher values (>100) missed out some significant places like a short grocery visit while lower values (<100) picked up traffic lights and stop signs while driving as significant locations. We recommend you set EPS based on the accuracy of your location data (the more accurate your data is, the lower you should be able to set EPS).","title":"DORYAB provider"},{"location":"features/phone-messages/","text":"Phone Messages \u00b6 Sensor parameters description for [PHONE_MESSAGES] : Key Description [TABLE] Database table where the messages data is stored RAPIDS provider \u00b6 Available time segments and platforms Available for all time segments Available for Android only File Sequence - data/raw/ { pid } /phone_messages_raw.csv - data/raw/ { pid } /phone_messages_with_datetime.csv - data/interim/ { pid } /phone_messages_features/phone_messages_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_messages.csv Parameters description for [PHONE_MESSAGES][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_MESSAGES features from the RAPIDS provider [MESSAGES_TYPES] The messages_type that will be analyzed. The options for this parameter are received or sent . [FEATURES] Features to be computed, see table below for [MESSAGES_TYPES] received and sent Features description for [PHONE_MESSAGES][PROVIDERS][RAPIDS] : Feature Units Description count messages Number of messages of type messages_type that occurred during a particular time_segment . distinctcontacts contacts Number of distinct contacts that are associated with a particular messages_type during a particular time_segment . timefirstmessages minutes Number of minutes between 12:00am (midnight) and the first message of a particular messages_type during a particular time_segment . timelastmessages minutes Number of minutes between 12:00am (midnight) and the last message of a particular messages_type during a particular time_segment . countmostfrequentcontact messages Number of messages from the contact with the most messages of messages_type during a time_segment throughout the whole dataset of each participant. Assumptions/Observations [MESSAGES_TYPES] and [FEATURES] keys in config.yaml need to match. For example, [MESSAGES_TYPES] sent matches the [FEATURES] key sent","title":"Phone Messages"},{"location":"features/phone-messages/#phone-messages","text":"Sensor parameters description for [PHONE_MESSAGES] : Key Description [TABLE] Database table where the messages data is stored","title":"Phone Messages"},{"location":"features/phone-messages/#rapids-provider","text":"Available time segments and platforms Available for all time segments Available for Android only File Sequence - data/raw/ { pid } /phone_messages_raw.csv - data/raw/ { pid } /phone_messages_with_datetime.csv - data/interim/ { pid } /phone_messages_features/phone_messages_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_messages.csv Parameters description for [PHONE_MESSAGES][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_MESSAGES features from the RAPIDS provider [MESSAGES_TYPES] The messages_type that will be analyzed. The options for this parameter are received or sent . [FEATURES] Features to be computed, see table below for [MESSAGES_TYPES] received and sent Features description for [PHONE_MESSAGES][PROVIDERS][RAPIDS] : Feature Units Description count messages Number of messages of type messages_type that occurred during a particular time_segment . distinctcontacts contacts Number of distinct contacts that are associated with a particular messages_type during a particular time_segment . timefirstmessages minutes Number of minutes between 12:00am (midnight) and the first message of a particular messages_type during a particular time_segment . timelastmessages minutes Number of minutes between 12:00am (midnight) and the last message of a particular messages_type during a particular time_segment . countmostfrequentcontact messages Number of messages from the contact with the most messages of messages_type during a time_segment throughout the whole dataset of each participant. Assumptions/Observations [MESSAGES_TYPES] and [FEATURES] keys in config.yaml need to match. For example, [MESSAGES_TYPES] sent matches the [FEATURES] key sent","title":"RAPIDS provider"},{"location":"features/phone-screen/","text":"Phone Screen \u00b6 Sensor parameters description for [PHONE_SCREEN] : Key Description [TABLE] Database table where the screen data is stored RAPIDS provider \u00b6 Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } /phone_screen_raw.csv - data/raw/ { pid } /phone_screen_with_datetime.csv - data/raw/ { pid } /phone_screen_with_datetime_unified.csv - data/interim/ { pid } /phone_screen_episodes.csv - data/interim/ { pid } /phone_screen_episodes_resampled.csv - data/interim/ { pid } /phone_screen_episodes_resampled_with_datetime.csv - data/interim/ { pid } /phone_screen_features/phone_screen_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_screen.csv Parameters description for [PHONE_SCREEN][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_SCREEN features from the RAPIDS provider [FEATURES] Features to be computed, see table below [REFERENCE_HOUR_FIRST_USE] The reference point from which firstuseafter is to be computed, default is midnight [IGNORE_EPISODES_SHORTER_THAN] Ignore episodes that are shorter than this threshold (minutes). Set to 0 to disable this filter. [IGNORE_EPISODES_LONGER_THAN] Ignore episodes that are longer than this threshold (minutes). Set to 0 to disable this filter. [EPISODE_TYPES] Currently we only support unlock episodes (from when the phone is unlocked until the screen is off) Features description for [PHONE_SCREEN][PROVIDERS][RAPIDS] : Feature Units Description sumduration minutes Total duration of all unlock episodes. maxduration minutes Longest duration of any unlock episode. minduration minutes Shortest duration of any unlock episode. avgduration minutes Average duration of all unlock episodes. stdduration minutes Standard deviation duration of all unlock episodes. countepisode episodes Number of all unlock episodes firstuseafter minutes Minutes until the first unlock episode. Assumptions/Observations In Android, lock events can happen right after an off event, after a few seconds of an off event, or never happen depending on the phone's settings, therefore, an unlock episode is defined as the time between an unlock and a off event. In iOS, on and off events do not exist, so an unlock episode is defined as the time between an unlock and a lock event. Events in iOS are recorded reliably albeit some duplicated lock events within milliseconds from each other, so we only keep consecutive unlock/lock pairs. In Android you cand find multiple consecutive unlock or lock events, so we only keep consecutive unlock/off pairs. In our experiments these cases are less than 10% of the screen events collected and this happens because ACTION_SCREEN_OFF and ACTION_SCREEN_ON are sent when the device becomes non-interactive which may have nothing to do with the screen turning off . In addition to unlock/off episodes, in Android it is possible to measure the time spent on the lock screen before an unlock event as well as the total screen time (i.e. ON to OFF ) but these are not implemented at the moment. We transform iOS screen events to match Android\u2019s format, we replace lock episodes with off episodes (2 with 0) in iOS. However, as mentioned above this is still computing unlock to lock episodes.","title":"Phone Screen"},{"location":"features/phone-screen/#phone-screen","text":"Sensor parameters description for [PHONE_SCREEN] : Key Description [TABLE] Database table where the screen data is stored","title":"Phone Screen"},{"location":"features/phone-screen/#rapids-provider","text":"Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } /phone_screen_raw.csv - data/raw/ { pid } /phone_screen_with_datetime.csv - data/raw/ { pid } /phone_screen_with_datetime_unified.csv - data/interim/ { pid } /phone_screen_episodes.csv - data/interim/ { pid } /phone_screen_episodes_resampled.csv - data/interim/ { pid } /phone_screen_episodes_resampled_with_datetime.csv - data/interim/ { pid } /phone_screen_features/phone_screen_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_screen.csv Parameters description for [PHONE_SCREEN][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_SCREEN features from the RAPIDS provider [FEATURES] Features to be computed, see table below [REFERENCE_HOUR_FIRST_USE] The reference point from which firstuseafter is to be computed, default is midnight [IGNORE_EPISODES_SHORTER_THAN] Ignore episodes that are shorter than this threshold (minutes). Set to 0 to disable this filter. [IGNORE_EPISODES_LONGER_THAN] Ignore episodes that are longer than this threshold (minutes). Set to 0 to disable this filter. [EPISODE_TYPES] Currently we only support unlock episodes (from when the phone is unlocked until the screen is off) Features description for [PHONE_SCREEN][PROVIDERS][RAPIDS] : Feature Units Description sumduration minutes Total duration of all unlock episodes. maxduration minutes Longest duration of any unlock episode. minduration minutes Shortest duration of any unlock episode. avgduration minutes Average duration of all unlock episodes. stdduration minutes Standard deviation duration of all unlock episodes. countepisode episodes Number of all unlock episodes firstuseafter minutes Minutes until the first unlock episode. Assumptions/Observations In Android, lock events can happen right after an off event, after a few seconds of an off event, or never happen depending on the phone's settings, therefore, an unlock episode is defined as the time between an unlock and a off event. In iOS, on and off events do not exist, so an unlock episode is defined as the time between an unlock and a lock event. Events in iOS are recorded reliably albeit some duplicated lock events within milliseconds from each other, so we only keep consecutive unlock/lock pairs. In Android you cand find multiple consecutive unlock or lock events, so we only keep consecutive unlock/off pairs. In our experiments these cases are less than 10% of the screen events collected and this happens because ACTION_SCREEN_OFF and ACTION_SCREEN_ON are sent when the device becomes non-interactive which may have nothing to do with the screen turning off . In addition to unlock/off episodes, in Android it is possible to measure the time spent on the lock screen before an unlock event as well as the total screen time (i.e. ON to OFF ) but these are not implemented at the moment. We transform iOS screen events to match Android\u2019s format, we replace lock episodes with off episodes (2 with 0) in iOS. However, as mentioned above this is still computing unlock to lock episodes.","title":"RAPIDS provider"},{"location":"features/phone-wifi-connected/","text":"Phone WiFi Connected \u00b6 Sensor parameters description for [PHONE_WIFI_CONNECTED] : Key Description [TABLE] Database table where the wifi (connected) data is stored RAPIDS provider \u00b6 Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } /phone_wifi_connected_raw.csv - data/raw/ { pid } /phone_wifi_connected_with_datetime.csv - data/interim/ { pid } /phone_wifi_connected_features/phone_wifi_connected_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_wifi_connected.csv Parameters description for [PHONE_WIFI_CONNECTED][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_WIFI_CONNECTED features from the RAPIDS provider [FEATURES] Features to be computed, see table below Features description for [PHONE_WIFI_CONNECTED][PROVIDERS][RAPIDS] : Feature Units Description countscans devices Number of scanned WiFi access points connected during a time_segment, an access point can be detected multiple times over time and these appearances are counted separately uniquedevices devices Number of unique access point during a time_segment as identified by their hardware address countscansmostuniquedevice scans Number of scans of the most scanned access point during a time_segment across the whole monitoring period Assumptions/Observations A connected WiFI access point is one that a phone was connected to. By default AWARE stores this data in the sensor_wifi table.","title":"Phone WiFI Connected"},{"location":"features/phone-wifi-connected/#phone-wifi-connected","text":"Sensor parameters description for [PHONE_WIFI_CONNECTED] : Key Description [TABLE] Database table where the wifi (connected) data is stored","title":"Phone WiFi Connected"},{"location":"features/phone-wifi-connected/#rapids-provider","text":"Available time segments and platforms Available for all time segments Available for Android and iOS File Sequence - data/raw/ { pid } /phone_wifi_connected_raw.csv - data/raw/ { pid } /phone_wifi_connected_with_datetime.csv - data/interim/ { pid } /phone_wifi_connected_features/phone_wifi_connected_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_wifi_connected.csv Parameters description for [PHONE_WIFI_CONNECTED][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_WIFI_CONNECTED features from the RAPIDS provider [FEATURES] Features to be computed, see table below Features description for [PHONE_WIFI_CONNECTED][PROVIDERS][RAPIDS] : Feature Units Description countscans devices Number of scanned WiFi access points connected during a time_segment, an access point can be detected multiple times over time and these appearances are counted separately uniquedevices devices Number of unique access point during a time_segment as identified by their hardware address countscansmostuniquedevice scans Number of scans of the most scanned access point during a time_segment across the whole monitoring period Assumptions/Observations A connected WiFI access point is one that a phone was connected to. By default AWARE stores this data in the sensor_wifi table.","title":"RAPIDS provider"},{"location":"features/phone-wifi-visible/","text":"Phone WiFi Visible \u00b6 Sensor parameters description for [PHONE_WIFI_VISIBLE] : Key Description [TABLE] Database table where the wifi (visible) data is stored RAPIDS provider \u00b6 Available time segments and platforms Available for all time segments Available for Android only File Sequence - data/raw/ { pid } /phone_wifi_visible_raw.csv - data/raw/ { pid } /phone_wifi_visible_with_datetime.csv - data/interim/ { pid } /phone_wifi_visible_features/phone_wifi_visible_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_wifi_visible.csv Parameters description for [PHONE_WIFI_VISIBLE][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_WIFI_VISIBLE features from the RAPIDS provider [FEATURES] Features to be computed, see table below Features description for [PHONE_WIFI_VISIBLE][PROVIDERS][RAPIDS] : Feature Units Description countscans devices Number of scanned WiFi access points visible during a time_segment, an access point can be detected multiple times over time and these appearances are counted separately uniquedevices devices Number of unique access point during a time_segment as identified by their hardware address countscansmostuniquedevice scans Number of scans of the most scanned access point during a time_segment across the whole monitoring period Assumptions/Observations A visible WiFI access point is one that a phone sensed around itself but that it was not connected to. Due to API restrictions, this sensor is not available on iOS. By default AWARE stores this data in the wifi table.","title":"Phone WiFI Visible"},{"location":"features/phone-wifi-visible/#phone-wifi-visible","text":"Sensor parameters description for [PHONE_WIFI_VISIBLE] : Key Description [TABLE] Database table where the wifi (visible) data is stored","title":"Phone WiFi Visible"},{"location":"features/phone-wifi-visible/#rapids-provider","text":"Available time segments and platforms Available for all time segments Available for Android only File Sequence - data/raw/ { pid } /phone_wifi_visible_raw.csv - data/raw/ { pid } /phone_wifi_visible_with_datetime.csv - data/interim/ { pid } /phone_wifi_visible_features/phone_wifi_visible_ { language } _ { provider_key } .csv - data/processed/features/ { pid } /phone_wifi_visible.csv Parameters description for [PHONE_WIFI_VISIBLE][PROVIDERS][RAPIDS] : Key Description [COMPUTE] Set to True to extract PHONE_WIFI_VISIBLE features from the RAPIDS provider [FEATURES] Features to be computed, see table below Features description for [PHONE_WIFI_VISIBLE][PROVIDERS][RAPIDS] : Feature Units Description countscans devices Number of scanned WiFi access points visible during a time_segment, an access point can be detected multiple times over time and these appearances are counted separately uniquedevices devices Number of unique access point during a time_segment as identified by their hardware address countscansmostuniquedevice scans Number of scans of the most scanned access point during a time_segment across the whole monitoring period Assumptions/Observations A visible WiFI access point is one that a phone sensed around itself but that it was not connected to. Due to API restrictions, this sensor is not available on iOS. By default AWARE stores this data in the wifi table.","title":"RAPIDS provider"},{"location":"setup/configuration/","text":"Configuration \u00b6 You need to follow these steps to configure your RAPIDS deployment before you can extract behavioral features Add your database credentials Choose the timezone of your study Create your participants files Select what time segments you want to extract features on Modify your device data source configuration Select what sensors and features you want to process When you are done with this configuration, go to executing RAPIDS . Hint Every time you see config[\"KEY\"] or [KEY] in these docs we are referring to the corresponding key in the config.yaml file. Database credentials \u00b6 Create an empty file called .env in your RAPIDS root directory Add the following lines and replace your database-specific credentials (user, password, host, and database): [ MY_GROUP ] user=MY_USER password=MY_PASSWORD host=MY_HOST port=3306 database=MY_DATABASE Warning The label MY_GROUP is arbitrary but it has to match the following config.yaml key: DATABASE_GROUP : &database_group MY_GROUP Hint If you are using RAPIDS\u2019 docker container and Docker-for-mac or Docker-for-Windows 18.03+, you can connect to a MySQL database in your host machine using host.docker.internal instead of 127.0.0.1 or localhost . In a Linux host you need to run our docker container using docker run --network=\"host\" -d moshiresearch/rapids:latest and then 127.0.0.1 will point to your host machine. Note You can ignore this step if you are only processing Fitbit data in CSV files. RAPIDS only supports MySQL/MariaDB databases. If you would like to add support for a different database engine get in touch and we can discuss how to implement it. Timezone of your study \u00b6 Single timezone \u00b6 If your study only happened in a single time zone, select the appropriate code form this list and change the following config key. Double check your timezone code pick, for example US Eastern Time is America/New_York not EST TIMEZONE : &timezone America/New_York Multiple timezones \u00b6 Support coming soon. Participant files \u00b6 Participant files link together multiple devices (smartphones and wearables) to specific participants and identify them throughout RAPIDS. You can create these files manually or automatically . Participant files are stored in data/external/participant_files/pxx.yaml and follow a unified structure . Note The list PIDS in config.yaml needs to have the participant file names of the people you want to process. For example, if you created p01.yaml , p02.yaml and p03.yaml files in /data/external/participant_files/ , then PIDS should be: PIDS : [ p01 , p02 , p03 ] Tip Attribute values of the [PHONE] and [FITBIT] sections in every participant file are optional which allows you to analyze data from participants that only carried smartphones, only Fitbit devices, or both. Optional: Migrating participants files with the old format If you were using the pre-release version of RAPIDS with participant files in plain text (as opposed to yaml), you can run the following command and your old files will be converted into yaml files stored in data/external/participant_files/ python tools/update_format_participant_files.py Structure of participants files \u00b6 Example of the structure of a participant file In this example, the participant used an android phone, an ios phone, and a fitbit device throughout the study between Apr 23 rd 2020 and Oct 28 th 2020 PHONE : DEVICE_IDS : [ a748ee1a-1d0b-4ae9-9074-279a2b6ba524 , dsadas-2324-fgsf-sdwr-gdfgs4rfsdf43 ] PLATFORMS : [ android , ios ] LABEL : test01 START_DATE : 2020-04-23 END_DATE : 2020-10-28 FITBIT : DEVICE_IDS : [ fitbit1 ] LABEL : test01 START_DATE : 2020-04-23 END_DATE : 2020-10-28 For [PHONE] Key Description [DEVICE_IDS] An array of the strings that uniquely identify each smartphone, you can have more than one for when participants changed phones in the middle of the study, in this case, data from all their devices will be joined and relabeled with the last 1 on this list. [PLATFORMS] An array that specifies the OS of each smartphone in [DEVICE_IDS] , use a combination of android or ios (we support participants that changed platforms in the middle of your study!). If you have an aware_device table in your database you can set [PLATFORMS]: [multiple] and RAPIDS will infer them automatically. [LABEL] A string that is used in reports and visualizations. [START_DATE] A string with format YYY-MM-DD . Only data collected after this date will be included in the analysis [END_DATE] A string with format YYY-MM-DD . Only data collected before this date will be included in the analysis For [FITBIT] Key Description [DEVICE_IDS] An array of the strings that uniquely identify each Fitbit, you can have more than one in case the participant changed devices in the middle of the study, in this case, data from all devices will be joined and relabeled with the last device_id on this list. [LABEL] A string that is used in reports and visualizations. [START_DATE] A string with format YYY-MM-DD . Only data collected after this date will be included in the analysis [END_DATE] A string with format YYY-MM-DD . Only data collected before this date will be included in the analysis Automatic creation of participant files \u00b6 You have two options a) use the aware_device table in your database or b) use a CSV file. In either case, in your config.yaml , set [PHONE_SECTION][ADD] or [FITBIT_SECTION][ADD] to TRUE depending on what devices you used in your study. Set [DEVICE_ID_COLUMN] to the name of the column that uniquely identifies each device and include any device ids you want to ignore in [IGNORED_DEVICE_IDS] . aware_device table Set the following keys in your config.yaml CREATE_PARTICIPANT_FILES : SOURCE : TYPE : AWARE_DEVICE_TABLE DATABASE_GROUP : *database_group CSV_FILE_PATH : \"\" TIMEZONE : *timezone PHONE_SECTION : ADD : TRUE # or FALSE DEVICE_ID_COLUMN : device_id # column name IGNORED_DEVICE_IDS : [] FITBIT_SECTION : ADD : TRUE # or FALSE DEVICE_ID_COLUMN : fitbit_id # column name IGNORED_DEVICE_IDS : [] Then run snakemake -j1 create_participants_files CSV file Set the following keys in your config.yaml . CREATE_PARTICIPANT_FILES : SOURCE : TYPE : CSV_FILE DATABASE_GROUP : \"\" CSV_FILE_PATH : \"your_path/to_your.csv\" TIMEZONE : *timezone PHONE_SECTION : ADD : TRUE # or FALSE DEVICE_ID_COLUMN : device_id # column name IGNORED_DEVICE_IDS : [] FITBIT_SECTION : ADD : TRUE # or FALSE DEVICE_ID_COLUMN : fitbit_id # column name IGNORED_DEVICE_IDS : [] Your CSV file ( [SOURCE][CSV_FILE_PATH] ) should have the following columns but you can omit any values you don\u2019t have on each column: Column Description phone device id The name of this column has to match [PHONE_SECTION][DEVICE_ID_COLUMN] . Separate multiple ids with ; fitbit device id The name of this column has to match [FITBIT_SECTION][DEVICE_ID_COLUMN] . Separate multiple ids with ; pid Unique identifiers with the format pXXX (your participant files will be named with this string platform Use android , ios or multiple as explained above, separate values with ; label A human readable string that is used in reports and visualizations. start_date A string with format YYY-MM-DD . end_date A string with format YYY-MM-DD . Example device_id,pid,label,platform,start_date,end_date,fitbit_id a748ee1a-1d0b-4ae9-9074-279a2b6ba524;dsadas-2324-fgsf-sdwr-gdfgs4rfsdf43,p01,julio,android;ios,2020-01-01,2021-01-01,fitbit1 4c4cf7a1-0340-44bc-be0f-d5053bf7390c,p02,meng,ios,2021-01-01,2022-01-01,fitbit2 Then run snakemake -j1 create_participants_files Time Segments \u00b6 Time segments (or epochs) are the time windows on which you want to extract behavioral features. For example, you might want to process data on every day, every morning, or only during weekends. RAPIDS offers three categories of time segments that are flexible enough to cover most use cases: frequency (short time windows every day), periodic (arbitrary time windows on any day), and event (arbitrary time windows around events of interest). See also our examples . Frequency Segments These segments are computed on every day and all have the same duration (for example 30 minutes). Set the following keys in your config.yaml TIME_SEGMENTS : &time_segments TYPE : FREQUENCY FILE : \"data/external/your_frequency_segments.csv\" INCLUDE_PAST_PERIODIC_SEGMENTS : FALSE The file pointed by [TIME_SEGMENTS][FILE] should have the following format and can only have 1 row. Column Description label A string that is used as a prefix in the name of your time segments length An integer representing the duration of your time segments in minutes Example label,length thirtyminutes,30 This configuration will compute 48 time segments for every day when any data from any participant was sensed. For example: start_time,length,label 00:00,30,thirtyminutes0000 00:30,30,thirtyminutes0001 01:00,30,thirtyminutes0002 01:30,30,thirtyminutes0003 ... Periodic Segments These segments can be computed every day, or on specific days of the week, month, quarter, and year. Their minimum duration is 1 minute but they can be as long as you want. Set the following keys in your config.yaml . TIME_SEGMENTS : &time_segments TYPE : PERIODIC FILE : \"data/external/your_periodic_segments.csv\" INCLUDE_PAST_PERIODIC_SEGMENTS : FALSE # or TRUE If [INCLUDE_PAST_PERIODIC_SEGMENTS] is set to TRUE , RAPIDS will consider instances of your segments back enough in the past as to include the first row of data of each participant. For example, if the first row of data from a participant happened on Saturday March 7 th 2020 and the requested segment duration is 7 days starting on every Sunday, the first segment to be considered would start on Sunday March 1 st if [INCLUDE_PAST_PERIODIC_SEGMENTS] is TRUE or on Sunday March 8 th if FALSE . The file pointed by [TIME_SEGMENTS][FILE] should have the following format and can have multiple rows. Column Description label A string that is used as a prefix in the name of your time segments. It has to be unique between rows start_time A string with format HH:MM:SS representing the starting time of this segment on any day length A string representing the length of this segment.It can have one or more of the following strings XXD XXH XXM XXS to represent days, hours, minutes and seconds. For example 7D 23H 59M 59S repeats_on One of the follow options every_day , wday , qday , mday , and yday . The last four represent a week, quarter, month and year day repeats_value An integer complementing repeats_on . If you set repeats_on to every_day set this to 0 , otherwise 1-7 represent a wday starting from Mondays, 1-31 represent a mday , 1-91 represent a qday , and 1-366 represent a yday Example label,start_time,length,repeats_on,repeats_value daily,00:00:00,23H 59M 59S,every_day,0 morning,06:00:00,5H 59M 59S,every_day,0 afternoon,12:00:00,5H 59M 59S,every_day,0 evening,18:00:00,5H 59M 59S,every_day,0 night,00:00:00,5H 59M 59S,every_day,0 This configuration will create five segments instances ( daily , morning , afternoon , evening , night ) on any given day ( every_day set to 0). The daily segment will start at midnight and will last 23:59:59 , the other four segments will start at 6am, 12pm, 6pm, and 12am respectively and last for 05:59:59 . Event segments These segments can be computed before or after an event of interest (defined as any UNIX timestamp). Their minimum duration is 1 minute but they can be as long as you want. The start of each segment can be shifted backwards or forwards from the specified timestamp. Set the following keys in your config.yaml . TIME_SEGMENTS : &time_segments TYPE : EVENT FILE : \"data/external/your_event_segments.csv\" INCLUDE_PAST_PERIODIC_SEGMENTS : FALSE # or TRUE The file pointed by [TIME_SEGMENTS][FILE] should have the following format and can have multiple rows. Column Description label A string that is used as a prefix in the name of your time segments. If labels are unique, every segment is independent; if two or more segments have the same label, their data will be grouped when computing auxiliary data for features like the most frequent contact for calls (the most frequent contact will be computed across all these segments). There cannot be two overlaping event segments with the same label (RAPIDS will throw an error) event_timestamp A UNIX timestamp that represents the moment an event of interest happened (clinical relapse, survey, readmission, etc.). The corresponding time segment will be computed around this moment using length , shift , and shift_direction length A string representing the length of this segment. It can have one or more of the following keys XXD XXH XXM XXS to represent a number of days, hours, minutes, and seconds. For example 7D 23H 59M 59S shift A string representing the time shift from event_timestamp . It can have one or more of the following keys XXD XXH XXM XXS to represent a number of days, hours, minutes and seconds. For example 7D 23H 59M 59S . Use this value to change the start of a segment with respect to its event_timestamp . For example, set this variable to 1H to create a segment that starts 1 hour from an event of interest ( shift_direction determines if it\u2019s before or after). shift_direction An integer representing whether the shift is before ( -1 ) or after ( 1 ) an event_timestamp device_id The device id (smartphone or fitbit) to whom this segment belongs to. You have to create a line in this event segment file for each event of a participant that you want to analyse. If you have participants with multiple device ids you can choose any of them Example label,event_timestamp,length,shift,shift_direction,device_id stress1,1587661220000,1H,5M,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 stress2,1587747620000,4H,4H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 stress3,1587906020000,3H,5M,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 stress4,1584291600000,7H,4H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 stress5,1588172420000,9H,5M,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 mood,1587661220000,1H,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 mood,1587747620000,1D,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 mood,1587906020000,7D,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 This example will create eight segments for a single participant ( a748ee1a... ), five independent stressX segments with various lengths (1,4,3,7, and 9 hours). Segments stress1 , stress3 , and stress5 are shifted forwards by 5 minutes and stress2 and stress4 are shifted backwards by 4 hours (that is, if the stress4 event happened on March 15 th at 1pm EST ( 1584291600000 ), the time segment will start on that day at 9am and end at 4pm). The three mood segments are 1 hour, 1 day and 7 days long and have no shift. In addition, these mood segments are grouped together, meaning that although RAPIDS will compute features on each one of them, some necessary information to compute a few of such features will be extracted from all three segments, for example the phone contact that called a participant the most or the location clusters visited by a participant. Segment Examples \u00b6 5-minutes Use the following Frequency segment file to create 288 (12 * 60 * 24) 5-minute segments starting from midnight of every day in your study label,length fiveminutes,5 Daily Use the following Periodic segment file to create daily segments starting from midnight of every day in your study label,start_time,length,repeats_on,repeats_value daily,00:00:00,23H 59M 59S,every_day,0 Morning Use the following Periodic segment file to create morning segments starting at 06:00:00 and ending at 11:59:59 of every day in your study label,start_time,length,repeats_on,repeats_value morning,06:00:00,5H 59M 59S,every_day,0 Overnight Use the following Periodic segment file to create overnight segments starting at 20:00:00 and ending at 07:59:59 (next day) of every day in your study label,start_time,length,repeats_on,repeats_value morning,20:00:00,11H 59M 59S,every_day,0 Weekly Use the following Periodic segment file to create non-overlapping weekly segments starting at midnight of every Monday in your study label,start_time,length,repeats_on,repeats_value weekly,00:00:00,6D 23H 59M 59S,wday,1 Use the following Periodic segment file to create overlapping weekly segments starting at midnight of every day in your study label,start_time,length,repeats_on,repeats_value weekly,00:00:00,6D 23H 59M 59S,every_day,0 Week-ends Use the following Periodic segment file to create week-end segments starting at midnight of every Saturday in your study label,start_time,length,repeats_on,repeats_value weekend,00:00:00,1D 23H 59M 59S,wday,6 Around surveys Use the following Event segment file to create two 2-hour segments that start 1 hour before surveys answered by 3 participants label,event_timestamp,length,shift,shift_direction,device_id survey1,1587661220000,2H,1H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 survey2,1587747620000,2H,1H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 survey1,1587906020000,2H,1H,-1,rqtertsd-43ff-34fr-3eeg-efe4fergregr survey2,1584291600000,2H,1H,-1,rqtertsd-43ff-34fr-3eeg-efe4fergregr survey1,1588172420000,2H,1H,-1,klj34oi2-8frk-2343-21kk-324ljklewlr3 survey2,1584291600000,2H,1H,-1,klj34oi2-8frk-2343-21kk-324ljklewlr3 Device Data Source Configuration \u00b6 You might need to modify the following config keys in your config.yaml depending on what devices your participants used and where you are storing your data. You can ignore [PHONE_DATA_CONFIGURATION] or [FITBIT_DATA_CONFIGURATION] if you are not working with either devices. Phone The relevant config.yaml section looks like this by default: PHONE_DATA_CONFIGURATION : SOURCE : TYPE : DATABASE DATABASE_GROUP : *database_group DEVICE_ID_COLUMN : device_id # column name TIMEZONE : TYPE : SINGLE # SINGLE (MULTIPLE support coming soon) VALUE : *timezone Parameters for [PHONE_DATA_CONFIGURATION] Key Description [SOURCE] [TYPE] Only DATABASE is supported (phone data will be pulled from a database) [SOURCE] [DATABASE_GROUP] *database_group points to the value defined before in Database credentials [SOURCE] [DEVICE_ID_COLUMN] A column that contains strings that uniquely identify smartphones. For data collected with AWARE this is usually device_id [TIMEZONE] [TYPE] Only SINGLE is supported for now [TIMEZONE] [VALUE] *timezone points to the value defined before in Timezone of your study Fitbit The relevant config.yaml section looks like this by default: FITBIT_DATA_CONFIGURATION : SOURCE : TYPE : DATABASE # DATABASE or FILES (set each [FITBIT_SENSOR][TABLE] attribute with a table name or a file path accordingly) COLUMN_FORMAT : JSON # JSON or PLAIN_TEXT DATABASE_GROUP : *database_group DEVICE_ID_COLUMN : device_id # column name TIMEZONE : TYPE : SINGLE # Fitbit devices don't support time zones so we read this data in the timezone indicated by VALUE VALUE : *timezone Parameters for For [FITBIT_DATA_CONFIGURATION] Key Description [SOURCE] [TYPE] DATABASE or FILES (set each [FITBIT_SENSOR] [TABLE] attribute accordingly with a table name or a file path) [SOURCE] [COLUMN_FORMAT] JSON or PLAIN_TEXT . Column format of the source data. If you pulled your data directly from the Fitbit API the column containing the sensor data will be in JSON format [SOURCE] [DATABASE_GROUP] *database_group points to the value defined before in Database credentials . Only used if [TYPE] is DATABASE . [SOURCE] [DEVICE_ID_COLUMN] A column that contains strings that uniquely identify Fitbit devices. [TIMEZONE] [TYPE] Only SINGLE is supported (Fitbit devices always store data in local time). [TIMEZONE] [VALUE] *timezone points to the value defined before in Timezone of your study Sensor and Features to Process \u00b6 Finally, you need to modify the config.yaml section of the sensors you want to extract behavioral features from. All sensors follow the same naming nomenclature ( DEVICE_SENSOR ) and parameter structure which we explain in the Behavioral Features Introduction . Done Head over to Execution to learn how to execute RAPIDS.","title":"Configuration"},{"location":"setup/configuration/#configuration","text":"You need to follow these steps to configure your RAPIDS deployment before you can extract behavioral features Add your database credentials Choose the timezone of your study Create your participants files Select what time segments you want to extract features on Modify your device data source configuration Select what sensors and features you want to process When you are done with this configuration, go to executing RAPIDS . Hint Every time you see config[\"KEY\"] or [KEY] in these docs we are referring to the corresponding key in the config.yaml file.","title":"Configuration"},{"location":"setup/configuration/#database-credentials","text":"Create an empty file called .env in your RAPIDS root directory Add the following lines and replace your database-specific credentials (user, password, host, and database): [ MY_GROUP ] user=MY_USER password=MY_PASSWORD host=MY_HOST port=3306 database=MY_DATABASE Warning The label MY_GROUP is arbitrary but it has to match the following config.yaml key: DATABASE_GROUP : &database_group MY_GROUP Hint If you are using RAPIDS\u2019 docker container and Docker-for-mac or Docker-for-Windows 18.03+, you can connect to a MySQL database in your host machine using host.docker.internal instead of 127.0.0.1 or localhost . In a Linux host you need to run our docker container using docker run --network=\"host\" -d moshiresearch/rapids:latest and then 127.0.0.1 will point to your host machine. Note You can ignore this step if you are only processing Fitbit data in CSV files. RAPIDS only supports MySQL/MariaDB databases. If you would like to add support for a different database engine get in touch and we can discuss how to implement it.","title":"Database credentials"},{"location":"setup/configuration/#timezone-of-your-study","text":"","title":"Timezone of your study"},{"location":"setup/configuration/#single-timezone","text":"If your study only happened in a single time zone, select the appropriate code form this list and change the following config key. Double check your timezone code pick, for example US Eastern Time is America/New_York not EST TIMEZONE : &timezone America/New_York","title":"Single timezone"},{"location":"setup/configuration/#multiple-timezones","text":"Support coming soon.","title":"Multiple timezones"},{"location":"setup/configuration/#participant-files","text":"Participant files link together multiple devices (smartphones and wearables) to specific participants and identify them throughout RAPIDS. You can create these files manually or automatically . Participant files are stored in data/external/participant_files/pxx.yaml and follow a unified structure . Note The list PIDS in config.yaml needs to have the participant file names of the people you want to process. For example, if you created p01.yaml , p02.yaml and p03.yaml files in /data/external/participant_files/ , then PIDS should be: PIDS : [ p01 , p02 , p03 ] Tip Attribute values of the [PHONE] and [FITBIT] sections in every participant file are optional which allows you to analyze data from participants that only carried smartphones, only Fitbit devices, or both. Optional: Migrating participants files with the old format If you were using the pre-release version of RAPIDS with participant files in plain text (as opposed to yaml), you can run the following command and your old files will be converted into yaml files stored in data/external/participant_files/ python tools/update_format_participant_files.py","title":"Participant files"},{"location":"setup/configuration/#structure-of-participants-files","text":"Example of the structure of a participant file In this example, the participant used an android phone, an ios phone, and a fitbit device throughout the study between Apr 23 rd 2020 and Oct 28 th 2020 PHONE : DEVICE_IDS : [ a748ee1a-1d0b-4ae9-9074-279a2b6ba524 , dsadas-2324-fgsf-sdwr-gdfgs4rfsdf43 ] PLATFORMS : [ android , ios ] LABEL : test01 START_DATE : 2020-04-23 END_DATE : 2020-10-28 FITBIT : DEVICE_IDS : [ fitbit1 ] LABEL : test01 START_DATE : 2020-04-23 END_DATE : 2020-10-28 For [PHONE] Key Description [DEVICE_IDS] An array of the strings that uniquely identify each smartphone, you can have more than one for when participants changed phones in the middle of the study, in this case, data from all their devices will be joined and relabeled with the last 1 on this list. [PLATFORMS] An array that specifies the OS of each smartphone in [DEVICE_IDS] , use a combination of android or ios (we support participants that changed platforms in the middle of your study!). If you have an aware_device table in your database you can set [PLATFORMS]: [multiple] and RAPIDS will infer them automatically. [LABEL] A string that is used in reports and visualizations. [START_DATE] A string with format YYY-MM-DD . Only data collected after this date will be included in the analysis [END_DATE] A string with format YYY-MM-DD . Only data collected before this date will be included in the analysis For [FITBIT] Key Description [DEVICE_IDS] An array of the strings that uniquely identify each Fitbit, you can have more than one in case the participant changed devices in the middle of the study, in this case, data from all devices will be joined and relabeled with the last device_id on this list. [LABEL] A string that is used in reports and visualizations. [START_DATE] A string with format YYY-MM-DD . Only data collected after this date will be included in the analysis [END_DATE] A string with format YYY-MM-DD . Only data collected before this date will be included in the analysis","title":"Structure of participants files"},{"location":"setup/configuration/#automatic-creation-of-participant-files","text":"You have two options a) use the aware_device table in your database or b) use a CSV file. In either case, in your config.yaml , set [PHONE_SECTION][ADD] or [FITBIT_SECTION][ADD] to TRUE depending on what devices you used in your study. Set [DEVICE_ID_COLUMN] to the name of the column that uniquely identifies each device and include any device ids you want to ignore in [IGNORED_DEVICE_IDS] . aware_device table Set the following keys in your config.yaml CREATE_PARTICIPANT_FILES : SOURCE : TYPE : AWARE_DEVICE_TABLE DATABASE_GROUP : *database_group CSV_FILE_PATH : \"\" TIMEZONE : *timezone PHONE_SECTION : ADD : TRUE # or FALSE DEVICE_ID_COLUMN : device_id # column name IGNORED_DEVICE_IDS : [] FITBIT_SECTION : ADD : TRUE # or FALSE DEVICE_ID_COLUMN : fitbit_id # column name IGNORED_DEVICE_IDS : [] Then run snakemake -j1 create_participants_files CSV file Set the following keys in your config.yaml . CREATE_PARTICIPANT_FILES : SOURCE : TYPE : CSV_FILE DATABASE_GROUP : \"\" CSV_FILE_PATH : \"your_path/to_your.csv\" TIMEZONE : *timezone PHONE_SECTION : ADD : TRUE # or FALSE DEVICE_ID_COLUMN : device_id # column name IGNORED_DEVICE_IDS : [] FITBIT_SECTION : ADD : TRUE # or FALSE DEVICE_ID_COLUMN : fitbit_id # column name IGNORED_DEVICE_IDS : [] Your CSV file ( [SOURCE][CSV_FILE_PATH] ) should have the following columns but you can omit any values you don\u2019t have on each column: Column Description phone device id The name of this column has to match [PHONE_SECTION][DEVICE_ID_COLUMN] . Separate multiple ids with ; fitbit device id The name of this column has to match [FITBIT_SECTION][DEVICE_ID_COLUMN] . Separate multiple ids with ; pid Unique identifiers with the format pXXX (your participant files will be named with this string platform Use android , ios or multiple as explained above, separate values with ; label A human readable string that is used in reports and visualizations. start_date A string with format YYY-MM-DD . end_date A string with format YYY-MM-DD . Example device_id,pid,label,platform,start_date,end_date,fitbit_id a748ee1a-1d0b-4ae9-9074-279a2b6ba524;dsadas-2324-fgsf-sdwr-gdfgs4rfsdf43,p01,julio,android;ios,2020-01-01,2021-01-01,fitbit1 4c4cf7a1-0340-44bc-be0f-d5053bf7390c,p02,meng,ios,2021-01-01,2022-01-01,fitbit2 Then run snakemake -j1 create_participants_files","title":"Automatic creation of participant files"},{"location":"setup/configuration/#time-segments","text":"Time segments (or epochs) are the time windows on which you want to extract behavioral features. For example, you might want to process data on every day, every morning, or only during weekends. RAPIDS offers three categories of time segments that are flexible enough to cover most use cases: frequency (short time windows every day), periodic (arbitrary time windows on any day), and event (arbitrary time windows around events of interest). See also our examples . Frequency Segments These segments are computed on every day and all have the same duration (for example 30 minutes). Set the following keys in your config.yaml TIME_SEGMENTS : &time_segments TYPE : FREQUENCY FILE : \"data/external/your_frequency_segments.csv\" INCLUDE_PAST_PERIODIC_SEGMENTS : FALSE The file pointed by [TIME_SEGMENTS][FILE] should have the following format and can only have 1 row. Column Description label A string that is used as a prefix in the name of your time segments length An integer representing the duration of your time segments in minutes Example label,length thirtyminutes,30 This configuration will compute 48 time segments for every day when any data from any participant was sensed. For example: start_time,length,label 00:00,30,thirtyminutes0000 00:30,30,thirtyminutes0001 01:00,30,thirtyminutes0002 01:30,30,thirtyminutes0003 ... Periodic Segments These segments can be computed every day, or on specific days of the week, month, quarter, and year. Their minimum duration is 1 minute but they can be as long as you want. Set the following keys in your config.yaml . TIME_SEGMENTS : &time_segments TYPE : PERIODIC FILE : \"data/external/your_periodic_segments.csv\" INCLUDE_PAST_PERIODIC_SEGMENTS : FALSE # or TRUE If [INCLUDE_PAST_PERIODIC_SEGMENTS] is set to TRUE , RAPIDS will consider instances of your segments back enough in the past as to include the first row of data of each participant. For example, if the first row of data from a participant happened on Saturday March 7 th 2020 and the requested segment duration is 7 days starting on every Sunday, the first segment to be considered would start on Sunday March 1 st if [INCLUDE_PAST_PERIODIC_SEGMENTS] is TRUE or on Sunday March 8 th if FALSE . The file pointed by [TIME_SEGMENTS][FILE] should have the following format and can have multiple rows. Column Description label A string that is used as a prefix in the name of your time segments. It has to be unique between rows start_time A string with format HH:MM:SS representing the starting time of this segment on any day length A string representing the length of this segment.It can have one or more of the following strings XXD XXH XXM XXS to represent days, hours, minutes and seconds. For example 7D 23H 59M 59S repeats_on One of the follow options every_day , wday , qday , mday , and yday . The last four represent a week, quarter, month and year day repeats_value An integer complementing repeats_on . If you set repeats_on to every_day set this to 0 , otherwise 1-7 represent a wday starting from Mondays, 1-31 represent a mday , 1-91 represent a qday , and 1-366 represent a yday Example label,start_time,length,repeats_on,repeats_value daily,00:00:00,23H 59M 59S,every_day,0 morning,06:00:00,5H 59M 59S,every_day,0 afternoon,12:00:00,5H 59M 59S,every_day,0 evening,18:00:00,5H 59M 59S,every_day,0 night,00:00:00,5H 59M 59S,every_day,0 This configuration will create five segments instances ( daily , morning , afternoon , evening , night ) on any given day ( every_day set to 0). The daily segment will start at midnight and will last 23:59:59 , the other four segments will start at 6am, 12pm, 6pm, and 12am respectively and last for 05:59:59 . Event segments These segments can be computed before or after an event of interest (defined as any UNIX timestamp). Their minimum duration is 1 minute but they can be as long as you want. The start of each segment can be shifted backwards or forwards from the specified timestamp. Set the following keys in your config.yaml . TIME_SEGMENTS : &time_segments TYPE : EVENT FILE : \"data/external/your_event_segments.csv\" INCLUDE_PAST_PERIODIC_SEGMENTS : FALSE # or TRUE The file pointed by [TIME_SEGMENTS][FILE] should have the following format and can have multiple rows. Column Description label A string that is used as a prefix in the name of your time segments. If labels are unique, every segment is independent; if two or more segments have the same label, their data will be grouped when computing auxiliary data for features like the most frequent contact for calls (the most frequent contact will be computed across all these segments). There cannot be two overlaping event segments with the same label (RAPIDS will throw an error) event_timestamp A UNIX timestamp that represents the moment an event of interest happened (clinical relapse, survey, readmission, etc.). The corresponding time segment will be computed around this moment using length , shift , and shift_direction length A string representing the length of this segment. It can have one or more of the following keys XXD XXH XXM XXS to represent a number of days, hours, minutes, and seconds. For example 7D 23H 59M 59S shift A string representing the time shift from event_timestamp . It can have one or more of the following keys XXD XXH XXM XXS to represent a number of days, hours, minutes and seconds. For example 7D 23H 59M 59S . Use this value to change the start of a segment with respect to its event_timestamp . For example, set this variable to 1H to create a segment that starts 1 hour from an event of interest ( shift_direction determines if it\u2019s before or after). shift_direction An integer representing whether the shift is before ( -1 ) or after ( 1 ) an event_timestamp device_id The device id (smartphone or fitbit) to whom this segment belongs to. You have to create a line in this event segment file for each event of a participant that you want to analyse. If you have participants with multiple device ids you can choose any of them Example label,event_timestamp,length,shift,shift_direction,device_id stress1,1587661220000,1H,5M,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 stress2,1587747620000,4H,4H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 stress3,1587906020000,3H,5M,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 stress4,1584291600000,7H,4H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 stress5,1588172420000,9H,5M,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 mood,1587661220000,1H,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 mood,1587747620000,1D,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 mood,1587906020000,7D,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 This example will create eight segments for a single participant ( a748ee1a... ), five independent stressX segments with various lengths (1,4,3,7, and 9 hours). Segments stress1 , stress3 , and stress5 are shifted forwards by 5 minutes and stress2 and stress4 are shifted backwards by 4 hours (that is, if the stress4 event happened on March 15 th at 1pm EST ( 1584291600000 ), the time segment will start on that day at 9am and end at 4pm). The three mood segments are 1 hour, 1 day and 7 days long and have no shift. In addition, these mood segments are grouped together, meaning that although RAPIDS will compute features on each one of them, some necessary information to compute a few of such features will be extracted from all three segments, for example the phone contact that called a participant the most or the location clusters visited by a participant.","title":"Time Segments"},{"location":"setup/configuration/#segment-examples","text":"5-minutes Use the following Frequency segment file to create 288 (12 * 60 * 24) 5-minute segments starting from midnight of every day in your study label,length fiveminutes,5 Daily Use the following Periodic segment file to create daily segments starting from midnight of every day in your study label,start_time,length,repeats_on,repeats_value daily,00:00:00,23H 59M 59S,every_day,0 Morning Use the following Periodic segment file to create morning segments starting at 06:00:00 and ending at 11:59:59 of every day in your study label,start_time,length,repeats_on,repeats_value morning,06:00:00,5H 59M 59S,every_day,0 Overnight Use the following Periodic segment file to create overnight segments starting at 20:00:00 and ending at 07:59:59 (next day) of every day in your study label,start_time,length,repeats_on,repeats_value morning,20:00:00,11H 59M 59S,every_day,0 Weekly Use the following Periodic segment file to create non-overlapping weekly segments starting at midnight of every Monday in your study label,start_time,length,repeats_on,repeats_value weekly,00:00:00,6D 23H 59M 59S,wday,1 Use the following Periodic segment file to create overlapping weekly segments starting at midnight of every day in your study label,start_time,length,repeats_on,repeats_value weekly,00:00:00,6D 23H 59M 59S,every_day,0 Week-ends Use the following Periodic segment file to create week-end segments starting at midnight of every Saturday in your study label,start_time,length,repeats_on,repeats_value weekend,00:00:00,1D 23H 59M 59S,wday,6 Around surveys Use the following Event segment file to create two 2-hour segments that start 1 hour before surveys answered by 3 participants label,event_timestamp,length,shift,shift_direction,device_id survey1,1587661220000,2H,1H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 survey2,1587747620000,2H,1H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524 survey1,1587906020000,2H,1H,-1,rqtertsd-43ff-34fr-3eeg-efe4fergregr survey2,1584291600000,2H,1H,-1,rqtertsd-43ff-34fr-3eeg-efe4fergregr survey1,1588172420000,2H,1H,-1,klj34oi2-8frk-2343-21kk-324ljklewlr3 survey2,1584291600000,2H,1H,-1,klj34oi2-8frk-2343-21kk-324ljklewlr3","title":"Segment Examples"},{"location":"setup/configuration/#device-data-source-configuration","text":"You might need to modify the following config keys in your config.yaml depending on what devices your participants used and where you are storing your data. You can ignore [PHONE_DATA_CONFIGURATION] or [FITBIT_DATA_CONFIGURATION] if you are not working with either devices. Phone The relevant config.yaml section looks like this by default: PHONE_DATA_CONFIGURATION : SOURCE : TYPE : DATABASE DATABASE_GROUP : *database_group DEVICE_ID_COLUMN : device_id # column name TIMEZONE : TYPE : SINGLE # SINGLE (MULTIPLE support coming soon) VALUE : *timezone Parameters for [PHONE_DATA_CONFIGURATION] Key Description [SOURCE] [TYPE] Only DATABASE is supported (phone data will be pulled from a database) [SOURCE] [DATABASE_GROUP] *database_group points to the value defined before in Database credentials [SOURCE] [DEVICE_ID_COLUMN] A column that contains strings that uniquely identify smartphones. For data collected with AWARE this is usually device_id [TIMEZONE] [TYPE] Only SINGLE is supported for now [TIMEZONE] [VALUE] *timezone points to the value defined before in Timezone of your study Fitbit The relevant config.yaml section looks like this by default: FITBIT_DATA_CONFIGURATION : SOURCE : TYPE : DATABASE # DATABASE or FILES (set each [FITBIT_SENSOR][TABLE] attribute with a table name or a file path accordingly) COLUMN_FORMAT : JSON # JSON or PLAIN_TEXT DATABASE_GROUP : *database_group DEVICE_ID_COLUMN : device_id # column name TIMEZONE : TYPE : SINGLE # Fitbit devices don't support time zones so we read this data in the timezone indicated by VALUE VALUE : *timezone Parameters for For [FITBIT_DATA_CONFIGURATION] Key Description [SOURCE] [TYPE] DATABASE or FILES (set each [FITBIT_SENSOR] [TABLE] attribute accordingly with a table name or a file path) [SOURCE] [COLUMN_FORMAT] JSON or PLAIN_TEXT . Column format of the source data. If you pulled your data directly from the Fitbit API the column containing the sensor data will be in JSON format [SOURCE] [DATABASE_GROUP] *database_group points to the value defined before in Database credentials . Only used if [TYPE] is DATABASE . [SOURCE] [DEVICE_ID_COLUMN] A column that contains strings that uniquely identify Fitbit devices. [TIMEZONE] [TYPE] Only SINGLE is supported (Fitbit devices always store data in local time). [TIMEZONE] [VALUE] *timezone points to the value defined before in Timezone of your study","title":"Device Data Source Configuration"},{"location":"setup/configuration/#sensor-and-features-to-process","text":"Finally, you need to modify the config.yaml section of the sensors you want to extract behavioral features from. All sensors follow the same naming nomenclature ( DEVICE_SENSOR ) and parameter structure which we explain in the Behavioral Features Introduction . Done Head over to Execution to learn how to execute RAPIDS.","title":"Sensor and Features to Process"},{"location":"setup/execution/","text":"Execution \u00b6 After you have installed and configured RAPIDS, use the following command to execute it. ./rapids -j1 Ready to extract behavioral features If you are ready to extract features head over to the Behavioral Features Introduction Info The script ./rapids is a wrapper around Snakemake so you can pass any parameters that Snakemake accepts (e.g. -j1 ). Updating RAPIDS output after modifying config.yaml Any changes to the config.yaml file will be applied automatically and only the relevant files will be updated. This means that after modifying the features list for PHONE_MESSAGE for example, RAPIDS will update the output file with the correct features. Multi-core You can run RAPIDS over multiple cores by modifying the -j argument (e.g. use -j8 to use 8 cores). However , take into account that this means multiple sensor datasets for different participants will be load in memory at the same time. If RAPIDS crashes because it ran out of memory reduce the number of cores and try again. As reference, we have run RAPIDS over 12 cores and 32 Gb of RAM without problems for a study with 200 participants with 14 days of low-frequency smartphone data (no accelerometer, gyroscope, or magnetometer). Forcing a complete rerun If you want to update your data from your database or rerun the whole pipeline from scratch run one or both of the following commands depending on the devices you are using: ./rapids -j1 -R download_phone_data ./rapids -j1 -R download_fitbit_data Deleting RAPIDS output If you want to delete all the output files RAPIDS produces you can execute the following command: ./rapids -j1 --delete-all-output","title":"Execution"},{"location":"setup/execution/#execution","text":"After you have installed and configured RAPIDS, use the following command to execute it. ./rapids -j1 Ready to extract behavioral features If you are ready to extract features head over to the Behavioral Features Introduction Info The script ./rapids is a wrapper around Snakemake so you can pass any parameters that Snakemake accepts (e.g. -j1 ). Updating RAPIDS output after modifying config.yaml Any changes to the config.yaml file will be applied automatically and only the relevant files will be updated. This means that after modifying the features list for PHONE_MESSAGE for example, RAPIDS will update the output file with the correct features. Multi-core You can run RAPIDS over multiple cores by modifying the -j argument (e.g. use -j8 to use 8 cores). However , take into account that this means multiple sensor datasets for different participants will be load in memory at the same time. If RAPIDS crashes because it ran out of memory reduce the number of cores and try again. As reference, we have run RAPIDS over 12 cores and 32 Gb of RAM without problems for a study with 200 participants with 14 days of low-frequency smartphone data (no accelerometer, gyroscope, or magnetometer). Forcing a complete rerun If you want to update your data from your database or rerun the whole pipeline from scratch run one or both of the following commands depending on the devices you are using: ./rapids -j1 -R download_phone_data ./rapids -j1 -R download_fitbit_data Deleting RAPIDS output If you want to delete all the output files RAPIDS produces you can execute the following command: ./rapids -j1 --delete-all-output","title":"Execution"},{"location":"setup/installation/","text":"Installation \u00b6 You can install RAPIDS using Docker (the fastest), or native instructions for MacOS and Linux (Ubuntu). Windows is supported through Docker or WSL. Docker Install Docker Pull our RAPIDS container docker pull moshiresearch/rapids:latest Run RAPIDS' container (after this step is done you should see a prompt in the main RAPIDS folder with its python environment active) docker run -it moshiresearch/rapids:latest Pull the latest version of RAPIDS git pull Make RAPIDS script executable chmod +x rapids Check that RAPIDS is working ./rapids -j1 Optional . You can edit RAPIDS files with vim but we recommend using Visual Studio Code and its Remote Containers extension How to configure Remote Containers extension Make sure RAPIDS container is running Install the Remote - Containers extension Go to the Remote Explorer panel on the left hand sidebar On the top right dropdown menu choose Containers Double click on the moshiresearch/rapids container in the CONTAINERS tree A new VS Code session should open on RAPIDS main folder inside the container. Warning If you installed RAPIDS using Docker for Windows on Windows 10, the container will have limits on the amount of RAM it can use. If you find that RAPIDS crashes due to running out of memory, increase this limit. MacOS We tested these instructions in Catalina Install brew Install MySQL brew install mysql brew services start mysql Install R 4.0, pandoc and rmarkdown. If you have other instances of R, we recommend uninstalling them brew install r brew install pandoc Rscript --vanilla -e 'install.packages(\"rmarkdown\", repos=\"http://cran.us.r-project.org\")' Install miniconda (restart your terminal afterwards) brew cask install miniconda conda init zsh # (or conda init bash) Clone our repo git clone https://github.com/carissalow/rapids Create a python virtual environment cd rapids conda env create -f environment.yml -n rapids conda activate rapids Install R packages and virtual environment: snakemake -j1 renv_install snakemake -j1 renv_restore Note This step could take several minutes to complete, especially if you have less than 3Gb of RAM or packages need to be compiled from source. Please be patient and let it run until completion. Make RAPIDS script executable chmod +x rapids Check that RAPIDS is working ./rapids -j1 Ubuntu We tested RAPIDS on Ubuntu 18.04 & 20.04. Note that the necessary Python and R packages are available in other Linux distributions, so if you decide to give it a try, let us know and we can update these docs. Install dependencies sudo apt install libcurl4-openssl-dev sudo apt install libssl-dev sudo apt install libxml2-dev sudo apt install libglpk40 Install MySQL sudo apt install libmysqlclient-dev sudo apt install mysql-server Add key for R\u2019s repository. sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 Add R\u2019s repository Ubuntu 18.04 Bionic sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/' Ubuntu 20.04 Focal sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' Install R 4.0. If you have other instances of R, we recommend uninstalling them sudo apt update sudo apt install r-base Install Pandoc and rmarkdown sudo apt install pandoc Rscript --vanilla -e 'install.packages(\"rmarkdown\", repos=\"http://cran.us.r-project.org\")' Install git sudo apt install git Install miniconda Restart your current shell Clone our repo: git clone https://github.com/carissalow/rapids Create a python virtual environment: cd rapids conda env create -f environment.yml -n MY_ENV_NAME conda activate MY_ENV_NAME Install the R virtual environment management package (renv) snakemake -j1 renv_install Restore the R virtual environment Ubuntu 18.04 Bionic (fast) Run the following command to restore the R virtual environment using RSPM binaries R -e 'renv::restore(repos = c(CRAN = \"https://packagemanager.rstudio.com/all/__linux__/bionic/latest\"))' Ubuntu 20.04 Focal (fast) Run the following command to restore the R virtual environment using RSPM binaries R -e 'renv::restore(repos = c(CRAN = \"https://packagemanager.rstudio.com/all/__linux__/focal/latest\"))' Ubuntu (slow) If the fast installation command failed for some reason, you can restore the R virtual environment from source: R -e 'renv::restore()' Note This step could take several minutes to complete, especially if you have less than 3Gb of RAM or packages need to be compiled from source. Please be patient and let it run until completion. Make RAPIDS script executable chmod +x rapids Check that RAPIDS is working ./rapids -j1 Windows There are several options varying in complexity: You can use our Docker instructions (tested) You can use our Ubuntu 20.04 instructions on WSL2 (not tested but it will likely work) Native installation (experimental). If you would like to contribute to RAPIDS you could try to install MySQL, miniconda, Python, and R 4.0+ in Windows and restore the Python and R virtual environments using steps 6 and 7 of the instructions for Mac. You can get in touch if you would like to discuss this with the team.","title":"Installation"},{"location":"setup/installation/#installation","text":"You can install RAPIDS using Docker (the fastest), or native instructions for MacOS and Linux (Ubuntu). Windows is supported through Docker or WSL. Docker Install Docker Pull our RAPIDS container docker pull moshiresearch/rapids:latest Run RAPIDS' container (after this step is done you should see a prompt in the main RAPIDS folder with its python environment active) docker run -it moshiresearch/rapids:latest Pull the latest version of RAPIDS git pull Make RAPIDS script executable chmod +x rapids Check that RAPIDS is working ./rapids -j1 Optional . You can edit RAPIDS files with vim but we recommend using Visual Studio Code and its Remote Containers extension How to configure Remote Containers extension Make sure RAPIDS container is running Install the Remote - Containers extension Go to the Remote Explorer panel on the left hand sidebar On the top right dropdown menu choose Containers Double click on the moshiresearch/rapids container in the CONTAINERS tree A new VS Code session should open on RAPIDS main folder inside the container. Warning If you installed RAPIDS using Docker for Windows on Windows 10, the container will have limits on the amount of RAM it can use. If you find that RAPIDS crashes due to running out of memory, increase this limit. MacOS We tested these instructions in Catalina Install brew Install MySQL brew install mysql brew services start mysql Install R 4.0, pandoc and rmarkdown. If you have other instances of R, we recommend uninstalling them brew install r brew install pandoc Rscript --vanilla -e 'install.packages(\"rmarkdown\", repos=\"http://cran.us.r-project.org\")' Install miniconda (restart your terminal afterwards) brew cask install miniconda conda init zsh # (or conda init bash) Clone our repo git clone https://github.com/carissalow/rapids Create a python virtual environment cd rapids conda env create -f environment.yml -n rapids conda activate rapids Install R packages and virtual environment: snakemake -j1 renv_install snakemake -j1 renv_restore Note This step could take several minutes to complete, especially if you have less than 3Gb of RAM or packages need to be compiled from source. Please be patient and let it run until completion. Make RAPIDS script executable chmod +x rapids Check that RAPIDS is working ./rapids -j1 Ubuntu We tested RAPIDS on Ubuntu 18.04 & 20.04. Note that the necessary Python and R packages are available in other Linux distributions, so if you decide to give it a try, let us know and we can update these docs. Install dependencies sudo apt install libcurl4-openssl-dev sudo apt install libssl-dev sudo apt install libxml2-dev sudo apt install libglpk40 Install MySQL sudo apt install libmysqlclient-dev sudo apt install mysql-server Add key for R\u2019s repository. sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 Add R\u2019s repository Ubuntu 18.04 Bionic sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/' Ubuntu 20.04 Focal sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' Install R 4.0. If you have other instances of R, we recommend uninstalling them sudo apt update sudo apt install r-base Install Pandoc and rmarkdown sudo apt install pandoc Rscript --vanilla -e 'install.packages(\"rmarkdown\", repos=\"http://cran.us.r-project.org\")' Install git sudo apt install git Install miniconda Restart your current shell Clone our repo: git clone https://github.com/carissalow/rapids Create a python virtual environment: cd rapids conda env create -f environment.yml -n MY_ENV_NAME conda activate MY_ENV_NAME Install the R virtual environment management package (renv) snakemake -j1 renv_install Restore the R virtual environment Ubuntu 18.04 Bionic (fast) Run the following command to restore the R virtual environment using RSPM binaries R -e 'renv::restore(repos = c(CRAN = \"https://packagemanager.rstudio.com/all/__linux__/bionic/latest\"))' Ubuntu 20.04 Focal (fast) Run the following command to restore the R virtual environment using RSPM binaries R -e 'renv::restore(repos = c(CRAN = \"https://packagemanager.rstudio.com/all/__linux__/focal/latest\"))' Ubuntu (slow) If the fast installation command failed for some reason, you can restore the R virtual environment from source: R -e 'renv::restore()' Note This step could take several minutes to complete, especially if you have less than 3Gb of RAM or packages need to be compiled from source. Please be patient and let it run until completion. Make RAPIDS script executable chmod +x rapids Check that RAPIDS is working ./rapids -j1 Windows There are several options varying in complexity: You can use our Docker instructions (tested) You can use our Ubuntu 20.04 instructions on WSL2 (not tested but it will likely work) Native installation (experimental). If you would like to contribute to RAPIDS you could try to install MySQL, miniconda, Python, and R 4.0+ in Windows and restore the Python and R virtual environments using steps 6 and 7 of the instructions for Mac. You can get in touch if you would like to discuss this with the team.","title":"Installation"},{"location":"visualizations/data-quality-visualizations/","text":"Data Quality Visualizations \u00b6 We showcase these visualizations with a test study that collected 14 days of smartphone and Fitbit data from two participants (t01 and t02) and extracted behavioral features within five time segments (daily, morning, afternoon, evening, and night). Note Time segments (e.g. daily , morning , etc.) can have multiple instances (day 1, day 2, or morning 1, morning 2, etc.) 1. Histograms of phone data yield \u00b6 RAPIDS provides two histograms that show the number of time segment instances that had a certain ratio of valid yielded minutes and hours , respectively. A valid yielded minute has at least 1 row of data from any smartphone sensor and a valid yielded hour contains at least M valid minutes. These plots can be used as a rough indication of the smartphone monitoring coverage during a study aggregated across all participants. For example, the figure below shows a valid yielded minutes histogram for daily segments and we can infer that the monitoring coverage was very good since almost all segments contain at least 90 to 100% of the expected sensed minutes. Example Click here to see an example of these interactive visualizations in HTML format Histogram of the data yielded minute ratio for a single participant during five time segments (daily, afternoon, evening, and night) 2. Heatmaps of overall data yield \u00b6 These heatmaps are a break down per time segment and per participant of Visualization 1 . Heatmap\u2019s rows represent participants, columns represent time segment instances and the cells\u2019 color represent the valid yielded minute or hour ratio for a participant during a time segment instance. As different participants might join a study on different dates and time segments can be of any length and start on any day, the x-axis is labelled with the time delta between the start of each time segment instance minus the start of the first instance. These plots provide a quick study overview of the monitoring coverage per person and per time segment. The figure below shows the heatmap of the valid yielded minute ratio for participants t01 and t02 on daily segments and, as we inferred from the previous histogram, the lighter (yellow) color on most time segment instances (cells) indicate both phones sensed data without interruptions for most days (except for the first and last ones). Example Click here to see an example of these interactive visualizations in HTML format Overall compliance heatmap for all participants 3. Heatmap of recorded phone sensors \u00b6 In these heatmaps rows represent time segment instances, columns represent minutes since the start of a time segment instance, and cells\u2019 color shows the number of phone sensors that logged at least one row of data during those 1-minute windows. RAPIDS creates a plot per participant and per time segment and can be used as a rough indication of whether time-based sensors were following their sensing schedule (e.g. if location was being sensed every 2 minutes). The figure below shows this heatmap for phone sensors collected by participant t01 in daily time segments from Apr 23 rd 2020 to May 4 th 2020. We can infer that for most of the monitoring time, the participant\u2019s phone logged data from at least 8 sensors each minute. Example Click here to see an example of these interactive visualizations in HTML format Heatmap of the recorded phone sensors per minute and per time segment of a single participant 4. Heatmap of sensor row count \u00b6 These heatmaps are a per-sensor breakdown of Visualization 1 and Visualization 2 . Note that the second row (ratio of valid yielded minutes) of this heatmap matches the respective participant (bottom) row the screenshot in Visualization 2. In these heatmaps rows represent phone or Fitbit sensors, columns represent time segment instances and cell\u2019s color shows the normalized (0 to 1) row count of each sensor within a time segment instance. RAPIDS creates one heatmap per participant and they can be used to judge missing data on a per participant and per sensor basis. The figure below shows data for 16 phone sensors (including data yield) of t01\u2019s daily segments (only half of the sensor names and dates are visible in the screenshot but all can be accessed in the interactive plot). From the top two rows, we can see that the phone was sensing data for most of the monitoring period (as suggested by Figure 3 and Figure 4). We can also infer how phone usage influenced the different sensor streams; there are peaks of screen events during the first day (Apr 23 rd ), peaks of location coordinates on Apr 26 th and Apr 30 th , and no sent or received SMS except for Apr 23 rd , Apr 29 th and Apr 30 th (unlabeled row between screen and locations). Example Click here to see an example of these interactive visualizations in HTML format Heatmap of the sensor row count per time segment of a single participant","title":"Data Quality"},{"location":"visualizations/data-quality-visualizations/#data-quality-visualizations","text":"We showcase these visualizations with a test study that collected 14 days of smartphone and Fitbit data from two participants (t01 and t02) and extracted behavioral features within five time segments (daily, morning, afternoon, evening, and night). Note Time segments (e.g. daily , morning , etc.) can have multiple instances (day 1, day 2, or morning 1, morning 2, etc.)","title":"Data Quality Visualizations"},{"location":"visualizations/data-quality-visualizations/#1-histograms-of-phone-data-yield","text":"RAPIDS provides two histograms that show the number of time segment instances that had a certain ratio of valid yielded minutes and hours , respectively. A valid yielded minute has at least 1 row of data from any smartphone sensor and a valid yielded hour contains at least M valid minutes. These plots can be used as a rough indication of the smartphone monitoring coverage during a study aggregated across all participants. For example, the figure below shows a valid yielded minutes histogram for daily segments and we can infer that the monitoring coverage was very good since almost all segments contain at least 90 to 100% of the expected sensed minutes. Example Click here to see an example of these interactive visualizations in HTML format Histogram of the data yielded minute ratio for a single participant during five time segments (daily, afternoon, evening, and night)","title":"1. Histograms of phone data yield"},{"location":"visualizations/data-quality-visualizations/#2-heatmaps-of-overall-data-yield","text":"These heatmaps are a break down per time segment and per participant of Visualization 1 . Heatmap\u2019s rows represent participants, columns represent time segment instances and the cells\u2019 color represent the valid yielded minute or hour ratio for a participant during a time segment instance. As different participants might join a study on different dates and time segments can be of any length and start on any day, the x-axis is labelled with the time delta between the start of each time segment instance minus the start of the first instance. These plots provide a quick study overview of the monitoring coverage per person and per time segment. The figure below shows the heatmap of the valid yielded minute ratio for participants t01 and t02 on daily segments and, as we inferred from the previous histogram, the lighter (yellow) color on most time segment instances (cells) indicate both phones sensed data without interruptions for most days (except for the first and last ones). Example Click here to see an example of these interactive visualizations in HTML format Overall compliance heatmap for all participants","title":"2. Heatmaps of overall data yield"},{"location":"visualizations/data-quality-visualizations/#3-heatmap-of-recorded-phone-sensors","text":"In these heatmaps rows represent time segment instances, columns represent minutes since the start of a time segment instance, and cells\u2019 color shows the number of phone sensors that logged at least one row of data during those 1-minute windows. RAPIDS creates a plot per participant and per time segment and can be used as a rough indication of whether time-based sensors were following their sensing schedule (e.g. if location was being sensed every 2 minutes). The figure below shows this heatmap for phone sensors collected by participant t01 in daily time segments from Apr 23 rd 2020 to May 4 th 2020. We can infer that for most of the monitoring time, the participant\u2019s phone logged data from at least 8 sensors each minute. Example Click here to see an example of these interactive visualizations in HTML format Heatmap of the recorded phone sensors per minute and per time segment of a single participant","title":"3. Heatmap of recorded phone sensors"},{"location":"visualizations/data-quality-visualizations/#4-heatmap-of-sensor-row-count","text":"These heatmaps are a per-sensor breakdown of Visualization 1 and Visualization 2 . Note that the second row (ratio of valid yielded minutes) of this heatmap matches the respective participant (bottom) row the screenshot in Visualization 2. In these heatmaps rows represent phone or Fitbit sensors, columns represent time segment instances and cell\u2019s color shows the normalized (0 to 1) row count of each sensor within a time segment instance. RAPIDS creates one heatmap per participant and they can be used to judge missing data on a per participant and per sensor basis. The figure below shows data for 16 phone sensors (including data yield) of t01\u2019s daily segments (only half of the sensor names and dates are visible in the screenshot but all can be accessed in the interactive plot). From the top two rows, we can see that the phone was sensing data for most of the monitoring period (as suggested by Figure 3 and Figure 4). We can also infer how phone usage influenced the different sensor streams; there are peaks of screen events during the first day (Apr 23 rd ), peaks of location coordinates on Apr 26 th and Apr 30 th , and no sent or received SMS except for Apr 23 rd , Apr 29 th and Apr 30 th (unlabeled row between screen and locations). Example Click here to see an example of these interactive visualizations in HTML format Heatmap of the sensor row count per time segment of a single participant","title":"4. Heatmap of sensor row count"},{"location":"visualizations/feature-visualizations/","text":"Feature Visualizations \u00b6 1. Heatmap Correlation Matrix \u00b6 Columns and rows are the behavioral features computed in RAPIDS, cells\u2019 color represents the correlation coefficient between all days of data for every pair of features of all participants. The user can specify a minimum number of observations ( time segment instances) required to compute the correlation between two features using the MIN_ROWS_RATIO parameter (0.5 by default) and the correlation method (Pearson, Spearman or Kendall) with the CORR_METHOD parameter. In addition, this plot can be configured to only display correlation coefficients above a threshold using the CORR_THRESHOLD parameter (0.1 by default). Example Click here to see an example of these interactive visualizations in HTML format Correlation matrix heatmap for all the features of all participants","title":"Features"},{"location":"visualizations/feature-visualizations/#feature-visualizations","text":"","title":"Feature Visualizations"},{"location":"visualizations/feature-visualizations/#1-heatmap-correlation-matrix","text":"Columns and rows are the behavioral features computed in RAPIDS, cells\u2019 color represents the correlation coefficient between all days of data for every pair of features of all participants. The user can specify a minimum number of observations ( time segment instances) required to compute the correlation between two features using the MIN_ROWS_RATIO parameter (0.5 by default) and the correlation method (Pearson, Spearman or Kendall) with the CORR_METHOD parameter. In addition, this plot can be configured to only display correlation coefficients above a threshold using the CORR_THRESHOLD parameter (0.1 by default). Example Click here to see an example of these interactive visualizations in HTML format Correlation matrix heatmap for all the features of all participants","title":"1. Heatmap Correlation Matrix"},{"location":"workflow-examples/analysis/","text":"Analysis Workflow Example \u00b6 TL;DR In addition to using RAPIDS to extract behavioral features and create plots, you can structure your data analysis within RAPIDS (i.e. cleaning your features and creating ML/statistical models) We include an analysis example in RAPIDS that covers raw data processing, cleaning, feature extraction, machine learning modeling, and evaluation Use this example as a guide to structure your own analysis within RAPIDS RAPIDS analysis workflows are compatible with your favorite data science tools and libraries RAPIDS analysis workflows are reproducible and we encourage you to publish them along with your research papers Why should I integrate my analysis in RAPIDS? \u00b6 Even though the bulk of RAPIDS current functionality is related to the computation of behavioral features, we recommend RAPIDS as a complementary tool to create a mobile data analysis workflow. This is because the cookiecutter data science file organization guidelines, the use of Snakemake, the provided behavioral features, and the reproducible R and Python development environments allow researchers to divide an analysis workflow into small parts that can be audited, shared in an online repository, reproduced in other computers, and understood by other people as they follow a familiar and consistent structure. We believe these advantages outweigh the time needed to learn how to create these workflows in RAPIDS. We clarify that to create analysis workflows in RAPIDS, researchers can still use any data manipulation tools, editors, libraries or languages they are already familiar with. RAPIDS is meant to be the final destination of analysis code that was developed in interactive notebooks or stand-alone scripts. For example, a user can compute call and location features using RAPIDS, then, they can use Jupyter notebooks to explore feature cleaning approaches and once the cleaning code is final, it can be moved to RAPIDS as a new step in the pipeline. In turn, the output of this cleaning step can be used to explore machine learning models and once a model is finished, it can also be transferred to RAPIDS as a step of its own. The idea is that when it is time to publish a piece of research, a RAPIDS workflow can be shared in a public repository as is. In the following sections we share an example of how we structured an analysis workflow in RAPIDS. Analysis workflow structure \u00b6 To accurately reflect the complexity of a real-world modeling scenario, we decided not to oversimplify this example. Importantly, every step in this example follows a basic structure: an input file and parameters are manipulated by an R or Python script that saves the results to an output file. Input files, parameters, output files and scripts are grouped into Snakemake rules that are described on smk files in the rules folder (we point the reader to the relevant rule(s) of each step). Researchers can use these rules and scripts as a guide to create their own as it is expected every modeling project will have different requirements, data and goals but ultimately most follow a similar chainned pattern. Hint The example\u2019s config file is example_profile/example_config.yaml and its Snakefile is in example_profile/Snakefile . The config file is already configured to process the sensor data as explained in Analysis workflow modules . Description of the study modeled in our analysis workflow example \u00b6 Our example is based on a hypothetical study that recruited 2 participants that underwent surgery and collected mobile data for at least one week before and one week after the procedure. Participants wore a Fitbit device and installed the AWARE client in their personal Android and iOS smartphones to collect mobile data 24/7. In addition, participants completed daily severity ratings of 12 common symptoms on a scale from 0 to 10 that we summed up into a daily symptom burden score. The goal of this workflow is to find out if we can predict the daily symptom burden score of a participant. Thus, we framed this question as a binary classification problem with two classes, high and low symptom burden based on the scores above and below average of each participant. We also want to compare the performance of individual (personalized) models vs a population model. In total, our example workflow has nine steps that are in charge of sensor data preprocessing, feature extraction, feature cleaning, machine learning model training and model evaluation (see figure below). We ship this workflow with RAPIDS and share a database with test data in an Open Science Framework repository. Modules of RAPIDS example workflow, from raw data to model evaluation Configure and run the analysis workflow example \u00b6 Install RAPIDS Configure the user credentials of a local or remote MySQL server with writing permissions in your .env file. The config file where you need to modify the DATABASE_GROUP is at example_profile/example_config.yaml . Skip this step if you are using RAPIDS docker container . Unzip the test database to data/external/rapids_example.sql and run: ./rapids -j1 restore_sql_file --profile example_profile Create the participant files for this example by running: ./rapids -j1 create_example_participant_files Run the example pipeline with: ./rapids -j1 --profile example_profile Modules of our analysis workflow example \u00b6 1. Feature extraction We extract daily behavioral features for data yield, received and sent messages, missed, incoming and outgoing calls, resample fused location data using Doryab provider, activity recognition, battery, Bluetooth, screen, light, applications foreground, conversations, Wi-Fi connected, Wi-Fi visible, Fitbit heart rate summary and intraday data, Fitbit sleep summary data, and Fitbit step summary and intraday data without excluding sleep periods with an active bout threshold of 10 steps. In total, we obtained 237 daily sensor features over 12 days per participant. 2. Extract demographic data. It is common to have demographic data in addition to mobile and target (ground truth) data. In this example we include participants\u2019 age, gender and the number of days they spent in hospital after their surgery as features in our model. We extract these three columns from the participant_info table of our test database . As these three features remain the same within participants, they are used only on the population model. Refer to the demographic_features rule in rules/models.smk . 3. Create target labels. The two classes for our machine learning binary classification problem are high and low symptom burden. Target values are already stored in the participant_target table of our test database and transferred to a CSV file. A new rule/script can be created if further manipulation is necessary. Refer to the parse_targets rule in rules/models.smk . 4. Feature merging. These daily features are stored on a CSV file per sensor, a CSV file per participant, and a CSV file including all features from all participants (in every case each column represents a feature and each row represents a day). Refer to the merge_sensor_features_for_individual_participants and merge_features_for_population_model rules in rules/features.smk . 5. Data visualization. At this point the user can use the five plots RAPIDS provides (or implement new ones) to explore and understand the quality of the raw data and extracted features and decide what sensors, days, or participants to include and exclude. Refer to rules/reports.smk to find the rules that generate these plots. 6. Feature cleaning. In this stage we perform four steps to clean our sensor feature file. First, we discard days with a data yield hour ratio less than or equal to 0.75, i.e. we include days with at least 18 hours of data. Second, we drop columns (features) with more than 30% of missing rows. Third, we drop columns with zero variance. Fourth, we drop rows (days) with more than 30% of missing columns (features). In this cleaning stage several parameters are created and exposed in example_profile/example_config.yaml . After this step, we kept 162 features over 11 days for the individual model of p01, 107 features over 12 days for the individual model of p02 and 101 features over 20 days for the population model. Note that the difference in the number of features between p01 and p02 is mostly due to iOS restrictions that stops researchers from collecting the same number of sensors than in Android phones. Feature cleaning for the individual models is done in the clean_sensor_features_for_individual_participants rule and for the population model in the clean_sensor_features_for_all_participants rule in rules/models.smk . 7. Merge features and targets. In this step we merge the cleaned features and target labels for our individual models in the merge_features_and_targets_for_individual_model rule in rules/models.smk . Additionally, we merge the cleaned features, target labels, and demographic features of our two participants for the population model in the merge_features_and_targets_for_population_model rule in rules/models.smk . These two merged files are the input for our individual and population models. 8. Modelling. This stage has three phases: model building, training and evaluation. In the building phase we impute, normalize and oversample our dataset. Missing numeric values in each column are imputed with their mean and we impute missing categorical values with their mode. We normalize each numeric column with one of three strategies (min-max, z-score, and scikit-learn package\u2019s robust scaler) and we one-hot encode each categorial feature as a numerical array. We oversample our imbalanced dataset using SMOTE (Synthetic Minority Over-sampling Technique) or a Random Over sampler from scikit-learn. All these parameters are exposed in example_profile/example_config.yaml . In the training phase, we create eight models: logistic regression, k-nearest neighbors, support vector machine, decision tree, random forest, gradient boosting classifier, extreme gradient boosting classifier and a light gradient boosting machine. We cross-validate each model with an inner cycle to tune hyper-parameters based on the Macro F1 score and an outer cycle to predict the test set on a model with the best hyper-parameters. Both cross-validation cycles use a leave-one-out strategy. Parameters for each model like weights and learning rates are exposed in example_profile/example_config.yaml . Finally, in the evaluation phase we compute the accuracy, Macro F1, kappa, area under the curve and per class precision, recall and F1 score of all folds of the outer cross-validation cycle. Refer to the modelling_for_individual_participants rule for the individual modeling and to the modelling_for_all_participants rule for the population modeling, both in rules/models.smk . 9. Compute model baselines. We create three baselines to evaluate our classification models. First, a majority classifier that labels each test sample with the majority class of our training data. Second, a random weighted classifier that predicts each test observation sampling at random from a binomial distribution based on the ratio of our target labels. Third, a decision tree classifier based solely on the demographic features of each participant. As we do not have demographic features for individual model, this baseline is only available for population model. Our baseline metrics (e.g. accuracy, precision, etc.) are saved into a CSV file, ready to be compared to our modeling results. Refer to the baselines_for_individual_model rule for the individual model baselines and to the baselines_for_population_model rule for population model baselines, both in rules/models.smk .","title":"Analysis"},{"location":"workflow-examples/analysis/#analysis-workflow-example","text":"TL;DR In addition to using RAPIDS to extract behavioral features and create plots, you can structure your data analysis within RAPIDS (i.e. cleaning your features and creating ML/statistical models) We include an analysis example in RAPIDS that covers raw data processing, cleaning, feature extraction, machine learning modeling, and evaluation Use this example as a guide to structure your own analysis within RAPIDS RAPIDS analysis workflows are compatible with your favorite data science tools and libraries RAPIDS analysis workflows are reproducible and we encourage you to publish them along with your research papers","title":"Analysis Workflow Example"},{"location":"workflow-examples/analysis/#why-should-i-integrate-my-analysis-in-rapids","text":"Even though the bulk of RAPIDS current functionality is related to the computation of behavioral features, we recommend RAPIDS as a complementary tool to create a mobile data analysis workflow. This is because the cookiecutter data science file organization guidelines, the use of Snakemake, the provided behavioral features, and the reproducible R and Python development environments allow researchers to divide an analysis workflow into small parts that can be audited, shared in an online repository, reproduced in other computers, and understood by other people as they follow a familiar and consistent structure. We believe these advantages outweigh the time needed to learn how to create these workflows in RAPIDS. We clarify that to create analysis workflows in RAPIDS, researchers can still use any data manipulation tools, editors, libraries or languages they are already familiar with. RAPIDS is meant to be the final destination of analysis code that was developed in interactive notebooks or stand-alone scripts. For example, a user can compute call and location features using RAPIDS, then, they can use Jupyter notebooks to explore feature cleaning approaches and once the cleaning code is final, it can be moved to RAPIDS as a new step in the pipeline. In turn, the output of this cleaning step can be used to explore machine learning models and once a model is finished, it can also be transferred to RAPIDS as a step of its own. The idea is that when it is time to publish a piece of research, a RAPIDS workflow can be shared in a public repository as is. In the following sections we share an example of how we structured an analysis workflow in RAPIDS.","title":"Why should I integrate my analysis in RAPIDS?"},{"location":"workflow-examples/analysis/#analysis-workflow-structure","text":"To accurately reflect the complexity of a real-world modeling scenario, we decided not to oversimplify this example. Importantly, every step in this example follows a basic structure: an input file and parameters are manipulated by an R or Python script that saves the results to an output file. Input files, parameters, output files and scripts are grouped into Snakemake rules that are described on smk files in the rules folder (we point the reader to the relevant rule(s) of each step). Researchers can use these rules and scripts as a guide to create their own as it is expected every modeling project will have different requirements, data and goals but ultimately most follow a similar chainned pattern. Hint The example\u2019s config file is example_profile/example_config.yaml and its Snakefile is in example_profile/Snakefile . The config file is already configured to process the sensor data as explained in Analysis workflow modules .","title":"Analysis workflow structure"},{"location":"workflow-examples/analysis/#description-of-the-study-modeled-in-our-analysis-workflow-example","text":"Our example is based on a hypothetical study that recruited 2 participants that underwent surgery and collected mobile data for at least one week before and one week after the procedure. Participants wore a Fitbit device and installed the AWARE client in their personal Android and iOS smartphones to collect mobile data 24/7. In addition, participants completed daily severity ratings of 12 common symptoms on a scale from 0 to 10 that we summed up into a daily symptom burden score. The goal of this workflow is to find out if we can predict the daily symptom burden score of a participant. Thus, we framed this question as a binary classification problem with two classes, high and low symptom burden based on the scores above and below average of each participant. We also want to compare the performance of individual (personalized) models vs a population model. In total, our example workflow has nine steps that are in charge of sensor data preprocessing, feature extraction, feature cleaning, machine learning model training and model evaluation (see figure below). We ship this workflow with RAPIDS and share a database with test data in an Open Science Framework repository. Modules of RAPIDS example workflow, from raw data to model evaluation","title":"Description of the study modeled in our analysis workflow example"},{"location":"workflow-examples/analysis/#configure-and-run-the-analysis-workflow-example","text":"Install RAPIDS Configure the user credentials of a local or remote MySQL server with writing permissions in your .env file. The config file where you need to modify the DATABASE_GROUP is at example_profile/example_config.yaml . Skip this step if you are using RAPIDS docker container . Unzip the test database to data/external/rapids_example.sql and run: ./rapids -j1 restore_sql_file --profile example_profile Create the participant files for this example by running: ./rapids -j1 create_example_participant_files Run the example pipeline with: ./rapids -j1 --profile example_profile","title":"Configure and run the analysis workflow example"},{"location":"workflow-examples/analysis/#modules-of-our-analysis-workflow-example","text":"1. Feature extraction We extract daily behavioral features for data yield, received and sent messages, missed, incoming and outgoing calls, resample fused location data using Doryab provider, activity recognition, battery, Bluetooth, screen, light, applications foreground, conversations, Wi-Fi connected, Wi-Fi visible, Fitbit heart rate summary and intraday data, Fitbit sleep summary data, and Fitbit step summary and intraday data without excluding sleep periods with an active bout threshold of 10 steps. In total, we obtained 237 daily sensor features over 12 days per participant. 2. Extract demographic data. It is common to have demographic data in addition to mobile and target (ground truth) data. In this example we include participants\u2019 age, gender and the number of days they spent in hospital after their surgery as features in our model. We extract these three columns from the participant_info table of our test database . As these three features remain the same within participants, they are used only on the population model. Refer to the demographic_features rule in rules/models.smk . 3. Create target labels. The two classes for our machine learning binary classification problem are high and low symptom burden. Target values are already stored in the participant_target table of our test database and transferred to a CSV file. A new rule/script can be created if further manipulation is necessary. Refer to the parse_targets rule in rules/models.smk . 4. Feature merging. These daily features are stored on a CSV file per sensor, a CSV file per participant, and a CSV file including all features from all participants (in every case each column represents a feature and each row represents a day). Refer to the merge_sensor_features_for_individual_participants and merge_features_for_population_model rules in rules/features.smk . 5. Data visualization. At this point the user can use the five plots RAPIDS provides (or implement new ones) to explore and understand the quality of the raw data and extracted features and decide what sensors, days, or participants to include and exclude. Refer to rules/reports.smk to find the rules that generate these plots. 6. Feature cleaning. In this stage we perform four steps to clean our sensor feature file. First, we discard days with a data yield hour ratio less than or equal to 0.75, i.e. we include days with at least 18 hours of data. Second, we drop columns (features) with more than 30% of missing rows. Third, we drop columns with zero variance. Fourth, we drop rows (days) with more than 30% of missing columns (features). In this cleaning stage several parameters are created and exposed in example_profile/example_config.yaml . After this step, we kept 162 features over 11 days for the individual model of p01, 107 features over 12 days for the individual model of p02 and 101 features over 20 days for the population model. Note that the difference in the number of features between p01 and p02 is mostly due to iOS restrictions that stops researchers from collecting the same number of sensors than in Android phones. Feature cleaning for the individual models is done in the clean_sensor_features_for_individual_participants rule and for the population model in the clean_sensor_features_for_all_participants rule in rules/models.smk . 7. Merge features and targets. In this step we merge the cleaned features and target labels for our individual models in the merge_features_and_targets_for_individual_model rule in rules/models.smk . Additionally, we merge the cleaned features, target labels, and demographic features of our two participants for the population model in the merge_features_and_targets_for_population_model rule in rules/models.smk . These two merged files are the input for our individual and population models. 8. Modelling. This stage has three phases: model building, training and evaluation. In the building phase we impute, normalize and oversample our dataset. Missing numeric values in each column are imputed with their mean and we impute missing categorical values with their mode. We normalize each numeric column with one of three strategies (min-max, z-score, and scikit-learn package\u2019s robust scaler) and we one-hot encode each categorial feature as a numerical array. We oversample our imbalanced dataset using SMOTE (Synthetic Minority Over-sampling Technique) or a Random Over sampler from scikit-learn. All these parameters are exposed in example_profile/example_config.yaml . In the training phase, we create eight models: logistic regression, k-nearest neighbors, support vector machine, decision tree, random forest, gradient boosting classifier, extreme gradient boosting classifier and a light gradient boosting machine. We cross-validate each model with an inner cycle to tune hyper-parameters based on the Macro F1 score and an outer cycle to predict the test set on a model with the best hyper-parameters. Both cross-validation cycles use a leave-one-out strategy. Parameters for each model like weights and learning rates are exposed in example_profile/example_config.yaml . Finally, in the evaluation phase we compute the accuracy, Macro F1, kappa, area under the curve and per class precision, recall and F1 score of all folds of the outer cross-validation cycle. Refer to the modelling_for_individual_participants rule for the individual modeling and to the modelling_for_all_participants rule for the population modeling, both in rules/models.smk . 9. Compute model baselines. We create three baselines to evaluate our classification models. First, a majority classifier that labels each test sample with the majority class of our training data. Second, a random weighted classifier that predicts each test observation sampling at random from a binomial distribution based on the ratio of our target labels. Third, a decision tree classifier based solely on the demographic features of each participant. As we do not have demographic features for individual model, this baseline is only available for population model. Our baseline metrics (e.g. accuracy, precision, etc.) are saved into a CSV file, ready to be compared to our modeling results. Refer to the baselines_for_individual_model rule for the individual model baselines and to the baselines_for_population_model rule for population model baselines, both in rules/models.smk .","title":"Modules of our analysis workflow example"},{"location":"workflow-examples/minimal/","text":"Minimal Working Example \u00b6 This is a quick guide for creating and running a simple pipeline to extract missing, outgoing, and incoming call features for daily ( 00:00:00 to 23:59:59 ) and night ( 00:00:00 to 05:59:59 ) epochs of every day of data of one participant monitored on the US East coast with an Android smartphone. Hint If you don\u2019t have call data that you can use to try this example you can restore this CSV file as a table in a MySQL database. Install RAPIDS and make sure your conda environment is active (see Installation ) Make the changes listed below for the corresponding Configuration step (we provide an example of what the relevant sections in your config.yml will look like after you are done) Required configuration changes Add your database credentials . Setup your database connection credentials in .env , we assume your credentials group in the .env file is called MY_GROUP . Choose the timezone of your study . Since this example is processing data collected on the US East cost, America/New_York should be the configured timezone, change this according to your data. Create your participants files . Since we are processing data from a single participant, you only need to create a single participant file called p01.yaml . This participant file only has a PHONE section because this hypothetical participant was only monitored with an smartphone. You also need to add p01 to [PIDS] in config.yaml . The following would be the content of your p01.yaml participant file: PHONE : DEVICE_IDS : [ a748ee1a-1d0b-4ae9-9074-279a2b6ba524 ] # the participant's AWARE device id PLATFORMS : [ android ] # or ios LABEL : MyTestP01 # any string START_DATE : 2020-01-01 # this can also be empty END_DATE : 2021-01-01 # this can also be empty Select what time segments you want to extract features on. [TIME_SEGMENTS][TYPE] should be the default PERIODIC . Change [TIME_SEGMENTS][FILE] with the path (for example data/external/timesegments_periodic.csv ) of a file containing the following lines: label,start_time,length,repeats_on,repeats_value daily,00:00:00,23H 59M 59S,every_day,0 night,00:00:00,5H 59M 59S,every_day,0 Modify your device data source configuration In this example we do not need to modify this section because we are using smartphone data collected with AWARE stored on a MySQL database. Select what sensors and features you want to process. Set [PHONE_CALLS][PROVIDERS][RAPIDS][COMPUTE] to True in the config.yaml file. Example of the config.yaml sections after the changes outlined above Highlighted lines are related to the configuration steps above. PIDS : [ p01 ] TIMEZONE : &timezone America/New_York DATABASE_GROUP : &database_group MY_GROUP # ... other irrelevant sections TIME_SEGMENTS : &time_segments TYPE : PERIODIC FILE : \"data/external/timesegments_periodic.csv\" # make sure the three lines specified above are in the file INCLUDE_PAST_PERIODIC_SEGMENTS : FALSE # No need to change this if you collected AWARE data on a database and your credentials are grouped under `MY_GROUP` in `.env` DEVICE_DATA : PHONE : SOURCE : TYPE : DATABASE DATABASE_GROUP : *database_group DEVICE_ID_COLUMN : device_id # column name TIMEZONE : TYPE : SINGLE # SINGLE or MULTIPLE VALUE : *timezone ############## PHONE ########################################################### ################################################################################ # ... other irrelevant sections # Communication call features config, TYPES and FEATURES keys need to match PHONE_CALLS : TABLE : calls # change if your calls table has a different name PROVIDERS : RAPIDS : COMPUTE : True # set this to True! CALL_TYPES : ... Run RAPIDS ./rapids -j1 The call features for daily and morning time segments will be in /data/processed/features/p01/phone_calls.csv","title":"Minimal"},{"location":"workflow-examples/minimal/#minimal-working-example","text":"This is a quick guide for creating and running a simple pipeline to extract missing, outgoing, and incoming call features for daily ( 00:00:00 to 23:59:59 ) and night ( 00:00:00 to 05:59:59 ) epochs of every day of data of one participant monitored on the US East coast with an Android smartphone. Hint If you don\u2019t have call data that you can use to try this example you can restore this CSV file as a table in a MySQL database. Install RAPIDS and make sure your conda environment is active (see Installation ) Make the changes listed below for the corresponding Configuration step (we provide an example of what the relevant sections in your config.yml will look like after you are done) Required configuration changes Add your database credentials . Setup your database connection credentials in .env , we assume your credentials group in the .env file is called MY_GROUP . Choose the timezone of your study . Since this example is processing data collected on the US East cost, America/New_York should be the configured timezone, change this according to your data. Create your participants files . Since we are processing data from a single participant, you only need to create a single participant file called p01.yaml . This participant file only has a PHONE section because this hypothetical participant was only monitored with an smartphone. You also need to add p01 to [PIDS] in config.yaml . The following would be the content of your p01.yaml participant file: PHONE : DEVICE_IDS : [ a748ee1a-1d0b-4ae9-9074-279a2b6ba524 ] # the participant's AWARE device id PLATFORMS : [ android ] # or ios LABEL : MyTestP01 # any string START_DATE : 2020-01-01 # this can also be empty END_DATE : 2021-01-01 # this can also be empty Select what time segments you want to extract features on. [TIME_SEGMENTS][TYPE] should be the default PERIODIC . Change [TIME_SEGMENTS][FILE] with the path (for example data/external/timesegments_periodic.csv ) of a file containing the following lines: label,start_time,length,repeats_on,repeats_value daily,00:00:00,23H 59M 59S,every_day,0 night,00:00:00,5H 59M 59S,every_day,0 Modify your device data source configuration In this example we do not need to modify this section because we are using smartphone data collected with AWARE stored on a MySQL database. Select what sensors and features you want to process. Set [PHONE_CALLS][PROVIDERS][RAPIDS][COMPUTE] to True in the config.yaml file. Example of the config.yaml sections after the changes outlined above Highlighted lines are related to the configuration steps above. PIDS : [ p01 ] TIMEZONE : &timezone America/New_York DATABASE_GROUP : &database_group MY_GROUP # ... other irrelevant sections TIME_SEGMENTS : &time_segments TYPE : PERIODIC FILE : \"data/external/timesegments_periodic.csv\" # make sure the three lines specified above are in the file INCLUDE_PAST_PERIODIC_SEGMENTS : FALSE # No need to change this if you collected AWARE data on a database and your credentials are grouped under `MY_GROUP` in `.env` DEVICE_DATA : PHONE : SOURCE : TYPE : DATABASE DATABASE_GROUP : *database_group DEVICE_ID_COLUMN : device_id # column name TIMEZONE : TYPE : SINGLE # SINGLE or MULTIPLE VALUE : *timezone ############## PHONE ########################################################### ################################################################################ # ... other irrelevant sections # Communication call features config, TYPES and FEATURES keys need to match PHONE_CALLS : TABLE : calls # change if your calls table has a different name PROVIDERS : RAPIDS : COMPUTE : True # set this to True! CALL_TYPES : ... Run RAPIDS ./rapids -j1 The call features for daily and morning time segments will be in /data/processed/features/p01/phone_calls.csv","title":"Minimal Working Example"}]}
\ No newline at end of file
diff --git a/0.4/setup/configuration/index.html b/0.4/setup/configuration/index.html
new file mode 100644
index 00000000..becf3691
--- /dev/null
+++ b/0.4/setup/configuration/index.html
@@ -0,0 +1,1989 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Configuration - RAPIDS
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
The label MY_GROUP is arbitrary but it has to match the following config.yaml key:
+
DATABASE_GROUP:&database_group
+ MY_GROUP
+
+
+
+
Hint
If you are using RAPIDS’ docker container and Docker-for-mac or Docker-for-Windows 18.03+, you can connect to a MySQL database in your host machine using host.docker.internal instead of 127.0.0.1 or localhost. In a Linux host you need to run our docker container using docker run --network="host" -d moshiresearch/rapids:latest and then 127.0.0.1 will point to your host machine.
+
+
+
+
Note
+
+
You can ignore this step if you are only processing Fitbit data in CSV files.
+
RAPIDS only supports MySQL/MariaDB databases. If you would like to add support for a different database engine get in touch and we can discuss how to implement it.
If your study only happened in a single time zone, select the appropriate code form this list and change the following config key. Double check your timezone code pick, for example US Eastern Time is America/New_York not EST
Participant files link together multiple devices (smartphones and wearables) to specific participants and identify them throughout RAPIDS. You can create these files manually or automatically. Participant files are stored in data/external/participant_files/pxx.yaml and follow a unified structure.
+
+
Note
+
The list PIDS in config.yaml needs to have the participant file names of the people you want to process. For example, if you created p01.yaml, p02.yaml and p03.yaml files in /data/external/participant_files/, then PIDS should be:
+
PIDS:[p01,p02,p03]
+
+
+
+
Tip
+
Attribute values of the [PHONE] and [FITBIT] sections in every participant file are optional which allows you to analyze data from participants that only carried smartphones, only Fitbit devices, or both.
+
+Optional: Migrating participants files with the old format
If you were using the pre-release version of RAPIDS with participant files in plain text (as opposed to yaml), you can run the following command and your old files will be converted into yaml files stored in data/external/participant_files/
An array of the strings that uniquely identify each smartphone, you can have more than one for when participants changed phones in the middle of the study, in this case, data from all their devices will be joined and relabeled with the last 1 on this list.
+
+
+
[PLATFORMS]
+
An array that specifies the OS of each smartphone in [DEVICE_IDS] , use a combination of android or ios (we support participants that changed platforms in the middle of your study!). If you have an aware_device table in your database you can set [PLATFORMS]: [multiple] and RAPIDS will infer them automatically.
+
+
+
[LABEL]
+
A string that is used in reports and visualizations.
+
+
+
[START_DATE]
+
A string with format YYY-MM-DD . Only data collected after this date will be included in the analysis
+
+
+
[END_DATE]
+
A string with format YYY-MM-DD . Only data collected before this date will be included in the analysis
+
+
+
+
For [FITBIT]
+
+
+
+
Key
+
Description
+
+
+
+
+
[DEVICE_IDS]
+
An array of the strings that uniquely identify each Fitbit, you can have more than one in case the participant changed devices in the middle of the study, in this case, data from all devices will be joined and relabeled with the last device_id on this list.
+
+
+
[LABEL]
+
A string that is used in reports and visualizations.
+
+
+
[START_DATE]
+
A string with format YYY-MM-DD . Only data collected after this date will be included in the analysis
+
+
+
[END_DATE]
+
A string with format YYY-MM-DD . Only data collected before this date will be included in the analysis
You have two options a) use the aware_device table in your database or b) use a CSV file. In either case, in your config.yaml, set [PHONE_SECTION][ADD] or [FITBIT_SECTION][ADD] to TRUE depending on what devices you used in your study. Set [DEVICE_ID_COLUMN] to the name of the column that uniquely identifies each device and include any device ids you want to ignore in [IGNORED_DEVICE_IDS].
+
+
Set the following keys in your config.yaml
+
CREATE_PARTICIPANT_FILES:
+ SOURCE:
+ TYPE:AWARE_DEVICE_TABLE
+ DATABASE_GROUP:*database_group
+ CSV_FILE_PATH:""
+ TIMEZONE:*timezone
+ PHONE_SECTION:
+ ADD:TRUE# or FALSE
+ DEVICE_ID_COLUMN:device_id# column name
+ IGNORED_DEVICE_IDS:[]
+ FITBIT_SECTION:
+ ADD:TRUE# or FALSE
+ DEVICE_ID_COLUMN:fitbit_id# column name
+ IGNORED_DEVICE_IDS:[]
+
+
Then run
+
snakemake -j1 create_participants_files
+
+
+
+
Set the following keys in your config.yaml.
+
CREATE_PARTICIPANT_FILES:
+ SOURCE:
+ TYPE:CSV_FILE
+ DATABASE_GROUP:""
+ CSV_FILE_PATH:"your_path/to_your.csv"
+ TIMEZONE:*timezone
+ PHONE_SECTION:
+ ADD:TRUE# or FALSE
+ DEVICE_ID_COLUMN:device_id# column name
+ IGNORED_DEVICE_IDS:[]
+ FITBIT_SECTION:
+ ADD:TRUE# or FALSE
+ DEVICE_ID_COLUMN:fitbit_id# column name
+ IGNORED_DEVICE_IDS:[]
+
+Your CSV file ([SOURCE][CSV_FILE_PATH]) should have the following columns but you can omit any values you don’t have on each column:
+
+
+
+
Column
+
Description
+
+
+
+
+
phone device id
+
The name of this column has to match [PHONE_SECTION][DEVICE_ID_COLUMN]. Separate multiple ids with ;
+
+
+
fitbit device id
+
The name of this column has to match [FITBIT_SECTION][DEVICE_ID_COLUMN]. Separate multiple ids with ;
+
+
+
pid
+
Unique identifiers with the format pXXX (your participant files will be named with this string
+
+
+
platform
+
Use android, ios or multiple as explained above, separate values with ;
+
+
+
label
+
A human readable string that is used in reports and visualizations.
Time segments (or epochs) are the time windows on which you want to extract behavioral features. For example, you might want to process data on every day, every morning, or only during weekends. RAPIDS offers three categories of time segments that are flexible enough to cover most use cases: frequency (short time windows every day), periodic (arbitrary time windows on any day), and event (arbitrary time windows around events of interest). See also our examples.
+
+
These segments are computed on every day and all have the same duration (for example 30 minutes). Set the following keys in your config.yaml
These segments can be computed every day, or on specific days of the week, month, quarter, and year. Their minimum duration is 1 minute but they can be as long as you want. Set the following keys in your config.yaml.
+
TIME_SEGMENTS:&time_segments
+ TYPE:PERIODIC
+ FILE:"data/external/your_periodic_segments.csv"
+ INCLUDE_PAST_PERIODIC_SEGMENTS:FALSE# or TRUE
+
+
If [INCLUDE_PAST_PERIODIC_SEGMENTS] is set to TRUE, RAPIDS will consider instances of your segments back enough in the past as to include the first row of data of each participant. For example, if the first row of data from a participant happened on Saturday March 7th 2020 and the requested segment duration is 7 days starting on every Sunday, the first segment to be considered would start on Sunday March 1st if [INCLUDE_PAST_PERIODIC_SEGMENTS] is TRUE or on Sunday March 8th if FALSE.
+
The file pointed by [TIME_SEGMENTS][FILE] should have the following format and can have multiple rows.
+
+
+
+
Column
+
Description
+
+
+
+
+
label
+
A string that is used as a prefix in the name of your time segments. It has to be unique between rows
+
+
+
start_time
+
A string with format HH:MM:SS representing the starting time of this segment on any day
+
+
+
length
+
A string representing the length of this segment.It can have one or more of the following strings XXD XXH XXM XXS to represent days, hours, minutes and seconds. For example 7D 23H 59M 59S
+
+
+
repeats_on
+
One of the follow options every_day, wday, qday, mday, and yday. The last four represent a week, quarter, month and year day
+
+
+
repeats_value
+
An integer complementing repeats_on. If you set repeats_on to every_day set this to 0, otherwise 1-7 represent a wday starting from Mondays, 1-31 represent a mday, 1-91 represent a qday, and 1-366 represent a yday
This configuration will create five segments instances (daily, morning, afternoon, evening, night) on any given day (every_day set to 0). The daily segment will start at midnight and will last 23:59:59, the other four segments will start at 6am, 12pm, 6pm, and 12am respectively and last for 05:59:59.
+
+
+
+
These segments can be computed before or after an event of interest (defined as any UNIX timestamp). Their minimum duration is 1 minute but they can be as long as you want. The start of each segment can be shifted backwards or forwards from the specified timestamp. Set the following keys in your config.yaml.
+
TIME_SEGMENTS:&time_segments
+ TYPE:EVENT
+ FILE:"data/external/your_event_segments.csv"
+ INCLUDE_PAST_PERIODIC_SEGMENTS:FALSE# or TRUE
+
+
The file pointed by [TIME_SEGMENTS][FILE] should have the following format and can have multiple rows.
+
+
+
+
Column
+
Description
+
+
+
+
+
label
+
A string that is used as a prefix in the name of your time segments. If labels are unique, every segment is independent; if two or more segments have the same label, their data will be grouped when computing auxiliary data for features like the most frequent contact for calls (the most frequent contact will be computed across all these segments). There cannot be two overlaping event segments with the same label (RAPIDS will throw an error)
+
+
+
event_timestamp
+
A UNIX timestamp that represents the moment an event of interest happened (clinical relapse, survey, readmission, etc.). The corresponding time segment will be computed around this moment using length, shift, and shift_direction
+
+
+
length
+
A string representing the length of this segment. It can have one or more of the following keys XXD XXH XXM XXS to represent a number of days, hours, minutes, and seconds. For example 7D 23H 59M 59S
+
+
+
shift
+
A string representing the time shift from event_timestamp. It can have one or more of the following keys XXD XXH XXM XXS to represent a number of days, hours, minutes and seconds. For example 7D 23H 59M 59S. Use this value to change the start of a segment with respect to its event_timestamp. For example, set this variable to 1H to create a segment that starts 1 hour from an event of interest (shift_direction determines if it’s before or after).
+
+
+
shift_direction
+
An integer representing whether the shift is before (-1) or after (1) an event_timestamp
+
+
+
device_id
+
The device id (smartphone or fitbit) to whom this segment belongs to. You have to create a line in this event segment file for each event of a participant that you want to analyse. If you have participants with multiple device ids you can choose any of them
This example will create eight segments for a single participant (a748ee1a...), five independent stressX segments with various lengths (1,4,3,7, and 9 hours). Segments stress1, stress3, and stress5 are shifted forwards by 5 minutes and stress2 and stress4 are shifted backwards by 4 hours (that is, if the stress4 event happened on March 15th at 1pm EST (1584291600000), the time segment will start on that day at 9am and end at 4pm).
+
The three mood segments are 1 hour, 1 day and 7 days long and have no shift. In addition, these mood segments are grouped together, meaning that although RAPIDS will compute features on each one of them, some necessary information to compute a few of such features will be extracted from all three segments, for example the phone contact that called a participant the most or the location clusters visited by a participant.
Use the following Periodic segment file to create overnight segments starting at 20:00:00 and ending at 07:59:59 (next day) of every day in your study
+
You might need to modify the following config keys in your config.yaml depending on what devices your participants used and where you are storing your data. You can ignore [PHONE_DATA_CONFIGURATION] or [FITBIT_DATA_CONFIGURATION] if you are not working with either devices.
+
+
The relevant config.yaml section looks like this by default:
+
PHONE_DATA_CONFIGURATION:
+ SOURCE:
+ TYPE:DATABASE
+ DATABASE_GROUP:*database_group
+ DEVICE_ID_COLUMN:device_id# column name
+ TIMEZONE:
+ TYPE:SINGLE# SINGLE (MULTIPLE support coming soon)
+ VALUE:*timezone
+
+
Parameters for [PHONE_DATA_CONFIGURATION]
+
+
+
+
Key
+
Description
+
+
+
+
+
[SOURCE] [TYPE]
+
Only DATABASE is supported (phone data will be pulled from a database)
The relevant config.yaml section looks like this by default:
+
FITBIT_DATA_CONFIGURATION:
+ SOURCE:
+ TYPE:DATABASE# DATABASE or FILES (set each [FITBIT_SENSOR][TABLE] attribute with a table name or a file path accordingly)
+ COLUMN_FORMAT:JSON# JSON or PLAIN_TEXT
+ DATABASE_GROUP:*database_group
+ DEVICE_ID_COLUMN:device_id# column name
+ TIMEZONE:
+ TYPE:SINGLE# Fitbit devices don't support time zones so we read this data in the timezone indicated by VALUE
+ VALUE:*timezone
+
+
Parameters for For [FITBIT_DATA_CONFIGURATION]
+
+
+
+
Key
+
Description
+
+
+
+
+
[SOURCE][TYPE]
+
DATABASE or FILES (set each [FITBIT_SENSOR][TABLE] attribute accordingly with a table name or a file path)
+
+
+
[SOURCE][COLUMN_FORMAT]
+
JSON or PLAIN_TEXT. Column format of the source data. If you pulled your data directly from the Fitbit API the column containing the sensor data will be in JSON format
+
+
+
[SOURCE][DATABASE_GROUP]
+
*database_group points to the value defined before in Database credentials. Only used if [TYPE] is DATABASE .
+
+
+
[SOURCE][DEVICE_ID_COLUMN]
+
A column that contains strings that uniquely identify Fitbit devices.
+
+
+
[TIMEZONE][TYPE]
+
Only SINGLE is supported (Fitbit devices always store data in local time).
Finally, you need to modify the config.yaml section of the sensors you want to extract behavioral features from. All sensors follow the same naming nomenclature (DEVICE_SENSOR) and parameter structure which we explain in the Behavioral Features Introduction.
+
+
Done
+
Head over to Execution to learn how to execute RAPIDS.
The script ./rapids is a wrapper around Snakemake so you can pass any parameters that Snakemake accepts (e.g. -j1).
+
+
+
Updating RAPIDS output after modifying config.yaml
+
Any changes to the config.yaml file will be applied automatically and only the relevant files will be updated. This means that after modifying the features list for PHONE_MESSAGE for example, RAPIDS will update the output file with the correct features.
+
+
+
Multi-core
+
You can run RAPIDS over multiple cores by modifying the -j argument (e.g. use -j8 to use 8 cores). However, take into account that this means multiple sensor datasets for different participants will be load in memory at the same time. If RAPIDS crashes because it ran out of memory reduce the number of cores and try again.
+
As reference, we have run RAPIDS over 12 cores and 32 Gb of RAM without problems for a study with 200 participants with 14 days of low-frequency smartphone data (no accelerometer, gyroscope, or magnetometer).
+
+
+
Forcing a complete rerun
+
If you want to update your data from your database or rerun the whole pipeline from scratch run one or both of the following commands depending on the devices you are using:
Go to the Remote Explorer panel on the left hand sidebar
+
On the top right dropdown menu choose Containers
+
Double click on the moshiresearch/rapids container in theCONTAINERS tree
+
A new VS Code session should open on RAPIDS main folder inside the container.
+
+
+
+
+
+
+
+
Warning
+
If you installed RAPIDS using Docker for Windows on Windows 10, the container will have limits on the amount of RAM it can use. If you find that RAPIDS crashes due to running out of memory, increase this limit.
This step could take several minutes to complete, especially if you have less than 3Gb of RAM or packages need to be compiled from source. Please be patient and let it run until completion.
+
+
+
+
+
Make RAPIDS script executable
+
chmod +x rapids
+
+
+
+
Check that RAPIDS is working
+
./rapids -j1
+
+
+
+
+
+
We tested RAPIDS on Ubuntu 18.04 & 20.04. Note that the necessary Python and R packages are available in other Linux distributions, so if you decide to give it a try, let us know and we can update these docs.
Install the R virtual environment management package (renv)
+
snakemake -j1 renv_install
+
+
+
+
Restore the R virtual environment
+
+
Run the following command to restore the R virtual environment using RSPM binaries
+
R -e 'renv::restore(repos = c(CRAN = "https://packagemanager.rstudio.com/all/__linux__/bionic/latest"))'
+
+
+
+
Run the following command to restore the R virtual environment using RSPM binaries
+
R -e 'renv::restore(repos = c(CRAN = "https://packagemanager.rstudio.com/all/__linux__/focal/latest"))'
+
+
+
+
If the fast installation command failed for some reason, you can restore the R virtual environment from source:
+
R -e 'renv::restore()'
+
+
+
+
+
Note
This step could take several minutes to complete, especially if you have less than 3Gb of RAM or packages need to be compiled from source. Please be patient and let it run until completion.
+
+
+
+
+
Make RAPIDS script executable
+
chmod +x rapids
+
+
+
+
Check that RAPIDS is working
+
./rapids -j1
+
+
+
+
+
+
There are several options varying in complexity:
+
+
You can use our Docker instructions (tested)
+
You can use our Ubuntu 20.04 instructions on WSL2 (not tested but it will likely work)
+
Native installation (experimental). If you would like to contribute to RAPIDS you could try to install MySQL, miniconda, Python, and R 4.0+ in Windows and restore the Python and R virtual environments using steps 6 and 7 of the instructions for Mac. You can get in touch if you would like to discuss this with the team.
Julio Vega is a postdoctoral associate at the Mobile Sensing + Health Institute. He is interested in personalized methodologies to monitor chronic conditions that affect daily human behavior using mobile and wearable data.
Meng Li received her Master of Science degree in Information Science from the University of Pittsburgh. She is interested in applying machine learning algorithms to the medical field.
Abhineeth Reddy Kunta is a Senior Software Engineer with the Mobile Sensing + Health Institute. He is experienced in software development and specializes in building solutions using machine learning. Abhineeth likes exploring ways to leverage technology in advancing medicine and education. Previously he worked as a Computer Programmer at Georgia Department of Public Health. He has a master’s degree in Computer Science from George Mason University.
Kwesi Aguillera is currently in his first year at the University of Pittsburgh pursuing a Master of Sciences in Information Science specializing in Big Data Analytics. He received his Bachelor of Science degree in Computer Science and Management from the University of the West Indies. Kwesi considers himself a full stack developer and looks forward to applying this knowledge to big data analysis.
Echhit Joshi is a Masters student at the School of Computing and Information at University of Pittsburgh. His areas of interest are Machine/Deep Learning, Data Mining, and Analytics.
Nicolas is a rising senior studying computer science at the University of Pittsburgh. His academic interests include databases, machine learning, and application development. After completing his undergraduate degree, he plans to attend graduate school for a MS in Computer Science with a focus on Intelligent Systems.
Nik is a graduate student at the University of Pittsburgh pursuing Master of Science in Information Science. He earned his Bachelor of Technology degree in Information Technology from India. He is a Data Enthusiasts and passionate about finding the meaning out of raw data. In a long term, his goal is to create a breakthrough in Data Science and Deep Learning.
Agam is a junior at Carnegie Mellon University studying Statistics and Machine Learning and pursuing an additional major in Computer Science. He is a member of the Data Science team in the Health and Human Performance Lab at CMU and has keen interests in software development and data science. His research interests include ML applications in medicine.
We showcase these visualizations with a test study that collected 14 days of smartphone and Fitbit data from two participants (t01 and t02) and extracted behavioral features within five time segments (daily, morning, afternoon, evening, and night).
+
+
Note
+
Time segments (e.g. daily, morning, etc.) can have multiple instances (day 1, day 2, or morning 1, morning 2, etc.)
RAPIDS provides two histograms that show the number of time segment instances that had a certain ratio of valid yielded minutes and hours, respectively. A valid yielded minute has at least 1 row of data from any smartphone sensor and a valid yielded hour contains at least M valid minutes.
+
These plots can be used as a rough indication of the smartphone monitoring coverage during a study aggregated across all participants. For example, the figure below shows a valid yielded minutes histogram for daily segments and we can infer that the monitoring coverage was very good since almost all segments contain at least 90 to 100% of the expected sensed minutes.
+
+
Example
+
Click here to see an example of these interactive visualizations in HTML format
These heatmaps are a break down per time segment and per participant of Visualization 1. Heatmap’s rows represent participants, columns represent time segment instances and the cells’ color represent the valid yielded minute or hour ratio for a participant during a time segment instance.
+
As different participants might join a study on different dates and time segments can be of any length and start on any day, the x-axis is labelled with the time delta between the start of each time segment instance minus the start of the first instance. These plots provide a quick study overview of the monitoring coverage per person and per time segment.
+
The figure below shows the heatmap of the valid yielded minute ratio for participants t01 and t02 on daily segments and, as we inferred from the previous histogram, the lighter (yellow) color on most time segment instances (cells) indicate both phones sensed data without interruptions for most days (except for the first and last ones).
+
+
Example
+
Click here to see an example of these interactive visualizations in HTML format
In these heatmaps rows represent time segment instances, columns represent minutes since the start of a time segment instance, and cells’ color shows the number of phone sensors that logged at least one row of data during those 1-minute windows.
+
RAPIDS creates a plot per participant and per time segment and can be used as a rough indication of whether time-based sensors were following their sensing schedule (e.g. if location was being sensed every 2 minutes).
+
The figure below shows this heatmap for phone sensors collected by participant t01 in daily time segments from Apr 23rd 2020 to May 4th 2020. We can infer that for most of the monitoring time, the participant’s phone logged data from at least 8 sensors each minute.
+
+
Example
+
Click here to see an example of these interactive visualizations in HTML format
These heatmaps are a per-sensor breakdown of Visualization 1 and Visualization 2. Note that the second row (ratio of valid yielded minutes) of this heatmap matches the respective participant (bottom) row the screenshot in Visualization 2.
+
In these heatmaps rows represent phone or Fitbit sensors, columns represent time segment instances and cell’s color shows the normalized (0 to 1) row count of each sensor within a time segment instance. RAPIDS creates one heatmap per participant and they can be used to judge missing data on a per participant and per sensor basis.
+
The figure below shows data for 16 phone sensors (including data yield) of t01’s daily segments (only half of the sensor names and dates are visible in the screenshot but all can be accessed in the interactive plot). From the top two rows, we can see that the phone was sensing data for most of the monitoring period (as suggested by Figure 3 and Figure 4). We can also infer how phone usage influenced the different sensor streams; there are peaks of screen events during the first day (Apr 23rd), peaks of location coordinates on Apr 26th and Apr 30th, and no sent or received SMS except for Apr 23rd, Apr 29th and Apr 30th (unlabeled row between screen and locations).
+
+
Example
+
Click here to see an example of these interactive visualizations in HTML format
Columns and rows are the behavioral features computed in RAPIDS, cells’ color represents the correlation coefficient between all days of data for every pair of features of all participants.
+
The user can specify a minimum number of observations (time segment instances) required to compute the correlation between two features using the MIN_ROWS_RATIO parameter (0.5 by default) and the correlation method (Pearson, Spearman or Kendall) with the CORR_METHOD parameter. In addition, this plot can be configured to only display correlation coefficients above a threshold using the CORR_THRESHOLD parameter (0.1 by default).
+
+
Example
+
Click here to see an example of these interactive visualizations in HTML format
In addition to using RAPIDS to extract behavioral features and create plots, you can structure your data analysis within RAPIDS (i.e. cleaning your features and creating ML/statistical models)
+
We include an analysis example in RAPIDS that covers raw data processing, cleaning, feature extraction, machine learning modeling, and evaluation
+
Use this example as a guide to structure your own analysis within RAPIDS
+
RAPIDS analysis workflows are compatible with your favorite data science tools and libraries
+
RAPIDS analysis workflows are reproducible and we encourage you to publish them along with your research papers
Even though the bulk of RAPIDS current functionality is related to the computation of behavioral features, we recommend RAPIDS as a complementary tool to create a mobile data analysis workflow. This is because the cookiecutter data science file organization guidelines, the use of Snakemake, the provided behavioral features, and the reproducible R and Python development environments allow researchers to divide an analysis workflow into small parts that can be audited, shared in an online repository, reproduced in other computers, and understood by other people as they follow a familiar and consistent structure. We believe these advantages outweigh the time needed to learn how to create these workflows in RAPIDS.
+
We clarify that to create analysis workflows in RAPIDS, researchers can still use any data manipulation tools, editors, libraries or languages they are already familiar with. RAPIDS is meant to be the final destination of analysis code that was developed in interactive notebooks or stand-alone scripts. For example, a user can compute call and location features using RAPIDS, then, they can use Jupyter notebooks to explore feature cleaning approaches and once the cleaning code is final, it can be moved to RAPIDS as a new step in the pipeline. In turn, the output of this cleaning step can be used to explore machine learning models and once a model is finished, it can also be transferred to RAPIDS as a step of its own. The idea is that when it is time to publish a piece of research, a RAPIDS workflow can be shared in a public repository as is.
+
In the following sections we share an example of how we structured an analysis workflow in RAPIDS.
To accurately reflect the complexity of a real-world modeling scenario, we decided not to oversimplify this example. Importantly, every step in this example follows a basic structure: an input file and parameters are manipulated by an R or Python script that saves the results to an output file. Input files, parameters, output files and scripts are grouped into Snakemake rules that are described on smk files in the rules folder (we point the reader to the relevant rule(s) of each step).
+
Researchers can use these rules and scripts as a guide to create their own as it is expected every modeling project will have different requirements, data and goals but ultimately most follow a similar chainned pattern.
+
+
Hint
+
The example’s config file is example_profile/example_config.yaml and its Snakefile is in example_profile/Snakefile. The config file is already configured to process the sensor data as explained in Analysis workflow modules.
+
+
Description of the study modeled in our analysis workflow example¶
+
Our example is based on a hypothetical study that recruited 2 participants that underwent surgery and collected mobile data for at least one week before and one week after the procedure. Participants wore a Fitbit device and installed the AWARE client in their personal Android and iOS smartphones to collect mobile data 24/7. In addition, participants completed daily severity ratings of 12 common symptoms on a scale from 0 to 10 that we summed up into a daily symptom burden score.
+
The goal of this workflow is to find out if we can predict the daily symptom burden score of a participant. Thus, we framed this question as a binary classification problem with two classes, high and low symptom burden based on the scores above and below average of each participant. We also want to compare the performance of individual (personalized) models vs a population model.
+
In total, our example workflow has nine steps that are in charge of sensor data preprocessing, feature extraction, feature cleaning, machine learning model training and model evaluation (see figure below). We ship this workflow with RAPIDS and share a database with test data in an Open Science Framework repository.
Configure the user credentials of a local or remote MySQL server with writing permissions in your .env file. The config file where you need to modify the DATABASE_GROUP is at example_profile/example_config.yaml.
+
Skip this step if you are using RAPIDS docker container. Unzip the test database to data/external/rapids_example.sql and run:
+
We extract daily behavioral features for data yield, received and sent messages, missed, incoming and outgoing calls, resample fused location data using Doryab provider, activity recognition, battery, Bluetooth, screen, light, applications foreground, conversations, Wi-Fi connected, Wi-Fi visible, Fitbit heart rate summary and intraday data, Fitbit sleep summary data, and Fitbit step summary and intraday data without excluding sleep periods with an active bout threshold of 10 steps. In total, we obtained 237 daily sensor features over 12 days per participant.
+
+2. Extract demographic data.
It is common to have demographic data in addition to mobile and target (ground truth) data. In this example we include participants’ age, gender and the number of days they spent in hospital after their surgery as features in our model. We extract these three columns from the participant_info table of our test database . As these three features remain the same within participants, they are used only on the population model. Refer to the demographic_features rule in rules/models.smk.
+
+3. Create target labels.
The two classes for our machine learning binary classification problem are high and low symptom burden. Target values are already stored in the participant_target table of our test database and transferred to a CSV file. A new rule/script can be created if further manipulation is necessary. Refer to the parse_targets rule in rules/models.smk.
+
+4. Feature merging.
These daily features are stored on a CSV file per sensor, a CSV file per participant, and a CSV file including all features from all participants (in every case each column represents a feature and each row represents a day). Refer to the merge_sensor_features_for_individual_participants and merge_features_for_population_model rules in rules/features.smk.
+
+5. Data visualization.
At this point the user can use the five plots RAPIDS provides (or implement new ones) to explore and understand the quality of the raw data and extracted features and decide what sensors, days, or participants to include and exclude. Refer to rules/reports.smk to find the rules that generate these plots.
+
+6. Feature cleaning.
In this stage we perform four steps to clean our sensor feature file. First, we discard days with a data yield hour ratio less than or equal to 0.75, i.e. we include days with at least 18 hours of data. Second, we drop columns (features) with more than 30% of missing rows. Third, we drop columns with zero variance. Fourth, we drop rows (days) with more than 30% of missing columns (features). In this cleaning stage several parameters are created and exposed in example_profile/example_config.yaml.
+
After this step, we kept 162 features over 11 days for the individual model of p01, 107 features over 12 days for the individual model of p02 and 101 features over 20 days for the population model. Note that the difference in the number of features between p01 and p02 is mostly due to iOS restrictions that stops researchers from collecting the same number of sensors than in Android phones.
+
Feature cleaning for the individual models is done in the clean_sensor_features_for_individual_participants rule and for the population model in the clean_sensor_features_for_all_participants rule in rules/models.smk.
+
+7. Merge features and targets.
In this step we merge the cleaned features and target labels for our individual models in the merge_features_and_targets_for_individual_model rule in rules/models.smk. Additionally, we merge the cleaned features, target labels, and demographic features of our two participants for the population model in the merge_features_and_targets_for_population_model rule in rules/models.smk. These two merged files are the input for our individual and population models.
+
+8. Modelling.
This stage has three phases: model building, training and evaluation.
+
In the building phase we impute, normalize and oversample our dataset. Missing numeric values in each column are imputed with their mean and we impute missing categorical values with their mode. We normalize each numeric column with one of three strategies (min-max, z-score, and scikit-learn package’s robust scaler) and we one-hot encode each categorial feature as a numerical array. We oversample our imbalanced dataset using SMOTE (Synthetic Minority Over-sampling Technique) or a Random Over sampler from scikit-learn. All these parameters are exposed in example_profile/example_config.yaml.
+
In the training phase, we create eight models: logistic regression, k-nearest neighbors, support vector machine, decision tree, random forest, gradient boosting classifier, extreme gradient boosting classifier and a light gradient boosting machine. We cross-validate each model with an inner cycle to tune hyper-parameters based on the Macro F1 score and an outer cycle to predict the test set on a model with the best hyper-parameters. Both cross-validation cycles use a leave-one-out strategy. Parameters for each model like weights and learning rates are exposed in example_profile/example_config.yaml.
+
Finally, in the evaluation phase we compute the accuracy, Macro F1, kappa, area under the curve and per class precision, recall and F1 score of all folds of the outer cross-validation cycle.
+
Refer to the modelling_for_individual_participants rule for the individual modeling and to the modelling_for_all_participants rule for the population modeling, both in rules/models.smk.
+
+9. Compute model baselines.
We create three baselines to evaluate our classification models.
+
First, a majority classifier that labels each test sample with the majority class of our training data. Second, a random weighted classifier that predicts each test observation sampling at random from a binomial distribution based on the ratio of our target labels. Third, a decision tree classifier based solely on the demographic features of each participant. As we do not have demographic features for individual model, this baseline is only available for population model.
+
Our baseline metrics (e.g. accuracy, precision, etc.) are saved into a CSV file, ready to be compared to our modeling results. Refer to the baselines_for_individual_model rule for the individual model baselines and to the baselines_for_population_model rule for population model baselines, both in rules/models.smk.
This is a quick guide for creating and running a simple pipeline to extract missing, outgoing, and incoming call features for daily (00:00:00 to 23:59:59) and night (00:00:00 to 05:59:59) epochs of every day of data of one participant monitored on the US East coast with an Android smartphone.
+
+
Hint
+
If you don’t have call data that you can use to try this example you can restore this CSV file as a table in a MySQL database.
+
+
+
Install RAPIDS and make sure your conda environment is active (see Installation)
+
+
Make the changes listed below for the corresponding Configuration step (we provide an example of what the relevant sections in your config.yml will look like after you are done)
Since this example is processing data collected on the US East cost, America/New_York should be the configured timezone, change this according to your data.
Since we are processing data from a single participant, you only need to create a single participant file called p01.yaml. This participant file only has a PHONE section because this hypothetical participant was only monitored with an smartphone. You also need to add p01 to [PIDS] in config.yaml. The following would be the content of your p01.yaml participant file:
+
PHONE:
+ DEVICE_IDS:[a748ee1a-1d0b-4ae9-9074-279a2b6ba524]# the participant's AWARE device id
+ PLATFORMS:[android]# or ios
+ LABEL:MyTestP01# any string
+ START_DATE:2020-01-01# this can also be empty
+ END_DATE:2021-01-01# this can also be empty
+
+
+
+
Select what time segments you want to extract features on.
+
[TIME_SEGMENTS][TYPE] should be the default PERIODIC. Change [TIME_SEGMENTS][FILE] with the path (for example data/external/timesegments_periodic.csv) of a file containing the following lines:
+
Set [PHONE_CALLS][PROVIDERS][RAPIDS][COMPUTE] to True in the config.yaml file.
+
+
+
+Example of the config.yaml sections after the changes outlined above
Highlighted lines are related to the configuration steps above.
+
PIDS:[p01]
+
+TIMEZONE:&timezone
+America/New_York
+
+DATABASE_GROUP:&database_group
+MY_GROUP
+
+# ... other irrelevant sections
+
+TIME_SEGMENTS:&time_segments
+TYPE:PERIODIC
+FILE:"data/external/timesegments_periodic.csv"# make sure the three lines specified above are in the file
+INCLUDE_PAST_PERIODIC_SEGMENTS:FALSE
+
+# No need to change this if you collected AWARE data on a database and your credentials are grouped under `MY_GROUP` in `.env`
+DEVICE_DATA:
+ PHONE:
+ SOURCE:
+ TYPE:DATABASE
+ DATABASE_GROUP:*database_group
+ DEVICE_ID_COLUMN:device_id# column name
+ TIMEZONE:
+ TYPE:SINGLE# SINGLE or MULTIPLE
+ VALUE:*timezone
+
+
+############## PHONE ###########################################################
+################################################################################
+
+# ... other irrelevant sections
+
+# Communication call features config, TYPES and FEATURES keys need to match
+PHONE_CALLS:
+ TABLE:calls# change if your calls table has a different name
+ PROVIDERS:
+ RAPIDS:
+COMPUTE:True# set this to True!
+CALL_TYPES:...
+
+
+
+
+
Run RAPIDS
+
./rapids -j1
+
+
+
The call features for daily and morning time segments will be in
+
/data/processed/features/p01/phone_calls.csv
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/latest/404.html b/latest/404.html
index 45ee5e14..e4550ca2 100644
--- a/latest/404.html
+++ b/latest/404.html
@@ -469,6 +469,19 @@
+
You are an internal developer if you have writing permissions to the repository.
Most feature branches are never pushed to the repo, only do so if you expect that its development will take days (to avoid losing your work if you computer is damaged). Otherwise follow the following instructions to locally rebase your feature branch into develop and push those rebased changes online.
@@ -1172,6 +1224,8 @@ git branch -d feature/feature1
Then open a pull request to the develop branch using Github’s GUI
To begin testing RAPIDS place the fake raw input data csv files in
+
To begin testing RAPIDS place the fake raw input data csv files of each fake participant in
tests/data/raw/. The fake participant files should be placed in
- tests/data/external/. The expected output files of RAPIDS after
- processing the input data should be placed in
- tests/data/processesd/.
-
The Snakemake rule(s) that are to be tested must be placed in the
- tests/Snakemake file. The current tests/Snakemake is a good
- example of how to define them. (At the time of writing this
- documentation the snakefile contains rules messages (SMS), calls and
- screen)
-
Edit the tests/settings/config.yaml. Add and/or remove the rules
+ tests/data/external/participant_files. The expected output files of RAPIDS after
+ processing the input data should be placed in tests/data/processesd/frequency and tests/data/processesd/periodic for frequency and periodic respectively.
+
Edit tests/settings/frequency/config.yaml and tests/settings/periodic/config.yaml to add and/or remove the rules
to be run for testing from the forcerun list.
-
Edit the tests/settings/testing_config.yaml with the necessary
- configuration settings for running the rules to be tested.
+
Edit tests/settings/frequency/testing_config.yaml and tests/settings/frequency/testing_config.yaml to configure the settings and enable/disable sensors to be tested.
Add any additional testscripts in tests/scripts.
-
Uncomment or comment off lines in the testing shell script
- tests/scripts/run_tests.sh.
+ [-l] will delete all the existing files in /data before running tests.
+ [all | periodic | frequency] will generate feature data for all or specific type of features and save in data/processed.
+ [test] will compare the features generated with the precomputed and verified features in /tests/data/processed.
The following is a snippet of the output you should see after running your test.
-
test_sensors_files_exist (test_sensor_features.TestSensorFeatures) ... ok
-test_sensors_features_calculations (test_sensor_features.TestSensorFeatures) ... FAIL
+
The results above show that the first test test_sensors_files_exist passed while test_sensors_features_calculations failed. In addition you should get the traceback of the failure (not shown here). For more information on how to implement test scripts and use unittest please see Unittest Documentation
+
The results above show that the for periodic both test_sensors_files_exist and test_sensors_features_calculations passed while for frequency first test test_sensors_files_exist passed while test_sensors_features_calculations failed. In addition you should get the traceback of the failure (not shown here). For more information on how to implement test scripts and use unittest please see Unittest Documentation
Testing of the RAPIDS sensors and features is a work-in-progress. Please see test-cases for a list of sensors and features that have testing currently available.
Currently the repository is set up to test a number of sensors out of the box by simply running the tests/scripts/run_tests.sh command once the RAPIDS python environment is active.
**Error in .local(drv, \...) :** **Failed to connect to database: Error:
+Problem
**Error in .local(drv, \...) :** **Failed to connect to database: Error:
Can\'t initialize character set unknown (path: compiled\_in)** :
Calls: dbConnect -> dbConnect -> .local -> .Call
Execution halted
[Tue Mar 1019:40:15 2020]
-Error in rule download_dataset:
+Error in rule download_dataset:
jobid: 531
output: data/raw/p60/locations_raw.csv
RuleException:
-CalledProcessError in line 20 of /home/ubuntu/rapids/rules/preprocessing.snakefile:
+CalledProcessError in line 20 of /home/ubuntu/rapids/rules/preprocessing.snakefile:
Command 'set -euo pipefail; Rscript --vanilla /home/ubuntu/rapids/.snakemake/scripts/tmp_2jnvqs7.download_dataset.R' returned non-zero exit status 1.
-File "/home/ubuntu/rapids/rules/preprocessing.snakefile", line 20, in __rule_download_dataset
-File "/home/ubuntu/anaconda3/envs/moshi-env/lib/python3.7/concurrent/futures/thread.py", line 57, in run
+File "/home/ubuntu/rapids/rules/preprocessing.snakefile", line 20, in __rule_download_dataset
+File "/home/ubuntu/anaconda3/envs/moshi-env/lib/python3.7/concurrent/futures/thread.py", line 57, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
@@ -1323,7 +1375,7 @@ Exiting because a job execution failed. Look above for er
Error Table XXX doesn't exist while running the download_phone_data or download_fitbit_data rule.¶
-Problem
Error in .local(conn, statement, ...) :
+Problem
Error in .local(conn, statement, ...) :
could not run statement: Table 'db_name.table_name' doesn't exist
Calls: colnames ... .local -> dbSendQuery -> dbSendQuery -> .local -> .Call
Execution halted
@@ -1446,7 +1498,7 @@ ERROR: configuration failed for package Problem
You get the following error:
CondaMultiError: CondaVerificationError: The package for tk located at /home/ubuntu/miniconda2/pkgs/tk-8.6.9-hed695b0_1003
appears to be corrupted. The path 'include/mysqlStubs.h'
- specified in the package manifest cannot be found.
+ specified in the package manifest cannot be found.
ClobberError: This transaction has incompatible packages due to a shared path.
packages: conda-forge/linux-64::llvm-openmp-10.0.0-hc9558a2_0, anaconda/linux-64::intel-openmp-2019.4-243
path: 'lib/libiomp5.so'
@@ -1456,26 +1508,28 @@ ClobberError: This transaction has incompatible packages due to a shared path.
You get the following error when downloading sensor data:
-
Error in result_fetch(res@ptr, n= n) :
- embedded nul in string:
+
Error in result_fetch(res@ptr, n= n) :
+ embedded nul in string:
Solution
This problem is due to the way RMariaDB handles a mismatch between data types in R and MySQL (see this issue). Since it seems this problem won’t be handled by RMariaDB, you have two options:
-
If it’s only a few rows that are causing this problem, remove the the null character from the conflictive table cell.
+
Remove the the null character from the conflictive table cell(s). You can adapt the following query on a MySQL server 8.0 or older
+
If it’s not feasible to modify your data you can try swapping RMariaDB with RMySQL. Just have in mind you might have problems connecting to modern MySQL servers running in Linux:
Add RMySQL to the renv environment by running the following command in a terminal open on RAPIDS root folder
R -e 'renv::install("RMySQL")'
Go to src/data/download_phone_data.R or src/data/download_fitbit_data.R and replace library(RMariaDB) with library(RMySQL)
-
In the same file replace dbEngine <- dbConnect(MariaDB(), default.file = "./.env", group = group) with dbEngine <- dbConnect(MySQL(), default.file = "./.env", group = group)
+
In the same file(s) replace dbEngine <- dbConnect(MariaDB(), default.file = "./.env", group = group) with dbEngine <- dbConnect(MySQL(), default.file = "./.env", group = group)