diff --git a/1.1/404.html b/1.1/404.html new file mode 100644 index 00000000..c587c1cd --- /dev/null +++ b/1.1/404.html @@ -0,0 +1,1706 @@ + + + +
+ + + + + + + + + + + + + + + +chunk_episodes
of utils.py
for multi time zone dataSCR_SCRIPT
and SRC_LANGUAGE
are replaced by SRC_SCRIPT
.config.yaml
validation schema and documentation. Now it’s more difficult to modify the config.yaml
file with invalid values.time at home
Doryab location featureFITBIT_DATA_YIELD
RAPIDS
provider[PHONE_DATA_YIELD][SENSORS]
when resampling location dataPHONE_BLUETOOTH
DORYAB
providerPHONE_BLUETOOTH
RAPIDS
providerfilter_data_by_segment
for Python when dataset was emptyRAPIDS and the community
+RAPIDS is a community effort and as such we want to continue recognizing the contributions from other researchers. Besides citing RAPIDS, we ask you to cite any of the authors listed below if you used those sensor providers in your analysis, thank you!
+If you used RAPIDS, please cite this paper.
+RAPIDS et al. citation
+Vega J, Li M, Aguillera K, Goel N, Joshi E, Durica KC, Kunta AR, Low CA +RAPIDS: Reproducible Analysis Pipeline for Data Streams Collected with Mobile Devices +JMIR Preprints. 18/08/2020:23246 +DOI: 10.2196/preprints.23246 +URL: https://preprints.jmir.org/preprint/23246
+If you computed features using the provider [DBDP]
of any of the Empatica sensors (accelerometer, heart rate, temperature, EDA, BVP, IBI, tags) cite this paper in addition to RAPIDS.
Bent et al. citation
+Bent, B., Wang, K., Grzesiak, E., Jiang, C., Qi, Y., Jiang, Y., Cho, P., Zingler, K., Ogbeide, F.I., Zhao, A., Runge, R., Sim, I., Dunn, J. (2020). The Digital Biomarker Discovery Pipeline: An open source software platform for the development of digital biomarkers using mHealth and wearables data. Journal of Clinical and Translational Science, 1-28. doi:10.1017/cts.2020.511
+If you computed accelerometer features using the provider [PHONE_ACCLEROMETER][PANDA]
cite this paper in addition to RAPIDS.
Panda et al. citation
+Panda N, Solsky I, Huang EJ, Lipsitz S, Pradarelli JC, Delisle M, Cusack JC, Gadd MA, Lubitz CC, Mullen JT, Qadan M, Smith BL, Specht M, Stephen AE, Tanabe KK, Gawande AA, Onnela JP, Haynes AB. Using Smartphones to Capture Novel Recovery Metrics After Cancer Surgery. JAMA Surg. 2020 Feb 1;155(2):123-129. doi: 10.1001/jamasurg.2019.4702. PMID: 31657854; PMCID: PMC6820047.
+If you computed applications foreground features using the app category (genre) catalogue in [PHONE_APPLICATIONS_FOREGROUND][RAPIDS]
cite this paper in addition to RAPIDS.
Stachl et al. citation
+Clemens Stachl, Quay Au, Ramona Schoedel, Samuel D. Gosling, Gabriella M. Harari, Daniel Buschek, Sarah Theres Völkel, Tobias Schuwerk, Michelle Oldemeier, Theresa Ullmann, Heinrich Hussmann, Bernd Bischl, Markus Bühner. Proceedings of the National Academy of Sciences Jul 2020, 117 (30) 17680-17687; DOI: 10.1073/pnas.1920484117
+If you computed bluetooth features using the provider [PHONE_BLUETOOTH][DORYAB]
cite this paper in addition to RAPIDS.
Doryab et al. citation
+Doryab, A., Chikarsel, P., Liu, X., & Dey, A. K. (2019). Extraction of Behavioral Features from Smartphone and Wearable Data. ArXiv:1812.10394 [Cs, Stat]. http://arxiv.org/abs/1812.10394
+If you computed locations features using the provider [PHONE_LOCATIONS][BARNETT]
cite this paper and this paper in addition to RAPIDS.
Barnett et al. citation
+Ian Barnett, Jukka-Pekka Onnela, Inferring mobility measures from GPS traces with missing data, Biostatistics, Volume 21, Issue 2, April 2020, Pages e98–e112, https://doi.org/10.1093/biostatistics/kxy059
+Canzian et al. citation
+Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ‘15). Association for Computing Machinery, New York, NY, USA, 1293–1304. DOI:https://doi.org/10.1145/2750858.2805845
+If you computed locations features using the provider [PHONE_LOCATIONS][DORYAB]
cite this paper and this paper in addition to RAPIDS.
Doryab et al. citation
+Doryab, A., Chikarsel, P., Liu, X., & Dey, A. K. (2019). Extraction of Behavioral Features from Smartphone and Wearable Data. ArXiv:1812.10394 [Cs, Stat]. http://arxiv.org/abs/1812.10394
+Canzian et al. citation
+Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ‘15). Association for Computing Machinery, New York, NY, USA, 1293–1304. DOI:https://doi.org/10.1145/2750858.2805845
+We as members, contributors, and leaders pledge to make participation in our +community a harassment-free experience for everyone, regardless of age, body +size, visible or invisible disability, ethnicity, sex characteristics, gender +identity and expression, level of experience, education, socio-economic status, +nationality, personal appearance, race, religion, or sexual identity +and orientation.
+We pledge to act and interact in ways that contribute to an open, welcoming, +diverse, inclusive, and healthy community.
+Examples of behavior that contributes to a positive environment for our +community include:
+Examples of unacceptable behavior include:
+Community leaders are responsible for clarifying and enforcing our standards of +acceptable behavior and will take appropriate and fair corrective action in +response to any behavior that they deem inappropriate, threatening, offensive, +or harmful.
+Community leaders have the right and responsibility to remove, edit, or reject +comments, commits, code, wiki edits, issues, and other contributions that are +not aligned to this Code of Conduct, and will communicate reasons for moderation +decisions when appropriate.
+This Code of Conduct applies within all community spaces, and also applies when +an individual is officially representing the community in public spaces. +Examples of representing our community include using an official e-mail address, +posting via an official social media account, or acting as an appointed +representative at an online or offline event.
+Instances of abusive, harassing, or otherwise unacceptable behavior may be +reported to the community leaders responsible for enforcement at +moshi@pitt.edu. +All complaints will be reviewed and investigated promptly and fairly.
+All community leaders are obligated to respect the privacy and security of the +reporter of any incident.
+Community leaders will follow these Community Impact Guidelines in determining +the consequences for any action they deem in violation of this Code of Conduct:
+Community Impact: Use of inappropriate language or other behavior deemed +unprofessional or unwelcome in the community.
+Consequence: A private, written warning from community leaders, providing +clarity around the nature of the violation and an explanation of why the +behavior was inappropriate. A public apology may be requested.
+Community Impact: A violation through a single incident or series +of actions.
+Consequence: A warning with consequences for continued behavior. No +interaction with the people involved, including unsolicited interaction with +those enforcing the Code of Conduct, for a specified period of time. This +includes avoiding interactions in community spaces as well as external channels +like social media. Violating these terms may lead to a temporary or +permanent ban.
+Community Impact: A serious violation of community standards, including +sustained inappropriate behavior.
+Consequence: A temporary ban from any sort of interaction or public +communication with the community for a specified period of time. No public or +private interaction with the people involved, including unsolicited interaction +with those enforcing the Code of Conduct, is allowed during this period. +Violating these terms may lead to a permanent ban.
+Community Impact: Demonstrating a pattern of violation of community +standards, including sustained inappropriate behavior, harassment of an +individual, or aggression toward or disparagement of classes of individuals.
+Consequence: A permanent ban from any sort of public interaction within +the community.
+This Code of Conduct is adapted from the Contributor Covenant, +version 2.0, available at +https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
+Community Impact Guidelines were inspired by +Mozilla’s code of conduct enforcement ladder.
+For answers to common questions about this code of conduct, see the FAQ at +https://www.contributor-covenant.org/faq. Translations are available +at https://www.contributor-covenant.org/translations.
+ + + + + + +**Error in .local(drv, \...) :** **Failed to connect to database: Error:
+Can\'t initialize character set unknown (path: compiled\_in)** :
+
+Calls: dbConnect -> dbConnect -> .local -> .Call
+Execution halted
+[Tue Mar 10 19:40:15 2020]
+Error in rule download_dataset:
+ jobid: 531
+ output: data/raw/p60/locations_raw.csv
+
+RuleException:
+CalledProcessError in line 20 of /home/ubuntu/rapids/rules/preprocessing.snakefile:
+Command 'set -euo pipefail; Rscript --vanilla /home/ubuntu/rapids/.snakemake/scripts/tmp_2jnvqs7.download_dataset.R' returned non-zero exit status 1.
+File "/home/ubuntu/rapids/rules/preprocessing.snakefile", line 20, in __rule_download_dataset
+File "/home/ubuntu/anaconda3/envs/moshi-env/lib/python3.7/concurrent/futures/thread.py", line 57, in run
+Shutting down, this might take some time.
+Exiting because a job execution failed. Look above for error message
+
Please make sure the DATABASE_GROUP
in config.yaml
matches your DB credentials group in .env
.
brew services start mysql
¶Cannot start mysql in linux via brew services start mysql
Use mysql.server start
When running snakemake -j1 -R pull_phone_data
or ./rapids -j1 -R pull_phone_data
all the rules and files are re-computed
This is expected behavior. The advantage of using snakemake
under the hood is that every time a file containing data is modified every rule that depends on that file will be re-executed to update their results. In this case, since download_dataset
updates all the raw data, and you are forcing the rule with the flag -R
every single rule that depends on those raw files will be executed.
Table XXX doesn't exist
while running the download_phone_data
or download_fitbit_data
rule.¶Error in .local(conn, statement, ...) :
+ could not run statement: Table 'db_name.table_name' doesn't exist
+Calls: colnames ... .local -> dbSendQuery -> dbSendQuery -> .local -> .Call
+Execution halted
+
Please make sure the sensors listed in [PHONE_VALID_SENSED_BINS][PHONE_SENSORS]
and the [CONTAINER]
of each sensor you activated in config.yaml
match your database tables or files.
Install dependencies (Homebrew - if not installed):
+sudo apt-get install libmariadb-client-lgpl-dev libxml2-dev libssl-dev
~/.bashrc
: export PATH=$HOME/.linuxbrew/bin:$PATH
source ~/.bashrc
Install MySQL
+brew install mysql
brew services start mysql
Install R, pandoc and rmarkdown:
+brew install r
brew install gcc@6
(needed due to this bug)HOMEBREW_CC=gcc-6 brew install pandoc
Install miniconda using these instructions
+Clone our repo:
+git clone https://github.com/carissalow/rapids
Create a python virtual environment:
+cd rapids
conda env create -f environment.yml -n MY_ENV_NAME
conda activate MY_ENV_NAME
Install R packages and virtual environment:
+snakemake renv_install
snakemake renv_init
snakemake renv_restore
This step could take several minutes to complete. Please be patient and let it run until completion.
+mysql.h
cannot be found¶--------------------------[ ERROR MESSAGE ]----------------------------
+<stdin>:1:10: fatal error: mysql.h: No such file or directory
+compilation terminated.
+-----------------------------------------------------------------------
+ERROR: configuration failed for package 'RMySQL'
+
sudo apt install libmariadbclient-dev
+
libcurl
found¶libcurl
cannot be found
Install libcurl
+
sudo apt install libcurl4-openssl-dev
+
openssl
was not found.¶openssl
cannot be found
Install openssl
+
sudo apt install libssl-dev
+
libxml-2.0
was not found¶libxml-2.0
cannot be found
Install libxml-2.0
+
sudo apt install libxml2-dev
+
You are getting the following error message when running RAPIDS: +
Error: Failed to connect: SSL connection error: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol.
+
This is a bug in Ubuntu 20.04 when trying to connect to an old MySQL server with MySQL client 8.0. You should get the same error message if you try to connect from the command line. There you can add the option --ssl-mode=DISABLED
but we can't do this from the R connector.
If you can't update your server, the quickest solution would be to import your database to another server or to a local environment. Alternatively, you could replace mysql-client
and libmysqlclient-dev
with mariadb-client
and libmariadbclient-dev
and reinstall renv. More info about this issue here
DB_TABLES
key not found¶If you get the following error KeyError in line 43 of preprocessing.smk: 'PHONE_SENSORS'
, it means that the indentation of the key [PHONE_SENSORS]
is not matching the other child elements of PHONE_VALID_SENSED_BINS
You need to add or remove any leading whitespaces as needed on that line.
+PHONE_VALID_SENSED_BINS:
+ COMPUTE: False # This flag is automatically ignored (set to True) if you are extracting PHONE_VALID_SENSED_DAYS or screen or Barnett's location features
+ BIN_SIZE: &bin_size 5 # (in minutes)
+ PHONE_SENSORS: []
+
You get the following error: +
CondaMultiError: CondaVerificationError: The package for tk located at /home/ubuntu/miniconda2/pkgs/tk-8.6.9-hed695b0_1003
+ appears to be corrupted. The path 'include/mysqlStubs.h'
+ specified in the package manifest cannot be found.
+ClobberError: This transaction has incompatible packages due to a shared path.
+ packages: conda-forge/linux-64::llvm-openmp-10.0.0-hc9558a2_0, anaconda/linux-64::intel-openmp-2019.4-243
+ path: 'lib/libiomp5.so'
+
Reinstall conda
+You get the following error when downloading sensor data: +
Error in result_fetch(res@ptr, n = n) :
+ embedded nul in string:
+
This problem is due to the way RMariaDB
handles a mismatch between data types in R and MySQL (see this issue). Since it seems this problem won’t be handled by RMariaDB
, you have two options:
update YOUR_TABLE set YOUR_COLUMN = regexp_replace(YOUR_COLUMN, '\0', '');
+
RMariaDB
with RMySQL
. Just have in mind you might have problems connecting to modern MySQL servers running in Linux:RMySQL
to the renv environment by running the following command in a terminal open on RAPIDS root folder
+R -e 'renv::install("RMySQL")'
+
src/data/streams/pull_phone_data.R
or src/data/streams/pull_fitbit_data.R
and replace library(RMariaDB)
with library(RMySQL)
dbEngine <- dbConnect(MariaDB(), default.file = "./.env", group = group)
with dbEngine <- dbConnect(MySQL(), default.file = "./.env", group = group)
RMariaDB
¶You get the following error when executing RAPIDS: +
Error in library(RMariaDB) : there is no package called 'RMariaDB'
+Execution halted
+
In RAPIDS v0.1.0 we replaced RMySQL
R package with RMariaDB
, this error means your R virtual environment is out of date, to update it run snakemake -j1 renv_restore
When running RAPIDS with R 4.0.3 on MacOS on M1, lubridate may throw an error associated with the timezone. +
Error in C_force_tz(time, tz = tzone, roll):
+ CCTZ: Unrecognized output timezone: "America/New_York"
+Calls: get_timestamp_filter ... .parse_date_time -> .strptime -> force_tz -> C_force_tz
+
This is because R timezone library is not set. Please add Sys.setenv(“TZDIR” = file.path(R.home(), “share”, “zoneinfo”))
to the file active.R in renv folder to set the timezone library. For further details on how to test if TZDIR
is properly set, please refer to https://github.com/tidyverse/lubridate/issues/928#issuecomment-720059233
.
You get the following error when downloading Fitbit data: +
Error: Unimplemented MAX_NO_FIELD_TYPES
+Execution halted
+
At the moment RMariaDB cannot handle MySQL columns of JSON type. Change the type of your Fitbit data column to longtext
(note that the content will not change and will still be a JSON object just interpreted as a string).
You get the following error when installing pandoc or running rapids: +
MoSHI/rapids/renv/staging/1/00LOCK-KernSmooth/00new/KernSmooth/libs/KernSmooth.so: mach-0, but wrong architecture
+
As of Feb 2020 in M1 macs, R needs to be installed via brew under Rosetta (x86 arch) due to some incompatibility with selected R libraries. To do this, run your terminal via Rosetta, then proceed with the usual brew installation command. x86 homebrew should be installed in /usr/local/bin/brew
, you can check which brew you are using by typing which brew
. Then use x86 homebrew to install R and restore RAPIDS packages (renv_restore
).
A data stream is a set of sensor data collected using a specific type of device with a specific format and stored in a specific container. RAPIDS is agnostic to data streams’ formats and container; see the Data Streams Introduction for a list of supported streams.
+A container is queried with an R or Python script that connects to the database, API or file where your stream’s raw data is stored.
+A format is described using a format.yaml
file that specifies how to map and mutate your stream’s raw data to match the data and format RAPIDS needs.
The most common cases when you would want to implement a new data stream are:
+aware_mysql
stream, but you will need to implement a new container script.Hint
+Both the container.[R|py]
and the format.yaml
are stored in ./src/data/streams/[stream_name]
where [stream_name]
can be aware_mysql
for example.
The container
script of a data stream can be implemented in R (strongly recommended) or python. This script must have two functions if you are implementing a stream for phone data or one function otherwise. The script can contain other auxiliary functions.
First of all, add any parameters your script might need in config.yaml
under (device)_DATA_STREAMS
. These parameters will be available in the stream_parameters
argument of the one or two functions you implement. For example, if you are adding support for Beiwe
data stored in PostgreSQL
and your container needs a set of credentials to connect to a database, your new data stream configuration would be:
PHONE_DATA_STREAMS:
+ USE: aware_python
+
+ # AVAILABLE:
+ aware_mysql:
+ DATABASE_GROUP: MY_GROUP
+ beiwe_postgresql:
+ DATABASE_GROUP: MY_GROUP # users define this group (user, password, host, etc.) in credentials.yaml
+
Then implement one or both of the following functions:
+This function returns the data columns for a specific sensor and participant. It has the following parameters:
+Param | +Description | +
---|---|
stream_parameters | +Any parameters (keys/values) set by the user in any [DEVICE_DATA_STREAMS][stream_name] key of config.yaml . For example, [DATABASE_GROUP] inside [FITBIT_DATA_STREAMS][fitbitjson_mysql] |
+
sensor_container | +The value set by the user in any [DEVICE_SENSOR][CONTAINER] key of config.yaml . It can be a table, file path, or whatever data source you want to support that contains the data from a single sensor for all participants. For example, [PHONE_ACCELEROMETER][CONTAINER] |
+
device | +The device id that you need to get the data for (this is set by the user in the participant files). For example, in AWARE this device id is a uuid | +
columns | +A list of the columns that you need to get from sensor_container . You specify these columns in your stream’s format.yaml |
+
Example
+This is the pull_data
function we implemented for aware_mysql
. Note that we can message
, warn
or stop
the user during execution.
pull_data <- function(stream_parameters, device, sensor_container, columns){
+ # get_db_engine is an auxiliary function not shown here for brevity bu can be found in src/data/streams/aware_mysql/container.R
+ dbEngine <- get_db_engine(stream_parameters$DATABASE_GROUP)
+ query <- paste0("SELECT ", paste(columns, collapse = ",")," FROM ", sensor_container, " WHERE device_id = '", device,"'")
+ # Letting the user know what we are doing
+ message(paste0("Executing the following query to download data: ", query))
+ sensor_data <- dbGetQuery(dbEngine, query)
+
+ dbDisconnect(dbEngine)
+
+ if(nrow(sensor_data) == 0)
+ warning(paste("The device '", device,"' did not have data in ", sensor_container))
+
+ return(sensor_data)
+}
+
Warning
+This function is only necessary for phone data streams.
+RAPIDS allows users to use the keyword infer
(previously multiple
) to automatically infer the mobile Operative System a phone was running.
If you have a way to infer the OS of a device id, implement this function. For example, for AWARE data we use the aware_device
table.
If you don’t have a way to infer the OS, call stop("Error Message")
so other users know they can’t use infer
or the inference failed, and they have to assign the OS manually in the participant file.
This function returns the operative system (android
or ios
) for a specific phone device id. It has the following parameters:
Param | +Description | +
---|---|
stream_parameters | +Any parameters (keys/values) set by the user in any [DEVICE_DATA_STREAMS][stream_name] key of config.yaml . For example, [DATABASE_GROUP] inside [FITBIT_DATA_STREAMS][fitbitjson_mysql] |
+
device | +The device id that you need to infer the OS for (this is set by the user in the participant files). For example, in AWARE this device id is a uuid | +
Example
+This is the infer_device_os
function we implemented for aware_mysql
. Note that we can message
, warn
or stop
the user during execution.
infer_device_os <- function(stream_parameters, device){
+ # get_db_engine is an auxiliary function not shown here for brevity bu can be found in src/data/streams/aware_mysql/container.R
+ group <- stream_parameters$DATABASE_GROUP
+
+ dbEngine <- dbConnect(MariaDB(), default.file = "./.env", group = group)
+ query <- paste0("SELECT device_id,brand FROM aware_device WHERE device_id = '", device, "'")
+ message(paste0("Executing the following query to infer phone OS: ", query))
+ os <- dbGetQuery(dbEngine, query)
+ dbDisconnect(dbEngine)
+
+ if(nrow(os) > 0)
+ return(os %>% mutate(os = ifelse(brand == "iPhone", "ios", "android")) %>% pull(os))
+ else
+ stop(paste("We cannot infer the OS of the following device id because it does not exist in the aware_device table:", device))
+
+ return(os)
+}
+
A format file format.yaml
describes the mapping between your stream’s raw data and the data that RAPIDS needs. This file has a section per sensor (e.g. PHONE_ACCELEROMETER
), and each section has two attributes (keys):
RAPIDS_COLUMN_MAPPINGS
are mappings between the columns RAPIDS needs and the columns your raw data already has.
FLAG_TO_MUTATE
flags columns that RAPIDS requires but that are not initially present in your container (database, CSV file). These columns have to be created by your mutation scripts.MUTATION
. Sometimes your raw data needs to be transformed to match the format RAPIDS can handle (including creating columns marked as FLAG_TO_MUTATE
)
COLUMN_MAPPINGS
are mappings between the columns a mutation SCRIPT
needs and the columns your raw data has.
SCRIPTS
are a collection of R or Python scripts that transform one or more raw data columns into the format RAPIDS needs.
Hint
+[RAPIDS_COLUMN_MAPPINGS]
and [MUTATE][COLUMN_MAPPINGS]
have a key
(left-hand side string) and a value
(right-hand side string). The values
are the names used to pulled columns from a container (e.g., columns in a database table). All values
are renamed to their keys
in lower case. The renamed columns are sent to every mutation script within the data
argument, and the final output is the input RAPIDS process further.
For example, let’s assume we are implementing beiwe_mysql
and defining the following format for PHONE_FAKESENSOR
:
PHONE_FAKESENSOR:
+ ANDROID:
+ RAPIDS_COLUMN_MAPPINGS:
+ TIMESTAMP: beiwe_timestamp
+ DEVICE_ID: beiwe_deviceID
+ MAGNITUDE_SQUARED: FLAG_TO_MUTATE
+ MUTATE:
+ COLUMN_MAPPINGS:
+ MAGNITUDE: beiwe_value
+ SCRIPTS:
+ - src/data/streams/mutations/phone/square_magnitude.py
+
RAPIDS will:
+beiwe_timestamp
, beiwe_deviceID
, and beiwe_value
from the container of beiwe_mysql
(MySQL DB)timestamp
, device_id
, and magnitude
, respectively.square_magnitude.py
with a data frame as an argument containing the renamed columns. This script will square magnitude
and rename it to magnitude_squared
square_magnitude.py
has the columns RAPIDS needs timestamp
, device_id
, and magnitude_squared
.Note that although RAPIDS_COLUMN_MAPPINGS
and [MUTATE][COLUMN_MAPPINGS]
keys are in capital letters for readability (e.g. MAGNITUDE_SQUARED
), the names of the final columns you mutate in your scripts should be lower case.
Let’s explain in more depth this column mapping with examples.
+The mapping for some sensors is straightforward. For example, accelerometer data most of the time has a timestamp, three axes (x,y,z), and a device id that produced it. AWARE and a different sensing app like Beiwe likely logged accelerometer data in the same way but with different column names. In this case, we only need to match Beiwe data columns to RAPIDS columns one-to-one:
+PHONE_ACCELEROMETER:
+ ANDROID:
+ RAPIDS_COLUMN_MAPPINGS:
+ TIMESTAMP: beiwe_timestamp
+ DEVICE_ID: beiwe_deviceID
+ DOUBLE_VALUES_0: beiwe_x
+ DOUBLE_VALUES_1: beiwe_y
+ DOUBLE_VALUES_2: beiwe_z
+ MUTATE:
+ COLUMN_MAPPINGS:
+ SCRIPTS: # it's ok if this is empty
+
For some sensors, we need to map column names and values. For example, screen data has ON and OFF events; let’s suppose Beiwe represents an ON event with the number 1,
but RAPIDS identifies ON events with the number 2
. In this case, we need to mutate the raw data coming from Beiwe and replace all 1
s with 2
s.
We do this by listing one or more R or Python scripts in MUTATION_SCRIPTS
that will be executed in order. We usually store all mutation scripts under src/data/streams/mutations/[device]/[platform]/
and they can be reused across data streams.
PHONE_SCREEN:
+ ANDROID:
+ RAPIDS_COLUMN_MAPPINGS:
+ TIMESTAMP: beiwe_timestamp
+ DEVICE_ID: beiwe_deviceID
+ EVENT: beiwe_event
+ MUTATE:
+ COLUMN_MAPPINGS:
+ SCRIPTS:
+ - src/data/streams/mutations/phone/beiwe/beiwe_screen_map.py
+
Hint
+MUTATION_SCRIPT
can also be used to clean/preprocess your data before extracting behavioral features.main
function that receives two arguments, data
and stream_parameters
.stream_parameters
argument contains the config.yaml
key/values of your data stream (this is the same argument that your container.[py|R]
script receives, see Implement a Container). Example of a python mutation script +
import pandas as pd
+
+def main(data, stream_parameters):
+ # mutate data
+ return(data)
+
Example of a R mutation script +
source("renv/activate.R") # needed to use RAPIDS renv environment
+library(dplyr)
+
+main <- function(data, stream_parameters){
+ # mutate data
+ return(data)
+}
+
Sometimes, your raw data doesn’t even have the same columns RAPIDS expects for a sensor. For example, let’s pretend Beiwe stores PHONE_ACCELEROMETER
axis data in a single column called acc_col
instead of three. You have to create a MUTATION_SCRIPT
to split acc_col
into three columns x
, y
, and z
.
For this, you mark the three axes columns RAPIDS needs in [RAPIDS_COLUMN_MAPPINGS]
with the word FLAG_TO_MUTATE
, map acc_col
in [MUTATION][COLUMN_MAPPINGS]
, and list a Python script under [MUTATION][SCRIPTS]
with the code to split acc_col
. See an example below.
RAPIDS expects that every column mapped as FLAG_TO_MUTATE
will be generated by your mutation script, so it won’t try to retrieve them from your container (database, CSV file, etc.).
In our example, acc_col
will be fetched from the stream’s container and renamed to JOINED_AXES
because beiwe_split_acc.py
will split it into double_values_0
, double_values_1
, and double_values_2
.
PHONE_ACCELEROMETER:
+ ANDROID:
+ RAPIDS_COLUMN_MAPPINGS:
+ TIMESTAMP: beiwe_timestamp
+ DEVICE_ID: beiwe_deviceID
+ DOUBLE_VALUES_0: FLAG_TO_MUTATE
+ DOUBLE_VALUES_1: FLAG_TO_MUTATE
+ DOUBLE_VALUES_2: FLAG_TO_MUTATE
+ MUTATE:
+ COLUMN_MAPPINGS:
+ JOINED_AXES: acc_col
+ SCRIPTS:
+ - src/data/streams/mutations/phone/beiwe/beiwe_split_acc.py
+
This is a draft of beiwe_split_acc.py
MUTATION_SCRIPT
:
+
import pandas as pd
+
+def main(data, stream_parameters):
+ # data has the acc_col
+ # split acc_col into three columns: double_values_0, double_values_1, double_values_2 to match RAPIDS format
+ # remove acc_col since we don't need it anymore
+ return(data)
+
There is a special case for a complex mapping scenario for smartphone data streams. The Android and iOS sensor APIs return data in different formats for certain sensors (like screen, activity recognition, battery, among others).
+In case you didn’t notice, the examples we have used so far are grouped under an ANDROID
key, which means they will be applied to data collected by Android phones. Additionally, each sensor has an IOS
key for a similar purpose. We use the complex mapping described above to transform iOS data into an Android format (it’s always iOS to Android and any new phone data stream must do the same).
For example, this is the format.yaml
key for PHONE_ACTVITY_RECOGNITION
. Note that the ANDROID
mapping is simple (one-to-one) but the IOS
mapping is complex with three FLAG_TO_MUTATE
columns, two [MUTATE][COLUMN_MAPPINGS]
mappings, and one [MUTATION][SCRIPT]
.
PHONE_ACTIVITY_RECOGNITION:
+ ANDROID:
+ RAPIDS_COLUMN_MAPPINGS:
+ TIMESTAMP: timestamp
+ DEVICE_ID: device_id
+ ACTIVITY_TYPE: activity_type
+ ACTIVITY_NAME: activity_name
+ CONFIDENCE: confidence
+ MUTATION:
+ COLUMN_MAPPINGS:
+ SCRIPTS:
+ IOS:
+ RAPIDS_COLUMN_MAPPINGS:
+ TIMESTAMP: timestamp
+ DEVICE_ID: device_id
+ ACTIVITY_TYPE: FLAG_TO_MUTATE
+ ACTIVITY_NAME: FLAG_TO_MUTATE
+ CONFIDENCE: FLAG_TO_MUTATE
+ MUTATION:
+ COLUMN_MAPPINGS:
+ ACTIVITIES: activities
+ CONFIDENCE: confidence
+ SCRIPTS:
+ - "src/data/streams/mutations/phone/aware/activity_recogniton_ios_unification.R"
+
In this MUTATION_SCRIPT
we create ACTIVITY_NAME
and ACTIVITY_TYPE
based on activities
, and map confidence
iOS values to Android values.
+
source("renv/activate.R")
+library("dplyr", warn.conflicts = F)
+library(stringr)
+
+clean_ios_activity_column <- function(ios_gar){
+ ios_gar <- ios_gar %>%
+ mutate(activities = str_replace_all(activities, pattern = '("|\\[|\\])', replacement = ""))
+
+ existent_multiple_activities <- ios_gar %>%
+ filter(str_detect(activities, ",")) %>%
+ group_by(activities) %>%
+ summarise(mutiple_activities = unique(activities), .groups = "drop_last") %>%
+ pull(mutiple_activities)
+
+ known_multiple_activities <- c("stationary,automotive")
+ unkown_multiple_actvities <- setdiff(existent_multiple_activities, known_multiple_activities)
+ if(length(unkown_multiple_actvities) > 0){
+ stop(paste0("There are unkwown combinations of ios activities, you need to implement the decision of the ones to keep: ", unkown_multiple_actvities))
+ }
+
+ ios_gar <- ios_gar %>%
+ mutate(activities = str_replace_all(activities, pattern = "stationary,automotive", replacement = "automotive"))
+
+ return(ios_gar)
+}
+
+unify_ios_activity_recognition <- function(ios_gar){
+ # We only need to unify Google Activity Recognition data for iOS
+ # discard rows where activities column is blank
+ ios_gar <- ios_gar[-which(ios_gar$activities == ""), ]
+ # clean "activities" column of ios_gar
+ ios_gar <- clean_ios_activity_column(ios_gar)
+
+ # make it compatible with android version: generate "activity_name" and "activity_type" columns
+ ios_gar <- ios_gar %>%
+ mutate(activity_name = case_when(activities == "automotive" ~ "in_vehicle",
+ activities == "cycling" ~ "on_bicycle",
+ activities == "walking" ~ "walking",
+ activities == "running" ~ "running",
+ activities == "stationary" ~ "still"),
+ activity_type = case_when(activities == "automotive" ~ 0,
+ activities == "cycling" ~ 1,
+ activities == "walking" ~ 7,
+ activities == "running" ~ 8,
+ activities == "stationary" ~ 3,
+ activities == "unknown" ~ 4),
+ confidence = case_when(confidence == 0 ~ 0,
+ confidence == 1 ~ 50,
+ confidence == 2 ~ 100)
+ ) %>%
+ select(-activities)
+
+ return(ios_gar)
+}
+
+main <- function(data, stream_parameters){
+ return(unify_ios_activity_recognition(data, stream_parameters))
+}
+
aware_csv
¶This data stream handles iOS and Android sensor data collected with the AWARE Framework and stored in CSV files.
+Warning
+The CSV files have to use ,
as separator, \
as escape character (do not escape "
with ""
), and wrap any string columns with "
.
See examples in the CSV files inside rapids_example_csv.zip
+"_id","timestamp","device_id","activities","confidence","stationary","walking","running","automotive","cycling","unknown","label"
+1,1587528000000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,""
+2,1587528060000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
+3,1587528120000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
+4,1587528180000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
+5,1587528240000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
+6,1587528300000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
+7,1587528360000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
+
A CSV file per sensor, each containing the data for all participants.
+The script to connect and download data from this container is at: +
src/data/streams/aware_csv/container.R
+
If you collected sensor data with the vanilla (original) AWARE mobile clients, you shouldn’t need to modify this format (described below).
+Remember that a format maps and transforms columns in your raw data stream to the mandatory columns RAPIDS needs.
+The yaml file that describes the format of this data stream is at: +
src/data/streams/aware_csv/format.yaml
+
For some sensors, we need to transform iOS data into Android format; you can refer to OS complex mapping for learn how this works.
+Hint
+The mappings in this stream (RAPIDS/Stream) are the same names because AWARE data was the first stream RAPIDS supported, meaning that it considers AWARE column names the default.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_VALUES_0 | +double_values_0 | +
DOUBLE_VALUES_1 | +double_values_1 | +
DOUBLE_VALUES_2 | +double_values_2 | +
MUTATION
+Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
ACTIVITY_NAME | +activity_name | +
ACTIVITY_TYPE | +activity_type | +
CONFIDENCE | +confidence | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
ACTIVITY_NAME | +FLAG_TO_MUTATE | +
ACTIVITY_TYPE | +FLAG_TO_MUTATE | +
CONFIDENCE | +FLAG_TO_MUTATE | +
MUTATION
+Script column | +Stream column | +
---|---|
ACTIVITIES | +activities | +
CONFIDENCE | +confidence | +
src/data/streams/mutations/phone/aware/activity_recogniton_ios_unification.R
+
Note
+For RAPIDS columns of ACTIVITY_NAME
and ACTIVITY_TYPE
:
activities
field is automotive, set ACTIVITY_NAME
= in_vehicle and ACTIVITY_TYPE
= 0activities
field is cycling, set ACTIVITY_NAME
= on_bicycle and ACTIVITY_TYPE
= 1activities
field is walking, set ACTIVITY_NAME
= walking and ACTIVITY_TYPE
= 7activities
field is running, set ACTIVITY_NAME
= running and ACTIVITY_TYPE
= 8activities
field is stationary, set ACTIVITY_NAME
= still and ACTIVITY_TYPE
= 3activities
field is unknown, set ACTIVITY_NAME
= unknown and ACTIVITY_TYPE
= 4For RAPIDS CONFIDENCE
column:
confidence
field is 0, set CONFIDENCE
= 0confidence
field is 1, set CONFIDENCE
= 50confidence
field is 2, set CONFIDENCE
= 100RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
PACKAGE_NAME | +package_name | +
APPLICATION_NAME | +application_name | +
APPLICATION_VERSION | +application_version | +
ERROR_SHORT | +error_short | +
ERROR_LONG | +error_long | +
ERROR_CONDITION | +error_condition | +
IS_SYSTEM_APP | +is_system_app | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
PACKAGE_NAME | +package_name | +
APPLICATION_NAME | +application_name | +
IS_SYSTEM_APP | +is_system_app | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
PACKAGE_NAME | +package_name | +
APPLICATION_NAME | +application_name | +
TEXT | +text | +
SOUND | +sound | +
VIBRATE | +vibrate | +
DEFAULTS | +defaults | +
FLAGS | +flags | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
BATTERY_STATUS | +battery_status | +
BATTERY_LEVEL | +battery_level | +
BATTERY_SCALE | +battery_scale | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
BATTERY_STATUS | +FLAG_TO_MUTATE | +
BATTERY_LEVEL | +battery_level | +
BATTERY_SCALE | +battery_scale | +
MUTATION
+Script column | +Stream column | +
---|---|
BATTERY_STATUS | +battery_status | +
src/data/streams/mutations/phone/aware/battery_ios_unification.R
+
Note
+For RAPIDS BATTERY_STATUS
column:
battery_status
field is 3, set BATTERY_STATUS
= 5 (full status)battery_status
field is 1, set BATTERY_STATUS
= 3 (discharge)Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
BT_ADDRESS | +bt_address | +
BT_NAME | +bt_name | +
BT_RSSI | +bt_rssi | +
MUTATION
+Only old iOS versions supported this sensor (same mapping as Android).
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
CALL_TYPE | +call_type | +
CALL_DURATION | +call_duration | +
TRACE | +trace | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
CALL_TYPE | +FLAG_TO_MUTATE | +
CALL_DURATION | +call_duration | +
TRACE | +trace | +
MUTATION
+Script column | +Stream column | +
---|---|
CALL_TYPE | +call_type | +
src/data/streams/mutations/phone/aware/calls_ios_unification.R
+
Note
+We transform iOS call logs into Android’s format. iOS stores call status: 1=incoming, 2=connected, 3=dialing, 4=disconnected, as opposed to Android’s events: 1=incoming, 2=outgoing, 3=missed.
+We follow this algorithm to convert iOS call data (there are some inaccuracies in the way we handle sequences, see new rules below):
+Tested with an Android (OnePlus 7T) and an iPhone XR
+Call type | +Android (duration) | +iOS (duration) | +New Rule | +
---|---|---|---|
Outgoing missed ended by me | +2 (0) | +3,4 (0,X) | +3,4 is converted to 2 with duration 0 | +
Outgoing missed ended by them | +2(0) | +3,2,4 (0,X,X2) | +3,2,4 is converted to 2 with duration X2* | +
Incoming missed ended by me | +NA** | +1,4 (0,X) | +1,4 is converted to 3 with duration 0 | +
Incoming missed ended by them | +3(0) | +1,4 (0,X) | +1,4 is converted to 3 with duration 0 | +
Outgoing answered | +2(X excluding dialing time) | +3,2,4 (0,X,X2) | +3,2,4 is converted to 2 with duration X2 | +
Incoming answered | +1(X excluding dialing time) | +1,2,4 (0,X,X2) | +1,2,4 is converted to 1 with duration X2 | +
.* There is no way to differentiate an outgoing missed call ended by them from an outgoing answered call because the phone goes directly to voice mail and it counts as call time (essentially the voice mail answered).
+.** Android does not record incoming missed calls ended by the participant, just those ended by the person calling or ignored by the participant.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_ENERGY | +double_energy | +
INFERENCE | +inference | +
DOUBLE_CONVO_START | +double_convo_start | +
DOUBLE_CONVO_END | +double_convo_end | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_ENERGY | +double_energy | +
INFERENCE | +inference | +
DOUBLE_CONVO_START | +FLAG_TO_MUTATE | +
DOUBLE_CONVO_END | +FLAG_TO_MUTATE | +
MUTATION
+Script column | +Stream column | +
---|---|
DOUBLE_CONVO_START | +double_convo_start | +
DOUBLE_CONVO_END | +double_convo_end | +
src/data/streams/mutations/phone/aware/conversation_ios_timestamp.R
+
Note
+For RAPIDS columns of DOUBLE_CONVO_START
and DOUBLE_CONVO_END
:
double_convo_start
field is smaller than 9999999999, it is in seconds instead of milliseconds. Set DOUBLE_CONVO_START
= 1000 * double_convo_start
.double_convo_end
field is smaller than 9999999999, it is in seconds instead of milliseconds. Set DOUBLE_CONVO_END
= 1000 * double_convo_end
.RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
PACKAGE_NAME | +package_name | +
BEFORE_TEXT | +before_text | +
CURRENT_TEXT | +current_text | +
IS_PASSWORD | +is_password | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_LIGHT_LUX | +double_light_lux | +
ACCURACY | +accuracy | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_LATITUDE | +double_latitude | +
DOUBLE_LONGITUDE | +double_longitude | +
DOUBLE_BEARING | +double_bearing | +
DOUBLE_SPEED | +double_speed | +
DOUBLE_ALTITUDE | +double_altitude | +
PROVIDER | +provider | +
ACCURACY | +accuracy | +
MUTATION
+Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
LOG_MESSAGE | +log_message | +
MUTATION
+Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
MESSAGE_TYPE | +message_type | +
TRACE | +trace | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
SCREEN_STATUS | +screen_status | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
SCREEN_STATUS | +FLAG_TO_MUTATE | +
MUTATION
+Script column | +Stream column | +
---|---|
SCREEN_STATUS | +screen_status | +
src/data/streams/mutations/phone/aware/screen_ios_unification.R
+
Note
+For SCREEN_STATUS
RAPIDS column:
screen_status
field is 2 (lock episode), set SCREEN_STATUS
= 0 (off episode).RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
MAC_ADDRESS | +mac_address | +
SSID | +ssid | +
BSSID | +bssid | +
MUTATION
+Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
SSID | +ssid | +
BSSID | +bssid | +
SECURITY | +security | +
FREQUENCY | +frequency | +
RSSI | +rssi | +
MUTATION
+Only old iOS versions supported this sensor (same mapping as Android).
+aware_influxdb (beta)
¶Warning
+This data stream is being released in beta while we test it thoroughly.
+This data stream handles iOS and Android sensor data collected with the AWARE Framework and stored in an InfluxDB database.
+An InfluxDB database with a table per sensor, each containing the data for all participants.
+The script to connect and download data from this container is at: +
src/data/streams/aware_influxdb/container.R
+
If you collected sensor data with the vanilla (original) AWARE mobile clients, you shouldn’t need to modify this format (described below).
+Remember that a format maps and transforms columns in your raw data stream to the mandatory columns RAPIDS needs.
+The yaml file that describes the format of this data stream is at: +
src/data/streams/aware_csv/format.yaml
+
For some sensors, we need to transform iOS data into Android format; you can refer to OS complex mapping for learn how this works.
+Hint
+The mappings in this stream (RAPIDS/Stream) are the same names because AWARE data was the first stream RAPIDS supported, meaning that it considers AWARE column names the default.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_VALUES_0 | +double_values_0 | +
DOUBLE_VALUES_1 | +double_values_1 | +
DOUBLE_VALUES_2 | +double_values_2 | +
MUTATION
+Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
ACTIVITY_NAME | +activity_name | +
ACTIVITY_TYPE | +activity_type | +
CONFIDENCE | +confidence | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
ACTIVITY_NAME | +FLAG_TO_MUTATE | +
ACTIVITY_TYPE | +FLAG_TO_MUTATE | +
CONFIDENCE | +FLAG_TO_MUTATE | +
MUTATION
+Script column | +Stream column | +
---|---|
ACTIVITIES | +activities | +
CONFIDENCE | +confidence | +
src/data/streams/mutations/phone/aware/activity_recogniton_ios_unification.R
+
Note
+For RAPIDS columns of ACTIVITY_NAME
and ACTIVITY_TYPE
:
activities
field is automotive, set ACTIVITY_NAME
= in_vehicle and ACTIVITY_TYPE
= 0activities
field is cycling, set ACTIVITY_NAME
= on_bicycle and ACTIVITY_TYPE
= 1activities
field is walking, set ACTIVITY_NAME
= walking and ACTIVITY_TYPE
= 7activities
field is running, set ACTIVITY_NAME
= running and ACTIVITY_TYPE
= 8activities
field is stationary, set ACTIVITY_NAME
= still and ACTIVITY_TYPE
= 3activities
field is unknown, set ACTIVITY_NAME
= unknown and ACTIVITY_TYPE
= 4For RAPIDS CONFIDENCE
column:
confidence
field is 0, set CONFIDENCE
= 0confidence
field is 1, set CONFIDENCE
= 50confidence
field is 2, set CONFIDENCE
= 100RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
PACKAGE_NAME | +package_name | +
APPLICATION_NAME | +application_name | +
APPLICATION_VERSION | +application_version | +
ERROR_SHORT | +error_short | +
ERROR_LONG | +error_long | +
ERROR_CONDITION | +error_condition | +
IS_SYSTEM_APP | +is_system_app | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
PACKAGE_NAME | +package_name | +
APPLICATION_NAME | +application_name | +
IS_SYSTEM_APP | +is_system_app | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
PACKAGE_NAME | +package_name | +
APPLICATION_NAME | +application_name | +
TEXT | +text | +
SOUND | +sound | +
VIBRATE | +vibrate | +
DEFAULTS | +defaults | +
FLAGS | +flags | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
BATTERY_STATUS | +battery_status | +
BATTERY_LEVEL | +battery_level | +
BATTERY_SCALE | +battery_scale | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
BATTERY_STATUS | +FLAG_TO_MUTATE | +
BATTERY_LEVEL | +battery_level | +
BATTERY_SCALE | +battery_scale | +
MUTATION
+Script column | +Stream column | +
---|---|
BATTERY_STATUS | +battery_status | +
src/data/streams/mutations/phone/aware/battery_ios_unification.R
+
Note
+For RAPIDS BATTERY_STATUS
column:
battery_status
field is 3, set BATTERY_STATUS
= 5 (full status)battery_status
field is 1, set BATTERY_STATUS
= 3 (discharge)Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
BT_ADDRESS | +bt_address | +
BT_NAME | +bt_name | +
BT_RSSI | +bt_rssi | +
MUTATION
+Only old iOS versions supported this sensor (same mapping as Android).
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
CALL_TYPE | +call_type | +
CALL_DURATION | +call_duration | +
TRACE | +trace | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
CALL_TYPE | +FLAG_TO_MUTATE | +
CALL_DURATION | +call_duration | +
TRACE | +trace | +
MUTATION
+Script column | +Stream column | +
---|---|
CALL_TYPE | +call_type | +
src/data/streams/mutations/phone/aware/calls_ios_unification.R
+
Note
+We transform iOS call logs into Android’s format. iOS stores call status: 1=incoming, 2=connected, 3=dialing, 4=disconnected, as opposed to Android’s events: 1=incoming, 2=outgoing, 3=missed.
+We follow this algorithm to convert iOS call data (there are some inaccuracies in the way we handle sequences, see new rules below):
+Tested with an Android (OnePlus 7T) and an iPhone XR
+Call type | +Android (duration) | +iOS (duration) | +New Rule | +
---|---|---|---|
Outgoing missed ended by me | +2 (0) | +3,4 (0,X) | +3,4 is converted to 2 with duration 0 | +
Outgoing missed ended by them | +2(0) | +3,2,4 (0,X,X2) | +3,2,4 is converted to 2 with duration X2* | +
Incoming missed ended by me | +NA** | +1,4 (0,X) | +1,4 is converted to 3 with duration 0 | +
Incoming missed ended by them | +3(0) | +1,4 (0,X) | +1,4 is converted to 3 with duration 0 | +
Outgoing answered | +2(X excluding dialing time) | +3,2,4 (0,X,X2) | +3,2,4 is converted to 2 with duration X2 | +
Incoming answered | +1(X excluding dialing time) | +1,2,4 (0,X,X2) | +1,2,4 is converted to 1 with duration X2 | +
.* There is no way to differentiate an outgoing missed call ended by them from an outgoing answered call because the phone goes directly to voice mail and it counts as call time (essentially the voice mail answered).
+.** Android does not record incoming missed calls ended by the participant, just those ended by the person calling or ignored by the participant.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_ENERGY | +double_energy | +
INFERENCE | +inference | +
DOUBLE_CONVO_START | +double_convo_start | +
DOUBLE_CONVO_END | +double_convo_end | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_ENERGY | +double_energy | +
INFERENCE | +inference | +
DOUBLE_CONVO_START | +FLAG_TO_MUTATE | +
DOUBLE_CONVO_END | +FLAG_TO_MUTATE | +
MUTATION
+Script column | +Stream column | +
---|---|
DOUBLE_CONVO_START | +double_convo_start | +
DOUBLE_CONVO_END | +double_convo_end | +
src/data/streams/mutations/phone/aware/conversation_ios_timestamp.R
+
Note
+For RAPIDS columns of DOUBLE_CONVO_START
and DOUBLE_CONVO_END
:
double_convo_start
field is smaller than 9999999999, it is in seconds instead of milliseconds. Set DOUBLE_CONVO_START
= 1000 * double_convo_start
.double_convo_end
field is smaller than 9999999999, it is in seconds instead of milliseconds. Set DOUBLE_CONVO_END
= 1000 * double_convo_end
.RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
PACKAGE_NAME | +package_name | +
BEFORE_TEXT | +before_text | +
CURRENT_TEXT | +current_text | +
IS_PASSWORD | +is_password | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_LIGHT_LUX | +double_light_lux | +
ACCURACY | +accuracy | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_LATITUDE | +double_latitude | +
DOUBLE_LONGITUDE | +double_longitude | +
DOUBLE_BEARING | +double_bearing | +
DOUBLE_SPEED | +double_speed | +
DOUBLE_ALTITUDE | +double_altitude | +
PROVIDER | +provider | +
ACCURACY | +accuracy | +
MUTATION
+Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
LOG_MESSAGE | +log_message | +
MUTATION
+Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
MESSAGE_TYPE | +message_type | +
TRACE | +trace | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
SCREEN_STATUS | +screen_status | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
SCREEN_STATUS | +FLAG_TO_MUTATE | +
MUTATION
+Script column | +Stream column | +
---|---|
SCREEN_STATUS | +screen_status | +
src/data/streams/mutations/phone/aware/screen_ios_unification.R
+
Note
+For SCREEN_STATUS
RAPIDS column:
screen_status
field is 2 (lock episode), set SCREEN_STATUS
= 0 (off episode).RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
MAC_ADDRESS | +mac_address | +
SSID | +ssid | +
BSSID | +bssid | +
MUTATION
+Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
SSID | +ssid | +
BSSID | +bssid | +
SECURITY | +security | +
FREQUENCY | +frequency | +
RSSI | +rssi | +
MUTATION
+Only old iOS versions supported this sensor (same mapping as Android).
+aware_mysql
¶This data stream handles iOS and Android sensor data collected with the AWARE Framework and stored in a MySQL database.
+A MySQL database with a table per sensor, each containing the data for all participants. This is the default database created by the old PHP AWARE server (as opposed to the new JavaScript Micro server).
+The script to connect and download data from this container is at: +
src/data/streams/aware_mysql/container.R
+
If you collected sensor data with the vanilla (original) AWARE mobile clients, you shouldn’t need to modify this format (described below).
+Remember that a format maps and transforms columns in your raw data stream to the mandatory columns RAPIDS needs.
+The yaml file that describes the format of this data stream is at: +
src/data/streams/aware_csv/format.yaml
+
For some sensors, we need to transform iOS data into Android format; you can refer to OS complex mapping for learn how this works.
+Hint
+The mappings in this stream (RAPIDS/Stream) are the same names because AWARE data was the first stream RAPIDS supported, meaning that it considers AWARE column names the default.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_VALUES_0 | +double_values_0 | +
DOUBLE_VALUES_1 | +double_values_1 | +
DOUBLE_VALUES_2 | +double_values_2 | +
MUTATION
+Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
ACTIVITY_NAME | +activity_name | +
ACTIVITY_TYPE | +activity_type | +
CONFIDENCE | +confidence | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
ACTIVITY_NAME | +FLAG_TO_MUTATE | +
ACTIVITY_TYPE | +FLAG_TO_MUTATE | +
CONFIDENCE | +FLAG_TO_MUTATE | +
MUTATION
+Script column | +Stream column | +
---|---|
ACTIVITIES | +activities | +
CONFIDENCE | +confidence | +
src/data/streams/mutations/phone/aware/activity_recogniton_ios_unification.R
+
Note
+For RAPIDS columns of ACTIVITY_NAME
and ACTIVITY_TYPE
:
activities
field is automotive, set ACTIVITY_NAME
= in_vehicle and ACTIVITY_TYPE
= 0activities
field is cycling, set ACTIVITY_NAME
= on_bicycle and ACTIVITY_TYPE
= 1activities
field is walking, set ACTIVITY_NAME
= walking and ACTIVITY_TYPE
= 7activities
field is running, set ACTIVITY_NAME
= running and ACTIVITY_TYPE
= 8activities
field is stationary, set ACTIVITY_NAME
= still and ACTIVITY_TYPE
= 3activities
field is unknown, set ACTIVITY_NAME
= unknown and ACTIVITY_TYPE
= 4For RAPIDS CONFIDENCE
column:
confidence
field is 0, set CONFIDENCE
= 0confidence
field is 1, set CONFIDENCE
= 50confidence
field is 2, set CONFIDENCE
= 100RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
PACKAGE_NAME | +package_name | +
APPLICATION_NAME | +application_name | +
APPLICATION_VERSION | +application_version | +
ERROR_SHORT | +error_short | +
ERROR_LONG | +error_long | +
ERROR_CONDITION | +error_condition | +
IS_SYSTEM_APP | +is_system_app | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
PACKAGE_NAME | +package_name | +
APPLICATION_NAME | +application_name | +
IS_SYSTEM_APP | +is_system_app | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
PACKAGE_NAME | +package_name | +
APPLICATION_NAME | +application_name | +
TEXT | +text | +
SOUND | +sound | +
VIBRATE | +vibrate | +
DEFAULTS | +defaults | +
FLAGS | +flags | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
BATTERY_STATUS | +battery_status | +
BATTERY_LEVEL | +battery_level | +
BATTERY_SCALE | +battery_scale | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
BATTERY_STATUS | +FLAG_TO_MUTATE | +
BATTERY_LEVEL | +battery_level | +
BATTERY_SCALE | +battery_scale | +
MUTATION
+Script column | +Stream column | +
---|---|
BATTERY_STATUS | +battery_status | +
src/data/streams/mutations/phone/aware/battery_ios_unification.R
+
Note
+For RAPIDS BATTERY_STATUS
column:
battery_status
field is 3, set BATTERY_STATUS
= 5 (full status)battery_status
field is 1, set BATTERY_STATUS
= 3 (discharge)Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
BT_ADDRESS | +bt_address | +
BT_NAME | +bt_name | +
BT_RSSI | +bt_rssi | +
MUTATION
+Only old iOS versions supported this sensor (same mapping as Android).
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
CALL_TYPE | +call_type | +
CALL_DURATION | +call_duration | +
TRACE | +trace | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
CALL_TYPE | +FLAG_TO_MUTATE | +
CALL_DURATION | +call_duration | +
TRACE | +trace | +
MUTATION
+Script column | +Stream column | +
---|---|
CALL_TYPE | +call_type | +
src/data/streams/mutations/phone/aware/calls_ios_unification.R
+
Note
+We transform iOS call logs into Android’s format. iOS stores call status: 1=incoming, 2=connected, 3=dialing, 4=disconnected, as opposed to Android’s events: 1=incoming, 2=outgoing, 3=missed.
+We follow this algorithm to convert iOS call data (there are some inaccuracies in the way we handle sequences, see new rules below):
+Tested with an Android (OnePlus 7T) and an iPhone XR
+Call type | +Android (duration) | +iOS (duration) | +New Rule | +
---|---|---|---|
Outgoing missed ended by me | +2 (0) | +3,4 (0,X) | +3,4 is converted to 2 with duration 0 | +
Outgoing missed ended by them | +2(0) | +3,2,4 (0,X,X2) | +3,2,4 is converted to 2 with duration X2* | +
Incoming missed ended by me | +NA** | +1,4 (0,X) | +1,4 is converted to 3 with duration 0 | +
Incoming missed ended by them | +3(0) | +1,4 (0,X) | +1,4 is converted to 3 with duration 0 | +
Outgoing answered | +2(X excluding dialing time) | +3,2,4 (0,X,X2) | +3,2,4 is converted to 2 with duration X2 | +
Incoming answered | +1(X excluding dialing time) | +1,2,4 (0,X,X2) | +1,2,4 is converted to 1 with duration X2 | +
.* There is no way to differentiate an outgoing missed call ended by them from an outgoing answered call because the phone goes directly to voice mail and it counts as call time (essentially the voice mail answered).
+.** Android does not record incoming missed calls ended by the participant, just those ended by the person calling or ignored by the participant.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_ENERGY | +double_energy | +
INFERENCE | +inference | +
DOUBLE_CONVO_START | +double_convo_start | +
DOUBLE_CONVO_END | +double_convo_end | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_ENERGY | +double_energy | +
INFERENCE | +inference | +
DOUBLE_CONVO_START | +FLAG_TO_MUTATE | +
DOUBLE_CONVO_END | +FLAG_TO_MUTATE | +
MUTATION
+Script column | +Stream column | +
---|---|
DOUBLE_CONVO_START | +double_convo_start | +
DOUBLE_CONVO_END | +double_convo_end | +
src/data/streams/mutations/phone/aware/conversation_ios_timestamp.R
+
Note
+For RAPIDS columns of DOUBLE_CONVO_START
and DOUBLE_CONVO_END
:
double_convo_start
field is smaller than 9999999999, it is in seconds instead of milliseconds. Set DOUBLE_CONVO_START
= 1000 * double_convo_start
.double_convo_end
field is smaller than 9999999999, it is in seconds instead of milliseconds. Set DOUBLE_CONVO_END
= 1000 * double_convo_end
.RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
PACKAGE_NAME | +package_name | +
BEFORE_TEXT | +before_text | +
CURRENT_TEXT | +current_text | +
IS_PASSWORD | +is_password | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_LIGHT_LUX | +double_light_lux | +
ACCURACY | +accuracy | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_LATITUDE | +double_latitude | +
DOUBLE_LONGITUDE | +double_longitude | +
DOUBLE_BEARING | +double_bearing | +
DOUBLE_SPEED | +double_speed | +
DOUBLE_ALTITUDE | +double_altitude | +
PROVIDER | +provider | +
ACCURACY | +accuracy | +
MUTATION
+Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
LOG_MESSAGE | +log_message | +
MUTATION
+Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
MESSAGE_TYPE | +message_type | +
TRACE | +trace | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
SCREEN_STATUS | +screen_status | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
SCREEN_STATUS | +FLAG_TO_MUTATE | +
MUTATION
+Script column | +Stream column | +
---|---|
SCREEN_STATUS | +screen_status | +
src/data/streams/mutations/phone/aware/screen_ios_unification.R
+
Note
+For SCREEN_STATUS
RAPIDS column:
screen_status
field is 2 (lock episode), set SCREEN_STATUS
= 0 (off episode).RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
MAC_ADDRESS | +mac_address | +
SSID | +ssid | +
BSSID | +bssid | +
MUTATION
+Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
SSID | +ssid | +
BSSID | +bssid | +
SECURITY | +security | +
FREQUENCY | +frequency | +
RSSI | +rssi | +
MUTATION
+Only old iOS versions supported this sensor (same mapping as Android).
+A data stream is a set of sensor data collected using a specific type of device with a specific format and stored in a specific container.
+For example, the aware_mysql
data stream handles smartphone data (device) collected with the AWARE Framework (format) stored in a MySQL database (container). Similarly, smartphone data collected with Beiwe will have a different format and could be stored in a container like a PostgreSQL database or a CSV file.
If you want to process a data stream using RAPIDS, make sure that your data is stored in a supported format and container (see table below).
+If RAPIDS doesn’t support your data stream yet (e.g. Beiwe data stored in PostgreSQL, or AWARE data stored in SQLite), you can always implement a new data stream. If it’s something you think other people might be interested on, we will be happy to include your new data stream in RAPIDS, so get in touch!.
+Hint
+Currently, you can add new data streams for smartphones, Fitbit, and Empatica devices. If you need RAPIDS to process data from other devices, like Oura Rings or Actigraph wearables, get in touch. It is a more complicated process that could take a couple of days to implement for someone familiar with R or Python, but we would be happy to work on it together.
+For reference, these are the data streams we currently support:
+Data Stream | +Device | +Format | +Container | +Docs | +
---|---|---|---|---|
aware_mysql |
+Phone | +AWARE app | +MySQL | +link | +
aware_csv |
+Phone | +AWARE app | +CSV files | +link | +
aware_influxdb (beta) |
+Phone | +AWARE app | +InfluxDB | +link | +
fitbitjson_mysql |
+Fitbit | +JSON (per Fitbit’s API) | +MySQL | +link | +
fitbitjson_csv |
+Fitbit | +JSON (per Fitbit’s API) | +CSV files | +link | +
fitbitparsed_mysql |
+Fitbit | +Parsed (parsed API data) | +MySQL | +link | +
fitbitparsed_csv |
+Fitbit | +Parsed (parsed API data) | +CSV files | +link | +
empatica_zip |
+Empatica | +E4 Connect | +ZIP files | +link | +
empatica_zip
¶This data stream handles Empatica sensor data downloaded as zip files using the E4 Connect.
+You need to create a subfolder for every participant named after their device id
inside the folder specified by [EMPATICA_DATA_STREAMS][empatica_zipfiles][FOLDER]
. You can add one or more Empatica zip files to any subfolder.
The script to connect and download data from this container is at: +
src/data/streams/empatica_zip/container.R
+
The format.yaml
maps and transforms columns in your raw data stream to the mandatory columns RAPIDS needs for Empatica sensors. This file is at:
src/data/streams/empatica_zip/format.yaml
+
All columns are mutated from the raw data in the zip files so you don’t need to modify any column mappings.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_VALUES_0 | +double_values_0 | +
DOUBLE_VALUES_1 | +double_values_1 | +
DOUBLE_VALUES_2 | +double_values_2 | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
HEARTRATE | +heartrate | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
TEMPERATURE | +temperature | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
ELECTRODERMAL_ACTIVITY | +electrodermal_activity | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
BLOOD_VOLUME_PULSE | +blood_volume_pulse | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
INTER_BEAT_INTERVAL | +inter_beat_interval | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
TAGS | +tags | +
MUTATION
+fitbitjson_csv
¶This data stream handles Fitbit sensor data downloaded using the Fitbit Web API and stored in a CSV file. Please note that RAPIDS cannot query the API directly; you need to use other available tools or implement your own. Once you have your sensor data in a CSV file, RAPIDS can process it.
+Warning
+The CSV files have to use ,
as separator, \
as escape character (do not escape "
with ""
), and wrap any string columns with "
.
"timestamp","device_id","label","fitbit_id","fitbit_data_type","fitbit_data"
+1587614400000,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524","5S","5ZKN9B","steps","{\"activities-steps\":[{\"dateTime\":\"2020-04-23\",\"value\":\"7881\"}]"
+
The container should be a CSV file per Fitbit sensor, each containing all participants’ data.
+The script to connect and download data from this container is at: +
src/data/streams/fitbitjson_csv/container.R
+
The format.yaml
maps and transforms columns in your raw data stream to the mandatory columns RAPIDS needs for Fitbit sensors. This file is at:
src/data/streams/fitbitjson_csv/format.yaml
+
If you want RAPIDS to process Fitbit sensor data using this stream, you will need to map DEVICE_ID
and JSON_FITBIT_COLUMN
to your own raw data columns inside each sensor section in format.yaml
.
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
HEARTRATE_DAILY_RESTINGHR | +FLAG_TO_MUTATE | +
HEARTRATE_DAILY_CALORIESOUTOFRANGE | +FLAG_TO_MUTATE | +
HEARTRATE_DAILY_CALORIESFATBURN | +FLAG_TO_MUTATE | +
HEARTRATE_DAILY_CALORIESCARDIO | +FLAG_TO_MUTATE | +
HEARTRATE_DAILY_CALORIESPEAK | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_heartrate_summary_json.py
+
Note
All columns except DEVICE_ID
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-07”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:1200.6102,”max”:88,”min”:31,”minutes”:1058,”name”:”Out of Range”},{“caloriesOut”:760.3020,”max”:120,”min”:86,”minutes”:366,”name”:”Fat Burn”},{“caloriesOut”:15.2048,”max”:146,”min”:120,”minutes”:2,”name”:”Cardio”},{“caloriesOut”:0,”max”:221,”min”:148,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:72}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:68},{“time”:”00:01:00”,”value”:67},{“time”:”00:02:00”,”value”:67},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-08”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:1100.1120,”max”:89,”min”:30,”minutes”:921,”name”:”Out of Range”},{“caloriesOut”:660.0012,”max”:118,”min”:82,”minutes”:361,”name”:”Fat Burn”},{“caloriesOut”:23.7088,”max”:142,”min”:108,”minutes”:3,”name”:”Cardio”},{“caloriesOut”:0,”max”:221,”min”:148,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:70}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:77},{“time”:”00:01:00”,”value”:75},{“time”:”00:02:00”,”value”:73},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-09”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:750.3615,”max”:77,”min”:30,”minutes”:851,”name”:”Out of Range”},{“caloriesOut”:734.1516,”max”:107,”min”:77,”minutes”:550,”name”:”Fat Burn”},{“caloriesOut”:131.8579,”max”:130,”min”:107,”minutes”:29,”name”:”Cardio”},{“caloriesOut”:0,”max”:220,”min”:130,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:69}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:90},{“time”:”00:01:00”,”value”:89},{“time”:”00:02:00”,”value”:88},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
HEARTRATE | +FLAG_TO_MUTATE | +
HEARTRATE_ZONE | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_heartrate_intraday_json.py
+
Note
All columns except DEVICE_ID
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-07”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:1200.6102,”max”:88,”min”:31,”minutes”:1058,”name”:”Out of Range”},{“caloriesOut”:760.3020,”max”:120,”min”:86,”minutes”:366,”name”:”Fat Burn”},{“caloriesOut”:15.2048,”max”:146,”min”:120,”minutes”:2,”name”:”Cardio”},{“caloriesOut”:0,”max”:221,”min”:148,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:72}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:68},{“time”:”00:01:00”,”value”:67},{“time”:”00:02:00”,”value”:67},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-08”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:1100.1120,”max”:89,”min”:30,”minutes”:921,”name”:”Out of Range”},{“caloriesOut”:660.0012,”max”:118,”min”:82,”minutes”:361,”name”:”Fat Burn”},{“caloriesOut”:23.7088,”max”:142,”min”:108,”minutes”:3,”name”:”Cardio”},{“caloriesOut”:0,”max”:221,”min”:148,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:70}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:77},{“time”:”00:01:00”,”value”:75},{“time”:”00:02:00”,”value”:73},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-09”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:750.3615,”max”:77,”min”:30,”minutes”:851,”name”:”Out of Range”},{“caloriesOut”:734.1516,”max”:107,”min”:77,”minutes”:550,”name”:”Fat Burn”},{“caloriesOut”:131.8579,”max”:130,”min”:107,”minutes”:29,”name”:”Cardio”},{“caloriesOut”:0,”max”:220,”min”:130,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:69}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:90},{“time”:”00:01:00”,”value”:89},{“time”:”00:02:00”,”value”:88},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
LOCAL_START_DATE_TIME | +FLAG_TO_MUTATE | +
LOCAL_END_DATE_TIME | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
EFFICIENCY | +FLAG_TO_MUTATE | +
MINUTES_AFTER_WAKEUP | +FLAG_TO_MUTATE | +
MINUTES_ASLEEP | +FLAG_TO_MUTATE | +
MINUTES_AWAKE | +FLAG_TO_MUTATE | +
MINUTES_TO_FALL_ASLEEP | +FLAG_TO_MUTATE | +
MINUTES_IN_BED | +FLAG_TO_MUTATE | +
IS_MAIN_SLEEP | +FLAG_TO_MUTATE | +
TYPE | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_sleep_summary_json.py
+
Note
+Fitbit API has two versions for sleep data, v1 and v1.2. We support both but ignore v1’s count_awake
, duration_awake
, and count_awakenings
, count_restless
, duration_restless
columns.
All columns except DEVICE_ID
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-10”,”duration”:3600000,”efficiency”:92,”endTime”:”2020-10-10T16:37:00.000”,”infoCode”:2,”isMainSleep”:false,”levels”:{“data”:[{“dateTime”:”2020-10-10T15:36:30.000”,”level”:”restless”,”seconds”:60},{“dateTime”:”2020-10-10T15:37:30.000”,”level”:”asleep”,”seconds”:660},{“dateTime”:”2020-10-10T15:48:30.000”,”level”:”restless”,”seconds”:60},…], “summary”:{“asleep”:{“count”:0,”minutes”:56},”awake”:{“count”:0,”minutes”:0},”restless”:{“count”:3,”minutes”:4}}},”logId”:26315914306,”minutesAfterWakeup”:0,”minutesAsleep”:55,”minutesAwake”:5,”minutesToFallAsleep”:0,”startTime”:”2020-10-10T15:36:30.000”,”timeInBed”:60,”type”:”classic”},{“dateOfSleep”:”2020-10-10”,”duration”:22980000,”efficiency”:88,”endTime”:”2020-10-10T08:10:00.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-10T01:46:30.000”,”level”:”light”,”seconds”:420},{“dateTime”:”2020-10-10T01:53:30.000”,”level”:”deep”,”seconds”:1230},{“dateTime”:”2020-10-10T02:14:00.000”,”level”:”light”,”seconds”:360},…], “summary”:{“deep”:{“count”:3,”minutes”:92,”thirtyDayAvgMinutes”:0},”light”:{“count”:29,”minutes”:193,”thirtyDayAvgMinutes”:0},”rem”:{“count”:4,”minutes”:33,”thirtyDayAvgMinutes”:0},”wake”:{“count”:28,”minutes”:65,”thirtyDayAvgMinutes”:0}}},”logId”:26311786557,”minutesAfterWakeup”:0,”minutesAsleep”:318,”minutesAwake”:65,”minutesToFallAsleep”:0,”startTime”:”2020-10-10T01:46:30.000”,”timeInBed”:383,”type”:”stages”}],”summary”:{“stages”:{“deep”:92,”light”:193,”rem”:33,”wake”:65},”totalMinutesAsleep”:373,”totalSleepRecords”:2,”totalTimeInBed”:443}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-11”,”duration”:41640000,”efficiency”:89,”endTime”:”2020-10-11T11:47:00.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-11T00:12:30.000”,”level”:”wake”,”seconds”:450},{“dateTime”:”2020-10-11T00:20:00.000”,”level”:”light”,”seconds”:870},{“dateTime”:”2020-10-11T00:34:30.000”,”level”:”wake”,”seconds”:780},…], “summary”:{“deep”:{“count”:4,”minutes”:52,”thirtyDayAvgMinutes”:62},”light”:{“count”:32,”minutes”:442,”thirtyDayAvgMinutes”:364},”rem”:{“count”:6,”minutes”:68,”thirtyDayAvgMinutes”:58},”wake”:{“count”:29,”minutes”:132,”thirtyDayAvgMinutes”:94}}},”logId”:26589710670,”minutesAfterWakeup”:1,”minutesAsleep”:562,”minutesAwake”:132,”minutesToFallAsleep”:0,”startTime”:”2020-10-11T00:12:30.000”,”timeInBed”:694,”type”:”stages”}],”summary”:{“stages”:{“deep”:52,”light”:442,”rem”:68,”wake”:132},”totalMinutesAsleep”:562,”totalSleepRecords”:1,”totalTimeInBed”:694}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-12”,”duration”:28980000,”efficiency”:93,”endTime”:”2020-10-12T09:34:30.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-12T01:31:00.000”,”level”:”wake”,”seconds”:600},{“dateTime”:”2020-10-12T01:41:00.000”,”level”:”light”,”seconds”:60},{“dateTime”:”2020-10-12T01:42:00.000”,”level”:”deep”,”seconds”:2340},…], “summary”:{“deep”:{“count”:4,”minutes”:63,”thirtyDayAvgMinutes”:59},”light”:{“count”:27,”minutes”:257,”thirtyDayAvgMinutes”:364},”rem”:{“count”:5,”minutes”:94,”thirtyDayAvgMinutes”:58},”wake”:{“count”:24,”minutes”:69,”thirtyDayAvgMinutes”:95}}},”logId”:26589710673,”minutesAfterWakeup”:0,”minutesAsleep”:415,”minutesAwake”:68,”minutesToFallAsleep”:0,”startTime”:”2020-10-12T01:31:00.000”,”timeInBed”:483,”type”:”stages”}],”summary”:{“stages”:{“deep”:63,”light”:257,”rem”:94,”wake”:69},”totalMinutesAsleep”:415,”totalSleepRecords”:1,”totalTimeInBed”:483}} | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
TYPE_EPISODE_ID | +FLAG_TO_MUTATE | +
DURATION | +FLAG_TO_MUTATE | +
IS_MAIN_SLEEP | +FLAG_TO_MUTATE | +
TYPE | +FLAG_TO_MUTATE | +
LEVEL | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_sleep_intraday_json.py
+
Note
+Fitbit API has two versions for sleep data, v1 and v1.2, we support both.
+All columns except DEVICE_ID
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-10”,”duration”:3600000,”efficiency”:92,”endTime”:”2020-10-10T16:37:00.000”,”infoCode”:2,”isMainSleep”:false,”levels”:{“data”:[{“dateTime”:”2020-10-10T15:36:30.000”,”level”:”restless”,”seconds”:60},{“dateTime”:”2020-10-10T15:37:30.000”,”level”:”asleep”,”seconds”:660},{“dateTime”:”2020-10-10T15:48:30.000”,”level”:”restless”,”seconds”:60},…], “summary”:{“asleep”:{“count”:0,”minutes”:56},”awake”:{“count”:0,”minutes”:0},”restless”:{“count”:3,”minutes”:4}}},”logId”:26315914306,”minutesAfterWakeup”:0,”minutesAsleep”:55,”minutesAwake”:5,”minutesToFallAsleep”:0,”startTime”:”2020-10-10T15:36:30.000”,”timeInBed”:60,”type”:”classic”},{“dateOfSleep”:”2020-10-10”,”duration”:22980000,”efficiency”:88,”endTime”:”2020-10-10T08:10:00.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-10T01:46:30.000”,”level”:”light”,”seconds”:420},{“dateTime”:”2020-10-10T01:53:30.000”,”level”:”deep”,”seconds”:1230},{“dateTime”:”2020-10-10T02:14:00.000”,”level”:”light”,”seconds”:360},…], “summary”:{“deep”:{“count”:3,”minutes”:92,”thirtyDayAvgMinutes”:0},”light”:{“count”:29,”minutes”:193,”thirtyDayAvgMinutes”:0},”rem”:{“count”:4,”minutes”:33,”thirtyDayAvgMinutes”:0},”wake”:{“count”:28,”minutes”:65,”thirtyDayAvgMinutes”:0}}},”logId”:26311786557,”minutesAfterWakeup”:0,”minutesAsleep”:318,”minutesAwake”:65,”minutesToFallAsleep”:0,”startTime”:”2020-10-10T01:46:30.000”,”timeInBed”:383,”type”:”stages”}],”summary”:{“stages”:{“deep”:92,”light”:193,”rem”:33,”wake”:65},”totalMinutesAsleep”:373,”totalSleepRecords”:2,”totalTimeInBed”:443}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-11”,”duration”:41640000,”efficiency”:89,”endTime”:”2020-10-11T11:47:00.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-11T00:12:30.000”,”level”:”wake”,”seconds”:450},{“dateTime”:”2020-10-11T00:20:00.000”,”level”:”light”,”seconds”:870},{“dateTime”:”2020-10-11T00:34:30.000”,”level”:”wake”,”seconds”:780},…], “summary”:{“deep”:{“count”:4,”minutes”:52,”thirtyDayAvgMinutes”:62},”light”:{“count”:32,”minutes”:442,”thirtyDayAvgMinutes”:364},”rem”:{“count”:6,”minutes”:68,”thirtyDayAvgMinutes”:58},”wake”:{“count”:29,”minutes”:132,”thirtyDayAvgMinutes”:94}}},”logId”:26589710670,”minutesAfterWakeup”:1,”minutesAsleep”:562,”minutesAwake”:132,”minutesToFallAsleep”:0,”startTime”:”2020-10-11T00:12:30.000”,”timeInBed”:694,”type”:”stages”}],”summary”:{“stages”:{“deep”:52,”light”:442,”rem”:68,”wake”:132},”totalMinutesAsleep”:562,”totalSleepRecords”:1,”totalTimeInBed”:694}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-12”,”duration”:28980000,”efficiency”:93,”endTime”:”2020-10-12T09:34:30.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-12T01:31:00.000”,”level”:”wake”,”seconds”:600},{“dateTime”:”2020-10-12T01:41:00.000”,”level”:”light”,”seconds”:60},{“dateTime”:”2020-10-12T01:42:00.000”,”level”:”deep”,”seconds”:2340},…], “summary”:{“deep”:{“count”:4,”minutes”:63,”thirtyDayAvgMinutes”:59},”light”:{“count”:27,”minutes”:257,”thirtyDayAvgMinutes”:364},”rem”:{“count”:5,”minutes”:94,”thirtyDayAvgMinutes”:58},”wake”:{“count”:24,”minutes”:69,”thirtyDayAvgMinutes”:95}}},”logId”:26589710673,”minutesAfterWakeup”:0,”minutesAsleep”:415,”minutesAwake”:68,”minutesToFallAsleep”:0,”startTime”:”2020-10-12T01:31:00.000”,”timeInBed”:483,”type”:”stages”}],”summary”:{“stages”:{“deep”:63,”light”:257,”rem”:94,”wake”:69},”totalMinutesAsleep”:415,”totalSleepRecords”:1,”totalTimeInBed”:483}} | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
STEPS | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_steps_summary_json.py
+
Note
TIMESTAMP
, LOCAL_DATE_TIME
, and STEPS
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-07”,”value”:”1775”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:5},{“time”:”00:01:00”,”value”:3},{“time”:”00:02:00”,”value”:0},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-08”,”value”:”3201”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:14},{“time”:”00:01:00”,”value”:11},{“time”:”00:02:00”,”value”:10},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-09”,”value”:”998”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:0},{“time”:”00:01:00”,”value”:0},{“time”:”00:02:00”,”value”:0},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
STEPS | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_steps_intraday_json.py
+
Note
TIMESTAMP
, LOCAL_DATE_TIME
, and STEPS
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-07”,”value”:”1775”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:5},{“time”:”00:01:00”,”value”:3},{“time”:”00:02:00”,”value”:0},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-08”,”value”:”3201”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:14},{“time”:”00:01:00”,”value”:11},{“time”:”00:02:00”,”value”:10},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-09”,”value”:”998”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:0},{“time”:”00:01:00”,”value”:0},{“time”:”00:02:00”,”value”:0},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
fitbitjson_mysql
¶This data stream handles Fitbit sensor data downloaded using the Fitbit Web API and stored in a MySQL database. Please note that RAPIDS cannot query the API directly; you need to use other available tools or implement your own. Once you have your sensor data in a MySQL database, RAPIDS can process it.
+The container should be a MySQL database with a table per sensor, each containing all participants’ data.
+The script to connect and download data from this container is at: +
src/data/streams/fitbitjson_mysql/container.R
+
The format.yaml
maps and transforms columns in your raw data stream to the mandatory columns RAPIDS needs for Fitbit sensors. This file is at:
src/data/streams/fitbitjson_csv/format.yaml
+
If you want RAPIDS to process Fitbit sensor data using this stream, you will need to map DEVICE_ID
and JSON_FITBIT_COLUMN
to your own raw data columns inside each sensor section in format.yaml
.
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
HEARTRATE_DAILY_RESTINGHR | +FLAG_TO_MUTATE | +
HEARTRATE_DAILY_CALORIESOUTOFRANGE | +FLAG_TO_MUTATE | +
HEARTRATE_DAILY_CALORIESFATBURN | +FLAG_TO_MUTATE | +
HEARTRATE_DAILY_CALORIESCARDIO | +FLAG_TO_MUTATE | +
HEARTRATE_DAILY_CALORIESPEAK | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_heartrate_summary_json.py
+
Note
All columns except DEVICE_ID
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-07”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:1200.6102,”max”:88,”min”:31,”minutes”:1058,”name”:”Out of Range”},{“caloriesOut”:760.3020,”max”:120,”min”:86,”minutes”:366,”name”:”Fat Burn”},{“caloriesOut”:15.2048,”max”:146,”min”:120,”minutes”:2,”name”:”Cardio”},{“caloriesOut”:0,”max”:221,”min”:148,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:72}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:68},{“time”:”00:01:00”,”value”:67},{“time”:”00:02:00”,”value”:67},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-08”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:1100.1120,”max”:89,”min”:30,”minutes”:921,”name”:”Out of Range”},{“caloriesOut”:660.0012,”max”:118,”min”:82,”minutes”:361,”name”:”Fat Burn”},{“caloriesOut”:23.7088,”max”:142,”min”:108,”minutes”:3,”name”:”Cardio”},{“caloriesOut”:0,”max”:221,”min”:148,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:70}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:77},{“time”:”00:01:00”,”value”:75},{“time”:”00:02:00”,”value”:73},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-09”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:750.3615,”max”:77,”min”:30,”minutes”:851,”name”:”Out of Range”},{“caloriesOut”:734.1516,”max”:107,”min”:77,”minutes”:550,”name”:”Fat Burn”},{“caloriesOut”:131.8579,”max”:130,”min”:107,”minutes”:29,”name”:”Cardio”},{“caloriesOut”:0,”max”:220,”min”:130,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:69}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:90},{“time”:”00:01:00”,”value”:89},{“time”:”00:02:00”,”value”:88},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
HEARTRATE | +FLAG_TO_MUTATE | +
HEARTRATE_ZONE | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_heartrate_intraday_json.py
+
Note
All columns except DEVICE_ID
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-07”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:1200.6102,”max”:88,”min”:31,”minutes”:1058,”name”:”Out of Range”},{“caloriesOut”:760.3020,”max”:120,”min”:86,”minutes”:366,”name”:”Fat Burn”},{“caloriesOut”:15.2048,”max”:146,”min”:120,”minutes”:2,”name”:”Cardio”},{“caloriesOut”:0,”max”:221,”min”:148,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:72}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:68},{“time”:”00:01:00”,”value”:67},{“time”:”00:02:00”,”value”:67},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-08”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:1100.1120,”max”:89,”min”:30,”minutes”:921,”name”:”Out of Range”},{“caloriesOut”:660.0012,”max”:118,”min”:82,”minutes”:361,”name”:”Fat Burn”},{“caloriesOut”:23.7088,”max”:142,”min”:108,”minutes”:3,”name”:”Cardio”},{“caloriesOut”:0,”max”:221,”min”:148,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:70}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:77},{“time”:”00:01:00”,”value”:75},{“time”:”00:02:00”,”value”:73},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-09”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:750.3615,”max”:77,”min”:30,”minutes”:851,”name”:”Out of Range”},{“caloriesOut”:734.1516,”max”:107,”min”:77,”minutes”:550,”name”:”Fat Burn”},{“caloriesOut”:131.8579,”max”:130,”min”:107,”minutes”:29,”name”:”Cardio”},{“caloriesOut”:0,”max”:220,”min”:130,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:69}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:90},{“time”:”00:01:00”,”value”:89},{“time”:”00:02:00”,”value”:88},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
LOCAL_START_DATE_TIME | +FLAG_TO_MUTATE | +
LOCAL_END_DATE_TIME | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
EFFICIENCY | +FLAG_TO_MUTATE | +
MINUTES_AFTER_WAKEUP | +FLAG_TO_MUTATE | +
MINUTES_ASLEEP | +FLAG_TO_MUTATE | +
MINUTES_AWAKE | +FLAG_TO_MUTATE | +
MINUTES_TO_FALL_ASLEEP | +FLAG_TO_MUTATE | +
MINUTES_IN_BED | +FLAG_TO_MUTATE | +
IS_MAIN_SLEEP | +FLAG_TO_MUTATE | +
TYPE | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_sleep_summary_json.py
+
Note
+Fitbit API has two versions for sleep data, v1 and v1.2. We support both but ignore v1’s count_awake
, duration_awake
, and count_awakenings
, count_restless
, duration_restless
columns.
All columns except DEVICE_ID
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-10”,”duration”:3600000,”efficiency”:92,”endTime”:”2020-10-10T16:37:00.000”,”infoCode”:2,”isMainSleep”:false,”levels”:{“data”:[{“dateTime”:”2020-10-10T15:36:30.000”,”level”:”restless”,”seconds”:60},{“dateTime”:”2020-10-10T15:37:30.000”,”level”:”asleep”,”seconds”:660},{“dateTime”:”2020-10-10T15:48:30.000”,”level”:”restless”,”seconds”:60},…], “summary”:{“asleep”:{“count”:0,”minutes”:56},”awake”:{“count”:0,”minutes”:0},”restless”:{“count”:3,”minutes”:4}}},”logId”:26315914306,”minutesAfterWakeup”:0,”minutesAsleep”:55,”minutesAwake”:5,”minutesToFallAsleep”:0,”startTime”:”2020-10-10T15:36:30.000”,”timeInBed”:60,”type”:”classic”},{“dateOfSleep”:”2020-10-10”,”duration”:22980000,”efficiency”:88,”endTime”:”2020-10-10T08:10:00.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-10T01:46:30.000”,”level”:”light”,”seconds”:420},{“dateTime”:”2020-10-10T01:53:30.000”,”level”:”deep”,”seconds”:1230},{“dateTime”:”2020-10-10T02:14:00.000”,”level”:”light”,”seconds”:360},…], “summary”:{“deep”:{“count”:3,”minutes”:92,”thirtyDayAvgMinutes”:0},”light”:{“count”:29,”minutes”:193,”thirtyDayAvgMinutes”:0},”rem”:{“count”:4,”minutes”:33,”thirtyDayAvgMinutes”:0},”wake”:{“count”:28,”minutes”:65,”thirtyDayAvgMinutes”:0}}},”logId”:26311786557,”minutesAfterWakeup”:0,”minutesAsleep”:318,”minutesAwake”:65,”minutesToFallAsleep”:0,”startTime”:”2020-10-10T01:46:30.000”,”timeInBed”:383,”type”:”stages”}],”summary”:{“stages”:{“deep”:92,”light”:193,”rem”:33,”wake”:65},”totalMinutesAsleep”:373,”totalSleepRecords”:2,”totalTimeInBed”:443}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-11”,”duration”:41640000,”efficiency”:89,”endTime”:”2020-10-11T11:47:00.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-11T00:12:30.000”,”level”:”wake”,”seconds”:450},{“dateTime”:”2020-10-11T00:20:00.000”,”level”:”light”,”seconds”:870},{“dateTime”:”2020-10-11T00:34:30.000”,”level”:”wake”,”seconds”:780},…], “summary”:{“deep”:{“count”:4,”minutes”:52,”thirtyDayAvgMinutes”:62},”light”:{“count”:32,”minutes”:442,”thirtyDayAvgMinutes”:364},”rem”:{“count”:6,”minutes”:68,”thirtyDayAvgMinutes”:58},”wake”:{“count”:29,”minutes”:132,”thirtyDayAvgMinutes”:94}}},”logId”:26589710670,”minutesAfterWakeup”:1,”minutesAsleep”:562,”minutesAwake”:132,”minutesToFallAsleep”:0,”startTime”:”2020-10-11T00:12:30.000”,”timeInBed”:694,”type”:”stages”}],”summary”:{“stages”:{“deep”:52,”light”:442,”rem”:68,”wake”:132},”totalMinutesAsleep”:562,”totalSleepRecords”:1,”totalTimeInBed”:694}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-12”,”duration”:28980000,”efficiency”:93,”endTime”:”2020-10-12T09:34:30.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-12T01:31:00.000”,”level”:”wake”,”seconds”:600},{“dateTime”:”2020-10-12T01:41:00.000”,”level”:”light”,”seconds”:60},{“dateTime”:”2020-10-12T01:42:00.000”,”level”:”deep”,”seconds”:2340},…], “summary”:{“deep”:{“count”:4,”minutes”:63,”thirtyDayAvgMinutes”:59},”light”:{“count”:27,”minutes”:257,”thirtyDayAvgMinutes”:364},”rem”:{“count”:5,”minutes”:94,”thirtyDayAvgMinutes”:58},”wake”:{“count”:24,”minutes”:69,”thirtyDayAvgMinutes”:95}}},”logId”:26589710673,”minutesAfterWakeup”:0,”minutesAsleep”:415,”minutesAwake”:68,”minutesToFallAsleep”:0,”startTime”:”2020-10-12T01:31:00.000”,”timeInBed”:483,”type”:”stages”}],”summary”:{“stages”:{“deep”:63,”light”:257,”rem”:94,”wake”:69},”totalMinutesAsleep”:415,”totalSleepRecords”:1,”totalTimeInBed”:483}} | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
TYPE_EPISODE_ID | +FLAG_TO_MUTATE | +
DURATION | +FLAG_TO_MUTATE | +
IS_MAIN_SLEEP | +FLAG_TO_MUTATE | +
TYPE | +FLAG_TO_MUTATE | +
LEVEL | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_sleep_intraday_json.py
+
Note
+Fitbit API has two versions for sleep data, v1 and v1.2, we support both.
+All columns except DEVICE_ID
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-10”,”duration”:3600000,”efficiency”:92,”endTime”:”2020-10-10T16:37:00.000”,”infoCode”:2,”isMainSleep”:false,”levels”:{“data”:[{“dateTime”:”2020-10-10T15:36:30.000”,”level”:”restless”,”seconds”:60},{“dateTime”:”2020-10-10T15:37:30.000”,”level”:”asleep”,”seconds”:660},{“dateTime”:”2020-10-10T15:48:30.000”,”level”:”restless”,”seconds”:60},…], “summary”:{“asleep”:{“count”:0,”minutes”:56},”awake”:{“count”:0,”minutes”:0},”restless”:{“count”:3,”minutes”:4}}},”logId”:26315914306,”minutesAfterWakeup”:0,”minutesAsleep”:55,”minutesAwake”:5,”minutesToFallAsleep”:0,”startTime”:”2020-10-10T15:36:30.000”,”timeInBed”:60,”type”:”classic”},{“dateOfSleep”:”2020-10-10”,”duration”:22980000,”efficiency”:88,”endTime”:”2020-10-10T08:10:00.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-10T01:46:30.000”,”level”:”light”,”seconds”:420},{“dateTime”:”2020-10-10T01:53:30.000”,”level”:”deep”,”seconds”:1230},{“dateTime”:”2020-10-10T02:14:00.000”,”level”:”light”,”seconds”:360},…], “summary”:{“deep”:{“count”:3,”minutes”:92,”thirtyDayAvgMinutes”:0},”light”:{“count”:29,”minutes”:193,”thirtyDayAvgMinutes”:0},”rem”:{“count”:4,”minutes”:33,”thirtyDayAvgMinutes”:0},”wake”:{“count”:28,”minutes”:65,”thirtyDayAvgMinutes”:0}}},”logId”:26311786557,”minutesAfterWakeup”:0,”minutesAsleep”:318,”minutesAwake”:65,”minutesToFallAsleep”:0,”startTime”:”2020-10-10T01:46:30.000”,”timeInBed”:383,”type”:”stages”}],”summary”:{“stages”:{“deep”:92,”light”:193,”rem”:33,”wake”:65},”totalMinutesAsleep”:373,”totalSleepRecords”:2,”totalTimeInBed”:443}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-11”,”duration”:41640000,”efficiency”:89,”endTime”:”2020-10-11T11:47:00.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-11T00:12:30.000”,”level”:”wake”,”seconds”:450},{“dateTime”:”2020-10-11T00:20:00.000”,”level”:”light”,”seconds”:870},{“dateTime”:”2020-10-11T00:34:30.000”,”level”:”wake”,”seconds”:780},…], “summary”:{“deep”:{“count”:4,”minutes”:52,”thirtyDayAvgMinutes”:62},”light”:{“count”:32,”minutes”:442,”thirtyDayAvgMinutes”:364},”rem”:{“count”:6,”minutes”:68,”thirtyDayAvgMinutes”:58},”wake”:{“count”:29,”minutes”:132,”thirtyDayAvgMinutes”:94}}},”logId”:26589710670,”minutesAfterWakeup”:1,”minutesAsleep”:562,”minutesAwake”:132,”minutesToFallAsleep”:0,”startTime”:”2020-10-11T00:12:30.000”,”timeInBed”:694,”type”:”stages”}],”summary”:{“stages”:{“deep”:52,”light”:442,”rem”:68,”wake”:132},”totalMinutesAsleep”:562,”totalSleepRecords”:1,”totalTimeInBed”:694}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-12”,”duration”:28980000,”efficiency”:93,”endTime”:”2020-10-12T09:34:30.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-12T01:31:00.000”,”level”:”wake”,”seconds”:600},{“dateTime”:”2020-10-12T01:41:00.000”,”level”:”light”,”seconds”:60},{“dateTime”:”2020-10-12T01:42:00.000”,”level”:”deep”,”seconds”:2340},…], “summary”:{“deep”:{“count”:4,”minutes”:63,”thirtyDayAvgMinutes”:59},”light”:{“count”:27,”minutes”:257,”thirtyDayAvgMinutes”:364},”rem”:{“count”:5,”minutes”:94,”thirtyDayAvgMinutes”:58},”wake”:{“count”:24,”minutes”:69,”thirtyDayAvgMinutes”:95}}},”logId”:26589710673,”minutesAfterWakeup”:0,”minutesAsleep”:415,”minutesAwake”:68,”minutesToFallAsleep”:0,”startTime”:”2020-10-12T01:31:00.000”,”timeInBed”:483,”type”:”stages”}],”summary”:{“stages”:{“deep”:63,”light”:257,”rem”:94,”wake”:69},”totalMinutesAsleep”:415,”totalSleepRecords”:1,”totalTimeInBed”:483}} | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
STEPS | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_steps_summary_json.py
+
Note
TIMESTAMP
, LOCAL_DATE_TIME
, and STEPS
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-07”,”value”:”1775”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:5},{“time”:”00:01:00”,”value”:3},{“time”:”00:02:00”,”value”:0},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-08”,”value”:”3201”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:14},{“time”:”00:01:00”,”value”:11},{“time”:”00:02:00”,”value”:10},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-09”,”value”:”998”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:0},{“time”:”00:01:00”,”value”:0},{“time”:”00:02:00”,”value”:0},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
STEPS | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_steps_intraday_json.py
+
Note
TIMESTAMP
, LOCAL_DATE_TIME
, and STEPS
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-07”,”value”:”1775”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:5},{“time”:”00:01:00”,”value”:3},{“time”:”00:02:00”,”value”:0},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-08”,”value”:”3201”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:14},{“time”:”00:01:00”,”value”:11},{“time”:”00:02:00”,”value”:10},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-09”,”value”:”998”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:0},{“time”:”00:01:00”,”value”:0},{“time”:”00:02:00”,”value”:0},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
fitbitparsed_csv
¶This data stream handles Fitbit sensor data downloaded using the Fitbit Web API, parsed, and stored in a CSV file. Please note that RAPIDS cannot query the API directly; you need to use other available tools or implement your own. Once you have your parsed sensor data in a CSV file, RAPIDS can process it.
+What is the difference between JSON and plain data streams
+Most people will only need fitbitjson_*
because they downloaded and stored their data directly from Fitbit’s API. However, if, for some reason, you don’t have access to that JSON data and instead only have the parsed data (columns and rows), you can use this data stream.
Warning
+The CSV files have to use ,
as separator, \
as escape character (do not escape "
with ""
), and wrap any string columns with "
.
"device_id","heartrate","heartrate_zone","local_date_time","timestamp"
+"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",69,"outofrange","2020-04-23 00:00:00",0
+"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",69,"outofrange","2020-04-23 00:01:00",0
+"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",67,"outofrange","2020-04-23 00:02:00",0
+"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",69,"outofrange","2020-04-23 00:03:00",0
+
The container should be a CSV file per sensor, each containing all participants’ data.
+The script to connect and download data from this container is at: +
src/data/streams/fitbitparsed_csv/container.R
+
The format.yaml
maps and transforms columns in your raw data stream to the mandatory columns RAPIDS needs for Fitbit sensors. This file is at:
src/data/streams/fitbitparsed_mysql/format.yaml
+
If you want to use this stream with your data, modify every sensor in format.yaml
to map all columns except TIMESTAMP
in [RAPIDS_COLUMN_MAPPINGS]
to your raw data column names.
All columns are mandatory; however, all except device_id
and local_date_time
can be empty if you don’t have that data. Just have in mind that some features will be empty if some of these columns are empty.
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +local_date_time | +
DEVICE_ID | +device_id | +
HEARTRATE_DAILY_RESTINGHR | +heartrate_daily_restinghr | +
HEARTRATE_DAILY_CALORIESOUTOFRANGE | +heartrate_daily_caloriesoutofrange | +
HEARTRATE_DAILY_CALORIESFATBURN | +heartrate_daily_caloriesfatburn | +
HEARTRATE_DAILY_CALORIESCARDIO | +heartrate_daily_caloriescardio | +
HEARTRATE_DAILY_CALORIESPEAK | +heartrate_daily_caloriespeak | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
device_id | +local_date_time | +heartrate_daily_restinghr | +heartrate_daily_caloriesoutofrange | +heartrate_daily_caloriesfatburn | +heartrate_daily_caloriescardio | +heartrate_daily_caloriespeak | +
---|---|---|---|---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 | +72 | +1200.6102 | +760.3020 | +15.2048 | +0 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-08 | +70 | +1100.1120 | +660.0012 | +23.7088 | +0 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-09 | +69 | +750.3615 | +734.1516 | +131.8579 | +0 | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +local_date_time | +
DEVICE_ID | +device_id | +
HEARTRATE | +heartrate | +
HEARTRATE_ZONE | +heartrate_zone | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
device_id | +local_date_time | +heartrate | +heartrate_zone | +
---|---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:00:00 | +68 | +outofrange | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:01:00 | +67 | +outofrange | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:02:00 | +67 | +outofrange | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +local_date_time | +
LOCAL_START_DATE_TIME | +local_start_date_time | +
LOCAL_END_DATE_TIME | +local_end_date_time | +
DEVICE_ID | +device_id | +
EFFICIENCY | +efficiency | +
MINUTES_AFTER_WAKEUP | +minutes_after_wakeup | +
MINUTES_ASLEEP | +minutes_asleep | +
MINUTES_AWAKE | +minutes_awake | +
MINUTES_TO_FALL_ASLEEP | +minutes_to_fall_asleep | +
MINUTES_IN_BED | +minutes_in_bed | +
IS_MAIN_SLEEP | +is_main_sleep | +
TYPE | +type | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
Fitbit API has two versions for sleep data, v1 and v1.2. We support both but ignore v1’s count_awake
, duration_awake
, and count_awakenings
, count_restless
, duration_restless
columns.
device_id | +local_start_date_time | +local_end_date_time | +efficiency | +minutes_after_wakeup | +minutes_asleep | +minutes_awake | +minutes_to_fall_asleep | +minutes_in_bed | +is_main_sleep | +type | +
---|---|---|---|---|---|---|---|---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-10 15:36:30 | +2020-10-10 16:37:00 | +92 | +0 | +55 | +5 | +0 | +60 | +0 | +classic | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-10 01:46:30 | +2020-10-10 08:10:00 | +88 | +0 | +318 | +65 | +0 | +383 | +1 | +stages | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-11 00:12:30 | +2020-10-11 11:47:00 | +89 | +1 | +562 | +132 | +0 | +694 | +1 | +stages | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-12 01:31:00 | +2020-10-12 09:34:30 | +93 | +0 | +415 | +68 | +0 | +483 | +1 | +stages | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +local_date_time | +
DEVICE_ID | +device_id | +
TYPE_EPISODE_ID | +type_episode_id | +
DURATION | +duration | +
IS_MAIN_SLEEP | +is_main_sleep | +
TYPE | +type | +
LEVEL | +level | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
Fitbit API has two versions for sleep data, v1 and v1.2, we support both.
+device_id | +type_episode_id | +local_date_time | +duration | +level | +is_main_sleep | +type | +
---|---|---|---|---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +0 | +2020-10-10 15:36:30 | +60 | +restless | +0 | +classic | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +0 | +2020-10-10 15:37:30 | +660 | +asleep | +0 | +classic | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +0 | +2020-10-10 15:48:30 | +60 | +restless | +0 | +classic | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +… | +… | +… | +… | +… | +… | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +1 | +2020-10-10 01:46:30 | +420 | +light | +1 | +stages | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +1 | +2020-10-10 01:53:30 | +1230 | +deep | +1 | +stages | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
LOCAL_DATE_TIME | +local_date_time | +
STEPS | +steps | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
device_id | +local_date_time | +steps | +
---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 | +1775 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-08 | +3201 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-09 | +998 | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
LOCAL_DATE_TIME | +local_date_time | +
STEPS | +steps | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
device_id | +local_date_time | +steps | +
---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:00:00 | +5 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:01:00 | +3 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:02:00 | +0 | +
fitbitparsed_mysql
¶This data stream handles Fitbit sensor data downloaded using the Fitbit Web API, parsed, and stored in a MySQL database. Please note that RAPIDS cannot query the API directly; you need to use other available tools or implement your own. Once you have your parsed sensor data in a MySQL database, RAPIDS can process it.
+What is the difference between JSON and plain data streams
+Most people will only need fitbitjson_*
because they downloaded and stored their data directly from Fitbit’s API. However, if, for some reason, you don’t have access to that JSON data and instead only have the parsed data (columns and rows), you can use this data stream.
The container should be a MySQL database with a table per sensor, each containing all participants’ data.
+The script to connect and download data from this container is at: +
src/data/streams/fitbitparsed_mysql/container.R
+
The format.yaml
maps and transforms columns in your raw data stream to the mandatory columns RAPIDS needs for Fitbit sensors. This file is at:
src/data/streams/fitbitparsed_mysql/format.yaml
+
If you want to use this stream with your data, modify every sensor in format.yaml
to map all columns except TIMESTAMP
in [RAPIDS_COLUMN_MAPPINGS]
to your raw data column names.
All columns are mandatory; however, all except device_id
and local_date_time
can be empty if you don’t have that data. Just have in mind that some features will be empty if some of these columns are empty.
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +local_date_time | +
DEVICE_ID | +device_id | +
HEARTRATE_DAILY_RESTINGHR | +heartrate_daily_restinghr | +
HEARTRATE_DAILY_CALORIESOUTOFRANGE | +heartrate_daily_caloriesoutofrange | +
HEARTRATE_DAILY_CALORIESFATBURN | +heartrate_daily_caloriesfatburn | +
HEARTRATE_DAILY_CALORIESCARDIO | +heartrate_daily_caloriescardio | +
HEARTRATE_DAILY_CALORIESPEAK | +heartrate_daily_caloriespeak | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
device_id | +local_date_time | +heartrate_daily_restinghr | +heartrate_daily_caloriesoutofrange | +heartrate_daily_caloriesfatburn | +heartrate_daily_caloriescardio | +heartrate_daily_caloriespeak | +
---|---|---|---|---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 | +72 | +1200.6102 | +760.3020 | +15.2048 | +0 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-08 | +70 | +1100.1120 | +660.0012 | +23.7088 | +0 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-09 | +69 | +750.3615 | +734.1516 | +131.8579 | +0 | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +local_date_time | +
DEVICE_ID | +device_id | +
HEARTRATE | +heartrate | +
HEARTRATE_ZONE | +heartrate_zone | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
device_id | +local_date_time | +heartrate | +heartrate_zone | +
---|---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:00:00 | +68 | +outofrange | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:01:00 | +67 | +outofrange | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:02:00 | +67 | +outofrange | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +local_date_time | +
LOCAL_START_DATE_TIME | +local_start_date_time | +
LOCAL_END_DATE_TIME | +local_end_date_time | +
DEVICE_ID | +device_id | +
EFFICIENCY | +efficiency | +
MINUTES_AFTER_WAKEUP | +minutes_after_wakeup | +
MINUTES_ASLEEP | +minutes_asleep | +
MINUTES_AWAKE | +minutes_awake | +
MINUTES_TO_FALL_ASLEEP | +minutes_to_fall_asleep | +
MINUTES_IN_BED | +minutes_in_bed | +
IS_MAIN_SLEEP | +is_main_sleep | +
TYPE | +type | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
Fitbit API has two versions for sleep data, v1 and v1.2. We support both but ignore v1’s count_awake
, duration_awake
, and count_awakenings
, count_restless
, duration_restless
columns.
device_id | +local_start_date_time | +local_end_date_time | +efficiency | +minutes_after_wakeup | +minutes_asleep | +minutes_awake | +minutes_to_fall_asleep | +minutes_in_bed | +is_main_sleep | +type | +
---|---|---|---|---|---|---|---|---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-10 15:36:30 | +2020-10-10 16:37:00 | +92 | +0 | +55 | +5 | +0 | +60 | +0 | +classic | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-10 01:46:30 | +2020-10-10 08:10:00 | +88 | +0 | +318 | +65 | +0 | +383 | +1 | +stages | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-11 00:12:30 | +2020-10-11 11:47:00 | +89 | +1 | +562 | +132 | +0 | +694 | +1 | +stages | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-12 01:31:00 | +2020-10-12 09:34:30 | +93 | +0 | +415 | +68 | +0 | +483 | +1 | +stages | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +local_date_time | +
DEVICE_ID | +device_id | +
TYPE_EPISODE_ID | +type_episode_id | +
DURATION | +duration | +
IS_MAIN_SLEEP | +is_main_sleep | +
TYPE | +type | +
LEVEL | +level | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
Fitbit API has two versions for sleep data, v1 and v1.2, we support both.
+device_id | +type_episode_id | +local_date_time | +duration | +level | +is_main_sleep | +type | +
---|---|---|---|---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +0 | +2020-10-10 15:36:30 | +60 | +restless | +0 | +classic | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +0 | +2020-10-10 15:37:30 | +660 | +asleep | +0 | +classic | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +0 | +2020-10-10 15:48:30 | +60 | +restless | +0 | +classic | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +… | +… | +… | +… | +… | +… | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +1 | +2020-10-10 01:46:30 | +420 | +light | +1 | +stages | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +1 | +2020-10-10 01:53:30 | +1230 | +deep | +1 | +stages | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
LOCAL_DATE_TIME | +local_date_time | +
STEPS | +steps | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
device_id | +local_date_time | +steps | +
---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 | +1775 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-08 | +3201 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-09 | +998 | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
LOCAL_DATE_TIME | +local_date_time | +
STEPS | +steps | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
device_id | +local_date_time | +steps | +
---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:00:00 | +5 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:01:00 | +3 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:02:00 | +0 | +
This is a description of the format RAPIDS needs to process data for the following Empatica sensors.
+RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
DOUBLE_VALUES_0 | +x axis of acceleration | +
DOUBLE_VALUES_1 | +y axis of acceleration | +
DOUBLE_VALUES_2 | +z axis of acceleration | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) | +
DEVICE_ID | +A string that uniquely identifies a device | +
HEARTRATE | +Intraday heartrate | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) | +
DEVICE_ID | +A string that uniquely identifies a device | +
TEMPERATURE | +temperature | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) | +
DEVICE_ID | +A string that uniquely identifies a device | +
ELECTRODERMAL_ACTIVITY | +electrical conductance | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) | +
DEVICE_ID | +A string that uniquely identifies a device | +
BLOOD_VOLUME_PULSE | +blood volume pulse | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) | +
DEVICE_ID | +A string that uniquely identifies a device | +
INTER_BEAT_INTERVAL | +inter beat interval | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) | +
DEVICE_ID | +A string that uniquely identifies a device | +
TAGS | +tags | +
This is a description of the format RAPIDS needs to process data for the following Fitbit sensors.
+RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) | +
LOCAL_DATE_TIME | +Date time string with format yyyy-mm-dd hh:mm:ss |
+
DEVICE_ID | +A string that uniquely identifies a device | +
HEARTRATE_DAILY_RESTINGHR | +Daily resting heartrate | +
HEARTRATE_DAILY_CALORIESOUTOFRANGE | +Calories spent while heartrate was oustide a heartrate zone | +
HEARTRATE_DAILY_CALORIESFATBURN | +Calories spent while heartrate was inside the fat burn zone | +
HEARTRATE_DAILY_CALORIESCARDIO | +Calories spent while heartrate was inside the cardio zone | +
HEARTRATE_DAILY_CALORIESPEAK | +Calories spent while heartrate was inside the peak zone | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) | +
LOCAL_DATE_TIME | +Date time string with format yyyy-mm-dd hh:mm:ss |
+
DEVICE_ID | +A string that uniquely identifies a device | +
HEARTRATE | +Intraday heartrate | +
HEARTRATE_ZONE | +Heartrate zone that HEARTRATE belongs to. It is based on the heartrate zone ranges of each device | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) | +
LOCAL_DATE_TIME | +Date time string with format yyyy-mm-dd hh:mm:ss , this either is a copy of LOCAL_START_DATE_TIME or LOCAL_END_DATE_TIME depending on which column is used to assign an episode to a specific day |
+
LOCAL_START_DATE_TIME | +Date time string with format yyyy-mm-dd hh:mm:ss representing the start of a daily sleep episode |
+
LOCAL_END_DATE_TIME | +Date time string with format yyyy-mm-dd hh:mm:ss representing the end of a daily sleep episode |
+
DEVICE_ID | +A string that uniquely identifies a device | +
EFFICIENCY | +Sleep efficiency computed by fitbit as time asleep / (total time in bed - time to fall asleep) | +
MINUTES_AFTER_WAKEUP | +Minutes the participant spent in bed after waking up | +
MINUTES_ASLEEP | +Minutes the participant was asleep | +
MINUTES_AWAKE | +Minutes the participant was awake | +
MINUTES_TO_FALL_ASLEEP | +Minutes the participant spent in bed before falling asleep | +
MINUTES_IN_BED | +Minutes the participant spent in bed across the sleep episode | +
IS_MAIN_SLEEP | +0 if this episode is a nap, or 1 if it is a main sleep episode | +
TYPE | +stages or classic sleep data | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) | +
LOCAL_DATE_TIME | +Date time string with format yyyy-mm-dd hh:mm:ss , this either is a copy of LOCAL_START_DATE_TIME or LOCAL_END_DATE_TIME depending on which column is used to assign an episode to a specific day |
+
DEVICE_ID | +A string that uniquely identifies a device | +
TYPE_EPISODE_ID | +An id for each unique main or nap episode. Main and nap episodes have different levels, each row in this table is one of such levels, so multiple rows can have the same TYPE_EPISODE_ID | +
DURATION | +Duration of the episode level in minutes | +
IS_MAIN_SLEEP | +0 if this episode level belongs to a nap, or 1 if it belongs to a main sleep episode | +
TYPE | +type of level: stages or classic sleep data | +
LEVEL | +For stages levels one of wake , deep , light , or rem . For classic levels one of awake , restless , and asleep |
+
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) | +
LOCAL_DATE_TIME | +Date time string with format yyyy-mm-dd hh:mm:ss |
+
DEVICE_ID | +A string that uniquely identifies a device | +
STEPS | +Daily step count | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) | +
LOCAL_DATE_TIME | +Date time string with format yyyy-mm-dd hh:mm:ss |
+
DEVICE_ID | +A string that uniquely identifies a device | +
STEPS | +Intraday step count (usually every minute) | +
This is a description of the format RAPIDS needs to process data for the following PHONE sensors.
+See examples in the CSV files inside rapids_example_csv.zip
+RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
DOUBLE_VALUES_0 | +x axis of acceleration | +
DOUBLE_VALUES_1 | +y axis of acceleration | +
DOUBLE_VALUES_2 | +z axis of acceleration | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
ACTIVITY_NAME | +An string that denotes current activity name: in_vehicle , on_bicycle , on_foot , still , unknown , tilting , walking or running |
+
ACTIVITY_TYPE | +An integer (ranged from 0 to 8) that denotes current activity type | +
CONFIDENCE | +An integer (ranged from 0 to 100) that denotes the prediction accuracy | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
PACKAGE_NAME | +Application’s package name | +
APPLICATION_NAME | +Application’s localized name | +
APPLICATION_VERSION | +Application’s version code | +
ERROR_SHORT | +Short description of the error | +
ERROR_LONG | +More verbose version of the error description | +
ERROR_CONDITION | +1 = code error; 2 = non-responsive (ANR error) | +
IS_SYSTEM_APP | +Device’s pre-installed application | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
PACKAGE_NAME | +Application’s package name | +
APPLICATION_NAME | +Application’s localized name | +
IS_SYSTEM_APP | +Device’s pre-installed application | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
PACKAGE_NAME | +Application’s package name | +
APPLICATION_NAME | +Application’s localized name | +
TEXT | +Notification’s header text, not the content | +
SOUND | +Notification’s sound source (if applicable) | +
VIBRATE | +Notification’s vibration pattern (if applicable) | +
DEFAULTS | +If notification was delivered according to device’s default settings | +
FLAGS | +An integer that denotes Android notification flag | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
BATTERY_STATUS | +An integer that denotes battery status: 0 or 1 = unknown, 2 = charging, 3 = discharging, 4 = not charging, 5 = full | +
BATTERY_LEVEL | +An integer that denotes battery level, between 0 and BATTERY_SCALE |
+
BATTERY_SCALE | +An integer that denotes the maximum battery level | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
BT_ADDRESS | +MAC address of the device’s Bluetooth sensor | +
BT_NAME | +User assigned name of the device’s Bluetooth sensor | +
BT_RSSI | +The RSSI dB to the scanned device | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
CALL_TYPE | +An integer that denotes call type: 1 = incoming, 2 = outgoing, 3 = missed | +
CALL_DURATION | +Length of the call session | +
TRACE | +SHA-1 one-way source/target of the call | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
DOUBLE_ENERGY | +A number that denotes the amplitude of an audio sample (L2-norm of the audio frame) | +
INFERENCE | +An integer (ranged from 0 to 3) that denotes the type of an audio sample: 0 = silence, 1 = noise, 2 = voice, 3 = unknown | +
DOUBLE_CONVO_START | +UNIX timestamp (13 digits) of the beginning of a conversation | +
DOUBLE_CONVO_END | +UNIX timestamp (13 digits) of the end of a conversation | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
PACKAGE_NAME | +The application’s package name of keyboard interaction | +
BEFORE_TEXT | +The previous keyboard input (empty if password) | +
CURRENT_TEXT | +The current keyboard input (empty if password) | +
IS_PASSWORD | +An integer: 0 = not password; 1 = password | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
DOUBLE_LIGHT_LUX | +The ambient luminance in lux units | +
ACCURACY | +An integer that denotes the sensor’s accuracy level: 3 = maximum accuracy, 2 = medium accuracy, 1 = low accuracy | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
DOUBLE_LATITUDE | +The location’s latitude, in degrees | +
DOUBLE_LONGITUDE | +The location’s longitude, in degrees | +
DOUBLE_BEARING | +The location’s bearing, in degrees | +
DOUBLE_SPEED | +The speed if available, in meters/second over ground | +
DOUBLE_ALTITUDE | +The altitude if available, in meters above sea level | +
PROVIDER | +A string that denotes the provider: gps , fused or network |
+
ACCURACY | +The estimated location accuracy | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
LOG_MESSAGE | +A string that denotes log message | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
MESSAGE_TYPE | +An integer that denotes message type: 1 = received, 2 = sent | +
TRACE | +SHA-1 one-way source/target of the message | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
SCREEN_STATUS | +An integer that denotes screen status: 0 = off, 1 = on, 2 = locked, 3 = unlocked | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
MAC_ADDRESS | +Device’s MAC address | +
SSID | +Currently connected access point network name | +
BSSID | +Currently connected access point MAC address | +
RAPIDS column | +Description | +
---|---|
TIMESTAMP | +An UNIX timestamp (13 digits) when a row of data was logged | +
DEVICE_ID | +A string that uniquely identifies a device | +
SSID | +Detected access point network name | +
BSSID | +Detected access point MAC address | +
SECURITY | +Active security protocols | +
FREQUENCY | +Wi-Fi band frequency (e.g., 2427, 5180), in Hz | +
RSSI | +RSSI dB to the scanned device | +
We use mkdocs with the material theme to write these docs. Whenever you make any changes, just push them back to the repo and the documentation will be deployed automatically.
+pip install mkdocs
pip install mkdocs-material
Run the following command in RAPIDS root folder and go to http://127.0.0.1:8000:
+mkdocs serve
+
The documentation config file is /mkdocs.yml
, if you are adding new .md
files to the docs modify the nav
attribute at the bottom of that file. You can use the hierarchy there to find all the files that appear in the documentation.
Check this page to get familiar with the different visual elements we can use in the docs (admonitions, code blocks, tables, etc.) You can also refer to /docs/setup/installation.md
and /docs/setup/configuration.md
to see practical examples of these elements.
Hint
+Any links to internal pages should be relative to the current page. For example, any link from this page (documentation) which is inside ./developers
should begin with ../
to go one folder level up like:
+
[mylink](../setup/installation.md)
+
You can insert emojis using this syntax :[SOURCE]-[ICON_NAME]
from the following sources:
You can use this page to create markdown tables more easily
+ + + + + + +We use the develop/master
variation of the OneFlow git flow
We use feature (topic) branches to implement new features
+You are an internal developer if you have writing permissions to the repository.
+Most feature branches are never pushed to the repo, only do so if you expect that its development will take days (to avoid losing your work if you computer is damaged). Otherwise follow the following instructions to locally rebase your feature branch into develop
and push those rebased changes online.
Starting your feature branch
+git checkout develop
+git pull
+
git checkout -b feature/feature1
+
docs/change-log.md
)git add modified-file1 modified-file2
+git commit -m "Add my new feature" # use a concise description
+
Merging back your feature branch
+If your changes took time to be implemented it is possible that there are new commits in our develop
branch, so we need to rebase your feature branch.
Fetch the latest changes to develop +
git fetch origin develop
+
Rebase your feature branch +
git checkout feature/feature1
+git rebase -i develop
+
Integrate your new feature to develop
+
git checkout develop
+git merge --no-ff feature/feature1 # (use the default merge message)
+git push origin develop
+git branch -d feature/feature1
+
You are an external developer if you do NOT have writing permissions to the repository.
+Starting your feature branch
+git checkout develop
+
git checkout -b feature/external-test
+
git add modified-file1 modified-file2
+git commit -m "Add my new feature" # use a concise description
+
Merging back your feature branch
+If your changes took time to be implemented, it is possible that there are new commits in our develop
branch, so we need to rebase your feature branch.
Add our repo as another remote
+
git remote add upstream https://github.com/carissalow/rapids/
+
Fetch the latest changes to develop +
git fetch upstream develop
+
Rebase your feature branch +
git checkout feature/external-test
+git rebase -i develop
+
Push your feature branch online +
git push --set-upstream origin feature/external-test
+
Open a pull request to the develop
branch using Github’s GUI
git checkout develop
+git pull
+
git describe --abbrev=0 --tags # Bump the release (0.1.0 to 0.2.0 => NEW_HOTFIX)
+git checkout -b release/v[NEW_RELEASE] develop
+
git tag v[NEW_RELEASE]
+
git checkout develop
+git merge release/v[NEW_RELEASE]
+git push --tags origin develop
+git branch -d release/v[NEW_RELEASE]
+
git checkout master
+git merge --ff-only develop
+git push
+
v[NEW_RELEASE]
(remember to add the change log)git checkout master
+git pull
+
git describe --abbrev=0 --tags # Bump the hotfix (0.1.0 to 0.1.1 => NEW_HOTFIX)
+git checkout -b hotfix/v[NEW_HOTFIX] master
+
git tag v[NEW_HOTFIX]
+git checkout develop
+git merge hotfix/v[NEW_HOTFIX]
+git push --tags origin develop
+git branch -d hotfix/v[NEW_HOTFIX]
+
git checkout master
+git merge --ff-only v[NEW_HOTFIX]
+git push
+
v[NEW_HOTFIX]
(remember to add the change log)We use the Live Share extension of Visual Studio Code to debug bugs when sharing data or database credentials is not possible.
+Terminal > New terminal
Press Ctrl+P or Cmd+P and run this command:
+>live share: start collaboration session
+
Along with the continued development and the addition of new sensors and features to the RAPIDS pipeline, tests for the currently available sensors and features are being implemented. Since this is a Work In Progress this page will be updated with the list of sensors and features for which testing is available. For each of the sensors listed a description of the data used for testing (test cases) are outline. Currently for all intent and testing purposes the tests/data/raw/test01/
contains all the test data files for testing android data formats and tests/data/raw/test02/
contains all the test data files for testing iOS data formats. It follows that the expected (verified output) are contained in the tests/data/processed/test01/
and tests/data/processed/test02/
for Android and iOS respectively. tests/data/raw/test03/
and tests/data/raw/test04/
contain data files for testing empty raw data files for android and iOS respectively.
The following is a list of the sensors that testing is currently available.
+Sensor | +Provider | +Periodic | +Frequency | +Event | +
---|---|---|---|---|
Phone Accelerometer | +Panda | +N | +N | +N | +
Phone Accelerometer | +RAPIDS | +N | +N | +N | +
Phone Activity Recognition | +RAPIDS | +N | +N | +N | +
Phone Applications Foreground | +RAPIDS | +N | +N | +N | +
Phone Battery | +RAPIDS | +Y | +Y | +N | +
Phone Bluetooth | +Doryab | +N | +N | +N | +
Phone Bluetooth | +RAPIDS | +Y | +Y | +Y | +
Phone Calls | +RAPIDS | +Y | +Y | +N | +
Phone Conversation | +RAPIDS | +Y | +Y | +N | +
Phone Data Yield | +RAPIDS | +N | +N | +N | +
Phone Light | +RAPIDS | +Y | +Y | +N | +
Phone Locations | +Doryab | +N | +N | +N | +
Phone Locations | +Barnett | +N | +N | +N | +
Phone Messages | +RAPIDS | +Y | +Y | +N | +
Phone Screen | +RAPIDS | +Y | +N | +N | +
Phone WiFi Connected | +RAPIDS | +Y | +Y | +N | +
Phone WiFi Visible | +RAPIDS | +Y | +Y | +N | +
Fitbit Data Yield | +RAPIDS | +N | +N | +N | +
Fitbit Heart Rate Summary | +RAPIDS | +N | +N | +N | +
Fitbit Heart Rate Intraday | +RAPIDS | +N | +N | +N | +
Fitbit Sleep Summary | +RAPIDS | +N | +N | +N | +
Fitbit Steps Summary | +RAPIDS | +N | +N | +N | +
Fitbit Steps Intraday | +RAPIDS | +N | +N | +N | +
epoch
.epoch
(currently morning
and evening
)message_types
+ (i.e. recieved
and sent
) in both days in all epochs. The
+ number records with each message_types
per epoch is randomly
+ distributed There is at least one records with each
+ message_types
per epoch.Due to the difference in the format of the raw call data for iOS and Android the following is the expected results the calls_with_datetime_unified.csv
. This would give a better idea of the use cases being tested since the calls_with_datetime_unified.csv
would make both the iOS and Android data comparable.
epoch
.epoch
(currently morning
and evening
)call_types
(i.e.
+ incoming
, outgoing
and missed
) in both days in all epochs.
+ The number records with each of the call_types
per epoch is
+ randomly distributed. There is at least one records with each
+ call_types
per epoch.Due to the difference in the format of the raw screen data for iOS and Android the following is the expected results the screen_deltas.csv
. This would give a better idea of the use cases being tested since the screen_eltas.csv
would make both the iOS and Android data comparable These files are used to calculate the features for the screen sensor
unlock
+ episode that falls within an epoch
for every epoch
.unlock
+ episode that falls across the boundary of 2 epochs. Namely the
+ unlock
episode starts in one epoch and ends in the next, thus
+ there is a record for unlock
episodes that fall across night
+ to morning
, morning
to afternoon
and finally afternoon
to
+ night
unlock
episode_type.Due to the difference in the format of the raw battery data for iOS and Android as well as versions of iOS the following is the expected results the battery_deltas.csv
. This would give a better idea of the use cases being tested since the battery_deltas.csv
would make both the iOS and Android data comparable. These files are used to calculate the features for the battery sensor.
charging
and
+ discharging
episode that falls within an epoch
for every
+ epoch
. Thus, for the daily
epoch there would be multiple
+ charging
and discharging
episodescharging
episode or a discharging
episode and
+ not both can occur across epochs, in order to test episodes that
+ occur across epochs alternating episodes of charging
and
+ discharging
episodes that fall across night
to morning
,
+ morning
to afternoon
and finally afternoon
to night
are
+ present in the battery delta data. This starts with a
+ discharging
episode that begins in night
and end in morning
.epoch
. Each epoch
has a record with a timestamp
for the
+ beginning boundary for that epoch
and a record with a
+ timestamp
for the ending boundary for that epoch
. (e.g. For
+ the morning
epoch there is a record with a timestamp
for
+ 6:00AM
and another record with a timestamp
for 11:59:59AM
.
+ These are to test edge cases)wifi_raw.csv
and sensor_wifi_raw.csv
)
+ for each fake participant for each phone platform. sensor_wifi_raw.csv
data contains at least 2 records for
+ each epoch
. Each epoch
has a record with a timestamp
for the
+ beginning boundary for that epoch
and a record with a
+ timestamp
for the ending boundary for that epoch
. (e.g. For
+ the morning
epoch there is a record with a timestamp
for
+ 6:00AM
and another record with a timestamp
for 11:59:59AM
.
+ These are to test edge cases)wifi_raw.csv
data contains 3 records with random timestamps
+ for each epoch
to represent visible broadcasting WIFI network.
+ This file is empty for the iOS phone testing data.sensor_wifi_raw.csv
and
+ wifi_raw.csv
.epoch
+ except night
. The single row of data for night
is for testing
+ features for single values inputs. (Example testing the standard
+ deviation of one input value)epoch
. The records for each epoch
contains apps that
+ are randomly selected from a list of apps that are from the
+ MULTIPLE_CATEGORIES
and SINGLE_CATEGORIES
(See
+ testing_config.yaml). There are also records in each epoch
+ that have apps randomly selected from a list of apps that are from
+ the EXCLUDED_CATEGORIES
and EXCLUDED_APPS
. This is to test
+ that these apps are actually being excluded from the calculations
+ of features. There are also records to test SINGLE_APPS
+ calculations.epoch
period contains
+ rows that records 2 - 5 different activity_types
. The is such
+ that durations of activities can be tested. Additionally, there
+ are records that mimic the duration of an activity over the time
+ boundary of neighboring epochs. (For example, there a set of
+ records that mimic the participant in_vehicle
from afternoon
+ into evening
)datatypes
(i.e. voice/noise
= 0
, and conversation
= 2
)
+ as well as rows with for samples of each of the inference
values
+ (i.e. silence
= 0
, noise
= 1
, voice
= 2
, and unknown
+ = 3
) for each epoch
. The different datatype
and inference
+ records are randomly distributed throughout the epoch
.datatype
+ = 2, and inference
= -1) in each epoch
and for each epoch
+ except night, there is a conversation record that has a
+ double_convo_start
timestamp
that is from the previous
+ epoch
. This is to test the calculations of features across
+ epochs
.plugin_studentlife_audio_android_raw.csv
and
+ plugin_studentlife_audio_raw.csv
respectively).Description
+Checklist
+time segment | +single tz | +multi tz | +platform | +
---|---|---|---|
30min | +OK | +OK | +fitbit | +
morning | +OK | +OK | +fitbit | +
daily | +OK | +OK | +fitbit | +
threeday | +OK | +OK | +fitbit | +
weekend | +OK | +OK | +fitbit | +
beforeMarchEvent | +OK | +OK | +fitbit | +
beforeNovemberEvent | +OK | +OK | +fitbit | +
The following is a simple guide to run RAPIDS’ tests. All files necessary for testing are stored in the ./tests/
directory
tests/script/assign_test_timestamps.py
to fit Fri March 6th 2020 - Mon March 9th 2020
and Fri Oct 30th 2020 - Mon Nov 2nd 2020
. We test daylight saving times with these dates.pids
: android
, ios
, fitbit
, empatica
, empty
). The data device_id
should be equal to the pid
.frequency
, periodic
, and event
time segments in a single
time zone, and the same three in multiple
time zones.30min,30
)morning,06:00:00,5H 59M 59S,every_day,0
)daily,00:00:00,23H 59M 59S,every_day,0
)threeday,00:00:00,71H 59M 59S,every_day,0
)weekend,00:00:00,71H 59M 59S,wday,5
)beforeMarchEvent,1583625600000,22H,3H,-1,android
)beforeNovemberEvent,1604185200000,22H,3H,-1,android
)America/New_York
+0
1583600400000
(Sat Mar 07 2020 12:00:00 EST)1583683200000
(Sun Mar 08 2020 12:00:00 EST)1604160000000
(Sat Oct 31 2020 12:00:00 EST)1604250000000
(Sun Nov 01 2020 12:00:00 EST)docs/developers/test-cases.md
under the corresponding sensor. Description
and the Checklist
tests/script/assign_test_timestamps.py
handles this)Description
is a list and every item describes the different scenarios your test data is covering. For example, if we are testing PHONE_BATTERY:
- We test 24 discharge episodes, 24 charge episodes and 2 episodes with a 0 discharge rate
+- One episode is shorter than 30 minutes (`start timestamp` to `end timestamp`)
+- One episode is 120 minutes long from 11:00 to 13:00 (`start timestamp` to `end timestamp`). This one covers the case when an episode has to be chunked for 30min frequency segments and for morning segments
+- One episode is 60 minutes long from 23:30 to 00:30 (`start timestamp` to `end timestamp`). This one covers the case when an episode has to be chunked for 30min frequency segments and for daly segments (overnight)
+- One 0 discharge rate episode 10 minutes long that happens within a 30-minute segment (10:00 to 10:29) (`start timestamp` to `end timestamp`)
+- Three discharge episodes that happen between during beforeMarchEvent (start/end timestamps of those discharge episodes)
+- Three charge episodes that happen between during beforeMarchEvent (start/end timestamps of those charge episodes)
+- One discharge episode that happen between 00:30 and 04:00 to test for daylight saving times in March and Novemeber 2020.
+- ... any other test corner cases you can think of
+
Describe your test cases in as much detail as possible so in the future if we find a bug in RAPIDS, we know what test case we did not include and should add.
+Checklist
is a table where you confirm you have verified the output of your dataset for the different time segments and time zones
time segment | +single tz | +multi tz | +platform | +
---|---|---|---|
30min | +OK | +OK | +android and iOS | +
morning | +OK | +OK | +android and iOS | +
daily | +OK | +OK | +android and iOS | +
threeday | +OK | +OK | +android and iOS | +
weekend | +OK | +OK | +android and iOS | +
beforeMarchEvent | +OK | +OK | +android and iOS | +
beforeNovemberEvent | +OK | +OK | +android and iOS | +
tests/data/manual/aware_csv/SENSOR_raw.csv
. Create the CSV if it does not exist.test_time
replaces timestamp
. To make your life easier, you can place a test data row in time using the test_time
column with the following format: Day HH:MM:SS.XXX
, for example Fri 22:54:30.597
.You can convert your manual test data to actual raw test data with the following commands:
+For the selected files: (It could be a single file name or multiple file names separated by whitespace(s)) +
python tests/script/assign_test_timestamps.py -f file_name_1 file_name_2
+
For all files under the tests/data/manual/aware_csv
folder:
+
python tests/script/assign_test_timestamps.py -a
+
The script assign_test_timestamps.py
converts you test_time
column into a timestamp
. For example, Fri 22:54:30.597
is converted to 1583553270597
(Fri Mar 06 2020 22:54:30 GMT-0500
) and to 1604112870597
(Fri Oct 30 2020 22:54:30 GMT-0400
). Note you can include milliseconds.
device_id
should be the same as pid
.The test_time
column will be automatically converted to a timestamp that fits our testing periods in March and November by tests/script/assign_test_timestamps.py
test_time,device_id,battery_level,battery_scale,battery_status
+Fri 01:00:00.000,ios,90,100,4
+Fri 01:00:30.500,ios,89,100,4
+Fri 01:01:00.000,ios,80,100,4
+Fri 01:01:45.500,ios,79,100,4
+...
+Sat 08:00:00.000,ios,78,100,4
+Sat 08:01:00.000,ios,50,100,4
+Sat 08:02:00.000,ios,49,100,4
+
tests/data/processed/features/{type_of_time_segment}/{pid}/device_sensor.csv
+
+# this example is expected output data for battery tests for periodic segments in a single timezone
+tests/data/processed/features/stz_periodic/android/phone_sensor.csv
+
+# this example is expected output data for battery tests for periodic segments in multi timezones
+tests/data/processed/features/mtz_periodic/android/phone_sensor.csv
+
[SENSOR][PROVIDER][COMPUTE]
to TRUE
in the config.yaml
of the time segments and time zones you are testing:
+- tests/settings/stz_frequency_config.yaml # For single-timezone frequency time segments
+- tests/settings/stz_periodic_config.yaml # For single-timezone periodic time segments
+- tests/settings/stz_event_config.yaml # For single-timezone event time segments
+
+- tests/settings/mtz_frequency_config.yaml # For multi-timezone frequency time segments
+- tests/settings/mtz_periodic_config.yaml # For multi-timezone periodic time segments
+- tests/settings/mtz_event_config.yaml # For multi-timezone event time segments
+
bash tests/scripts/run_tests.sh -t all
+
bash tests/scripts/run_tests.sh -t stz_frequency -a both # swap stz_frequency for mtz_frequency, stz_event, mtz_event, etc
+
bash tests/scripts/run_tests.sh -t stz_frequency -a run
+bash tests/scripts/run_tests.sh -t stz_frequency -a test
+
This bash script tests/scripts/run_tests.sh
executes one or all test pipelines for different time segment types (frequency
, periodic
, and events
) and single or multiple timezones.
The python script tests/scripts/run_tests.py
runs the tests. It parses the involved participants and active sensor providers in the config.yaml
file of the time segment type and time zone being tested. We test that the output file we expect exists and that its content matches the expected values.
The following is a snippet of the output you should see after running your test.
+test_sensors_files_exist (test_sensor_features.TestSensorFeatures) ... stz_periodic
+ok
+test_sensors_features_calculations (test_sensor_features.TestSensorFeatures) ... stz_periodic
+ok
+
+test_sensors_files_exist (test_sensor_features.TestSensorFeatures) ... stz_frequency
+ok
+test_sensors_features_calculations (test_sensor_features.TestSensorFeatures) ... stz_frequency
+FAIL
+
The results above show that the for stz_periodic, both test_sensors_files_exist
and test_sensors_features_calculations
passed. While for stz_frequency, the first test test_sensors_files_exist
passed while test_sensors_features_calculations
failed. Additionally, you should get the traceback of the failure (not shown here).
config.yaml
¶Why do we need to validate the config.yaml
?
Most of the key/values in the config.yaml
are constrained to a set of possible values or types. For example [TIME_SEGMENTS][TYPE]
can only be one of ["FREQUENCY", "PERIODIC", "EVENT"]
, and [TIMEZONE]
has to be a string.
We should show the user an error if that’s not the case. We could validate this in Python or R but since we reuse scripts and keys in multiple places, tracking these validations can be time consuming and get out of control. Thus, we do these validations through a schema and check that schema before RAPIDS starts processing any data so the user can see the error right away.
+Keep in mind these validations can only cover certain base cases. Some validations that require more complex logic should still be done in the respective script. For example, we can check that a CSV file path actually ends in .csv
but we can only check that the file actually exists in a Python script.
The structure and values of the config.yaml
file are validated using a YAML schema stored in tools/config.schema.yaml
. Each key in config.yaml
, for example PIDS
, has a corresponding entry in the schema where we can validate its type, possible values, required properties, min and max values, among other things.
The config.yaml
is validated against the schema every time RAPIDS runs (see the top of the Snakefile
):
validate(config, "tools/config.schema.yaml")
+
The schema has three main sections required
, definitions
, and properties
. All of them are just nested key/value YAML pairs, where the value can be a primitive type (integer
, string
, boolean
, number
) or can be another key/value pair (object
).
required
lists properties
that should be present in the config.yaml
. We will almost always add every config.yaml
key to this list (meaning that the user cannot delete any of those keys like TIMEZONE
or PIDS
).
definitions
lists key/values that are common to different properties
so we can reuse them. You can define a key/value under definitions
and use $ref
to refer to it in any property
.
For example, every sensor like [PHONE_ACCELEROMETER]
has one or more providers like RAPIDS
and PANDA
, these providers have some common properties like the COMPUTE
flag or the SRC_SCRIPT
string. Therefore we define a shared provider “template” that is used by every provider and extended with properties exclusive to each one of them. For example:
The PROVIDER
definition will be used later on different properties
.
PROVIDER:
+ type: object
+ required: [COMPUTE, SRC_SCRIPT, FEATURES]
+ properties:
+ COMPUTE:
+ type: boolean
+ FEATURES:
+ type: [array, object]
+ SRC_SCRIPT:
+ type: string
+ pattern: "^.*\\.(py|R)$"
+
Notice that RAPIDS
(a provider) uses and extends the PROVIDER
template in this example. The FEATURES
key is overriding the FEATURES
key from the #/definitions/PROVIDER
template but is keeping the validation for COMPUTE
, and SRC_SCRIPT
. For more details about reusing properties, go to this link
PHONE_ACCELEROMETER:
+ type: object
+ # .. other properties
+ PROVIDERS:
+ type: ["null", object]
+ properties:
+ RAPIDS:
+ allOf:
+ - $ref: "#/definitions/PROVIDER"
+ - properties:
+ FEATURES:
+ type: array
+ uniqueItems: True
+ items:
+ type: string
+ enum: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
+
properties
are nested key/values that describe the different components of our config.yaml
file. Values can be of one or more primitive types like string
, number
, array
, boolean
and null
. Values can also be another key/value pair (of type object
) that are similar to a dictionary in Python.
For example, the following property validates the PIDS
of our config.yaml
. It checks that PIDS
is an array
with unique items of type string
.
PIDS:
+ type: array
+ uniqueItems: True
+ items:
+ type: string
+
Validating the config.yaml
during development
If you updated the schema and want to check the config.yaml
is compliant, you can run the command snakemake --list-params-changes
. You will see Building DAG of jobs...
if there are no problems or an error message otherwise (try setting any COMPUTE
flag to a string like test
instead of False/True
).
You can use this command without having to configure RAPIDS to process any participants or sensors.
+You can validate different aspects of each key/value in our config.yaml
file:
Including min and max values +
MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS:
+ type: number
+ minimum: 0
+ maximum: 1
+
+FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD:
+ type: integer
+ exclusiveMinimum: 0
+
Including valid values (enum
)
+
items:
+ type: string
+ enum: ["count", "maxlux", "minlux", "avglux", "medianlux", "stdlux"]
+
MINUTES_DATA_USED:
+ type: boolean
+
Including whether or not it should have unique values, the type of the array’s elements (strings
, numbers
) and valid values (enum
).
+
MESSAGES_TYPES:
+ type: array
+ uniqueItems: True
+ items:
+ type: string
+ enum: ["received", "sent"]
+
PARENT
is an object that has two properties. KID1
is one of those properties that are, in turn, another object that will reuse the "#/definitions/PROVIDER"
definition
AND also include (extend) two extra properties GRAND_KID1
of type array
and GRAND_KID2
of type number
. KID2
is another property of PARENT
of type boolean
.
The schema validation looks like this +
PARENT:
+ type: object
+ properties:
+ KID1:
+ allOf:
+ - $ref: "#/definitions/PROVIDER"
+ - properties:
+ GRAND_KID1:
+ type: array
+ uniqueItems: True
+ GRAND_KID2:
+ type: number
+ KID2:
+ type: boolean
+
The config.yaml
key that the previous schema validates looks like this:
+
PARENT:
+ KID1:
+ # These four come from the `PROVIDER` definition (template)
+ COMPUTE: False
+ FEATURES: [x, y] # an array
+ SRC_SCRIPT: "a path to a py or R script"
+
+ # This two come from the extension
+ GRAND_KID1: [a, b] # an array
+ GRAND_KID2: 5.1 # an number
+ KID2: True # a boolean
+
We recommend that before you start modifying the schema you modify the config.yaml
key that you want to validate with an invalid value. For example, if you want to validate that COMPUTE
is boolean, you set COMPUTE: 123
. Then create your validation, run snakemake --list-params-changes
and make sure your validation fails (123 is not boolean
), and then set the key to the correct value. In other words, make sure it’s broken first so that you know that your validation works.
Warning
+Be careful. You can check that the schema config.schema.yaml
has a valid format by running python tools/check_schema.py
. You will see this message if its structure is correct: Schema is OK
. However, we don’t have a way to detect typos, for example allOf
will work but allOF
won’t (capital F
) and it won’t show any error. That’s why we recommend to start with an invalid key/value in your config.yaml
so that you can be sure the schema validation finds the problem.
Read the following links to learn more about what we can validate with schemas. They are based on JSON
instead of YAML
schemas but the same concepts apply.
Try to install any new package using conda install -c CHANNEL PACKAGE_NAME
(you can use pip
if the package is only available there). Make sure your Python virtual environment is active (conda activate YOUR_ENV
).
Uninstall packages using the same manager you used to install them conda remove PACKAGE_NAME
or pip uninstall PACKAGE_NAME
Make sure your Python virtual environment is active (conda activate YOUR_ENV
), then run
+
conda update --all
+
environment.yaml
¶After installing or removing a package you can use the following command in your terminal to update your environment.yaml
before publishing your pipeline. Note that we ignore the package version for libfortran
and mkl
to keep compatibility with Linux:
+
conda env export --no-builds | sed 's/^.*libgfortran.*$/ - libgfortran/' | sed 's/^.*mkl=.*$/ - mkl/' > environment.yml
+
R
to open an R interactive sessionrenv::install("PACKAGE_NAME")
R
to open an R interactive sessionrenv::remove("PACKAGE_NAME")
R
to open an R interactive sessionrenv::update()
renv.lock
¶After installing or removing a package you can use the following command in your terminal to update your renv.lock
before publishing your pipeline.
R
to open an R interactive sessionrenv::snapshot()
(renv will ask you to confirm any updates to this file)Hint
+You can add new features to any existing sensors (see list below) by adding a new provider in three steps:
+config.yaml
file As a tutorial, we will add a new provider for PHONE_ACCELEROMETER
called VEGA
that extracts feature1
, feature2
, feature3
with a Python script that requires a parameter from the user called MY_PARAMETER
.
An existing sensor of any device with a configuration entry in config.yaml
:
Smartphone (AWARE)
+Fitbit
+Empatica
+config.yaml
file¶In this step, you need to add your provider configuration section under the relevant sensor in config.yaml
. See our example for our tutorial’s VEGA
provider for PHONE_ACCELEROMETER
:
VEGA
PHONE_ACCELEROMETER:
+ CONTAINER: accelerometer
+ PROVIDERS:
+ RAPIDS: # this is a feature provider
+ COMPUTE: False
+ ...
+
+ PANDA: # this is another feature provider
+ COMPUTE: False
+ ...
+
+ VEGA: # this is our new feature provider
+ COMPUTE: False
+ FEATURES: ["feature1", "feature2", "feature3"]
+ MY_PARAMTER: a_string
+ SRC_SCRIPT: src/features/phone_accelerometer/vega/main.py
+
Key | +Description | +
---|---|
[COMPUTE] |
+Flag to activate/deactivate your provider | +
[FEATURES] |
+List of features your provider supports. Your provider code should only return the features on this list | +
[MY_PARAMTER] |
+An arbitrary parameter that our example provider VEGA needs. This can be a boolean, integer, float, string, or an array of any of such types. |
+
[SRC_SCRIPT] |
+The relative path from RAPIDS’ root folder to an script that computes the features for this provider. It can be implemented in R or Python. | +
Create your feature Python or R script called main.py
or main.R
in the correct folder, src/feature/[sensorname]/[providername]/
. RAPIDS automatically loads and executes it based on the config key [SRC_SCRIPT]
you added in the last step. For our example, this script is:
+
src/feature/phone_accelerometer/vega/main.py
+
Every feature script (main.[py|R]
) needs a [providername]_features
function with specific parameters. RAPIDS calls this function with the sensor data ready to process and with other functions and arguments you will need.
def [providername]_features(sensor_data_files, time_segment, provider, filter_data_by_segment, *args, **kwargs):
+ # empty for now
+ return(your_features_df)
+
[providername]_features <- function(sensor_data, time_segment, provider){
+ # empty for now
+ return(your_features_df)
+}
+
Parameter | +Description | +
---|---|
sensor_data_files |
+Path to the CSV file containing the data of a single participant. This data has been cleaned and preprocessed. Your function will be automatically called for each participant in your study (in the [PIDS] array in config.yaml ) |
+
time_segment |
+The label of the time segment that should be processed. | +
provider |
+The parameters you configured for your provider in config.yaml will be available in this variable as a dictionary in Python or a list in R. In our example this dictionary contains {MY_PARAMETER:"a_string"} |
+
filter_data_by_segment |
+Python only. A function that you will use to filter your data. In R this function is already available in the environment. | +
*args |
+Python only. Not used for now | +
**kwargs |
+Python only. Not used for now | +
The next step is to implement the code that computes your behavioral features in your provider script’s function. As with any other script, this function can call other auxiliary methods, but in general terms, it should have three stages:
+sensor_data_files
acc_data = pd.read_csv(sensor_data_files["sensor_data"])
+
Note that the phone’s battery, screen, and activity recognition data are given as episodes instead of event rows (for example, start and end timestamps of the periods the phone screen was on)
+time_segment
This step is only one line of code, but keep reading to understand why we need it. +
acc_data = filter_data_by_segment(acc_data, time_segment)
+
You should use the filter_data_by_segment()
function to process and group those rows that belong to each of the time segments RAPIDS could be configured with.
Let’s understand the filter_data_by_segment()
function with an example. A RAPIDS user can extract features on any arbitrary time segment. A time segment is a period that has a label and one or more instances. For example, the user (or you) could have requested features on a daily, weekly, and weekend basis for p01
. The labels are arbitrary, and the instances depend on the days a participant was monitored for:
my_days
and if p01
was monitored for 14 days, it would have 14 instancesmy_weeks
and if p01
was monitored for 14 days, it would have 2 instances.my_weekends
and if p01
was monitored for 14 days, it would have 2 instances.For this example, RAPIDS will call your provider function three times for p01
, once where time_segment
is my_days
, once where time_segment
is my_weeks
, and once where time_segment
is my_weekends
. In this example, not every row in p01
‘s data needs to take part in the feature computation for either segment and the rows need to be grouped differently.
Thus filter_data_by_segment()
comes in handy, it will return a data frame that contains the rows that were logged during a time segment plus an extra column called local_segment
. This new column will have as many unique values as time segment instances exist (14, 2, and 2 for our p01
‘s my_days
, my_weeks
, and my_weekends
examples). After filtering, you should group the data frame by this column and compute any desired features, for example:
acc_features["maxmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].max()
+
The reason RAPIDS does not filter the participant’s data set for you is because your code might need to compute something based on a participant’s complete dataset before computing their features. For example, you might want to identify the number that called a participant the most throughout the study before computing a feature with the number of calls the participant received from that number.
+After filtering, grouping your data, and computing your features, your provider function should return a data frame that has:
+p01
‘s my_days
example)local_segment
column added by filter_data_by_segment()
feature1
) by convention. RAPIDS automatically adds the correct sensor and provider prefix; in our example, this prefix is phone_accelerometr_vega_
.PHONE_ACCELEROMETER
Provider ExampleFor your reference, this our own provider (RAPIDS
) for PHONE_ACCELEROMETER
that computes five acceleration features
import pandas as pd
+import numpy as np
+
+def rapids_features(sensor_data_files, time_segment, provider, filter_data_by_segment, *args, **kwargs):
+
+ acc_data = pd.read_csv(sensor_data_files["sensor_data"])
+ requested_features = provider["FEATURES"]
+ # name of the features this function can compute
+ base_features_names = ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
+ # the subset of requested features this function can compute
+ features_to_compute = list(set(requested_features) & set(base_features_names))
+
+ acc_features = pd.DataFrame(columns=["local_segment"] + features_to_compute)
+ if not acc_data.empty:
+ acc_data = filter_data_by_segment(acc_data, time_segment)
+
+ if not acc_data.empty:
+ acc_features = pd.DataFrame()
+ # get magnitude related features: magnitude = sqrt(x^2+y^2+z^2)
+ magnitude = acc_data.apply(lambda row: np.sqrt(row["double_values_0"] ** 2 + row["double_values_1"] ** 2 + row["double_values_2"] ** 2), axis=1)
+ acc_data = acc_data.assign(magnitude = magnitude.values)
+
+ if "maxmagnitude" in features_to_compute:
+ acc_features["maxmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].max()
+ if "minmagnitude" in features_to_compute:
+ acc_features["minmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].min()
+ if "avgmagnitude" in features_to_compute:
+ acc_features["avgmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].mean()
+ if "medianmagnitude" in features_to_compute:
+ acc_features["medianmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].median()
+ if "stdmagnitude" in features_to_compute:
+ acc_features["stdmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].std()
+
+ acc_features = acc_features.reset_index()
+
+ return acc_features
+
If you want to add features for a device or a sensor that we do not support at the moment (those that do not appear in the "Existing Sensors"
list above), contact us or request it on Slack and we can add the necessary code so you can follow the instructions above.
Sensor parameters description for [EMPATICA_ACCELEROMETER]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Name of the CSV file containing accelerometer data that is compressed inside an Empatica zip file. Since these zip files are created automatically by Empatica, there is no need to change the value of this attribute. | +
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/empatica_accelerometer_raw.csv
+- data/raw/{pid}/empatica_accelerometer_with_datetime.csv
+- data/interim/{pid}/empatica_accelerometer_features/empatica_accelerometer_{language}_{provider_key}.csv
+- data/processed/features/{pid}/empatica_accelerometer.csv
+
Parameters description for [EMPATICA_ACCELEROMETER][PROVIDERS][DBDP]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract EMPATICA_ACCELEROMETER features from the DBDP provider |
+
[FEATURES] |
+Features to be computed, see table below | +
Features description for [EMPATICA_ACCELEROMETER][PROVIDERS][RAPDBDPIDS]
:
Feature | +Units | +Description | +
---|---|---|
maxmagnitude | +m/s2 | +The maximum magnitude of acceleration (\(\|acceleration\| = \sqrt{x^2 + y^2 + z^2}\)). | +
minmagnitude | +m/s2 | +The minimum magnitude of acceleration. | +
avgmagnitude | +m/s2 | +The average magnitude of acceleration. | +
medianmagnitude | +m/s2 | +The median magnitude of acceleration. | +
stdmagnitude | +m/s2 | +The standard deviation of acceleration. | +
Assumptions/Observations
+Sensor parameters description for [EMPATICA_BLOOD_VOLUME_PULSE]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Name of the CSV file containing blood volume pulse data that is compressed inside an Empatica zip file. Since these zip files are created automatically by Empatica, there is no need to change the value of this attribute. | +
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/empatica_blood_volume_pulse_raw.csv
+- data/raw/{pid}/empatica_blood_volume_pulse_with_datetime.csv
+- data/interim/{pid}/empatica_blood_volume_pulse_features/empatica_blood_volume_pulse_{language}_{provider_key}.csv
+- data/processed/features/{pid}/empatica_blood_volume_pulse.csv
+
Parameters description for [EMPATICA_BLOOD_VOLUME_PULSE][PROVIDERS][DBDP]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract EMPATICA_BLOOD_VOLUME_PULSE features from the DBDP provider |
+
[FEATURES] |
+Features to be computed from blood volume pulse intraday data, see table below | +
Features description for [EMPATICA_BLOOD_VOLUME_PULSE][PROVIDERS][DBDP]
:
Feature | +Units | +Description | +
---|---|---|
maxbvp | +- | +The maximum blood volume pulse during a time segment. | +
minbvp | +- | +The minimum blood volume pulse during a time segment. | +
avgbvp | +- | +The average blood volume pulse during a time segment. | +
medianbvp | +- | +The median of blood volume pulse during a time segment. | +
modebvp | +- | +The mode of blood volume pulse during a time segment. | +
stdbvp | +- | +The standard deviation of blood volume pulse during a time segment. | +
diffmaxmodebvp | +- | +The difference between the maximum and mode blood volume pulse during a time segment. | +
diffminmodebvp | +- | +The difference between the mode and minimum blood volume pulse during a time segment. | +
entropybvp | +nats | +Shannon’s entropy measurement based on blood volume pulse during a time segment. | +
Assumptions/Observations
+For more information about BVP read this.
+Sensor parameters description for [EMPATICA_ELECTRODERMAL_ACTIVITY]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Name of the CSV file containing electrodermal activity data that is compressed inside an Empatica zip file. Since these zip files are created automatically by Empatica, there is no need to change the value of this attribute. | +
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/empatica_electrodermal_activity_raw.csv
+- data/raw/{pid}/empatica_electrodermal_activity_with_datetime.csv
+- data/interim/{pid}/empatica_electrodermal_activity_features/empatica_electrodermal activity_{language}_{provider_key}.csv
+- data/processed/features/{pid}/empatica_electrodermal_activity.csv
+
Parameters description for [EMPATICA_ELECTRODERMAL_ACTIVITY][PROVIDERS][DBDP]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract EMPATICA_ELECTRODERMAL_ACTIVITY features from the DBDP provider |
+
[FEATURES] |
+Features to be computed from electrodermal activity intraday data, see table below | +
Features description for [EMPATICA_ELECTRODERMAL ACTIVITY][PROVIDERS][DBDP]
:
Feature | +Units | +Description | +
---|---|---|
maxeda | +microsiemens | +The maximum electrical conductance during a time segment. | +
mineda | +microsiemens | +The minimum electrical conductance during a time segment. | +
avgeda | +microsiemens | +The average electrical conductance during a time segment. | +
medianeda | +microsiemens | +The median of electrical conductance during a time segment. | +
modeeda | +microsiemens | +The mode of electrical conductance during a time segment. | +
stdeda | +microsiemens | +The standard deviation of electrical conductance during a time segment. | +
diffmaxmodeeda | +microsiemens | +The difference between the maximum and mode electrical conductance during a time segment. | +
diffminmodeeda | +microsiemens | +The difference between the mode and minimum electrical conductance during a time segment. | +
entropyeda | +nats | +Shannon’s entropy measurement based on electrical conductance during a time segment. | +
Assumptions/Observations
+None
+Sensor parameters description for [EMPATICA_HEARTRATE]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Name of the CSV file containing heart rate data that is compressed inside an Empatica zip file. Since these zip files are created automatically by Empatica, there is no need to change the value of this attribute. | +
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/empatica_heartrate_raw.csv
+- data/raw/{pid}/empatica_heartrate_with_datetime.csv
+- data/interim/{pid}/empatica_heartrate_features/empatica_heartrate_{language}_{provider_key}.csv
+- data/processed/features/{pid}/empatica_heartrate.csv
+
Parameters description for [EMPATICA_HEARTRATE][PROVIDERS][DBDP]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract EMPATICA_HEARTRATE features from the DBDP provider |
+
[FEATURES] |
+Features to be computed from heart rate intraday data, see table below | +
Features description for [EMPATICA_HEARTRATE][PROVIDERS][DBDP]
:
Feature | +Units | +Description | +
---|---|---|
maxhr | +beats | +The maximum heart rate during a time segment. | +
minhr | +beats | +The minimum heart rate during a time segment. | +
avghr | +beats | +The average heart rate during a time segment. | +
medianhr | +beats | +The median of heart rate during a time segment. | +
modehr | +beats | +The mode of heart rate during a time segment. | +
stdhr | +beats | +The standard deviation of heart rate during a time segment. | +
diffmaxmodehr | +beats | +The difference between the maximum and mode heart rate during a time segment. | +
diffminmodehr | +beats | +The difference between the mode and minimum heart rate during a time segment. | +
entropyhr | +nats | +Shannon’s entropy measurement based on heart rate during a time segment. | +
Assumptions/Observations
+We extract the previous features based on the average heart rate values computed in 10-second windows.
+Sensor parameters description for [EMPATICA_INTER_BEAT_INTERVAL]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Name of the CSV file containing inter beat interval data that is compressed inside an Empatica zip file. Since these zip files are created automatically by Empatica, there is no need to change the value of this attribute. | +
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/empatica_inter_beat_interval_raw.csv
+- data/raw/{pid}/empatica_inter_beat_interval_with_datetime.csv
+- data/interim/{pid}/empatica_inter_beat_interval_features/empatica_inter_beat_interval_{language}_{provider_key}.csv
+- data/processed/features/{pid}/empatica_inter_beat_interval.csv
+
Parameters description for [EMPATICA_INTER_BEAT_INTERVAL][PROVIDERS][DBDP]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract EMPATICA_INTER_BEAT_INTERVAL features from the DBDP provider |
+
[FEATURES] |
+Features to be computed from inter beat interval intraday data, see table below | +
Features description for [EMPATICA_INTER_BEAT_INTERVAL][PROVIDERS][DBDP]
:
Feature | +Units | +Description | +
---|---|---|
maxibi | +seconds | +The maximum inter beat interval during a time segment. | +
minibi | +seconds | +The minimum inter beat interval during a time segment. | +
avgibi | +seconds | +The average inter beat interval during a time segment. | +
medianibi | +seconds | +The median of inter beat interval during a time segment. | +
modeibi | +seconds | +The mode of inter beat interval during a time segment. | +
stdibi | +seconds | +The standard deviation of inter beat interval during a time segment. | +
diffmaxmodeibi | +seconds | +The difference between the maximum and mode inter beat interval during a time segment. | +
diffminmodeibi | +seconds | +The difference between the mode and minimum inter beat interval during a time segment. | +
entropyibi | +nats | +Shannon’s entropy measurement based on inter beat interval during a time segment. | +
Assumptions/Observations
+For more information about IBI read this.
+Sensor parameters description for [EMPATICA_TAGS]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Name of the CSV file containing tags data that is compressed inside an Empatica zip file. Since these zip files are created automatically by Empatica, there is no need to change the value of this attribute. | +
Note
+Sensor parameters description for [EMPATICA_TEMPERATURE]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Name of the CSV file containing temperature data that is compressed inside an Empatica zip file. Since these zip files are created automatically by Empatica, there is no need to change the value of this attribute. | +
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/empatica_temperature_raw.csv
+- data/raw/{pid}/empatica_temperature_with_datetime.csv
+- data/interim/{pid}/empatica_temperature_features/empatica_temperature_{language}_{provider_key}.csv
+- data/processed/features/{pid}/empatica_temperature.csv
+
Parameters description for [EMPATICA_TEMPERATURE][PROVIDERS][DBDP]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract EMPATICA_TEMPERATURE features from the DBDP provider |
+
[FEATURES] |
+Features to be computed from temperature intraday data, see table below | +
Features description for [EMPATICA_TEMPERATURE][PROVIDERS][DBDP]
:
Feature | +Units | +Description | +
---|---|---|
maxtemp | +degrees C | +The maximum temperature during a time segment. | +
mintemp | +degrees C | +The minimum temperature during a time segment. | +
avgtemp | +degrees C | +The average temperature during a time segment. | +
mediantemp | +degrees C | +The median of temperature during a time segment. | +
modetemp | +degrees C | +The mode of temperature during a time segment. | +
stdtemp | +degrees C | +The standard deviation of temperature during a time segment. | +
diffmaxmodetemp | +degrees C | +The difference between the maximum and mode temperature during a time segment. | +
diffminmodetemp | +degrees C | +The difference between the mode and minimum temperature during a time segment. | +
entropytemp | +nats | +Shannon’s entropy measurement based on temperature during a time segment. | +
Assumptions/Observations
+None
+A behavioral feature is a metric computed from raw sensor data quantifying the behavior of a participant. For example, the time spent at home computed based on location data. These are also known as digital biomarkers.
+RAPIDS’ config.yaml
has a section for each supported device/sensor (e.g., PHONE_ACCELEROMETER
, FITBIT_STEPS
, EMPATICA_HEARTRATE
). These sections follow a similar structure, and they can have one or more feature PROVIDERS
, that compute one or more behavioral features. You will modify the parameters of these PROVIDERS
to obtain features from different mobile sensors. We’ll use PHONE_ACCELEROMETER
as an example to explain this further.
Hint
+/data/processed/features
on files per sensor, per participant and per study (all participants).[COMPUTE]
flag to TRUE
, configure any of its parameters, and execute RAPIDS.Each sensor section follows the same structure. Click on the numbered markers to know more.
+PHONE_ACCELEROMETER: # (1)
+
+ CONTAINER: accelerometer # (2)
+
+ PROVIDERS: # (3)
+ RAPIDS:
+ COMPUTE: False # (4)
+ FEATURES: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
+
+ SRC_SCRIPT: src/features/phone_accelerometer/rapids/main.py
+
+ PANDA:
+ COMPUTE: False
+ VALID_SENSED_MINUTES: False
+ FEATURES: # (5)
+ exertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
+ nonexertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
+
+ # (6)
+ SRC_SCRIPT: src/features/phone_accelerometer/panda/main.py
+
Sensor section
+Each sensor (accelerometer, screen, etc.) of every supported device (smartphone, Fitbit, etc.) has a section in the config.yaml
with parameters
and feature PROVIDERS
.
Sensor Parameters.
+Each sensor section has one or more parameters. These are parameters that affect different aspects of how the raw data is pulled, and processed.
+The CONTAINER
parameter exists for every sensor, but some sensors will have extra parameters like [PHONE_LOCATIONS]
.
We explain these parameters in a table at the top of each sensor documentation page.
+Sensor Providers
+Each object in this list represents a feature PROVIDER
. Each sensor can have zero, one, or more providers.
A PROVIDER
is a script that creates behavioral features for a specific sensor. Providers are created by the core RAPIDS team or by the community, which are named after its first author like [PHONE_LOCATIONS][DORYAB].
In this example, there are two accelerometer feature providers RAPIDS
and PANDA
.
PROVIDER
Parameters
Each PROVIDER
has parameters that affect the computation of the behavioral features it offers.
These parameters include at least a [COMPUTE]
flag that you switch to True
to extract a provider’s behavioral features.
We explain every provider’s parameter in a table under the Parameters description
heading on each provider documentation page.
PROVIDER
Features
Each PROVIDER
offers a set of behavioral features.
These features are grouped in an array for some providers, like those for RAPIDS
provider. For others, they are grouped in a collection of arrays, like those for PANDAS
provider.
In either case, you can delete the features you are not interested in, and they will not be included in the sensor’s output feature file.
+We explain each behavioral feature in a table under the Features description
heading on each provider documentation page.
PROVIDER
script
Each PROVIDER
has a SRC_SCRIPT
that points to the script implementing its behavioral features.
It has to be a relative path from RAPIDS’ root folder and the script’s parent folder should be named after the provider, e.g. panda
.
These are the descriptions of each marker for accessibility:
+Sensor section
+Each sensor (accelerometer, screen, etc.) of every supported device (smartphone, Fitbit, etc.) has a section in the config.yaml
with parameters
and feature PROVIDERS
.
Sensor Parameters.
+Each sensor section has one or more parameters. These are parameters that affect different aspects of how the raw data is pulled, and processed.
+The CONTAINER
parameter exists for every sensor, but some sensors will have extra parameters like [PHONE_LOCATIONS]
.
We explain these parameters in a table at the top of each sensor documentation page.
+Sensor Providers
+Each object in this list represents a feature PROVIDER
. Each sensor can have zero, one, or more providers.
A PROVIDER
is a script that creates behavioral features for a specific sensor. Providers are created by the core RAPIDS team or by the community, which are named after its first author like [PHONE_LOCATIONS][DORYAB].
In this example, there are two accelerometer feature providers RAPIDS
and PANDA
.
PROVIDER
Parameters
Each PROVIDER
has parameters that affect the computation of the behavioral features it offers.
These parameters include at least a [COMPUTE]
flag that you switch to True
to extract a provider’s behavioral features.
We explain every provider’s parameter in a table under the Parameters description
heading on each provider documentation page.
PROVIDER
Features
Each PROVIDER
offers a set of behavioral features.
These features are grouped in an array for some providers, like those for RAPIDS
provider. For others, they are grouped in a collection of arrays, like those for PANDAS
provider.
In either case, you can delete the features you are not interested in, and they will not be included in the sensor’s output feature file.
+We explain each behavioral feature in a table under the Features description
heading on each provider documentation page.
PROVIDER
script
Each PROVIDER
has a SRC_SCRIPT
that points to the script implementing its behavioral features.
It has to be a relative path from RAPIDS’ root folder and the script’s parent folder should be named after the provider, e.g. panda
.
Sensor parameters description for [FITBIT_CALORIES_INTRADAY]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Container where your calories intraday data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. | +
Available time segments
+File Sequence
+- data/raw/{pid}/fitbit_calories_intraday_raw.csv
+- data/raw/{pid}/fitbit_calories_intraday_with_datetime.csv
+- data/interim/{pid}/fitbit_calories_intraday_features/fitbit_calories_intraday_{language}_{provider_key}.csv
+- data/processed/features/{pid}/fitbit_calories_intraday.csv
+
Parameters description for [FITBIT_CALORIES_INTRADAY][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract FITBIT_CALORIES_INTRADAY features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed from calories intraday data, see table below | +
[EPISODE_TYPE] |
+RAPIDS will compute features for any episodes in this list. There are seven types of episodes defined as consecutive appearances of a label. Four are based on the activity level labels provided by Fitbit: sedentary , lightly active , fairly active , and very active . One is defined by RAPIDS as moderate to vigorous physical activity MVPA episodes that are based on all fairly active , and very active labels. Two are defined by the user based on a threshold that divides low or high MET (metabolic equivalent) episodes. |
+
EPISODE_TIME_THRESHOLD |
+Any consecutive rows of the same [EPISODE_TYPE] will be considered a single episode if the time difference between them is less or equal than this threshold in minutes |
+
[EPISODE_MET_THRESHOLD] |
+Any 1-minute calorie data chunk with a MET value equal or higher than this threshold will be considered a high MET episode and low MET otherwise. The default value is 3 | +
[EPISODE_MVPA_CATEGORIES] |
+The Fitbit level labels that are considered part of a moderate to vigorous physical activity episode. One or more of sedentary , lightly active , fairly active , and very active . The default are fairly active and very active |
+
[EPISODE_REFERENCE_TIME] |
+Reference time for the start/end time features. MIDNIGHT sets the reference time to 00:00 of each day, START_OF_THE_SEGMENT sets the reference time to the start of the time segment (useful when a segment is shorter than a day or spans multiple days) |
+
Features description for [FITBIT_CALORIES_INTRADAY][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
starttimefirstepisodeEPISODE_TYPE |
+minutes | +Start time of the first episode of type [EPISODE_TYPE] |
+
endtimefirstepisodeEPISODE_TYPE |
+minutes | +End time of the first episode of type [EPISODE_TYPE] |
+
starttimelastepisodeEPISODE_TYPE |
+minutes | +Start time of the last episode of type [EPISODE_TYPE] |
+
endtimelastepisodeEPISODE_TYPE |
+minutes | +End time of the last episode of type [EPISODE_TYPE] |
+
starttimelongestepisodeEPISODE_TYPE |
+minutes | +Start time of the longest episode of type [EPISODE_TYPE] |
+
endtimelongestepisodeEPISODE_TYPE |
+minutes | +End time of the longest episode of type [EPISODE_TYPE] |
+
countepisodeEPISODE_TYPE |
+episodes | +The number of episodes of type [EPISODE_TYPE] |
+
sumdurationepisodeEPISODE_TYPE |
+minutes | +The sum of the duration of episodes of type [EPISODE_TYPE] |
+
avgdurationepisodeEPISODE_TYPE |
+minutes | +The average of the duration of episodes of type [EPISODE_TYPE] |
+
maxdurationepisodeEPISODE_TYPE |
+minutes | +The maximum of the duration of episodes of type [EPISODE_TYPE] |
+
mindurationepisodeEPISODE_TYPE |
+minutes | +The minimum of the duration of episodes of type [EPISODE_TYPE] |
+
stddurationepisodeEPISODE_TYPE |
+minutes | +The standard deviation of the duration of episodes of type [EPISODE_TYPE] |
+
summetEPISODE_TYPE |
+METs | +The sum of all METs during episodes of type [EPISODE_TYPE] |
+
avgmetEPISODE_TYPE |
+METs | +The average of all METs during episodes of type [EPISODE_TYPE] |
+
maxmetEPISODE_TYPE |
+METs | +The maximum of all METs during episodes of type [EPISODE_TYPE] |
+
minmetEPISODE_TYPE |
+METs | +The minimum of all METs during episodes of type [EPISODE_TYPE] |
+
stdmetEPISODE_TYPE |
+METs | +The standard deviation of all METs during episodes of type [EPISODE_TYPE] |
+
sumcaloriesEPISODE_TYPE |
+calories | +The sum of all calories during episodes of type [EPISODE_TYPE] |
+
avgcaloriesEPISODE_TYPE |
+calories | +The average of all calories during episodes of type [EPISODE_TYPE] |
+
maxcaloriesEPISODE_TYPE |
+calories | +The maximum of all calories during episodes of type [EPISODE_TYPE] |
+
mincaloriesEPISODE_TYPE |
+calories | +The minimum of all calories during episodes of type [EPISODE_TYPE] |
+
stdcaloriesEPISODE_TYPE |
+calories | +The standard deviation of all calories during episodes of type [EPISODE_TYPE] |
+
Assumptions/Observations
+We use Fitbit heart rate intraday data to extract data yield features. Fitbit data yield features can be used to remove rows (time segments) that do not contain enough Fitbit data. You should decide what is your “enough” threshold depending on the time a participant was supposed to be wearing their Fitbit, the length of your study, and the rates of missing data that your analysis could handle.
+Why is Fitbit data yield important?
+Imagine that you want to extract FITBIT_STEPS_SUMMARY
features on daily segments (00:00
to 23:59
). Let’s say that on day 1 the Fitbit logged 6k as the total step count and the heart rate sensor logged 24 hours of data and on day 2 the Fitbit logged 101 as the total step count and the heart rate sensor logged 2 hours of data. It’s very likely that on day 2 you walked during the other 22 hours so including this day in your analysis could bias your results.
Sensor parameters description for [FITBIT_DATA_YIELD]
:
Key | +Description | +
---|---|
[SENSORS] |
+The Fitbit sensor we considered for calculating the Fitbit data yield features. We only support FITBIT_HEARTRATE_INTRADAY since sleep data is commonly collected only overnight, and step counts are 0 even when not wearing the Fitbit device. |
+
Before explaining the data yield features, let’s define the following relevant concepts:
+[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/fitbit_heartrate_intraday_raw.csv
+- data/raw/{pid}/fitbit_heartrate_intraday_with_datetime.csv
+- data/interim/{pid}/fitbit_data_yield_features/fitbit_data_yield_{language}_{provider_key}.csv
+- data/processed/features/{pid}/fitbit_data_yield.csv
+
Parameters description for [FITBIT_DATA_YIELD][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract FITBIT_DATA_YIELD features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed, see table below | +
[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS] |
+The proportion [0.0 ,1.0] of valid minutes in a 60-minute window necessary to flag that window as valid. |
+
Features description for [FITBIT_DATA_YIELD][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
ratiovalidyieldedminutes | +- | +The ratio between the number of valid minutes and the duration in minutes of a time segment. | +
ratiovalidyieldedhours | +- | +The ratio between the number of valid hours and the duration in hours of a time segment. If the time segment is shorter than 1 hour this feature will always be 1. | +
Assumptions/Observations
+We recommend using ratiovalidyieldedminutes
on time segments that are shorter than two or three hours and ratiovalidyieldedhours
for longer segments. This is because relying on yielded minutes only can be misleading when a big chunk of those missing minutes are clustered together.
For example, let’s assume we are working with a 24-hour time segment that is missing 12 hours of data. Two extreme cases can occur:
+ratiovalidyieldedminutes
would be 0.5 for both a
and b
(hinting the missing circumstances are similar). However, ratiovalidyieldedhours
would be 0.5 for a
and 1.0 for b
if [MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]
is between [0.0 and 0.49] (hinting that the missing circumstances might be more favorable for b
. In other words, sensed data for b
is more evenly spread compared to a
.
We assume your Fitbit intraday data was sampled (requested form the Fitbit API) at 1 minute intervals, if the interval is longer, for example 15 minutes, you need to take into account that valid minutes and valid hours ratios are going to be small (for example you would have at most 4 “minutes” of data per hour because you would have four 15-minute windows) and so you should adjust your thresholds to include and exclude rows accordingly. If you are in this situation, get in touch with us, we could implement this use case but we are not sure there is enough demand for it at the moment since you can control the sampling rate of the data you request from Fitbit API.
+Sensor parameters description for [FITBIT_HEARTRATE_INTRADAY]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Container where your heart rate intraday data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. | +
Available time segments
+File Sequence
+- data/raw/{pid}/fitbit_heartrate_intraday_raw.csv
+- data/raw/{pid}/fitbit_heartrate_intraday_with_datetime.csv
+- data/interim/{pid}/fitbit_heartrate_intraday_features/fitbit_heartrate_intraday_{language}_{provider_key}.csv
+- data/processed/features/{pid}/fitbit_heartrate_intraday.csv
+
Parameters description for [FITBIT_HEARTRATE_INTRADAY][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract FITBIT_HEARTRATE_INTRADAY features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed from heart rate intraday data, see table below | +
Features description for [FITBIT_HEARTRATE_INTRADAY][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
maxhr | +beats/mins | +The maximum heart rate during a time segment. | +
minhr | +beats/mins | +The minimum heart rate during a time segment. | +
avghr | +beats/mins | +The average heart rate during a time segment. | +
medianhr | +beats/mins | +The median of heart rate during a time segment. | +
modehr | +beats/mins | +The mode of heart rate during a time segment. | +
stdhr | +beats/mins | +The standard deviation of heart rate during a time segment. | +
diffmaxmodehr | +beats/mins | +The difference between the maximum and mode heart rate during a time segment. | +
diffminmodehr | +beats/mins | +The difference between the mode and minimum heart rate during a time segment. | +
entropyhr | +nats | +Shannon’s entropy measurement based on heart rate during a time segment. | +
minutesonZONE | +minutes | +Number of minutes the user’s heart rate fell within each heartrate_zone during a time segment. |
+
Assumptions/Observations
+outofrange
, fatburn
, cardio
, and peak
. Please refer to Fitbit documentation for more information about the way they are computed.Sensor parameters description for [FITBIT_HEARTRATE_SUMMARY]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Container where your heart rate summary data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. | +
Available time segments
+File Sequence
+- data/raw/{pid}/fitbit_heartrate_summary_raw.csv
+- data/raw/{pid}/fitbit_heartrate_summary_with_datetime.csv
+- data/interim/{pid}/fitbit_heartrate_summary_features/fitbit_heartrate_summary_{language}_{provider_key}.csv
+- data/processed/features/{pid}/fitbit_heartrate_summary.csv
+
Parameters description for [FITBIT_HEARTRATE_SUMMARY][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract FITBIT_HEARTRATE_SUMMARY features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed from heart rate summary data, see table below | +
Features description for [FITBIT_HEARTRATE_SUMMARY][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
maxrestinghr | +beats/mins | +The maximum daily resting heart rate during a time segment. | +
minrestinghr | +beats/mins | +The minimum daily resting heart rate during a time segment. | +
avgrestinghr | +beats/mins | +The average daily resting heart rate during a time segment. | +
medianrestinghr | +beats/mins | +The median of daily resting heart rate during a time segment. | +
moderestinghr | +beats/mins | +The mode of daily resting heart rate during a time segment. | +
stdrestinghr | +beats/mins | +The standard deviation of daily resting heart rate during a time segment. | +
diffmaxmoderestinghr | +beats/mins | +The difference between the maximum and mode daily resting heart rate during a time segment. | +
diffminmoderestinghr | +beats/mins | +The difference between the mode and minimum daily resting heart rate during a time segment. | +
entropyrestinghr | +nats | +Shannon’s entropy measurement based on daily resting heart rate during a time segment. | +
sumcaloriesZONE | +cals | +The total daily calories burned within heartrate_zone during a time segment. |
+
maxcaloriesZONE | +cals | +The maximum daily calories burned within heartrate_zone during a time segment. |
+
mincaloriesZONE | +cals | +The minimum daily calories burned within heartrate_zone during a time segment. |
+
avgcaloriesZONE | +cals | +The average daily calories burned within heartrate_zone during a time segment. |
+
mediancaloriesZONE | +cals | +The median of daily calories burned within heartrate_zone during a time segment. |
+
stdcaloriesZONE | +cals | +The standard deviation of daily calories burned within heartrate_zone during a time segment. |
+
entropycaloriesZONE | +nats | +Shannon’s entropy measurement based on daily calories burned within heartrate_zone during a time segment. |
+
Assumptions/Observations
+There are four heart rate zones (ZONE): outofrange
, fatburn
, cardio
, and peak
. Please refer to Fitbit documentation for more information about the way they are computed.
Calories’ accuracy depends on the users’ Fitbit profile (weight, height, etc.).
+Sensor parameters description for [FITBIT_SLEEP_INTRADAY]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Container where your sleep intraday data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. | +
Available time segments
+File Sequence
+- data/raw/{pid}/fitbit_sleep_intraday_raw.csv
+- data/raw/{pid}/fitbit_sleep_intraday_with_datetime.csv
+- data/interim/{pid}/fitbit_sleep_intraday_episodes.csv
+- data/interim/{pid}/fitbit_sleep_intraday_episodes_resampled.csv
+- data/interim/{pid}/fitbit_sleep_intraday_episodes_resampled_with_datetime.csv
+- data/interim/{pid}/fitbit_sleep_intraday_features/fitbit_sleep_intraday_{language}_{provider_key}.csv
+- data/processed/features/{pid}/fitbit_sleep_intraday.csv
+
Parameters description for [FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract FITBIT_SLEEP_INTRADAY features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed from sleep intraday data, see table below | +
[SLEEP_LEVELS] |
+Fitbit’s sleep API Version 1 only provides CLASSIC records. However, Version 1.2 provides 2 types of records: CLASSIC and STAGES . STAGES is only available in devices with a heart rate sensor and even those devices will fail to report it if the battery is low or the device is not tight enough. While CLASSIC contains 3 sleep levels (awake , restless , and asleep ), STAGES contains 4 sleep levels (wake , deep , light , rem ). To make it consistent, RAPIDS grouped them into 2 UNIFIED sleep levels: awake (CLASSIC : awake and restless ; STAGES : wake ) and asleep (CLASSIC : asleep ; STAGES : deep , light , and rem ). |
+
[SLEEP_TYPES] |
+Types of sleep to be included in the feature extraction computation. Fitbit provides 2 types of sleep: main , nap . |
+
[INCLUDE_SLEEP_LATER_THAN] |
+All resampled sleep rows (bin interval: one minute) that started after this time will be included in the feature computation. It is a number ranging from 0 (midnight) to 1439 (23:59) which denotes the number of minutes after midnight. If a segment is longer than one day, this value is for every day. | +
[REFERENCE_TIME] |
+The reference point from which the [ROUTINE] features are to be computed. Chosen from MIDNIGHT and START_OF_THE_SEGMENT , default is MIDNIGHT . If you have multiple time segments per day it might be more informative to set this flag to START_OF_THE_SEGMENT . |
+
Features description for [FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS][LEVELS_AND_TYPES]
:
Feature | +Units | +Description | +
---|---|---|
countepisode[LEVEL][TYPE] |
+episodes | +Number of [LEVEL][TYPE] sleep episodes. [LEVEL] is one of [SLEEP_LEVELS] (e.g. awake-classic or rem-stages) and [TYPE] is one of [SLEEP_TYPES] (e.g. main). Both [LEVEL] and [TYPE] can also be all when LEVELS_AND_TYPES_COMBINING_ALL is True, which ignores the levels and groups by sleep types. |
+
sumduration[LEVEL][TYPE] |
+minutes | +Total duration of all [LEVEL][TYPE] sleep episodes. [LEVEL] is one of [SLEEP_LEVELS] (e.g. awake-classic or rem-stages) and [TYPE] is one of [SLEEP_TYPES] (e.g. main). Both [LEVEL] and [TYPE] can also be all when LEVELS_AND_TYPES_COMBINING_ALL is True, which ignores the levels and groups by sleep types. |
+
maxduration[LEVEL][TYPE] |
+minutes | +Longest duration of any [LEVEL][TYPE] sleep episode. [LEVEL] is one of [SLEEP_LEVELS] (e.g. awake-classic or rem-stages) and [TYPE] is one of [SLEEP_TYPES] (e.g. main). Both [LEVEL] and [TYPE] can also be all when LEVELS_AND_TYPES_COMBINING_ALL is True, which ignores the levels and groups by sleep types. |
+
minduration[LEVEL][TYPE] |
+minutes | +Shortest duration of any [LEVEL][TYPE] sleep episode. [LEVEL] is one of [SLEEP_LEVELS] (e.g. awake-classic or rem-stages) and [TYPE] is one of [SLEEP_TYPES] (e.g. main). Both [LEVEL] and [TYPE] can also be all when LEVELS_AND_TYPES_COMBINING_ALL is True, which ignores the levels and groups by sleep types. |
+
avgduration[LEVEL][TYPE] |
+minutes | +Average duration of all [LEVEL][TYPE] sleep episodes. [LEVEL] is one of [SLEEP_LEVELS] (e.g. awake-classic or rem-stages) and [TYPE] is one of [SLEEP_TYPES] (e.g. main). Both [LEVEL] and [TYPE] can also be all when LEVELS_AND_TYPES_COMBINING_ALL is True, which ignores the levels and groups by sleep types. |
+
medianduration[LEVEL][TYPE] |
+minutes | +Median duration of all [LEVEL][TYPE] sleep episodes. [LEVEL] is one of [SLEEP_LEVELS] (e.g. awake-classic or rem-stages) and [TYPE] is one of [SLEEP_TYPES] (e.g. main). Both [LEVEL] and [TYPE] can also be all when LEVELS_AND_TYPES_COMBINING_ALL is True, which ignores the levels and groups by sleep types. |
+
stdduration[LEVEL][TYPE] |
+minutes | +Standard deviation duration of all [LEVEL][TYPE] sleep episodes. [LEVEL] is one of [SLEEP_LEVELS] (e.g. awake-classic or rem-stages) and [TYPE] is one of [SLEEP_TYPES] (e.g. main). Both [LEVEL] and [TYPE] can also be all when LEVELS_AND_TYPES_COMBINING_ALL is True, which ignores the levels and groups by sleep types. |
+
Features description for [FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS]
RATIOS [ACROSS_LEVELS]
:
Feature | +Units | +Description | +
---|---|---|
ratiocount[LEVEL] |
+- | +Ratio between the count of episodes of a single sleep [LEVEL] and the count of all episodes of all levels during both main and nap sleep types. This answers the question: what percentage of all wake , deep , light , and rem episodes were rem ? (e.g., \(countepisode[remstages][all] / countepisode[all][all]\)) |
+
ratioduration[LEVEL] |
+- | +Ratio between the duration of episodes of a single sleep [LEVEL] and the duration of all episodes of all levels during both main and nap sleep types. This answers the question: what percentage of all wake , deep , light , and rem time was rem ? (e.g., \(sumduration[remstages][all] / sumduration[all][all]\)) |
+
Features description for [FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS]
RATIOS [ACROSS_TYPES]
:
Feature | +Units | +Description | +
---|---|---|
ratiocountmain | +- | +Ratio between the count of all main episodes (independently of the levels inside) divided by the count of all main and nap episodes. This answers the question: what percentage of all sleep episodes (main and nap ) were main ? We do not provide the ratio for nap because is complementary. (\(countepisode[all][main] / countepisode[all][all]\)) |
+
ratiodurationmain | +- | +Ratio between the duration of all main episodes (independently of the levels inside) divided by the duration of all main and nap episodes. This answers the question: what percentage of all sleep time (main and nap ) was main ? We do not provide the ratio for nap because is complementary. (\(sumduration[all][main] / sumduration[all][all]\)) |
+
Features description for [FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS]
RATIOS [WITHIN_LEVELS]
:
Feature | +Units | +Description | +
---|---|---|
ratiocount[TYPE] within[LEVEL] |
+- | +Ratio between the count of episodes of a single sleep [LEVEL] during main sleep divided by the count of episodes of a single sleep [LEVEL] during main and nap . This answers the question: are rem episodes more frequent during main than nap sleep? We do not provide the ratio for nap because is complementary. (\(countepisode[remstages][main] / countepisode[remstages][all]\)) |
+
ratioduration[TYPE] within[LEVEL] |
+- | +Ratio between the duration of episodes of a single sleep [LEVEL] during main sleep divided by the duration of episodes of a single sleep [LEVEL] during main and nap . This answers the question: is rem time more frequent during main than nap sleep? We do not provide the ratio for nap because is complementary. (\(countepisode[remstages][main] / countepisode[remstages][all]\)) |
+
Features description for [FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS]
RATIOS [WITHIN_TYPES]
:
Feature | +Units | +Description | +
---|---|---|
ratiocount[LEVEL] within[TYPE] |
+- | +Ratio between the count of episodes of a single sleep [LEVEL] and the count of all episodes of all levels during either main or nap sleep types. This answers the question: what percentage of all wake , deep , light , and rem episodes were rem during main /nap sleep time? (e.g., \(countepisode[remstages][main] / countepisode[all][main]\)) |
+
ratioduration[LEVEL] within[TYPE] |
+- | +Ratio between the duration of episodes of a single sleep [LEVEL] and the duration of all episodes of all levels during either main or nap sleep types. This answers the question: what percentage of all wake , deep , light , and rem time was rem during main /nap sleep time? (e.g., \(sumduration[remstages][main] / sumduration[all][main]\)) |
+
Features description for [FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS][ROUTINE]
:
Feature | +Units | +Description | +
---|---|---|
starttimefirstmainsleep | +minutes | +Start time (in minutes since REFERENCE_TIME ) of the first main sleep episode after INCLUDE_EPISODES_LATER_THAN . |
+
endtimelastmainsleep | +minutes | +End time (in minutes since REFERENCE_TIME ) of the last main sleep episode after INCLUDE_EPISODES_LATER_THAN . |
+
starttimefirstnap | +minutes | +Start time (in minutes since REFERENCE_TIME ) of the first nap episode after INCLUDE_EPISODES_LATER_THAN . |
+
endtimelastnap | +minutes | +End time (in minutes since REFERENCE_TIME ) of the last nap episode after INCLUDE_EPISODES_LATER_THAN . |
+
Assumptions/Observations
+[SLEEP_LEVELS]
or [SLEEP_TYPES]
will only change the features you receive from [LEVELS_AND_TYPES]
. For example if STAGES
only contains [rem, light]
you will not receive countepisode[wake|deep][TYPE]
or sum, max, min, avg, median, or std duration
. These values will not influence RATIOS
or ROUTINE
features.[LEVEL]
grouping is done within the elements of each class CLASSIC
, STAGES
, and UNIFIED
. That is, we never combine CLASSIC
or STAGES
types to compute features when LEVELS_AND_TYPES_COMBINING_ALL
is True or when computing RATIOS
.Available time segments
+File Sequence
+- data/raw/{pid}/fitbit_sleep_intraday_raw.csv
+- data/raw/{pid}/fitbit_sleep_intraday_parsed.csv
+- data/interim/{pid}/fitbit_sleep_intraday_episodes_resampled.csv
+- data/interim/{pid}/fitbit_sleep_intraday_episodes_resampled_with_datetime.csv
+- data/interim/{pid}/fitbit_sleep_intraday_features/fitbit_sleep_intraday_{language}_{provider_key}.csv
+- data/processed/features/{pid}/fitbit_sleep_intraday.csv
+
Parameters description for [FITBIT_SLEEP_INTRADAY][PROVIDERS][PRICE]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract FITBIT_SLEEP_INTRADAY features from the PRICE provider |
+
[FEATURES] |
+Features to be computed from sleep intraday data, see table below | +
[SLEEP_LEVELS] |
+Fitbit’s sleep API Version 1 only provides CLASSIC records. However, Version 1.2 provides 2 types of records: CLASSIC and STAGES . STAGES is only available in devices with a heart rate sensor and even those devices will fail to report it if the battery is low or the device is not tight enough. While CLASSIC contains 3 sleep levels (awake , restless , and asleep ), STAGES contains 4 sleep levels (wake , deep , light , rem ). To make it consistent, RAPIDS grouped them into 2 UNIFIED sleep levels: awake (CLASSIC : awake and restless ; STAGES : wake ) and asleep (CLASSIC : asleep ; STAGES : deep , light , and rem ). |
+
[DAY_TYPE] |
+The features of this provider can be computed using daily averages/standard deviations that were extracted on WEEKEND days only, WEEK days only, or ALL days |
+
[GROUP_EPISODES_WITHIN] |
+This parameter contains 2 values: [START_TIME] and [LENGTH] . Only main sleep episodes that intersect or contain the period between [START_TIME , START_TIME + LENGTH ] are taken into account to compute the features described below. Both [START_TIME] and [LENGTH] are in minutes. [START_TIME] is a number ranging from 0 (midnight) to 1439 (23:59) which denotes the number of minutes after midnight. [LENGTH] is a number smaller than 1440 (24 hours). |
+
Features description for [FITBIT_SLEEP_INTRADAY][PROVIDERS][PRICE]
:
Feature | +Units | +Description | +
---|---|---|
avgduration[LEVEL] main[DAY_TYPE] |
+minutes | +Average duration of daily LEVEL sleep episodes. You can include daily average that were computed on weekend days, week days or both depending on the value of the DAY_TYPE flag. |
+
avgratioduration[LEVEL] withinmain[DAY_TYPE] |
+- | +Average ratio between daily LEVEL time and in-bed time inferred from main sleep episodes. LEVEL is one of SLEEP_LEVELS (e.g. awake-classic or rem-stages). In-bed time is the total duration of all main sleep episodes for each day. You can include daily ratios that were computed on weekend days, week days or both depending on the value of the DAY_TYPE flag. |
+
avgstarttimeofepisodemain[DAY_TYPE] |
+minutes | +Average start time of the first main sleep episode of each day in a time segment. You can include daily start times from episodes detected on weekend days, week days or both depending on the value of the DAY_TYPE flag. |
+
avgendtimeofepisodemain[DAY_TYPE] |
+minutes | +Average end time of the last main sleep episode of each day in a time segment. You can include daily end times from episodes detected on weekend days, week days or both depending on the value of the DAY_TYPE flag. |
+
avgmidpointofepisodemain[DAY_TYPE] |
+minutes | +Average mid time between the start of the first main sleep episode and the end of the last main sleep episode of each day in a time segment. You can include episodes detected on weekend days, week days or both depending on the value of the DAY_TYPE flag. |
+
stdstarttimeofepisodemain[DAY_TYPE] |
+minutes | +Standard deviation of start time of the first main sleep episode of each day in a time segment. You can include daily start times from episodes detected on weekend days, week days or both depending on the value of the DAY_TYPE flag. |
+
stdendtimeofepisodemain[DAY_TYPE] |
+minutes | +Standard deviation of end time of the last main sleep episode of each day in a time segment. You can include daily end times from episodes detected on weekend days, week days or both depending on the value of the DAY_TYPE flag. |
+
stdmidpointofepisodemain[DAY_TYPE] |
+minutes | +Standard deviation of mid time between the start of the first main sleep episode and the end of the last main sleep episode of each day in a time segment. You can include episodes detected on weekend days, week days or both depending on the value of the DAY_TYPE flag. |
+
socialjetlag | +minutes | +Difference in minutes between the avgmidpointofepisodemain (average mid time between bedtime and wake time) of weekends and weekdays. | +
meanssdstarttimeofepisodemain | +minutes squared | +Same as avgstarttimeofepisodemain[DAY_TYPE] but the average is computed over the squared differences of each pair of consecutive start times. |
+
meanssdendtimeofepisodemain | +minutes squared | +Same as avgendtimeofepisodemain[DAY_TYPE] but the average is computed over the squared differences of each pair of consecutive end times. |
+
meanssdmidpointofepisodemain | +minutes squared | +Same as avgmidpointofepisodemain[DAY_TYPE] but the average is computed over the squared differences of each pair of consecutive mid times. |
+
medianssdstarttimeofepisodemain | +minutes squared | +Same as avgstarttimeofepisodemain[DAY_TYPE] but the median is computed over the squared differences of each pair of consecutive start times. |
+
medianssdendtimeofepisodemain | +minutes squared | +Same as avgendtimeofepisodemain[DAY_TYPE] but the median is computed over the squared differences of each pair of consecutive end times. |
+
medianssdmidpointofepisodemain | +minutes squared | +Same as avgmidpointofepisodemain[DAY_TYPE] but the median is computed over the squared differences of each pair of consecutive mid times. |
+
Assumptions/Observations
+main
and nap
), only main
sleep episodes are considered.How do we assign sleep episodes to specific dates?
+START_TIME
and LENGTH
control the dates that sleep episodes belong to. For a pair of [START_TIME]
and [LENGTH]
, sleep episodes (blue boxes) can only be placed at the following places:
If the end time of a sleep episode is before [START_TIME]
, it will belong to the day before its start date (e.g. sleep episode #1).
if (1) the start time or the end time of a sleep episode are between (overlap) [START_TIME]
and [START_TIME] + [LENGTH]
or (2) the start time is before [START_TIME]
and the end time is after [START_TIME] + [LENGTH]
, it will belong to its start date (e.g. sleep episode #2, #3, #4, #5).
If the start time of a sleep episode is after START_TIME] + [LENGTH]
, it will belong to the day after its start date (e.g. sleep episode #6).
Only main
sleep episodes that intersect or contain the period between [START_TIME]
and [START_TIME] + [LENGTH]
will be included in the feature computation. If we process the following main
sleep episodes:
episode | +start | +end | +
---|---|---|
1 | +2021-02-01 12:00 | +2021-02-01 15:00 | +
2 | +2021-02-01 21:00 | +2021-02-02 03:00 | +
3 | +2021-02-02 05:00 | +2021-02-02 08:00 | +
4 | +2021-02-02 11:00 | +2021-02-02 14:00 | +
5 | +2021-02-02 19:00 | +2021-02-03 06:00 | +
And our parameters:
+[INCLUDE_EPISODES_INTERSECTING][START_TIME]
= 1320 (today’s 22:00)
[INCLUDE_EPISODES_INTERSECTING][LENGTH]
= 720 (tomorrow’s 10:00, or 22:00 + 12 hours)
Only sleep episodes 2, 3,and 5 would be considered.
+Time related features represent the number of minutes between the start/end/midpoint of sleep episodes and the assigned day’s midnight.
+All main
sleep episodes are chunked within the requested time segments which need to be at least 24 hours or more long (1, 2, 3, 7 days, etc.). Then, daily features will be extracted and averaged across the length of the time segment, for example:
The daily features extracted on 2021-02-01 will be:
+starttimeofepisodemain (bedtime) is 21 * 60
(episode 2 start time 2021-02-01 21:00)
endtimeofepisodemain (wake time) is 32 * 60
(episode 3 end time 2021-02-02 08:00 + 24)
midpointofepisodemain (midpoint sleep) is [(21 * 60) + (32 * 60)] / 2
The daily features extracted on 2021-02-02 will be:
+starttimeofepisodemain (bedtime) is 19 * 60
(episode 5 start time 2021-02-01 19:00)
endtimeofepisodemain (wake time) is 30 * 60
(episode 5 end time 2021-02-03 06:00 + 24)
midpointofepisodemain (midpoint sleep) is [(19 * 60) + (30 * 60)] / 2
And avgstarttimeofepisodemain[DAY_TYPE]
will be ([21 * 60] + [19 * 60]) / 2
Sensor parameters description for [FITBIT_SLEEP_SUMMARY]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Container where your sleep summary data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. | +
Available time segments
+File Sequence
+- data/raw/{pid}/fitbit_sleep_summary_raw.csv
+- data/raw/{pid}/fitbit_sleep_summary_with_datetime.csv
+- data/interim/{pid}/fitbit_sleep_summary_features/fitbit_sleep_summary_{language}_{provider_key}.csv
+- data/processed/features/{pid}/fitbit_sleep_summary.csv
+
Parameters description for [FITBIT_SLEEP_SUMMARY][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract FITBIT_SLEEP_SUMMARY features from the RAPIDS provider |
+
[SLEEP_TYPES] |
+Types of sleep to be included in the feature extraction computation. Fitbit provides 3 types of sleep: main , nap , all . |
+
[FEATURES] |
+Features to be computed from sleep summary data, see table below | +
Features description for [FITBIT_SLEEP_SUMMARY][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
countepisodeTYPE | +episodes | +Number of sleep episodes for a certain sleep type during a time segment. | +
avgefficiencyTYPE | +scores | +Average sleep efficiency for a certain sleep type during a time segment. | +
sumdurationafterwakeupTYPE | +minutes | +Total duration the user stayed in bed after waking up for a certain sleep type during a time segment. | +
sumdurationasleepTYPE | +minutes | +Total sleep duration for a certain sleep type during a time segment. | +
sumdurationawakeTYPE | +minutes | +Total duration the user stayed awake but still in bed for a certain sleep type during a time segment. | +
sumdurationtofallasleepTYPE | +minutes | +Total duration the user spent to fall asleep for a certain sleep type during a time segment. | +
sumdurationinbedTYPE | +minutes | +Total duration the user stayed in bed (sumdurationtofallasleep + sumdurationawake + sumdurationasleep + sumdurationafterwakeup) for a certain sleep type during a time segment. | +
avgdurationafterwakeupTYPE | +minutes | +Average duration the user stayed in bed after waking up for a certain sleep type during a time segment. | +
avgdurationasleepTYPE | +minutes | +Average sleep duration for a certain sleep type during a time segment. | +
avgdurationawakeTYPE | +minutes | +Average duration the user stayed awake but still in bed for a certain sleep type during a time segment. | +
avgdurationtofallasleepTYPE | +minutes | +Average duration the user spent to fall asleep for a certain sleep type during a time segment. | +
avgdurationinbedTYPE | +minutes | +Average duration the user stayed in bed (sumdurationtofallasleep + sumdurationawake + sumdurationasleep + sumdurationafterwakeup) for a certain sleep type during a time segment. | +
Assumptions/Observations
+There are three sleep types (TYPE): main
, nap
, all
. The all
type contains both main sleep and naps.
There are two versions of Fitbit’s sleep API (version 1 and version 1.2), and each provides raw sleep data in a different format:
+v1
contains count_awake
, duration_awake
, count_awakenings
, count_restless
, and duration_restless
fields for every sleep record but v1.2
does not.API columns. Features are computed based on the values provided by Fitbit’s API: efficiency
, minutes_after_wakeup
, minutes_asleep
, minutes_awake
, minutes_to_fall_asleep
, minutes_in_bed
, is_main_sleep
and type
.
Sensor parameters description for [FITBIT_STEPS_INTRADAY]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Container where your steps intraday data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. | +
Available time segments
+File Sequence
+- data/raw/{pid}/fitbit_steps_intraday_raw.csv
+- data/raw/{pid}/fitbit_steps_intraday_with_datetime.csv
+- data/interim/{pid}/fitbit_steps_intraday_features/fitbit_steps_intraday_{language}_{provider_key}.csv
+- data/processed/features/{pid}/fitbit_steps_intraday.csv
+
Parameters description for [FITBIT_STEPS_INTRADAY][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract FITBIT_STEPS_INTRADAY features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed from steps intraday data, see table below | +
[THRESHOLD_ACTIVE_BOUT] |
+Every minute with Fitbit steps data wil be labelled as sedentary if its step count is below this threshold, otherwise, active . |
+
[INCLUDE_ZERO_STEP_ROWS] |
+Whether or not to include time segments with a 0 step count during the whole day. | +
Features description for [FITBIT_STEPS_INTRADAY][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
sumsteps | +steps | +The total step count during a time segment. | +
maxsteps | +steps | +The maximum step count during a time segment. | +
minsteps | +steps | +The minimum step count during a time segment. | +
avgsteps | +steps | +The average step count during a time segment. | +
stdsteps | +steps | +The standard deviation of step count during a time segment. | +
countepisodesedentarybout | +bouts | +Number of sedentary bouts during a time segment. | +
sumdurationsedentarybout | +minutes | +Total duration of all sedentary bouts during a time segment. | +
maxdurationsedentarybout | +minutes | +The maximum duration of any sedentary bout during a time segment. | +
mindurationsedentarybout | +minutes | +The minimum duration of any sedentary bout during a time segment. | +
avgdurationsedentarybout | +minutes | +The average duration of sedentary bouts during a time segment. | +
stddurationsedentarybout | +minutes | +The standard deviation of the duration of sedentary bouts during a time segment. | +
countepisodeactivebout | +bouts | +Number of active bouts during a time segment. | +
sumdurationactivebout | +minutes | +Total duration of all active bouts during a time segment. | +
maxdurationactivebout | +minutes | +The maximum duration of any active bout during a time segment. | +
mindurationactivebout | +minutes | +The minimum duration of any active bout during a time segment. | +
avgdurationactivebout | +minutes | +The average duration of active bouts during a time segment. | +
stddurationactivebout | +minutes | +The standard deviation of the duration of active bouts during a time segment. | +
Assumptions/Observations
+THRESHOLD_ACTIVE_BOUT
(default value is 10), that minute is labelled as sedentary, otherwise, is labelled as active. Active and sedentary bouts are periods of consecutive minutes labelled as active
or sedentary
.Sensor parameters description for [FITBIT_STEPS_SUMMARY]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Container where your steps summary data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. | +
Available time segments
+File Sequence
+- data/raw/{pid}/fitbit_steps_summary_raw.csv
+- data/raw/{pid}/fitbit_steps_summary_with_datetime.csv
+- data/interim/{pid}/fitbit_steps_summary_features/fitbit_steps_summary_{language}_{provider_key}.csv
+- data/processed/features/{pid}/fitbit_steps_summary.csv
+
Parameters description for [FITBIT_STEPS_SUMMARY][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract FITBIT_STEPS_SUMMARY features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed from steps summary data, see table below | +
Features description for [FITBIT_STEPS_SUMMARY][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
maxsumsteps | +steps | +The maximum daily step count during a time segment. | +
minsumsteps | +steps | +The minimum daily step count during a time segment. | +
avgsumsteps | +steps | +The average daily step count during a time segment. | +
mediansumsteps | +steps | +The median of daily step count during a time segment. | +
stdsumsteps | +steps | +The standard deviation of daily step count during a time segment. | +
Assumptions/Observations
+NA
+Sensor parameters description for [PHONE_ACCELEROMETER]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Data stream container (database table, CSV file, etc.) where the accelerometer data is stored | +
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/phone_accelerometer_raw.csv
+- data/raw/{pid}/phone_accelerometer_with_datetime.csv
+- data/interim/{pid}/phone_accelerometer_features/phone_accelerometer_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_accelerometer.csv
+
Parameters description for [PHONE_ACCELEROMETER][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_ACCELEROMETER features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed, see table below | +
Features description for [PHONE_ACCELEROMETER][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
maxmagnitude | +m/s2 | +The maximum magnitude of acceleration (\(\|acceleration\| = \sqrt{x^2 + y^2 + z^2}\)). | +
minmagnitude | +m/s2 | +The minimum magnitude of acceleration. | +
avgmagnitude | +m/s2 | +The average magnitude of acceleration. | +
medianmagnitude | +m/s2 | +The median magnitude of acceleration. | +
stdmagnitude | +m/s2 | +The standard deviation of acceleration. | +
Assumptions/Observations
+These features are based on the work by Panda et al.
+Available time segments and platforms
+File Sequence
+- data/raw/{pid}/phone_accelerometer_raw.csv
+- data/raw/{pid}/phone_accelerometer_with_datetime.csv
+- data/interim/{pid}/phone_accelerometer_features/phone_accelerometer_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_accelerometer.csv
+
Parameters description for [PHONE_ACCELEROMETER][PROVIDERS][PANDA]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_ACCELEROMETER features from the PANDA provider |
+
[FEATURES] |
+Features to be computed for exertional and non-exertional activity episodes, see table below | +
Features description for [PHONE_ACCELEROMETER][PROVIDERS][PANDA]
:
Feature | +Units | +Description | +
---|---|---|
sumduration | +minutes | +Total duration of all exertional or non-exertional activity episodes. | +
maxduration | +minutes | +Longest duration of any exertional or non-exertional activity episode. | +
minduration | +minutes | +Shortest duration of any exertional or non-exertional activity episode. | +
avgduration | +minutes | +Average duration of any exertional or non-exertional activity episode. | +
medianduration | +minutes | +Median duration of any exertional or non-exertional activity episode. | +
stdduration | +minutes | +Standard deviation of the duration of all exertional or non-exertional activity episodes. | +
Assumptions/Observations
+Sensor parameters description for [PHONE_ACTIVITY_RECOGNITION]
:
Key | +Description | +
---|---|
[CONTAINER][ANDROID] |
+Data stream container (database table, CSV file, etc.) where the activity data from Android devices is stored (the AWARE client saves this data on different tables for Android and iOS) | +
[CONTAINER][IOS] |
+Data stream container (database table, CSV file, etc.) where the activity data from iOS devices is stored (the AWARE client saves this data on different tables for Android and iOS) | +
[EPISODE_THRESHOLD_BETWEEN_ROWS] |
+Difference in minutes between any two rows for them to be considered part of the same activity episode | +
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/phone_activity_recognition_raw.csv
+- data/raw/{pid}/phone_activity_recognition_with_datetime.csv
+- data/interim/{pid}/phone_activity_recognition_episodes.csv
+- data/interim/{pid}/phone_activity_recognition_episodes_resampled.csv
+- data/interim/{pid}/phone_activity_recognition_episodes_resampled_with_datetime.csv
+- data/interim/{pid}/phone_activity_recognition_features/phone_activity_recognition_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_activity_recognition.csv
+
Parameters description for [PHONE_ACTIVITY_RECOGNITION][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_ACTIVITY_RECOGNITION features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed, see table below | +
[ACTIVITY_CLASSES][STATIONARY] |
+An array of the activity labels to be considered in the STATIONARY category choose any of still , tilting |
+
[ACTIVITY_CLASSES][MOBILE] |
+An array of the activity labels to be considered in the MOBILE category choose any of on_foot , walking , running , on_bicycle |
+
[ACTIVITY_CLASSES][VEHICLE] |
+An array of the activity labels to be considered in the VEHICLE category choose any of in_vehicule |
+
Features description for [PHONE_ACTIVITY_RECOGNITION][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
count | +rows | +Number of episodes. | +
mostcommonactivity | +activity type | +The most common activity type (e.g. still , on_foot , etc.). If there is a tie, the first one is chosen. |
+
countuniqueactivities | +activity type | +Number of unique activities. | +
durationstationary | +minutes | +The total duration of [ACTIVITY_CLASSES][STATIONARY] episodes |
+
durationmobile | +minutes | +The total duration of [ACTIVITY_CLASSES][MOBILE] episodes of on foot, running, and on bicycle activities |
+
durationvehicle | +minutes | +The total duration of [ACTIVITY_CLASSES][VEHICLE] episodes of on vehicle activity |
+
Assumptions/Observations
+iOS Activity Recognition names and types are unified with Android labels:
+iOS Activity Name | +Android Activity Name | +Android Activity Type | +
---|---|---|
walking |
+walking |
+7 |
+
running |
+running |
+8 |
+
cycling |
+on_bicycle |
+1 |
+
automotive |
+in_vehicle |
+0 |
+
stationary |
+still |
+3 |
+
unknown |
+unknown |
+4 |
+
In AWARE, Activity Recognition data for Android and iOS are stored in two different database tables, RAPIDS automatically infers what platform each participant belongs to based on their participant file.
+Sensor parameters description for [PHONE_APPLICATIONS_CRASHES]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Data stream container (database table, CSV file, etc.) where the applications crashes data is stored | +
[APPLICATION_CATEGORIES][CATALOGUE_SOURCE] |
+FILE or GOOGLE . If FILE , app categories (genres) are read from [CATALOGUE_FILE] . If [GOOGLE] , app categories (genres) are scrapped from the Play Store |
+
[APPLICATION_CATEGORIES][CATALOGUE_FILE] |
+CSV file with a package_name and genre column. By default we provide the catalogue created by Stachl et al in data/external/stachl_application_genre_catalogue.csv |
+
[APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE] |
+if [CATALOGUE_SOURCE] is equal to FILE , this flag signals whether or not to update [CATALOGUE_FILE] , if [CATALOGUE_SOURCE] is equal to GOOGLE all scraped genres will be saved to [CATALOGUE_FILE] |
+
[APPLICATION_CATEGORIES][SCRAPE_MISSING_CATEGORIES] |
+This flag signals whether or not to scrape categories (genres) missing from the [CATALOGUE_FILE] . If [CATALOGUE_SOURCE] is equal to GOOGLE , all genres are scraped anyway (this flag is ignored) |
+
Note
+No feature providers have been implemented for this sensor yet, however you can use its key (PHONE_APPLICATIONS_CRASHES
) to improve PHONE_DATA_YIELD
or you can implement your own features.
Sensor parameters description for [PHONE_APPLICATIONS_FOREGROUND]
(these parameters are used by the only provider available at the moment, RAPIDS):
Key | +Description | +
---|---|
[CONTAINER] |
+Data stream container (database table, CSV file, etc.) where the applications foreground data is stored | +
[APPLICATION_CATEGORIES][CATALOGUE_SOURCE] |
+FILE or GOOGLE . If FILE , app categories (genres) are read from [CATALOGUE_FILE] . If [GOOGLE] , app categories (genres) are scrapped from the Play Store |
+
[APPLICATION_CATEGORIES][CATALOGUE_FILE] |
+CSV file with a package_name and genre column. By default we provide the catalogue created by Stachl et al in data/external/stachl_application_genre_catalogue.csv |
+
[APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE] |
+if [CATALOGUE_SOURCE] is equal to FILE , this flag signals whether or not to update [CATALOGUE_FILE] , if [CATALOGUE_SOURCE] is equal to GOOGLE all scraped genres will be saved to [CATALOGUE_FILE] |
+
[APPLICATION_CATEGORIES][SCRAPE_MISSING_CATEGORIES] |
+This flag signals whether or not to scrape categories (genres) missing from the [CATALOGUE_FILE] . If [CATALOGUE_SOURCE] is equal to GOOGLE , all genres are scraped anyway (this flag is ignored) |
+
The app category (genre) catalogue used in these features was originally created by Stachl et al.
+Available time segments and platforms
+File Sequence
+- data/raw/{pid}/phone_applications_foreground_raw.csv
+- data/raw/{pid}/phone_applications_foreground_with_datetime.csv
+- data/raw/{pid}/phone_applications_foreground_with_datetime_with_categories.csv
+- data/interim/{pid}/phone_applications_foreground_features/phone_applications_foreground_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_applications_foreground.csv
+
Parameters description for [PHONE_APPLICATIONS_FOREGROUND][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_APPLICATIONS_FOREGROUND features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed, see table below | +
[SINGLE_CATEGORIES] |
+An array of app categories to be included in the feature extraction computation. The special keyword all represents a category with all the apps from each participant. By default we use the category catalogue pointed by [APPLICATION_CATEGORIES][CATALOGUE_FILE] (see the Sensor parameters description table above) |
+
[MULTIPLE_CATEGORIES] |
+An array of collections representing meta-categories (a group of categories). They key of each element is the name of the meta-category and the value is an array of member app categories. By default we use the category catalogue pointed by [APPLICATION_CATEGORIES][CATALOGUE_FILE] (see the Sensor parameters description table above) |
+
[SINGLE_APPS] |
+An array of apps to be included in the feature extraction computation. Use their package name (e.g. com.google.android.youtube ) or the reserved keyword top1global (the most used app by a participant over the whole monitoring study) |
+
[EXCLUDED_CATEGORIES] |
+An array of app categories to be excluded from the feature extraction computation. By default we use the category catalogue pointed by [APPLICATION_CATEGORIES][CATALOGUE_FILE] (see the Sensor parameters description table above) |
+
[EXCLUDED_APPS] |
+An array of apps to be excluded from the feature extraction computation. Use their package name, for example: com.google.android.youtube |
+
Features description for [PHONE_APPLICATIONS_FOREGROUND][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
count | +apps | +Number of times a single app or apps within a category were used (i.e. they were brought to the foreground either by tapping their icon or switching to it from another app) | +
timeoffirstuse | +minutes | +The time in minutes between 12:00am (midnight) and the first use of a single app or apps within a category during a time_segment |
+
timeoflastuse | +minutes | +The time in minutes between 12:00am (midnight) and the last use of a single app or apps within a category during a time_segment |
+
frequencyentropy | +nats | +The entropy of the used apps within a category during a time_segment (each app is seen as a unique event, the more apps were used, the higher the entropy). This is especially relevant when computed over all apps. Entropy cannot be obtained for a single app |
+
Assumptions/Observations
+Features can be computed by app, by apps grouped under a single category (genre) and by multiple categories grouped together (meta-categories). For example, we can get features for Facebook
(single app), for Social Network
apps (a category including Facebook and other social media apps) or for Social
(a meta-category formed by Social Network
and Social Media Tools
categories).
Apps installed by default like YouTube are considered systems apps on some phones. We do an exact match to exclude apps where “genre” == EXCLUDED_CATEGORIES
or “package_name” == EXCLUDED_APPS
.
We provide three ways of classifying and app within a category (genre): a) by automatically scraping its official category from the Google Play Store, b) by using the catalogue created by Stachl et al. which we provide in RAPIDS (data/external/stachl_application_genre_catalogue.csv
), or c) by manually creating a personalized catalogue. You can choose a, b or c by modifying [APPLICATION_GENRES]
keys and values (see the Sensor parameters description table above).
Sensor parameters description for [PHONE_APPLICATIONS_NOTIFICATIONS]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Data stream container (database table, CSV file, etc.) where the applications notifications data is stored | +
[APPLICATION_CATEGORIES][CATALOGUE_SOURCE] |
+FILE or GOOGLE . If FILE , app categories (genres) are read from [CATALOGUE_FILE] . If [GOOGLE] , app categories (genres) are scrapped from the Play Store |
+
[APPLICATION_CATEGORIES][CATALOGUE_FILE] |
+CSV file with a package_name and genre column. By default we provide the catalogue created by Stachl et al in data/external/stachl_application_genre_catalogue.csv |
+
[APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE] |
+if [CATALOGUE_SOURCE] is equal to FILE , this flag signals whether or not to update [CATALOGUE_FILE] , if [CATALOGUE_SOURCE] is equal to GOOGLE all scraped genres will be saved to [CATALOGUE_FILE] |
+
[APPLICATION_CATEGORIES][SCRAPE_MISSING_CATEGORIES] |
+This flag signals whether or not to scrape categories (genres) missing from the [CATALOGUE_FILE] . If [CATALOGUE_SOURCE] is equal to GOOGLE , all genres are scraped anyway (this flag is ignored) |
+
Note
+No feature providers have been implemented for this sensor yet, however you can use its key (PHONE_APPLICATIONS_NOTIFICATIONS
) to improve PHONE_DATA_YIELD
or you can implement your own features.
Sensor parameters description for [PHONE_BATTERY]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Data stream container (database table, CSV file, etc.) where the battery data is stored | +
[EPISODE_THRESHOLD_BETWEEN_ROWS] |
+Difference in minutes between any two rows for them to be considered part of the same battery charge or discharge episode | +
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/phone_battery_raw.csv
+- data/interim/{pid}/phone_battery_episodes.csv
+- data/interim/{pid}/phone_battery_episodes_resampled.csv
+- data/interim/{pid}/phone_battery_episodes_resampled_with_datetime.csv
+- data/interim/{pid}/phone_battery_features/phone_battery_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_battery.csv
+
Parameters description for [PHONE_BATTERY][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_BATTERY features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed, see table below | +
Features description for [PHONE_BATTERY][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
countdischarge | +episodes | +Number of discharging episodes. | +
sumdurationdischarge | +minutes | +The total duration of all discharging episodes. | +
countcharge | +episodes | +Number of battery charging episodes. | +
sumdurationcharge | +minutes | +The total duration of all charging episodes. | +
avgconsumptionrate | +episodes/minutes | +The average of all episodes’ consumption rates. An episode’s consumption rate is defined as the ratio between its battery delta and duration | +
maxconsumptionrate | +episodes/minutes | +The highest of all episodes’ consumption rates. An episode’s consumption rate is defined as the ratio between its battery delta and duration | +
Assumptions/Observations
+4
do not exist) to match Android battery format: we swap status 3
for 5
and 1
for 3
[EPISODE_THRESHOLD_BETWEEN_ROWS]
minutes from each other.Sensor parameters description for [PHONE_BLUETOOTH]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Data stream container (database table, CSV file, etc.) where the bluetooth data is stored | +
Warning
+The features of this provider are deprecated in favor of DORYAB
provider (see below).
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/phone_bluetooth_raw.csv
+- data/raw/{pid}/phone_bluetooth_with_datetime.csv
+- data/interim/{pid}/phone_bluetooth_features/phone_bluetooth_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_bluetooth.csv"
+
Parameters description for [PHONE_BLUETOOTH][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_BLUETOOTH features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed, see table below | +
Features description for [PHONE_BLUETOOTH][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
devices | +Number of scanned devices during a time segment, a device can be detected multiple times over time and these appearances are counted separately | +|
devices | +Number of unique devices during a time segment as identified by their hardware (bt_address ) address |
+|
scans | +Number of scans of the most sensed device within each time segment instance | +
Assumptions/Observations
+v0.2.0
countscans
, uniquedevices
, countscansmostuniquedevice
were deprecated because they overlap with the respective features for ALL
devices of the PHONE_BLUETOOTH
DORYAB
providerThis provider is adapted from the work by Doryab et al.
+Available time segments and platforms
+File Sequence
+- data/raw/{pid}/phone_bluetooth_raw.csv
+- data/raw/{pid}/phone_bluetooth_with_datetime.csv
+- data/interim/{pid}/phone_bluetooth_features/phone_bluetooth_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_bluetooth.csv"
+
Parameters description for [PHONE_BLUETOOTH][PROVIDERS][DORYAB]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_BLUETOOTH features from the DORYAB provider |
+
[FEATURES] |
+Features to be computed, see table below. These features are computed for three device categories: all devices, own devices and other devices. |
+
Features description for [PHONE_BLUETOOTH][PROVIDERS][DORYAB]
:
Feature | +Units | +Description | +
---|---|---|
countscans | +scans | +Number of scans (rows) from the devices sensed during a time segment instance. The more scans a bluetooth device has the longer it remained within range of the participant’s phone | +
uniquedevices | +devices | +Number of unique bluetooth devices sensed during a time segment instance as identified by their hardware addresses (bt_address ) |
+
meanscans | +scans | +Mean of the scans of every sensed device within each time segment instance | +
stdscans | +scans | +Standard deviation of the scans of every sensed device within each time segment instance | +
countscansmostfrequentdevicewithinsegments | +scans | +Number of scans of the most sensed device within each time segment instance | +
countscansleastfrequentdevicewithinsegments | +scans | +Number of scans of the least sensed device within each time segment instance | +
countscansmostfrequentdeviceacrosssegments | +scans | +Number of scans of the most sensed device across time segment instances of the same type | +
countscansleastfrequentdeviceacrosssegments | +scans | +Number of scans of the least sensed device across time segment instances of the same type per device | +
countscansmostfrequentdeviceacrossdataset | +scans | +Number of scans of the most sensed device across the entire dataset of every participant | +
countscansleastfrequentdeviceacrossdataset | +scans | +Number of scans of the least sensed device across the entire dataset of every participant | +
Assumptions/Observations
+own
) or to other people (others
) using k-means based on the number of times and the number of days each device was detected across each participant’s dataset. See Doryab et al for more details.other
. Thus all
and other
features will be equal. The likelihood of this scenario decreases the more days of data you have.The following is a simplified example with bluetooth data from three days and two time segments: morning and afternoon. There are two own
devices: 5C836F5-487E-405F-8E28-21DBD40FA4FF
detected seven times across two days and 499A1EAF-DDF1-4657-986C-EA5032104448
detected eight times on a single day.
+
local_date segment bt_address own_device
+2016-11-29 morning 55C836F5-487E-405F-8E28-21DBD40FA4FF 1
+2016-11-29 morning 55C836F5-487E-405F-8E28-21DBD40FA4FF 1
+2016-11-29 morning 55C836F5-487E-405F-8E28-21DBD40FA4FF 1
+2016-11-29 morning 55C836F5-487E-405F-8E28-21DBD40FA4FF 1
+2016-11-29 morning 48872A52-68DE-420D-98DA-73339A1C4685 0
+2016-11-29 afternoon 55C836F5-487E-405F-8E28-21DBD40FA4FF 1
+2016-11-29 afternoon 48872A52-68DE-420D-98DA-73339A1C4685 0
+2016-11-30 morning 55C836F5-487E-405F-8E28-21DBD40FA4FF 1
+2016-11-30 morning 48872A52-68DE-420D-98DA-73339A1C4685 0
+2016-11-30 morning 25262DC7-780C-4AD5-AD3A-D9776AEF7FC1 0
+2016-11-30 morning 5B1E6981-2E50-4D9A-99D8-67AED430C5A8 0
+2016-11-30 morning 5B1E6981-2E50-4D9A-99D8-67AED430C5A8 0
+2016-11-30 afternoon 55C836F5-487E-405F-8E28-21DBD40FA4FF 1
+2017-05-07 morning 5C5A9C41-2F68-4CEB-96D0-77DE3729B729 0
+2017-05-07 morning 25262DC7-780C-4AD5-AD3A-D9776AEF7FC1 0
+2017-05-07 morning 5B1E6981-2E50-4D9A-99D8-67AED430C5A8 0
+2017-05-07 morning 6C444841-FE64-4375-BC3F-FA410CDC0AC7 0
+2017-05-07 morning 4DC7A22D-9F1F-4DEF-8576-086910AABCB5 0
+2017-05-07 afternoon 5B1E6981-2E50-4D9A-99D8-67AED430C5A8 0
+2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1
+2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1
+2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1
+2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1
+2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1
+2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1
+2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1
+2017-05-07 afternoon 499A1EAF-DDF1-4657-986C-EA5032104448 1
+
OTHER
devices (own_device == 0
) during morning segmentsThe most and least frequent ALL
|OWN
|OTHER
devices are computed within each time segment instance, across time segment instances of the same type and across the entire dataset of each person. These are the most and least frequent devices for OTHER
devices during morning segments.
+
most frequent device across 2016-11-29 morning: '48872A52-68DE-420D-98DA-73339A1C4685' (this device is the only one in this instance)
+least frequent device across 2016-11-29 morning: '48872A52-68DE-420D-98DA-73339A1C4685' (this device is the only one in this instance)
+most frequent device across 2016-11-30 morning: '5B1E6981-2E50-4D9A-99D8-67AED430C5A8'
+least frequent device across 2016-11-30 morning: '25262DC7-780C-4AD5-AD3A-D9776AEF7FC1' (when tied, the first occurance is chosen)
+most frequent device across 2017-05-07 morning: '25262DC7-780C-4AD5-AD3A-D9776AEF7FC1' (when tied, the first occurance is chosen)
+least frequent device across 2017-05-07 morning: '25262DC7-780C-4AD5-AD3A-D9776AEF7FC1' (when tied, the first occurance is chosen)
+
+most frequent across morning segments: '5B1E6981-2E50-4D9A-99D8-67AED430C5A8'
+least frequent across morning segments: '6C444841-FE64-4375-BC3F-FA410CDC0AC7' (when tied, the first occurance is chosen)
+
+most frequent across dataset: '499A1EAF-DDF1-4657-986C-EA5032104448' (only taking into account "morning" segments)
+least frequent across dataset: '4DC7A22D-9F1F-4DEF-8576-086910AABCB5' (when tied, the first occurance is chosen)
+
OTHER
devices and morning segmentsFor brevity we only show the following features for morning segments: +
OTHER:
+ DEVICES: ["countscans", "uniquedevices", "meanscans", "stdscans"]
+ SCANS_MOST_FREQUENT_DEVICE: ["withinsegments", "acrosssegments", "acrossdataset"]
+
Note that countscansmostfrequentdeviceacrossdatasetothers
is all 0
s because 499A1EAF-DDF1-4657-986C-EA5032104448
is excluded from the count as is labelled as an own
device (not other
).
+
local_segment countscansothers uniquedevicesothers meanscansothers stdscansothers countscansmostfrequentdevicewithinsegmentsothers countscansmostfrequentdeviceacrosssegmentsothers countscansmostfrequentdeviceacrossdatasetothers
+2016-11-29-morning 1 1 1.000000 NaN 1 0.0 0.0
+2016-11-30-morning 4 3 1.333333 0.57735 2 2.0 2.0
+2017-05-07-morning 5 5 1.000000 0.00000 1 1.0 1.0
+
Sensor parameters description for [PHONE_CALLS]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Data stream container (database table, CSV file, etc.) where the calls data is stored | +
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/phone_calls_raw.csv
+- data/raw/{pid}/phone_calls_with_datetime.csv
+- data/interim/{pid}/phone_calls_features/phone_calls_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_calls.csv
+
Parameters description for [PHONE_CALLS][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_CALLS features from the RAPIDS provider |
+
[CALL_TYPES] |
+The particular call_type that will be analyzed. The options for this parameter are incoming, outgoing or missed. | +
[FEATURES] |
+Features to be computed for outgoing , incoming , and missed calls. Note that the same features are available for both incoming and outgoing calls, while missed calls has its own set of features. See the tables below. |
+
Features description for [PHONE_CALLS][PROVIDERS][RAPIDS]
incoming and outgoing calls:
Feature | +Units | +Description | +
---|---|---|
count | +calls | +Number of calls of a particular call_type occurred during a particular time_segment . |
+
distinctcontacts | +contacts | +Number of distinct contacts that are associated with a particular call_type for a particular time_segment |
+
meanduration | +seconds | +The mean duration of all calls of a particular call_type during a particular time_segment . |
+
sumduration | +seconds | +The sum of the duration of all calls of a particular call_type during a particular time_segment . |
+
minduration | +seconds | +The duration of the shortest call of a particular call_type during a particular time_segment . |
+
maxduration | +seconds | +The duration of the longest call of a particular call_type during a particular time_segment . |
+
stdduration | +seconds | +The standard deviation of the duration of all the calls of a particular call_type during a particular time_segment . |
+
modeduration | +seconds | +The mode of the duration of all the calls of a particular call_type during a particular time_segment . |
+
entropyduration | +nats | +The estimate of the Shannon entropy for the the duration of all the calls of a particular call_type during a particular time_segment . |
+
timefirstcall | +minutes | +The time in minutes between 12:00am (midnight) and the first call of call_type . |
+
timelastcall | +minutes | +The time in minutes between 12:00am (midnight) and the last call of call_type . |
+
countmostfrequentcontact | +calls | +The number of calls of a particular call_type during a particular time_segment of the most frequent contact throughout the monitored period. |
+
Features description for [PHONE_CALLS][PROVIDERS][RAPIDS]
missed calls:
Feature | +Units | +Description | +
---|---|---|
count | +calls | +Number of missed calls that occurred during a particular time_segment . |
+
distinctcontacts | +contacts | +Number of distinct contacts that are associated with missed calls for a particular time_segment |
+
timefirstcall | +minutes | +The time in hours from 12:00am (Midnight) that the first missed call occurred. |
+
timelastcall | +minutes | +The time in hours from 12:00am (Midnight) that the last missed call occurred. |
+
countmostfrequentcontact | +calls | +The number of missed calls during a particular time_segment of the most frequent contact throughout the monitored period. |
+
Assumptions/Observations
+countmostfrequentcontact
meaningless and distinctcontacts
equal to the total number of traces. [CALL_TYPES]
and [FEATURES]
keys in config.yaml
need to match. For example, [CALL_TYPES]
outgoing
matches the [FEATURES]
key outgoing
Sensor parameters description for [PHONE_CONVERSATION]
:
Key | +Description | +
---|---|
[CONTAINER][ANDROID] |
+Data stream container (database table, CSV file, etc.) where the conversation data from Android devices is stored (the AWARE client saves this data on different tables for Android and iOS) | +
[CONTAINER][IOS] |
+Data stream container (database table, CSV file, etc.) where the conversation data from iOS devices is stored (the AWARE client saves this data on different tables for Android and iOS) | +
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/phone_conversation_raw.csv
+- data/raw/{pid}/phone_conversation_with_datetime.csv
+- data/interim/{pid}/phone_conversation_features/phone_conversation_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_conversation.csv
+
Parameters description for [PHONE_CONVERSATION][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_CONVERSATION features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed, see table below | +
[RECORDING_MINUTES] |
+Minutes the plugin was recording audio (default 1 min) | +
[PAUSED_MINUTES] |
+Minutes the plugin was NOT recording audio (default 3 min) | +
Features description for [PHONE_CONVERSATION][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
minutessilence | +minutes | +Minutes labeled as silence | +
minutesnoise | +minutes | +Minutes labeled as noise | +
minutesvoice | +minutes | +Minutes labeled as voice | +
minutesunknown | +minutes | +Minutes labeled as unknown | +
sumconversationduration | +minutes | +Total duration of all conversations | +
maxconversationduration | +minutes | +Longest duration of all conversations | +
minconversationduration | +minutes | +Shortest duration of all conversations | +
avgconversationduration | +minutes | +Average duration of all conversations | +
sdconversationduration | +minutes | +Standard Deviation of the duration of all conversations | +
timefirstconversation | +minutes | +Minutes since midnight when the first conversation for a time segment was detected | +
timelastconversation | +minutes | +Minutes since midnight when the last conversation for a time segment was detected | +
noisesumenergy | +L2-norm | +Sum of all energy values when inference is noise | +
noiseavgenergy | +L2-norm | +Average of all energy values when inference is noise | +
noisesdenergy | +L2-norm | +Standard Deviation of all energy values when inference is noise | +
noiseminenergy | +L2-norm | +Minimum of all energy values when inference is noise | +
noisemaxenergy | +L2-norm | +Maximum of all energy values when inference is noise | +
voicesumenergy | +L2-norm | +Sum of all energy values when inference is voice | +
voiceavgenergy | +L2-norm | +Average of all energy values when inference is voice | +
voicesdenergy | +L2-norm | +Standard Deviation of all energy values when inference is voice | +
voiceminenergy | +L2-norm | +Minimum of all energy values when inference is voice | +
voicemaxenergy | +L2-norm | +Maximum of all energy values when inference is voice | +
silencesensedfraction | +- | +Ratio between minutessilence and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown) | +
noisesensedfraction | +- | +Ratio between minutesnoise and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown) | +
voicesensedfraction | +- | +Ratio between minutesvoice and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown) | +
unknownsensedfraction | +- | +Ratio between minutesunknown and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown) | +
silenceexpectedfraction | +- | +Ration between minutessilence and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes) | +
noiseexpectedfraction | +- | +Ration between minutesnoise and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes) | +
voiceexpectedfraction | +- | +Ration between minutesvoice and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes) | +
unknownexpectedfraction | +- | +Ration between minutesunknown and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes) | +
Assumptions/Observations
+This is a combinatorial sensor which means that we use the data from multiple sensors to extract data yield features. Data yield features can be used to remove rows (time segments) that do not contain enough data. You should decide what is your “enough” threshold depending on the type of sensors you collected (frequency vs event based, e.g. acceleroemter vs calls), the length of your study, and the rates of missing data that your analysis could handle.
+Why is data yield important?
+Imagine that you want to extract PHONE_CALL
features on daily segments (00:00
to 23:59
). Let’s say that on day 1 the phone logged 10 calls and 23 hours of data from other sensors and on day 2 the phone logged 10 calls and only 2 hours of data from other sensors. It’s more likely that other calls were placed on the 22 hours of data that you didn’t log on day 2 than on the 1 hour of data you didn’t log on day 1, and so including day 2 in your analysis could bias your results.
Sensor parameters description for [PHONE_DATA_YIELD]
:
Key | +Description | +
---|---|
[SENSORS] |
+One or more phone sensor config keys (e.g. PHONE_MESSAGE ). The more keys you include the more accurately RAPIDS can approximate the time an smartphone was sensing data. The supported phone sensors you can include in this list are outlined below (do NOT include Fitbit sensors, ONLY include phone sensors). |
+
Supported phone sensors for [PHONE_DATA_YIELD][SENSORS]
PHONE_ACCELEROMETER
+PHONE_ACTIVITY_RECOGNITION
+PHONE_APPLICATIONS_CRASHES
+PHONE_APPLICATIONS_FOREGROUND
+PHONE_APPLICATIONS_NOTIFICATIONS
+PHONE_BATTERY
+PHONE_BLUETOOTH
+PHONE_CALLS
+PHONE_CONVERSATION
+PHONE_KEYBOARD
+PHONE_LIGHT
+PHONE_LOCATIONS
+PHONE_LOG
+PHONE_MESSAGES
+PHONE_SCREEN
+PHONE_WIFI_CONNECTED
+PHONE_WIFI_VISIBLE
+
Before explaining the data yield features, let’s define the following relevant concepts:
+[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]
The timestamps of all sensors are concatenated and then grouped per time segment. Minute and hour windows are created from the beginning of each time segment instance and these windows are marked as valid based on the definitions above. The duration of each time segment is taken into account to compute the features described below.
+Available time segments and platforms
+File Sequence
+- data/raw/{pid}/{sensor}_raw.csv # one for every [PHONE_DATA_YIELD][SENSORS]
+- data/interim/{pid}/phone_yielded_timestamps.csv
+- data/interim/{pid}/phone_yielded_timestamps_with_datetime.csv
+- data/interim/{pid}/phone_data_yield_features/phone_data_yield_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_data_yield.csv
+
Parameters description for [PHONE_DATA_YIELD][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_DATA_YIELD features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed, see table below | +
[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS] |
+The proportion [0.0 ,1.0] of valid minutes in a 60-minute window necessary to flag that window as valid. |
+
Features description for [PHONE_DATA_YIELD][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
ratiovalidyieldedminutes | +- | +The ratio between the number of valid minutes and the duration in minutes of a time segment. | +
ratiovalidyieldedhours | +- | +The ratio between the number of valid hours and the duration in hours of a time segment. If the time segment is shorter than 1 hour this feature will always be 1. | +
Assumptions/Observations
+We recommend using ratiovalidyieldedminutes
on time segments that are shorter than two or three hours and ratiovalidyieldedhours
for longer segments. This is because relying on yielded minutes only can be misleading when a big chunk of those missing minutes are clustered together.
For example, let’s assume we are working with a 24-hour time segment that is missing 12 hours of data. Two extreme cases can occur:
+ratiovalidyieldedminutes
would be 0.5 for both a
and b
(hinting the missing circumstances are similar). However, ratiovalidyieldedhours
would be 0.5 for a
and 1.0 for b
if [MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]
is between [0.0 and 0.49] (hinting that the missing circumstances might be more favorable for b
. In other words, sensed data for b
is more evenly spread compared to a
.
Sensor parameters description for [PHONE_KEYBOARD]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Data stream container (database table, CSV file, etc.) where the keyboard data is stored | +
Note
+No feature providers have been implemented for this sensor yet, however you can use its key (PHONE_KEYBOARD
) to improve PHONE_DATA_YIELD
or you can implement your own features.
Sensor parameters description for [PHONE_LIGHT]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Data stream container (database table, CSV file, etc.) where the light data is stored | +
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/phone_light_raw.csv
+- data/raw/{pid}/phone_light_with_datetime.csv
+- data/interim/{pid}/phone_light_features/phone_light_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_light.csv
+
Parameters description for [PHONE_LIGHT][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_LIGHT features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed, see table below | +
Features description for [PHONE_LIGHT][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
count | +rows | +Number light sensor rows recorded. | +
maxlux | +lux | +The maximum ambient luminance. | +
minlux | +lux | +The minimum ambient luminance. | +
avglux | +lux | +The average ambient luminance. | +
medianlux | +lux | +The median ambient luminance. | +
stdlux | +lux | +The standard deviation of ambient luminance. | +
Assumptions/Observations
+NA
+Sensor parameters description for [PHONE_LOCATIONS]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Data stream container (database table, CSV file, etc.) where the location data is stored | +
[LOCATIONS_TO_USE] |
+Type of location data to use, one of ALL , GPS , ALL_RESAMPLED or FUSED_RESAMPLED . This filter is based on the provider column of the locations table, ALL includes every row, GPS only includes rows where the provider is gps, ALL_RESAMPLED includes all rows after being resampled, and FUSED_RESAMPLED only includes rows where the provider is fused after being resampled. |
+
[FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD] |
+if ALL_RESAMPLED or FUSED_RESAMPLED is used, the original fused data has to be resampled, a location row is resampled to the next valid timestamp (see the Assumptions/Observations below) only if the time difference between them is less or equal than this threshold (in minutes). |
+
[FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION] |
+if ALL_RESAMPLED or FUSED_RESAMPLED is used, the original fused data has to be resampled, a location row is resampled at most for this long (in minutes) |
+
Assumptions/Observations
+Types of location data to use +Android and iOS clients can collect location coordinates through the phone’s GPS, the network cellular towers around the phone, or Google’s fused location API.
+[LOCATIONS_TO_USE]
to GPS
[LOCATIONS_TO_USE]
to ALL
ALL_RESAMPLED
[LOCATIONS_TO_USE]
to RESAMPLE_FUSED
.ALL_RESAMPLED
and RESAMPLE_FUSED
take the original location coordinates and replicate each pair forward in time as long as the phone was sensing data as indicated by the joined timestamps of [PHONE_DATA_YIELD][SENSORS]
. This is done because Google’s API only logs a new location coordinate pair when it is sufficiently different in time or space from the previous one and because GPS and network providers can log data at variable rates.
There are two parameters associated with resampling fused location.
+FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD
(in minutes, default 30) controls the maximum gap between any two coordinate pairs to replicate the last known pair. For example, participant A’s phone did not collect data between 10.30 am and 10:50 am and between 11:05am and 11:40am, the last known coordinate pair is replicated during the first period but not the second. In other words, we assume that we cannot longer guarantee the participant stayed at the last known location if the phone did not sense data for more than 30 minutes. FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION
(in minutes, default 720 or 12 hours) stops the last known fused location from being replicated longer than this threshold even if the phone was sensing data continuously. For example, participant A went home at 9 pm, and their phone was sensing data without gaps until 11 am the next morning, the last known location is replicated until 9 am. If you have suggestions to modify or improve this resampling, let us know.
+These features are based on the original open-source implementation by Barnett et al and some features created by Canzian et al.
+Available time segments and platforms
+File Sequence
+- data/raw/{pid}/phone_locations_raw.csv
+- data/interim/{pid}/phone_locations_processed.csv
+- data/interim/{pid}/phone_locations_processed_with_datetime.csv
+- data/interim/{pid}/phone_locations_features/phone_locations_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_locations.csv
+
Parameters description for [PHONE_LOCATIONS][PROVIDERS][BARNETT]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_LOCATIONS features from the BARNETT provider |
+
[FEATURES] |
+Features to be computed, see table below | +
[ACCURACY_LIMIT] |
+An integer in meters, any location rows with an accuracy higher than this is dropped. This number means there’s a 68% probability the actual location is within this radius | +
[IF_MULTIPLE_TIMEZONES] |
+Currently, USE_MOST_COMMON is the only value supported. If the location data for a participant belongs to multiple time zones, we select the most common because Barnett’s algorithm can only handle one time zone |
+
[MINUTES_DATA_USED] |
+Set to True to include an extra column in the final location feature file containing the number of minutes used to compute the features on each time segment. Use this for quality control purposes; the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough. |
+
Features description for [PHONE_LOCATIONS][PROVIDERS][BARNETT]
adapted from Beiwe Summary Statistics:
Feature | +Units | +Description | +
---|---|---|
hometime | +minutes | +Time at home. Time spent at home in minutes. Home is the most visited significant location between 8 pm and 8 am, including any pauses within a 200-meter radius. | +
disttravelled | +meters | +Total distance traveled over a day (flights). | +
rog | +meters | +The Radius of Gyration (rog) is a measure in meters of the area covered by a person over a day. A centroid is calculated for all the places (pauses) visited during a day, and a weighted distance between all the places and that centroid is computed. The weights are proportional to the time spent in each place. | +
maxdiam | +meters | +The maximum diameter is the largest distance between any two pauses. | +
maxhomedist | +meters | +The maximum distance from home in meters. | +
siglocsvisited | +locations | +The number of significant locations visited during the day. Significant locations are computed using k-means clustering over pauses found in the whole monitoring period. The number of clusters is found iterating k from 1 to 200 stopping until the centroids of two significant locations are within 400 meters of one another. | +
avgflightlen | +meters | +Mean length of all flights. | +
stdflightlen | +meters | +Standard deviation of the length of all flights. | +
avgflightdur | +seconds | +Mean duration of all flights. | +
stdflightdur | +seconds | +The standard deviation of the duration of all flights. | +
probpause | +- | +The fraction of a day spent in a pause (as opposed to a flight) | +
siglocentropy | +nats | +Shannon’s entropy measurement is based on the proportion of time spent at each significant location visited during a day. | +
circdnrtn | +- | +A continuous metric quantifying a person’s circadian routine that can take any value between 0 and 1, where 0 represents a daily routine completely different from any other sensed days and 1 a routine the same as every other sensed day. | +
wkenddayrtn | +- | +Same as circdnrtn but computed separately for weekends and weekdays. | +
Assumptions/Observations
+Multi day segment features +Barnett’s features are only available on time segments that span entire days (00:00:00 to 23:59:59). Such segments can be one-day long (daily) or multi-day (weekly, for example). Multi-day segment features are computed based on daily features summarized the following way:
+hometime
, disttravelled
, siglocsvisited
, and minutes_data_used
maxdiam
, and maxhomedist
rog
, avgflightlen
, stdflightlen
, avgflightdur
, stdflightdur
, probpause
, siglocentropy
, circdnrtn
, wkenddayrtn
, and minsmissing
Computation speed +The process to extract these features can be slow compared to other sensors and providers due to the required simulation.
+How are these features computed?
+These features are based on a Pause-Flight model. A pause is defined as a mobility trace (location pings) within a certain duration and distance (by default, 300 seconds and 60 meters). A flight is any mobility trace between two pauses. Data is resampled and imputed before the features are computed. See Barnett et al for more information. In RAPIDS, we only expose one parameter for these features (accuracy limit). You can change other parameters in src/features/phone_locations/barnett/library/MobilityFeatures.R
.
Significant Locations +Significant locations are determined using K-means clustering on pauses longer than 10 minutes. The number of clusters (K) is increased until no two clusters are within 400 meters from each other. After this, pauses within a certain range of a cluster (200 meters by default) count as a visit to that significant location. This description was adapted from the Supplementary Materials of Barnett et al.
+The Circadian Calculation +For a detailed description of how this is calculated, see Canzian et al.
+These features are based on the original implementation by Doryab et al..
+Available time segments and platforms
+File Sequence
+- data/raw/{pid}/phone_locations_raw.csv
+- data/interim/{pid}/phone_locations_processed.csv
+- data/interim/{pid}/phone_locations_processed_with_datetime.csv
+- data/interim/{pid}/phone_locations_processed_with_datetime_with_home.csv
+- data/interim/{pid}/phone_locations_features/phone_locations_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_locations.csv
+
Parameters description for [PHONE_LOCATIONS][PROVIDERS][DORYAB]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_LOCATIONS features from the BARNETT provider |
+
[FEATURES] |
+Features to be computed, see table below | +
[ACCURACY_LIMIT] |
+An integer in meters, any location rows with an accuracy higher than this will be dropped. This number means there’s a 68% probability the true location is within this radius | +
[DBSCAN_EPS] |
+The maximum distance in meters between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function. | +
[DBSCAN_MINSAMPLES] |
+The number of samples (or total weight) in a neighborhood for a point to be considered as a core point of a cluster. This includes the point itself. | +
[THRESHOLD_STATIC] |
+It is the threshold value in km/hr which labels a row as Static or Moving. | +
[MAXIMUM_ROW_GAP] |
+The maximum gap (in seconds) allowed between any two consecutive rows for them to be considered part of the same displacement. If this threshold is too high, it can throw speed and distance calculations off for periods when the phone was not sensing. | +
[MAXIMUM_ROW_DURATION] |
+The time difference between any two consecutive rows A and B is considered as the time a participant spent in A . If this difference is bigger than MAXIMUM_ROW_GAP we substitute it with MAXIMUM_ROW_DURATION . |
+
[MINUTES_DATA_USED] |
+Set to True to include an extra column in the final location feature file containing the number of minutes used to compute the features on each time segment. Use this for quality control purposes; the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough. |
+
[SAMPLING_FREQUENCY] |
+Expected time difference between any two location rows in minutes. If set to 0 , the sampling frequency will be inferred automatically as the median of all the differences between two consecutive row timestamps (recommended if you are using FUSED_RESAMPLED data). This parameter impacts all the time calculations. |
+
[CLUSTER_ON] |
+Set this flag to PARTICIPANT_DATASET to create clusters based on the entire participant’s dataset or to TIME_SEGMENT to create clusters based on all the instances of the corresponding time segment (e.g. all mornings). |
+
[CLUSTERING_ALGORITHM] |
+The original Doryab et al. implementation uses DBSCAN , OPTICS is also available with similar (but not identical) clustering results and lower memory consumption. |
+
[RADIUS_FOR_HOME] |
+All location coordinates within this distance (meters) from the home location coordinates are considered a homestay (see timeathome feature). |
+
Features description for [PHONE_LOCATIONS][PROVIDERS][DORYAB]
:
Feature | +Units | +Description | +
---|---|---|
locationvariance | +\(meters^2\) | +The sum of the variances of the latitude and longitude columns. | +
loglocationvariance | +- | +Log of the sum of the variances of the latitude and longitude columns. | +
totaldistance | +meters | +Total distance traveled in a time segment using the haversine formula. | +
averagespeed | +km/hr | +Average speed in a time segment considering only the instances labeled as Moving. | +
varspeed | +km/hr | +Speed variance in a time segment considering only the instances labeled as Moving. | +
- | +Not suggested for use now; see Observations below. “It encodes the extent to which a person’s location patterns follow a 24-hour circadian cycle." Doryab et al.. | +|
numberofsignificantplaces | +places | +Number of significant locations visited. It is calculated using the DBSCAN/OPTICS clustering algorithm which takes in EPS and MIN_SAMPLES as parameters to identify clusters. Each cluster is a significant place. | +
numberlocationtransitions | +transitions | +Number of movements between any two clusters in a time segment. | +
radiusgyration | +meters | +Quantifies the area covered by a participant | +
timeattop1location | +minutes | +Time spent at the most significant location. | +
timeattop2location | +minutes | +Time spent at the 2nd most significant location. | +
timeattop3location | +minutes | +Time spent at the 3rd most significant location. | +
movingtostaticratio | +- | +Ratio between stationary time and total location sensed time. A lat/long coordinate pair is labeled as stationary if its speed (distance/time) to the next coordinate pair is less than 1km/hr. A higher value represents a more stationary routine. These times are computed using timeInSeconds feature. | +
outlierstimepercent | +- | +Ratio between the time spent in non-significant clusters divided by the time spent in all clusters (total location sensed time). A higher value represents more time spent in non-significant clusters. These times are computed using timeInSeconds feature. | +
maxlengthstayatclusters | +minutes | +Maximum time spent in a cluster (significant location). | +
minlengthstayatclusters | +minutes | +Minimum time spent in a cluster (significant location). | +
meanlengthstayatclusters | +minutes | +Average time spent in a cluster (significant location). | +
stdlengthstayatclusters | +minutes | +Standard deviation of time spent in a cluster (significant location). | +
locationentropy | +nats | +Shannon Entropy computed over the row count of each cluster (significant location), it is higher the more rows belong to a cluster (i.e., the more time a participant spent at a significant location). | +
normalizedlocationentropy | +nats | +Shannon Entropy computed over the row count of each cluster (significant location) divided by the number of clusters; it is higher the more rows belong to a cluster (i.e., the more time a participant spent at a significant location). | +
timeathome | +minutes | +Time spent at home (see Observations below for a description on how we compute home). | +
Assumptions/Observations
+Significant Locations Identified +Significant locations are determined using DBSCAN clustering on locations that a patient visit over the course of the period of data collection.
+Circadian Movement Calculation +Note Feb 3 2021. It seems the implementation of this feature is not correct; we suggest not to use this feature until a fix is in place. For a detailed description of how this should be calculated, see Saeb et al.
+Fine-Tuning Clustering Parameters
+Based on an experiment where we collected fused location data for 7 days with a mean accuracy of 86 & SD of 350.874635, we determined that EPS/MAX_EPS
=100 produced closer clustering results to reality. Higher values (>100) missed out on some significant places, like a short grocery visit, while lower values (<100) picked up traffic lights and stop signs while driving as significant locations. We recommend you set EPS
based on your location data’s accuracy (the more accurate your data is, the lower you should be able to set EPS).
Duration Calculation +To calculate the time duration component for our features, we compute the difference between consecutive rows’ timestamps to take into account sampling rate variability. If this time difference is larger than a threshold (300 seconds by default), we replace it with a maximum duration (60 seconds by default, i.e., we assume a participant spent at least 60 seconds in their last known location)
+Home location
+Home is calculated using all location data of a participant between 12 am and 6 am, then applying a clustering algorithm (DB_SCAN
or OPTICS
) and considering the center of the biggest cluster home for that participant.
Sensor parameters description for [PHONE_LOG]
:
Key | +Description | +
---|---|
[CONTAINER][ANDROID] |
+Data stream container (database table, CSV file, etc.) where a data log is stored for Android devices | +
[CONTAINER][IOS] |
+Data stream container (database table, CSV file, etc.) where a data log is stored for iOS devices | +
Note
+No feature providers have been implemented for this sensor yet, however you can use its key (PHONE_LOG
) to improve PHONE_DATA_YIELD
or you can implement your own features.
Sensor parameters description for [PHONE_MESSAGES]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Data stream container (database table, CSV file, etc.) where the messages data is stored | +
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/phone_messages_raw.csv
+- data/raw/{pid}/phone_messages_with_datetime.csv
+- data/interim/{pid}/phone_messages_features/phone_messages_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_messages.csv
+
Parameters description for [PHONE_MESSAGES][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_MESSAGES features from the RAPIDS provider |
+
[MESSAGES_TYPES] |
+The messages_type that will be analyzed. The options for this parameter are received or sent . |
+
[FEATURES] |
+Features to be computed, see table below for [MESSAGES_TYPES] received and sent |
+
Features description for [PHONE_MESSAGES][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
count | +messages | +Number of messages of type messages_type that occurred during a particular time_segment . |
+
distinctcontacts | +contacts | +Number of distinct contacts that are associated with a particular messages_type during a particular time_segment . |
+
timefirstmessages | +minutes | +Number of minutes between 12:00am (midnight) and the first message of a particular messages_type during a particular time_segment . |
+
timelastmessages | +minutes | +Number of minutes between 12:00am (midnight) and the last message of a particular messages_type during a particular time_segment . |
+
countmostfrequentcontact | +messages | +Number of messages from the contact with the most messages of messages_type during a time_segment throughout the whole dataset of each participant. |
+
Assumptions/Observations
+[MESSAGES_TYPES]
and [FEATURES]
keys in config.yaml
need to match. For example, [MESSAGES_TYPES]
sent
matches the [FEATURES]
key sent
Sensor parameters description for [PHONE_SCREEN]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Data stream container (database table, CSV file, etc.) where the screen data is stored | +
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/phone_screen_raw.csv
+- data/raw/{pid}/phone_screen_with_datetime.csv
+- data/interim/{pid}/phone_screen_episodes.csv
+- data/interim/{pid}/phone_screen_episodes_resampled.csv
+- data/interim/{pid}/phone_screen_episodes_resampled_with_datetime.csv
+- data/interim/{pid}/phone_screen_features/phone_screen_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_screen.csv
+
Parameters description for [PHONE_SCREEN][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_SCREEN features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed, see table below | +
[REFERENCE_HOUR_FIRST_USE] |
+The reference point from which firstuseafter is to be computed, default is midnight |
+
[IGNORE_EPISODES_SHORTER_THAN] |
+Ignore episodes that are shorter than this threshold (minutes). Set to 0 to disable this filter. | +
[IGNORE_EPISODES_LONGER_THAN] |
+Ignore episodes that are longer than this threshold (minutes). Set to 0 to disable this filter. | +
[EPISODE_TYPES] |
+Currently we only support unlock episodes (from when the phone is unlocked until the screen is off) |
+
Features description for [PHONE_SCREEN][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
sumduration | +minutes | +Total duration of all unlock episodes. | +
maxduration | +minutes | +Longest duration of any unlock episode. | +
minduration | +minutes | +Shortest duration of any unlock episode. | +
avgduration | +minutes | +Average duration of all unlock episodes. | +
stdduration | +minutes | +Standard deviation duration of all unlock episodes. | +
countepisode | +episodes | +Number of all unlock episodes | +
firstuseafter | +minutes | +Minutes until the first unlock episode. | +
Assumptions/Observations
+In Android, lock
events can happen right after an off
event, after a few seconds of an off
event, or never happen depending on the phone's settings, therefore, an unlock
episode is defined as the time between an unlock
and a off
event. In iOS, on
and off
events do not exist, so an unlock
episode is defined as the time between an unlock
and a lock
event.
Events in iOS are recorded reliably albeit some duplicated lock
events within milliseconds from each other, so we only keep consecutive unlock/lock pairs. In Android you cand find multiple consecutive unlock
or lock
events, so we only keep consecutive unlock/off pairs. In our experiments these cases are less than 10% of the screen events collected and this happens because ACTION_SCREEN_OFF
and ACTION_SCREEN_ON
are sent when the device becomes non-interactive which may have nothing to do with the screen turning off
. In addition to unlock/off episodes, in Android it is possible to measure the time spent on the lock screen before an unlock
event as well as the total screen time (i.e. ON
to OFF
) but these are not implemented at the moment.
We transform iOS screen events to match Android’s format, we replace lock
episodes with off
episodes (2 with 0) in iOS. However, as mentioned above this is still computing unlock
to lock
episodes.
Sensor parameters description for [PHONE_WIFI_CONNECTED]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Data stream container (database table, CSV file, etc.) where the wifi (connected) data is stored | +
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/phone_wifi_connected_raw.csv
+- data/raw/{pid}/phone_wifi_connected_with_datetime.csv
+- data/interim/{pid}/phone_wifi_connected_features/phone_wifi_connected_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_wifi_connected.csv
+
Parameters description for [PHONE_WIFI_CONNECTED][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_WIFI_CONNECTED features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed, see table below | +
Features description for [PHONE_WIFI_CONNECTED][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
countscans | +devices | +Number of scanned WiFi access points connected during a time_segment, an access point can be detected multiple times over time and these appearances are counted separately | +
uniquedevices | +devices | +Number of unique access point during a time_segment as identified by their hardware address | +
countscansmostuniquedevice | +scans | +Number of scans of the most scanned access point during a time_segment across the whole monitoring period | +
Assumptions/Observations
+sensor_wifi
table.Sensor parameters description for [PHONE_WIFI_VISIBLE]
:
Key | +Description | +
---|---|
[CONTAINER] |
+Data stream container (database table, CSV file, etc.) where the wifi (visible) data is stored | +
Available time segments and platforms
+File Sequence
+- data/raw/{pid}/phone_wifi_visible_raw.csv
+- data/raw/{pid}/phone_wifi_visible_with_datetime.csv
+- data/interim/{pid}/phone_wifi_visible_features/phone_wifi_visible_{language}_{provider_key}.csv
+- data/processed/features/{pid}/phone_wifi_visible.csv
+
Parameters description for [PHONE_WIFI_VISIBLE][PROVIDERS][RAPIDS]
:
Key | +Description | +
---|---|
[COMPUTE] |
+Set to True to extract PHONE_WIFI_VISIBLE features from the RAPIDS provider |
+
[FEATURES] |
+Features to be computed, see table below | +
Features description for [PHONE_WIFI_VISIBLE][PROVIDERS][RAPIDS]
:
Feature | +Units | +Description | +
---|---|---|
countscans | +devices | +Number of scanned WiFi access points visible during a time_segment, an access point can be detected multiple times over time and these appearances are counted separately | +
uniquedevices | +devices | +Number of unique access point during a time_segment as identified by their hardware address | +
countscansmostuniquedevice | +scans | +Number of scans of the most scanned access point during a time_segment across the whole monitoring period | +
Assumptions/Observations
+wifi
table.Reproducible Analysis Pipeline for Data Streams (RAPIDS) allows you to process smartphone and wearable data to extract and create behavioral features (a.k.a. digital biomarkers), visualize mobile sensor data, and structure your analysis into reproducible workflows.
+RAPIDS is open source, documented, modular, tested, and reproducible. At the moment, we support data streams logged by smartphones, Fitbit wearables, and, in collaboration with the DBDP, Empatica wearables (but you can add your own too).
+If you want to know more head over to Overview
+Tip
+Questions or feedback can be posted on the #rapids channel in AWARE Framework's slack.
+Bugs and feature requests should be posted on Github.
+Join our discussions on our algorithms and assumptions for feature processing.
+ Are you upgrading from RAPIDS 0.4.x
or older? Follow this guide
Ready? Go to Overview.
+There are four actions that you need to take if you were using RAPIDS 0.4.3
or older (before Feb 9th, 2021):
Check the new Overview page. Hopefully, it is a better overview of RAPIDS and provides answers to Frequently Asked Questions.
+snakemake -j1 renv_install
(for Ubuntu take advantage of the platform specific R renv
instructions)snakemake -j1 renv_restore
(for Ubuntu take advantage of the platform specific R renv
instructions)pxx.yaml
to the new folder.env
file to the new folder.env
file to the new credentials.yaml
formatThe .env
file is not used anymore, the same credential groups are stored in credentials.yaml
, migrate your .env
file by running:
+
python tools/update_format_env.py
+
config.yaml
Reconfigure your config.yaml
file by hand (don’t copy and paste the old one). Some keys and values changed but the defaults should be compatible with the things you know from RAPIDS 0.x (see below).
The most relevant changes to RAPIDS that you need to know about are:
+RAPIDS abstracts sensor data logged by different devices, platforms and stored in different data containers as data streams.
+The default data stream for PHONE
is aware_mysql
, and the default for FITBIT
is fitbitjson_mysql
. This is compatible with the old functionality (AWARE and JSON Fitbit data stored in MySQL). These values are set in [PHONE_DATA_STREAMS][USE]
and [FITBIT_DATA_STREAMS][USE]
.
You can add new data stream formats (sensing apps) and containers (database engines, file types, etc.).
+If you were processing your Fitbit data either in JSON or plain text (parsed) format, and it was stored in MySQL or CSV files, the changes that you made to your raw data will be compatible. Just choose fitbitjson_mysql
, fitbitparsed_mysql
, fitbitjson_csv
, fitbitparsed_csv
accordingly and set it in [FITBIT_DATA_STREAMS][USE]
.
In the future, you will not have to change your raw data; you will be able to just change column mappings/values in the data stream’s format.yaml
file.
You can now process data from participants that visited multiple time zones. The default is still a single time zone (America/New_York). See how to handle multiple time zones
+multiple
is now infer
When processing data from smartphones, RAPIDS allows you to infer the OS of a smartphone by using the keyword multiple
in the [PLATFORM]
key of participant files. Now RAPIDS uses infer
instead of multiple
Nonetheless, multiple
still works for backward compatibility.
DATABASE_GROUP
does not exist anymoreThere is no global DATABASE_GROUP
anymore. Each data stream that needs credentials to connect to a database has its own DATABASE_GROUP
config key. The groups are defined in credentials.yaml
instead of the .env
.
[DEVICE_SENSOR][TABLE]
is now [DEVICE_SENSOR][CONTAINER]
We renamed the keys [DEVICE_SENSOR][TABLE]
to [DEVICE_SENSOR][CONTAINER]
to reflect that, with the introduction of data streams, they can point to a database table, file, or any other data container.
In previous versions of RAPIDS, you could create participant files automatically using the aware_device
table. We deprecated this option but you can still achieve the same results if you export the output of the following SQL query as a CSV file and follow the instructions to create participant files from CSV files:
SELECT device_id, device_id as fitbit_id, CONCAT("p", _id) as empatica_id, CONCAT("p", _id) as pid, if(brand = "iPhone", "ios", "android") as platform, CONCAT("p", _id) as label, DATE_FORMAT(FROM_UNIXTIME((timestamp/1000)- 86400), "%Y-%m-%d") as start_date, CURRENT_DATE as end_date from aware_device order by _id;
+
SCR_SCRIPT
and SRC_LANGUAGE
are replaced by SRC_SCRIPT
The attributes SCR_SCRIPT
and SRC_LANGUAGE
of every sensor PROVIDER
are replaced by SRC_SCRIPT
. SRC_SCRIPT
is a relative path from the RAPIDS root folder to that provider’s feature script. We did this to simplify and clarify where the features scripts are stored.
There are no actions to take unless you created your own feature provider; update it with your feature script path.
+If you were relying on the old docs and the most recent version of RAPIDS you are working with is from or before Oct 13, 2020 you are using the beta version of RAPIDS.
+You can start using the RAPIDS 0.1.0
right away, just take into account the following:
chmod +x rapids
.env
fileYou can migrate your old participant files to the new YAML format: +
python tools/update_format_participant_files.py
+
Follow the new Configuration guide
+Get familiar with the new way of Executing RAPIDS
+You need to follow these steps to configure your RAPIDS deployment before you can extract behavioral features
+When you are done with this configuration, go to executing RAPIDS.
+Hint
+Every time you see config["KEY"]
or [KEY]
in these docs we are referring to the corresponding key in the config.yaml
file.
A data stream refers to sensor data collected using a specific type of device with a specific format and stored in a specific container. For example, the aware_mysql
data stream handles smartphone data (device) collected with the AWARE Framework (format) stored in a MySQL database (container).
Check the table in introduction to data streams to know what data streams we support. If your data stream is supported, continue to the next configuration section, you will use its label later in this guide (e.g. aware_mysql
). If your steam is not supported but you want to implement it, follow the tutorial to add support for new data streams and get in touch by email or in Slack if you have any questions.
Participant files link together multiple devices (smartphones and wearables) to specific participants and identify them throughout RAPIDS. You can create these files manually or automatically. Participant files are stored in data/external/participant_files/pxx.yaml
and follow a unified structure.
config.yaml
file with your PIDSThe list PIDS
in config.yaml
needs to have the participant file names of the people you want to process. For example, if you created p01.yaml
, p02.yaml
and p03.yaml
files in /data/external/participant_files/
, then PIDS
should be:
+
PIDS: [p01, p02, p03]
+
If you were using the pre-release version of RAPIDS with participant files in plain text (as opposed to yaml), you can run the following command and your old files will be converted into yaml files stored in data/external/participant_files/
python tools/update_format_participant_files.py
+
In this example, the participant used an android phone, an ios phone, a fitbit device, and a Empatica device throughout the study between Apr 23rd 2020 and Oct 28th 2020
+If your participants didn’t use a [PHONE]
, [FITBIT]
or [EMPATICA]
device, it is not necessary to include that section in their participant file. In other words, you can analyse data from 1 or more devices per participant.
PHONE:
+ DEVICE_IDS: [a748ee1a-1d0b-4ae9-9074-279a2b6ba524, dsadas-2324-fgsf-sdwr-gdfgs4rfsdf43]
+ PLATFORMS: [android,ios]
+ LABEL: test01
+ START_DATE: 2020-04-23
+ END_DATE: 2020-10-28
+FITBIT:
+ DEVICE_IDS: [fitbit1]
+ LABEL: test01
+ START_DATE: 2020-04-23
+ END_DATE: 2020-10-28
+EMPATICA:
+ DEVICE_IDS: [empatica1]
+ LABEL: test01
+ START_DATE: 2020-04-23
+ END_DATE: 2020-10-28
+
Key | +Description | +
---|---|
[DEVICE_IDS] |
+An array of the strings that uniquely identify each smartphone, you can have more than one for when participants changed phones in the middle of the study. | +
[PLATFORMS] |
+An array that specifies the OS of each smartphone in [DEVICE_IDS] , use a combination of android or ios (we support participants that changed platforms in the middle of your study!). You can set [PLATFORMS]: [infer] and RAPIDS will infer them automatically (each phone data stream infer this differently, e.g. aware_mysql uses the aware_device table). |
+
[LABEL] |
+A string that is used in reports and visualizations. | +
[START_DATE] |
+A string with format YYYY-MM-DD or YYYY-MM-DD HH:MM:SS . Only data collected after this date time will be included in the analysis. By default, YYYY-MM-DD is interpreted as YYYY-MM-DD 00:00:00 . |
+
[END_DATE] |
+A string with format YYYY-MM-DD or YYYY-MM-DD HH:MM:SS . Only data collected before this date time will be included in the analysis. By default, YYYY-MM-DD is interpreted as YYYY-MM-DD 00:00:00 . |
+
Key | +Description | +
---|---|
[DEVICE_IDS] |
+An array of the strings that uniquely identify each Fitbit, you can have more than one in case the participant changed devices in the middle of the study. | +
[LABEL] |
+A string that is used in reports and visualizations. | +
[START_DATE] |
+A string with format YYYY-MM-DD or YYYY-MM-DD HH:MM:SS . Only data collected after this date time will be included in the analysis. By default, YYYY-MM-DD is interpreted as YYYY-MM-DD 00:00:00 . |
+
[END_DATE] |
+A string with format YYYY-MM-DD or YYYY-MM-DD HH:MM:SS . Only data collected before this date time will be included in the analysis. By default, YYYY-MM-DD is interpreted as YYYY-MM-DD 00:00:00 . |
+
Key | +Description | +
---|---|
[DEVICE_IDS] |
+An array of the strings that uniquely identify each Empatica device used by this participant. Since the most common use case involves having multiple zip files from a single device for each person, set this device id to an arbitrary string (we usually use their pid ) |
+
[LABEL] |
+A string that is used in reports and visualizations. | +
[START_DATE] |
+A string with format YYYY-MM-DD or YYYY-MM-DD HH:MM:SS . Only data collected after this date time will be included in the analysis. By default, YYYY-MM-DD is interpreted as YYYY-MM-DD 00:00:00 . |
+
[END_DATE] |
+A string with format YYYY-MM-DD or YYYY-MM-DD HH:MM:SS . Only data collected before this date time will be included in the analysis. By default, YYYY-MM-DD is interpreted as YYYY-MM-DD 00:00:00 . |
+
You can use a CSV file with a row per participant to automatically create participant files.
+AWARE_DEVICE_TABLE
was deprecatedIn previous versions of RAPIDS, you could create participant files automatically using the aware_device
table. We deprecated this option but you can still achieve the same results if you export the output of the following SQL query as a CSV file and follow the instructions below:
SELECT device_id, device_id as fitbit_id, CONCAT("p", _id) as empatica_id, CONCAT("p", _id) as pid, if(brand = "iPhone", "ios", "android") as platform, CONCAT("p", _id) as label, DATE_FORMAT(FROM_UNIXTIME((timestamp/1000)- 86400), "%Y-%m-%d") as start_date, CURRENT_DATE as end_date from aware_device order by _id;
+
In your config.yaml
:
CSV_FILE_PATH
to a CSV file path that complies with the specs described belowPHONE
, FITBIT
, EMPATICA
) [ADD]
flag to TRUE
depending on what devices you used in your study.CREATE_PARTICIPANT_FILES:
+ CSV_FILE_PATH: "your_path/to_your.csv"
+ PHONE_SECTION:
+ ADD: TRUE # or FALSE
+ IGNORED_DEVICE_IDS: []
+ FITBIT_SECTION:
+ ADD: TRUE # or FALSE
+ IGNORED_DEVICE_IDS: []
+ EMPATICA_SECTION:
+ ADD: TRUE # or FALSE
+ IGNORED_DEVICE_IDS: []
+
Your CSV file ([CSV_FILE_PATH]
) should have the following columns (headers) but the values within each column can be empty:
Column | +Description | +
---|---|
device_id | +Phone device id. Separate multiple ids with ; |
+
fitbit_id | +Fitbit device id. Separate multiple ids with ; |
+
empatica_id | +Empatica device id. Since the most common use case involves having multiple zip files from a single device for each person, set this device id to an arbitrary string (we usually use their pid ) |
+
pid | +Unique identifiers with the format pXXX (your participant files will be named with this string) | +
platform | +Use android , ios or infer as explained above, separate values with ; |
+
label | +A human readable string that is used in reports and visualizations. | +
start_date | +A string with format YYY-MM-DD or YYYY-MM-DD HH:MM:SS . By default, YYYY-MM-DD is interpreted as YYYY-MM-DD 00:00:00 . |
+
end_date | +A string with format YYY-MM-DD or YYYY-MM-DD HH:MM:SS . By default, YYYY-MM-DD is interpreted as YYYY-MM-DD 00:00:00 . |
+
Example
+We added white spaces to this example to make it easy to read but you don’t have to.
+device_id ,fitbit_id, empatica_id ,pid ,label ,platform ,start_date ,end_date
+a748ee1a-1d0b-4ae9-9074-279a2b6ba524;dsadas-2324-fgsf-sdwr-gdfgs4rfsdf43 ,fitbit1 , p01 ,p01 ,julio ,android;ios ,2020-01-01 ,2021-01-01
+4c4cf7a1-0340-44bc-be0f-d5053bf7390c ,fitbit2 , p02 ,p02 ,meng ,ios ,2021-01-01 ,2022-01-01
+
Then run
+snakemake -j1 create_participants_files
+
Time segments (or epochs) are the time windows on which you want to extract behavioral features. For example, you might want to process data on every day, every morning, or only during weekends. RAPIDS offers three categories of time segments that are flexible enough to cover most use cases: frequency (short time windows every day), periodic (arbitrary time windows on any day), and event (arbitrary time windows around events of interest). See also our examples.
+These segments are computed on every day and all have the same duration (for example 30 minutes). Set the following keys in your config.yaml
TIME_SEGMENTS: &time_segments
+ TYPE: FREQUENCY
+ FILE: "data/external/your_frequency_segments.csv"
+ INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE
+
The file pointed by [TIME_SEGMENTS][FILE]
should have the following format and can only have 1 row.
Column | +Description | +
---|---|
label | +A string that is used as a prefix in the name of your time segments | +
length | +An integer representing the duration of your time segments in minutes | +
Example
+label,length
+thirtyminutes,30
+
This configuration will compute 48 time segments for every day when any data from any participant was sensed. For example:
+start_time,length,label
+00:00,30,thirtyminutes0000
+00:30,30,thirtyminutes0001
+01:00,30,thirtyminutes0002
+01:30,30,thirtyminutes0003
+...
+
These segments can be computed every day, or on specific days of the week, month, quarter, and year. Their minimum duration is 1 minute but they can be as long as you want. Set the following keys in your config.yaml
.
TIME_SEGMENTS: &time_segments
+ TYPE: PERIODIC
+ FILE: "data/external/your_periodic_segments.csv"
+ INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE # or TRUE
+
If [INCLUDE_PAST_PERIODIC_SEGMENTS]
is set to TRUE
, RAPIDS will consider instances of your segments back enough in the past as to include the first row of data of each participant. For example, if the first row of data from a participant happened on Saturday March 7th 2020 and the requested segment duration is 7 days starting on every Sunday, the first segment to be considered would start on Sunday March 1st if [INCLUDE_PAST_PERIODIC_SEGMENTS]
is TRUE
or on Sunday March 8th if FALSE
.
The file pointed by [TIME_SEGMENTS][FILE]
should have the following format and can have multiple rows.
Column | +Description | +
---|---|
label | +A string that is used as a prefix in the name of your time segments. It has to be unique between rows | +
start_time | +A string with format HH:MM:SS representing the starting time of this segment on any day |
+
length | +A string representing the length of this segment.It can have one or more of the following strings XXD XXH XXM XXS to represent days, hours, minutes and seconds. For example 7D 23H 59M 59S |
+
repeats_on | +One of the follow options every_day , wday , qday , mday , and yday . The last four represent a week, quarter, month and year day |
+
repeats_value | +An integer complementing repeats_on . If you set repeats_on to every_day set this to 0 , otherwise 1-7 represent a wday starting from Mondays, 1-31 represent a mday , 1-91 represent a qday , and 1-366 represent a yday |
+
Example
+label,start_time,length,repeats_on,repeats_value
+daily,00:00:00,23H 59M 59S,every_day,0
+morning,06:00:00,5H 59M 59S,every_day,0
+afternoon,12:00:00,5H 59M 59S,every_day,0
+evening,18:00:00,5H 59M 59S,every_day,0
+night,00:00:00,5H 59M 59S,every_day,0
+
This configuration will create five segments instances (daily
, morning
, afternoon
, evening
, night
) on any given day (every_day
set to 0). The daily
segment will start at midnight and will last 23:59:59
, the other four segments will start at 6am, 12pm, 6pm, and 12am respectively and last for 05:59:59
.
These segments can be computed before or after an event of interest (defined as any UNIX timestamp). Their minimum duration is 1 minute but they can be as long as you want. The start of each segment can be shifted backwards or forwards from the specified timestamp. Set the following keys in your config.yaml
.
TIME_SEGMENTS: &time_segments
+ TYPE: EVENT
+ FILE: "data/external/your_event_segments.csv"
+ INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE # or TRUE
+
The file pointed by [TIME_SEGMENTS][FILE]
should have the following format and can have multiple rows.
Column | +Description | +
---|---|
label | +A string that is used as a prefix in the name of your time segments. If labels are unique, every segment is independent; if two or more segments have the same label, their data will be grouped when computing auxiliary data for features like the most frequent contact for calls (the most frequent contact will be computed across all these segments). There cannot be two overlaping event segments with the same label (RAPIDS will throw an error) |
+
event_timestamp | +A UNIX timestamp that represents the moment an event of interest happened (clinical relapse, survey, readmission, etc.). The corresponding time segment will be computed around this moment using length , shift , and shift_direction |
+
length | +A string representing the length of this segment. It can have one or more of the following keys XXD XXH XXM XXS to represent a number of days, hours, minutes, and seconds. For example 7D 23H 59M 59S |
+
shift | +A string representing the time shift from event_timestamp . It can have one or more of the following keys XXD XXH XXM XXS to represent a number of days, hours, minutes and seconds. For example 7D 23H 59M 59S . Use this value to change the start of a segment with respect to its event_timestamp . For example, set this variable to 1H to create a segment that starts 1 hour from an event of interest (shift_direction determines if it’s before or after). |
+
shift_direction | +An integer representing whether the shift is before (-1 ) or after (1 ) an event_timestamp |
+
device_id | +The device id (smartphone or fitbit) to whom this segment belongs to. You have to create a line in this event segment file for each event of a participant that you want to analyse. If you have participants with multiple device ids you can choose any of them | +
Example
+label,event_timestamp,length,shift,shift_direction,device_id
+stress1,1587661220000,1H,5M,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+stress2,1587747620000,4H,4H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+stress3,1587906020000,3H,5M,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+stress4,1584291600000,7H,4H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+stress5,1588172420000,9H,5M,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+mood,1587661220000,1H,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+mood,1587747620000,1D,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+mood,1587906020000,7D,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+
This example will create eight segments for a single participant (a748ee1a...
), five independent stressX
segments with various lengths (1,4,3,7, and 9 hours). Segments stress1
, stress3
, and stress5
are shifted forwards by 5 minutes and stress2
and stress4
are shifted backwards by 4 hours (that is, if the stress4
event happened on March 15th at 1pm EST (1584291600000
), the time segment will start on that day at 9am and end at 4pm).
The three mood
segments are 1 hour, 1 day and 7 days long and have no shift. In addition, these mood
segments are grouped together, meaning that although RAPIDS will compute features on each one of them, some necessary information to compute a few of such features will be extracted from all three segments, for example the phone contact that called a participant the most or the location clusters visited by a participant.
In the final feature file, you will find a row per event segment. The local_segment
column of each row has a label
, a start date-time string, and an end date-time string.
weeklysurvey2060#2020-09-12 01:00:00,2020-09-18 23:59:59
+
All sensor data is always segmented based on timestamps, and the date-time strings are attached for informative purposes. For example, you can plot your features based on these strings.
+When you configure RAPIDS to work with a single time zone, such tz code will be used to convert start/end timestamps (the ones you typed in the event segments file) into start/end date-time strings. However, when you configure RAPIDS to work with multiple time zones, RAPIDS will use the most common time zone across all devices of every participant to do the conversion. The most common time zone is the one in which a participant spent the most time.
+In practical terms, this means that the date-time strings of event segments that happened in uncommon time zones will have shifted start/end date-time labels. However, the data within each segment was correctly filtered based on timestamps.
+Use the following Frequency
segment file to create 288 (12 * 60 * 24) 5-minute segments starting from midnight of every day in your study
+
label,length
+fiveminutes,5
+
Use the following Periodic
segment file to create daily segments starting from midnight of every day in your study
+
label,start_time,length,repeats_on,repeats_value
+daily,00:00:00,23H 59M 59S,every_day,0
+
Use the following Periodic
segment file to create morning segments starting at 06:00:00 and ending at 11:59:59 of every day in your study
+
label,start_time,length,repeats_on,repeats_value
+morning,06:00:00,5H 59M 59S,every_day,0
+
Use the following Periodic
segment file to create overnight segments starting at 20:00:00 and ending at 07:59:59 (next day) of every day in your study
+
label,start_time,length,repeats_on,repeats_value
+morning,20:00:00,11H 59M 59S,every_day,0
+
Use the following Periodic
segment file to create non-overlapping weekly segments starting at midnight of every Monday in your study
+
label,start_time,length,repeats_on,repeats_value
+weekly,00:00:00,6D 23H 59M 59S,wday,1
+
Periodic
segment file to create overlapping weekly segments starting at midnight of every day in your study
+label,start_time,length,repeats_on,repeats_value
+weekly,00:00:00,6D 23H 59M 59S,every_day,0
+
Use the following Periodic
segment file to create week-end segments starting at midnight of every Saturday in your study
+
label,start_time,length,repeats_on,repeats_value
+weekend,00:00:00,1D 23H 59M 59S,wday,6
+
Use the following Event
segment file to create two 2-hour segments that start 1 hour before surveys answered by 3 participants
+
label,event_timestamp,length,shift,shift_direction,device_id
+survey1,1587661220000,2H,1H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+survey2,1587747620000,2H,1H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+survey1,1587906020000,2H,1H,-1,rqtertsd-43ff-34fr-3eeg-efe4fergregr
+survey2,1584291600000,2H,1H,-1,rqtertsd-43ff-34fr-3eeg-efe4fergregr
+survey1,1588172420000,2H,1H,-1,klj34oi2-8frk-2343-21kk-324ljklewlr3
+survey2,1584291600000,2H,1H,-1,klj34oi2-8frk-2343-21kk-324ljklewlr3
+
If your study only happened in a single time zone or you want to ignore short trips of your participants to different time zones, select the appropriate code form this list and change the following config key. Double-check your timezone code pick, for example, US Eastern Time is America/New_York
not EST
TIMEZONE:
+ TYPE: SINGLE
+ TZCODE: America/New_York
+
If your participants lived in different time zones or they traveled across time zones, and you know when participants’ devices were in a specific time zone, RAPIDS can use this data to process your data streams with the correct date-time. You need to provide RAPIDS with the time zone data in a CSV file ([TZCODES_FILE]
) in the format described below.
TIMEZONE:
+ TYPE: MULTIPLE
+ SINGLE:
+ TZCODE: America/New_York
+ MULTIPLE:
+ TZCODES_FILE: path_to/time_zones_csv.file
+ IF_MISSING_TZCODE: STOP
+ DEFAULT_TZCODE: America/New_York
+ FITBIT:
+ ALLOW_MULTIPLE_TZ_PER_DEVICE: False
+ INFER_FROM_SMARTPHONE_TZ: False
+
Parameters for [TIMEZONE]
Parameter | +Description | +
---|---|
[TYPE] |
+Either SINGLE or MULTIPLE as explained above |
+
[SINGLE][TZCODE] |
+The time zone code from this list to be used across all devices | +
[MULTIPLE][TZCODES_FILE] |
+A CSV file containing the time zones in which participants’ devices sensed data (see the required format below). Multiple devices can be linked to the same person, read more in Participants Files | +
[MULTIPLE][IF_MISSING_TZCODE] |
+When a device is missing from [TZCODES_FILE] Set this flag to STOP to stop RAPIDS execution and show an error, or to USE_DEFAULT to assign the time zone specified in [DEFAULT_TZCODE] to any such devices |
+
[MULTIPLE][FITBIT][ALLOW_MULTIPLE_TZ_PER_DEVICE] |
+You only need to care about this flag if one or more Fitbit devices sensed data in one or more time zones, and you want RAPIDS to take into account this in its feature computation. Read more in “How does RAPIDS handle Fitbit devices?” below. | +
[MULTIPLE][FITBIT][INFER_FROM_SMARTPHONE_TZ] |
+You only need to care about this flag if one or more Fitbit devices sensed data in one or more time zones, and you want RAPIDS to take into account this in its feature computation. Read more in “How does RAPIDS handle Fitbit devices?” below. | +
TZCODES_FILE
TZCODES_FILE
has three columns and a row for each time zone a device visited (a device can be a smartphone or wearable (Fitbit/Empatica)):
Column | +Description | +
---|---|
device_id |
+A string that uniquely identifies a smartphone or wearable | +
tzcode |
+A string with the appropriate code from this list that represents the time zone where the device sensed data |
+
timestamp |
+A UNIX timestamp indicating when was the first time this device_id sensed data in tzcode |
+
device_id, tzcode, timestamp
+13dbc8a3-dae3-4834-823a-4bc96a7d459d, America/New_York, 1587500000000
+13dbc8a3-dae3-4834-823a-4bc96a7d459d, America/Mexico_City, 1587600000000
+13dbc8a3-dae3-4834-823a-4bc96a7d459d, America/Los_Angeles, 1587700000000
+65sa66a5-2d2d-4524-946v-44ascbv4sad7, Europe/Amsterdam, 1587100000000
+65sa66a5-2d2d-4524-946v-44ascbv4sad7, Europe/Berlin, 1587200000000
+65sa66a5-2d2d-4524-946v-44ascbv4sad7, Europe/Amsterdam, 1587300000000
+
Using this file, RAPDIS will create time zone intervals per device, for example for 13dbc8a3-dae3-4834-823a-4bc96a7d459d
:
[1587500000000, 1587599999999]
for America/New_York
[1587600000000, 1587699999999]
for America/Mexico_City
[1587700000000, now]
for America/Los_Angeles
Any sensor data row from a device will be assigned a timezone if it falls within that interval, for example:
+1587533333333
will be assigned to America/New_York
because it falls within Interval 11587400000000
will be discarded because it was logged outside any interval.TZCODES_FILE
from the time zone table collected automatically by the AWARE app?Sure. You can put your timezone table (timezone.csv
) collected by the AWARE app under data/external
folder and run:
+
python tools/create_multi_timezones_file.py
+
TZCODES_FILE
will be saved as data/external/multiple_timezones.csv
file.
+Add a row per participant and set timestamp to 0
:
+
device_id, tzcode, timestamp
+13dbc8a3-dae3-4834-823a-4bc96a7d459d, America/Los_Angeles, 0
+65sa66a5-2d2d-4524-946v-44ascbv4sad7, Europe/Amsterdam, 0
+
It depends on [IF_MISSING_TZCODE]
.
If [IF_MISSING_TZCODE]
is set to STOP
, RAPIDS will stop its execution and show you an error message.
If [IF_MISSING_TZCODE]
is set to USE_DEFAULT
, it will assign the time zone specified in [DEFAULT_TZCODE]
to any devices with missing time zone information in [TZCODES_FILE]
. This is helpful if only a few of your participants had multiple timezones and you don’t want to specify the same time zone for the rest.
Fitbit devices are not time zone aware and they always log data with a local date-time string.
+When none of the Fitbit devices in your study changed time zones (e.g., p01
was always in New York and p02
was always in Amsterdam), you can set a single time zone per Fitbit device id along with a timestamp 0 (you can still assign multiple time zones to smartphone device ids)
+
device_id, tzcode, timestamp
+fitbit123, America/New_York, 0
+fitbit999, Europe/Amsterdam, 0
+
On the other hand, when at least one of your Fitbit devices changed time zones AND you want RAPIDS to take into account these changes, you need to set [ALLOW_MULTIPLE_TZ_PER_DEVICE]
to True
. You have to manually allow this option because you need to be aware it can produce inaccurate features around the times when time zones changed. This is because we cannot know exactly when the Fitbit device detected and processed the time zone change.
If you want to ALLOW_MULTIPLE_TZ_PER_DEVICE
you will need to add any time zone changes per device in the TZCODES_FILE
as explained above. You could obtain this data by hand but if your participants also used a smartphone during your study, you can use their time zone logs. Recall that in RAPIDS every participant is represented with a participant file pXX.yaml
, this file links together multiple devices and we will use it to know what smartphone time zone data should be applied to Fitbit devices. Thus set INFER_FROM_SMARTPHONE_TZ
to TRUE
, if you have included smartphone time zone data in your TZCODE_FILE
and you want to make a participant’s Fitbit data time zone aware with their respective smartphone data.
Modify the following keys in your config.yaml
depending on the data stream you want to process.
Set [PHONE_DATA_STREAMS][TYPE]
to the smartphone data stream you want to process (e.g. aware_mysql
) and configure its parameters (e.g. [DATABASE_GROUP]
). Ignore the parameters of streams you are not using (e.g. [FOLDER]
of aware_csv
).
PHONE_DATA_STREAMS:
+ USE: aware_mysql
+
+ # AVAILABLE:
+ aware_mysql:
+ DATABASE_GROUP: MY_GROUP
+
+ aware_csv:
+ FOLDER: data/external/aware_csv
+
Key | +Description | +
---|---|
[DATABASE_GROUP] |
+A database credentials group. Read the instructions below to set it up | +
If you haven’t done so, create an empty file called credentials.yaml
in your RAPIDS root directory:
Add the following lines to credentials.yaml
and replace your database-specific credentials (user, password, host, and database):
MY_GROUP:
+ database: MY_DATABASE
+ host: MY_HOST
+ password: MY_PASSWORD
+ port: 3306
+ user: MY_USER
+
Notes
+The label [MY_GROUP]
is arbitrary but it has to match the [DATABASE_GROUP]
attribute of the data stream you choose to use.
Indentation matters
+You can have more than one credentials group in credentials.yaml
./.env
from RAPIDS 0.xIn RAPIDS versions 0.x, database credentials were stored in a ./.env
file. If you are migrating from that type of file, you have two options:
Migrate your credentials by hand:
+[MY_GROUP]
+user=MY_USER
+password=MY_PASSWORD
+host=MY_HOST
+port=3306
+database=MY_DATABASE
+
MY_GROUP:
+ user: MY_USER
+ password: MY_PASSWORD
+ host: MY_HOST
+ port: 3306
+ database: MY_DATABASE
+
Use the migration script we provide (make sure your conda environment is active):
+python tools/update_format_env.py
+
If you are using RAPIDS’ docker container and Docker-for-mac or Docker-for-Windows 18.03+, you can connect to a MySQL database in your host machine using host.docker.internal
instead of 127.0.0.1
or localhost
. In a Linux host, you need to run our docker container using docker run --network="host" -d moshiresearch/rapids:latest
and then 127.0.0.1
will point to your host machine.
Key | +Description | +
---|---|
[FOLDER] |
+Folder where you have to place a CSV file per phone sensor. Each file has to contain all the data from every participant you want to process. | +
Set [FITBIT_DATA_STREAMS][TYPE]
to the Fitbit data stream you want to process (e.g. fitbitjson_mysql
) and configure its parameters (e.g. [DATABASE_GROUP]
). Ignore the parameters of the other streams you are not using (e.g. [FOLDER]
of aware_csv
).
Warning
+You will probably have to tell RAPIDS the name of the columns where you stored your Fitbit data. To do this, modify your chosen stream’s format.yaml
column mappings to match your raw data column names.
FITBIT_DATA_STREAMS:
+ USE: fitbitjson_mysql
+
+ # AVAILABLE:
+ fitbitjson_mysql:
+ DATABASE_GROUP: MY_GROUP
+ SLEEP_SUMMARY_EPISODE_DAY_ANCHOR: False
+
+ fitbitjson_csv:
+ FOLDER: data/external/fitbit_csv
+ SLEEP_SUMMARY_EPISODE_DAY_ANCHOR: False
+
+ fitbitparsed_mysql:
+ DATABASE_GROUP: MY_GROUP
+ SLEEP_SUMMARY_EPISODE_DAY_ANCHOR: False
+
+ fitbitparsed_csv:
+ FOLDER: data/external/fitbit_csv
+ SLEEP_SUMMARY_EPISODE_DAY_ANCHOR: False
+
This data stream process Fitbit data inside a JSON column as obtained from the Fitbit API and stored in a MySQL database. Read more about its column mappings and mutations in fitbitjson_mysql
.
Key | +Description | +
---|---|
[DATABASE_GROUP] |
+A database credentials group. Read the instructions below to set it up | +
[SLEEP_SUMMARY_EPISODE_DAY_ANCHOR] |
+One of start or end . Summary sleep episodes are considered as events based on either the start timestamp or end timestamp (they will belong to the day where they start or end). |
+
If you haven’t done so, create an empty file called credentials.yaml
in your RAPIDS root directory:
Add the following lines to credentials.yaml
and replace your database-specific credentials (user, password, host, and database):
MY_GROUP:
+ database: MY_DATABASE
+ host: MY_HOST
+ password: MY_PASSWORD
+ port: 3306
+ user: MY_USER
+
Notes
+The label [MY_GROUP]
is arbitrary but it has to match the [DATABASE_GROUP]
attribute of the data stream you choose to use.
Indentation matters
+You can have more than one credentials group in credentials.yaml
./.env
from RAPIDS 0.xIn RAPIDS versions 0.x, database credentials were stored in a ./.env
file. If you are migrating from that type of file, you have two options:
Migrate your credentials by hand:
+[MY_GROUP]
+user=MY_USER
+password=MY_PASSWORD
+host=MY_HOST
+port=3306
+database=MY_DATABASE
+
MY_GROUP:
+ user: MY_USER
+ password: MY_PASSWORD
+ host: MY_HOST
+ port: 3306
+ database: MY_DATABASE
+
Use the migration script we provide (make sure your conda environment is active):
+python tools/update_format_env.py
+
If you are using RAPIDS’ docker container and Docker-for-mac or Docker-for-Windows 18.03+, you can connect to a MySQL database in your host machine using host.docker.internal
instead of 127.0.0.1
or localhost
. In a Linux host, you need to run our docker container using docker run --network="host" -d moshiresearch/rapids:latest
and then 127.0.0.1
will point to your host machine.
This data stream process Fitbit data inside a JSON column as obtained from the Fitbit API and stored in a CSV file. Read more about its column mappings and mutations in fitbitjson_csv
.
Key | +Description | +
---|---|
[FOLDER] |
+Folder where you have to place a CSV file per Fitbit sensor. Each file has to contain all the data from every participant you want to process. | +
[SLEEP_SUMMARY_EPISODE_DAY_ANCHOR] |
+One of start or end . Summary sleep episodes are considered as events based on either the start timestamp or end timestamp (they will belong to the day where they start or end). |
+
This data stream process Fitbit data stored in multiple columns after being parsed from the JSON column returned by Fitbit API and stored in a MySQL database. Read more about its column mappings and mutations in fitbitparsed_mysql
.
Key | +Description | +
---|---|
[DATABASE_GROUP] |
+A database credentials group. Read the instructions below to set it up | +
[SLEEP_SUMMARY_EPISODE_DAY_ANCHOR] |
+One of start or end . Summary sleep episodes are considered as events based on either the start timestamp or end timestamp (they will belong to the day where they start or end). |
+
If you haven’t done so, create an empty file called credentials.yaml
in your RAPIDS root directory:
Add the following lines to credentials.yaml
and replace your database-specific credentials (user, password, host, and database):
MY_GROUP:
+ database: MY_DATABASE
+ host: MY_HOST
+ password: MY_PASSWORD
+ port: 3306
+ user: MY_USER
+
Notes
+The label [MY_GROUP]
is arbitrary but it has to match the [DATABASE_GROUP]
attribute of the data stream you choose to use.
Indentation matters
+You can have more than one credentials group in credentials.yaml
./.env
from RAPIDS 0.xIn RAPIDS versions 0.x, database credentials were stored in a ./.env
file. If you are migrating from that type of file, you have two options:
Migrate your credentials by hand:
+[MY_GROUP]
+user=MY_USER
+password=MY_PASSWORD
+host=MY_HOST
+port=3306
+database=MY_DATABASE
+
MY_GROUP:
+ user: MY_USER
+ password: MY_PASSWORD
+ host: MY_HOST
+ port: 3306
+ database: MY_DATABASE
+
Use the migration script we provide (make sure your conda environment is active):
+python tools/update_format_env.py
+
If you are using RAPIDS’ docker container and Docker-for-mac or Docker-for-Windows 18.03+, you can connect to a MySQL database in your host machine using host.docker.internal
instead of 127.0.0.1
or localhost
. In a Linux host, you need to run our docker container using docker run --network="host" -d moshiresearch/rapids:latest
and then 127.0.0.1
will point to your host machine.
This data stream process Fitbit data stored in multiple columns (plain text) after being parsed from the JSON column returned by Fitbit API and stored in a CSV file. Read more about its column mappings and mutations in fitbitparsed_csv
.
Key | +Description | +
---|---|
[FOLDER] |
+Folder where you have to place a CSV file per Fitbit sensor. Each file has to contain all the data from every participant you want to process. | +
[SLEEP_SUMMARY_EPISODE_DAY_ANCHOR] |
+One of start or end . Summary sleep episodes are considered as events based on either the start timestamp or end timestamp (they will belong to the day where they start or end). |
+
Set [USE]
to the Empatica data stream you want to use; see the table in introduction to data streams. Configure any parameters as indicated below.
EMPATICA_DATA_STREAMS:
+ USE: empatica_zip
+
+ # AVAILABLE:
+ empatica_zip:
+ FOLDER: data/external/empatica
+
Key | +Description | +
---|---|
[FOLDER] |
+The relative path to a folder containing one subfolder per participant. The name of a participant folder should match their device_id assigned in their participant file. Each participant folder can have one or more zip files with any name; in other words, the sensor data in those zip files belong to a single participant. The zip files are automatically generated by Empatica and have a CSV file per sensor (ACC , HR , TEMP , EDA , BVP , TAGS ). All CSV files of the same type contained in one or more zip files are uncompressed, parsed, sorted by timestamp, and joined together. |
+
In the file tree below, we want to process three participants’ data: p01
, p02
, and p03
. p01
has two zip files, p02
has only one zip file, and p03
has three zip files. Each zip has a CSV file per sensor that are joined together and processed by RAPIDS.
data/ # this folder exists in the root RAPIDS folder
+ external/
+ empatica/
+ p01/
+ file1.zip
+ file2.zip
+ p02/
+ aaaa.zip
+ p03/
+ t1.zip
+ t2.zip
+ t3.zip
+
Finally, you need to modify the config.yaml
section of the sensors you want to extract behavioral features from. All sensors follow the same naming nomenclature (DEVICE_SENSOR
) and parameter structure which we explain in the Behavioral Features Introduction.
Done
+Head over to Execution to learn how to execute RAPIDS.
+After you have installed and configured RAPIDS, use the following command to execute it.
+./rapids -j1
+
Ready to extract behavioral features
+If you are ready to extract features head over to the Behavioral Features Introduction
+We wrap Snakemake
+The script ./rapids
is a wrapper around Snakemake so you can pass any parameters that Snakemake accepts (e.g. -j1
).
Updating RAPIDS output after modifying config.yaml
Any changes to the config.yaml
file will be applied automatically and only the relevant files will be updated. This means that after modifying the features list for PHONE_MESSAGE
for example, RAPIDS will execute the script that computes MESSAGES
features and update its output file.
Multi-core
+You can run RAPIDS over multiple cores by modifying the -j
argument (e.g. use -j8
to use 8 cores). However, take into account that this means multiple sensor datasets for different participants will be loaded in memory at the same time. If RAPIDS crashes because it ran out of memory, reduce the number of cores and try again.
As reference, we have run RAPIDS over 12 cores and 32 Gb of RAM without problems for a study with 200 participants with 14 days of low-frequency smartphone data (no accelerometer, gyroscope, or magnetometer).
+Deleting RAPIDS output
+If you want to delete all the output files RAPIDS produces, you can execute the following command:
+./rapids -j1 --delete-all-output
+
Forcing a complete rerun or updating your raw data in RAPIDS
+If you want to update your raw data or rerun the whole pipeline from scratch, run the following commands:
+./rapids -j1 --delete-all-output
+./rapids -j1
+
You can install RAPIDS using Docker (the fastest), or native instructions for MacOS and Linux (Ubuntu). Windows is supported through Docker or WSL.
+Install Docker
+Pull our RAPIDS container +
docker pull moshiresearch/rapids:latest
+
Run RAPIDS' container (after this step is done you should see a + prompt in the main RAPIDS folder with its python environment active)
+docker run -it moshiresearch/rapids:latest
+
Pull the latest version of RAPIDS
+git pull
+
Make RAPIDS script executable +
chmod +x rapids
+
Check that RAPIDS is working +
./rapids -j1
+
Optional. You can edit RAPIDS files with vim
but we recommend using Visual Studio Code
and its Remote Containers
extension
Remote Explorer
panel on the left hand sidebarContainers
moshiresearch/rapids
container in theCONTAINERS
treeWe tested these instructions in Catalina and Big Sur
+RAPIDS can run on M1 Macs, the only changes as of Feb 21, 2021 are:
+snakemake -j1 renv_install & snakemake -j1 renv_restore
). TZ_DIR
in renv/activate.R
(line #19) Sys.setenv("TZDIR" = file.path(R.home(), "share", "zoneinfo"))
(RAPIDS does this automatically).Install brew
+Install MySQL
+brew install mysql
+brew services start mysql
+
Install R 4.0, pandoc and rmarkdown. If you have other instances of R, we recommend uninstalling them
+brew install r
+brew install pandoc
+Rscript --vanilla -e 'install.packages("rmarkdown", repos="http://cran.us.r-project.org")'
+
Install miniconda (restart your terminal afterwards)
+brew cask install miniconda
+conda init zsh # (or conda init bash)
+
Clone our repo
+git clone https://github.com/carissalow/rapids
+
Create a python virtual environment
+cd rapids
+conda env create -f environment.yml -n rapids
+conda activate rapids
+
Install R packages and virtual environment:
+snakemake -j1 renv_install
+snakemake -j1 renv_restore
+
Note
This step could take several minutes to complete, especially if you have less than 3Gb of RAM or packages need to be compiled from source. Please be patient and let it run until completion.
+ +Make RAPIDS script executable +
chmod +x rapids
+
Check that RAPIDS is working +
./rapids -j1
+
We tested RAPIDS on Ubuntu 18.04 & 20.04. Note that the necessary Python and R packages are available in other Linux distributions, so if you decide to give it a try, let us know and we can update these docs.
+Install dependencies
+sudo apt install libcurl4-openssl-dev
+sudo apt install libssl-dev
+sudo apt install libxml2-dev
+sudo apt install libglpk40
+
Install MySQL
+sudo apt install libmysqlclient-dev
+sudo apt install mysql-server
+
Add key for R’s repository.
+sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
+
Add R’s repository
+sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/'
+
sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/'
+
Install R 4.0. If you have other instances of R, we recommend uninstalling them
+sudo apt update
+sudo apt install r-base
+
Install Pandoc and rmarkdown
+sudo apt install pandoc
+Rscript --vanilla -e 'install.packages("rmarkdown", repos="http://cran.us.r-project.org")'
+
Install git
+sudo apt install git
+
Install miniconda
+Restart your current shell
+Clone our repo:
+git clone https://github.com/carissalow/rapids
+
Create a python virtual environment:
+cd rapids
+conda env create -f environment.yml -n MY_ENV_NAME
+conda activate MY_ENV_NAME
+
Install the R virtual environment management package (renv)
+snakemake -j1 renv_install
+
Restore the R virtual environment
+Run the following command to restore the R virtual environment using RSPM binaries +
R -e 'renv::restore(repos = c(CRAN = "https://packagemanager.rstudio.com/all/__linux__/bionic/latest"))'
+
Run the following command to restore the R virtual environment using RSPM binaries +
R -e 'renv::restore(repos = c(CRAN = "https://packagemanager.rstudio.com/all/__linux__/focal/latest"))'
+
If the fast installation command failed for some reason, you can restore the R virtual environment from source: +
R -e 'renv::restore()'
+
Note
This step could take several minutes to complete, especially if you have less than 3Gb of RAM or packages need to be compiled from source. Please be patient and let it run until completion.
+ +Make RAPIDS script executable +
chmod +x rapids
+
Check that RAPIDS is working +
./rapids -j1
+
There are several options varying in complexity:
+Let’s review some key concepts we use throughout these docs:
+Definition | +Description | +
---|---|
Data Stream | +Set of sensor data collected using a specific type of device with a specific format and stored in a specific container. For example, smartphone (device) data collected with the AWARE Framework (format) and stored in a MySQL database (container). | +
Device | +A mobile or wearable device, like smartphones, Fitbit wrist bands, Oura Rings, etc. | +
Sensor | +A physical or digital module builtin in a device that produces a data stream. For example, a smartphone’s accelerometer or screen. | +
Format | +A file in RAPIDS that describes how sensor data from a device matches RAPIDS data representation. | +
Container | +An electronic repository of data, it can be a database, a file, a Web API, etc. RAPIDS connects to containers through container scripts. | +
Participant | +A person that took part in a monitoring study | +
Behavioral feature | +A metric computed from raw sensor data quantifying the behavior of a participant. For example, time spent at home computed from location data. These are also known as digital biomarkers | +
Time segment | +Time segments (or epochs) are the time windows on which RAPIDS extracts behavioral features. For example, you might want to compute participants’ time at home every morning or only during weekends. You define time segments in a CSV file that RAPIDS processes. | +
Time zone | +A string code like America/New_York that represents a time zone where a device logged data. You can process data collected in single or multiple time zones. |
+
Provider | +A script that creates behavioral features for a specific sensor. Providers are created by the core RAPIDS team or by the community, which are named after its first author like [PHONE_LOCATIONS][DORYAB]. | +
config.yaml | +A YAML file where you can modify parameters to process data streams and behavioral features. This is the heart of RAPIDS and the file that you will modify the most. | +
credentials.yaml | +A YAML file where you can define credential groups (user, password, host, etc.) if your data stream needs to connect to a database or Web API | +
Participant file(s) | +A YAML file that links one or more smartphone or wearable devices that a single participant used. RAPIDS needs one file per participant. | +
What can I do with RAPIDS?
+You can do one or more of these things with RAPIDS:
+Hint
+In order to follow any of the previous tutorials, you will have to Install, Configure, and learn how to Execute RAPIDS.
+We recommend you follow the Minimal Example tutorial to get familiar with RAPIDS
+Email us, leave a comment in these docs, create a Github issue or text us in Slack if you have any questions
+RAPIDS is a group of configuration files and R and Python scripts that are executed by Snakemake. You can get a copy of RAPIDS by cloning our Github repository.
+RAPIDS is not a web application or server; all the processing is done in your laptop, server, or computer cluster.
+You will most of the time only have to modify configuration files in YAML format (config.yaml
, credentials.yaml
, and participant files pxx.yaml
), and in CSV format (time zones and time segments).
RAPIDS pulls data from different data containers and processes it in steps. The input/output of each step is saved as a CSV file for inspection; you can check the files that are created for each sensor on its documentation page.
+All data is stored in data/
, and all processing Python and R scripts are stored in src/
.
In the figure below, we represent the interactions between users and files. After a user modifies the configuration files mentioned above, the Snakefile
file will search for and execute the Snakemake rules that contain the Python or R scripts necessary to generate or update the required output files (behavioral features, plots, etc.).
In the figure below, we represent the flow of data in RAPIDS. In broad terms, smartphone and wearable devices log data streams with a certain format to a data container (database, file, etc.).
+RAPIDS can connect to these containers if it has a format.yaml
and a container.[R|py]
script used to pull the correct data and mutate it to comply with RAPIDS’ internal data representation. Once the data stream is in RAPIDS, it goes through some basic transformations (scripts), one that assigns a time segment and a time zone to each data row, and another one that creates “episodes” of data for some sensors that need it (like screen, battery, activity recognition, and sleep intraday data).
After this, RAPIDS executes the requested PROVIDER
script that computes behavioral features per time segment instance. After every feature is computed, they are joined per sensor, per participant, and study. Visualizations are built based on raw data or based on computed features.
Absolutely, you are processing your data with your own copy of RAPIDS in your laptop, server, or computer cluster, so neither we nor anyone else can have access to your datasets.
+If you want to extract the behavioral features or visualizations that RAPIDS offers out of the box, the answer is no. However, you need to be comfortable running commands in your terminal and familiar with editing YAML files and CSV files.
+If you want to add support for new data streams or behavioral features, you need to be familiar with R or Python.
+Yes, RAPIDS is both open-source and free.
+Please refer to our Citation guide; depending on what parts of RAPIDS you used, we also ask you to cite the work of other authors that shared their work.
+Yes, we use Snakemake under the hood, so you can automatically distribute RAPIDS execution over multiple cores or clusters. RAPIDS processes data per sensor and participant, so it can take advantage of this parallel processing.
+We believe RAPIDS can benefit your analysis in several ways:
+Yes, but you need to add a new data stream to RAPIDS (a new format.yaml
and container script in R or Python). Follow this tutorial. Email us, create a Github issue or text us in Slack if you have any questions.
If you do so, let us know so we can integrate your work into RAPIDS.
+The only wearables we support at the moment are Empatica and Fitbit. However, get in touch if you need to process data from a different wearable. We have limited resources so we add support for different devices on an as-needed basis, but we would be happy to collaborate with you to add new wearables. Email us, create a Github issue or text us in Slack if you have any questions.
+Yes, but you need to add a new data stream to RAPIDS (a new format.yaml
and container script in R or Python). Follow this tutorial. If you are processing data streams we already support like AWARE, Fitbit, or Empatica and are just connecting to a different container; you can reuse their format.yaml
and only implement a new container script. Email us, create a Github issue or text us in Slack if you have any questions.
If you do so, let us know so we can integrate your work into RAPIDS.
+Yes, RAPIDS can handle single or multiple timezones per participant. You can use time zone data collected by smartphones or collected by hand.
+Yes, you can link more than one smartphone or wearable device to a single participant. RAPIDS will merge them and sort them automatically.
+Yes, data from multiple smartphones can be linked to a single participant. All iOS data is converted to Android data before merging it.
+Yes, you can implement your own features in R or Python following this tutorial
+Yes, we do this for PHONE_DATA_YIELD
(combines all phone sensors), PHONE_LOCATIONS
(combines location and data yield data), PHONE_APPLICATIONS_BACKGROUND
(combines screen and app usage data), and FITBIT_INTRADAY_STEPS
(combines Fitbit and sleep and step data).
However, we haven’t come up with a user-friendly way to configure this, and currently, we join sensors on a case-by-case basis. This is mainly because not enough users have needed this functionality so far. Get in touch, and we can set it up together; the more use cases we are aware of, the easier it will be to integrate this into RAPIDS.
+Yes, you don’t need to write any code to use RAPIDS out of the box. If you need to add support for new data streams or behavioral features you can use scripts in either language.
+Yes, you can add them as a [MUTATION][SCRIPT]
in the format.yaml
of the data stream you are using. You will add a main
function that will receive a data frame with the raw data for that sensor that in turn will be used to compute behavioral features.
If you collected sensor data with the vanilla (original) AWARE mobile clients, you shouldn’t need to modify this format (described below).
+Remember that a format maps and transforms columns in your raw data stream to the mandatory columns RAPIDS needs.
+The yaml file that describes the format of this data stream is at: +
src/data/streams/aware_csv/format.yaml
+
For some sensors, we need to transform iOS data into Android format; you can refer to OS complex mapping for learn how this works.
+Hint
+The mappings in this stream (RAPIDS/Stream) are the same names because AWARE data was the first stream RAPIDS supported, meaning that it considers AWARE column names the default.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_VALUES_0 | +double_values_0 | +
DOUBLE_VALUES_1 | +double_values_1 | +
DOUBLE_VALUES_2 | +double_values_2 | +
MUTATION
+Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
ACTIVITY_NAME | +activity_name | +
ACTIVITY_TYPE | +activity_type | +
CONFIDENCE | +confidence | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
ACTIVITY_NAME | +FLAG_TO_MUTATE | +
ACTIVITY_TYPE | +FLAG_TO_MUTATE | +
CONFIDENCE | +FLAG_TO_MUTATE | +
MUTATION
+Script column | +Stream column | +
---|---|
ACTIVITIES | +activities | +
CONFIDENCE | +confidence | +
src/data/streams/mutations/phone/aware/activity_recogniton_ios_unification.R
+
Note
+For RAPIDS columns of ACTIVITY_NAME
and ACTIVITY_TYPE
:
activities
field is automotive, set ACTIVITY_NAME
= in_vehicle and ACTIVITY_TYPE
= 0activities
field is cycling, set ACTIVITY_NAME
= on_bicycle and ACTIVITY_TYPE
= 1activities
field is walking, set ACTIVITY_NAME
= walking and ACTIVITY_TYPE
= 7activities
field is running, set ACTIVITY_NAME
= running and ACTIVITY_TYPE
= 8activities
field is stationary, set ACTIVITY_NAME
= still and ACTIVITY_TYPE
= 3activities
field is unknown, set ACTIVITY_NAME
= unknown and ACTIVITY_TYPE
= 4For RAPIDS CONFIDENCE
column:
confidence
field is 0, set CONFIDENCE
= 0confidence
field is 1, set CONFIDENCE
= 50confidence
field is 2, set CONFIDENCE
= 100RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
PACKAGE_NAME | +package_name | +
APPLICATION_NAME | +application_name | +
APPLICATION_VERSION | +application_version | +
ERROR_SHORT | +error_short | +
ERROR_LONG | +error_long | +
ERROR_CONDITION | +error_condition | +
IS_SYSTEM_APP | +is_system_app | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
PACKAGE_NAME | +package_name | +
APPLICATION_NAME | +application_name | +
IS_SYSTEM_APP | +is_system_app | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
PACKAGE_NAME | +package_name | +
APPLICATION_NAME | +application_name | +
TEXT | +text | +
SOUND | +sound | +
VIBRATE | +vibrate | +
DEFAULTS | +defaults | +
FLAGS | +flags | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
BATTERY_STATUS | +battery_status | +
BATTERY_LEVEL | +battery_level | +
BATTERY_SCALE | +battery_scale | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
BATTERY_STATUS | +FLAG_TO_MUTATE | +
BATTERY_LEVEL | +battery_level | +
BATTERY_SCALE | +battery_scale | +
MUTATION
+Script column | +Stream column | +
---|---|
BATTERY_STATUS | +battery_status | +
src/data/streams/mutations/phone/aware/battery_ios_unification.R
+
Note
+For RAPIDS BATTERY_STATUS
column:
battery_status
field is 3, set BATTERY_STATUS
= 5 (full status)battery_status
field is 1, set BATTERY_STATUS
= 3 (discharge)Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
BT_ADDRESS | +bt_address | +
BT_NAME | +bt_name | +
BT_RSSI | +bt_rssi | +
MUTATION
+Only old iOS versions supported this sensor (same mapping as Android).
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
CALL_TYPE | +call_type | +
CALL_DURATION | +call_duration | +
TRACE | +trace | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
CALL_TYPE | +FLAG_TO_MUTATE | +
CALL_DURATION | +call_duration | +
TRACE | +trace | +
MUTATION
+Script column | +Stream column | +
---|---|
CALL_TYPE | +call_type | +
src/data/streams/mutations/phone/aware/calls_ios_unification.R
+
Note
+We transform iOS call logs into Android’s format. iOS stores call status: 1=incoming, 2=connected, 3=dialing, 4=disconnected, as opposed to Android’s events: 1=incoming, 2=outgoing, 3=missed.
+We follow this algorithm to convert iOS call data (there are some inaccuracies in the way we handle sequences, see new rules below):
+Tested with an Android (OnePlus 7T) and an iPhone XR
+Call type | +Android (duration) | +iOS (duration) | +New Rule | +
---|---|---|---|
Outgoing missed ended by me | +2 (0) | +3,4 (0,X) | +3,4 is converted to 2 with duration 0 | +
Outgoing missed ended by them | +2(0) | +3,2,4 (0,X,X2) | +3,2,4 is converted to 2 with duration X2* | +
Incoming missed ended by me | +NA** | +1,4 (0,X) | +1,4 is converted to 3 with duration 0 | +
Incoming missed ended by them | +3(0) | +1,4 (0,X) | +1,4 is converted to 3 with duration 0 | +
Outgoing answered | +2(X excluding dialing time) | +3,2,4 (0,X,X2) | +3,2,4 is converted to 2 with duration X2 | +
Incoming answered | +1(X excluding dialing time) | +1,2,4 (0,X,X2) | +1,2,4 is converted to 1 with duration X2 | +
.* There is no way to differentiate an outgoing missed call ended by them from an outgoing answered call because the phone goes directly to voice mail and it counts as call time (essentially the voice mail answered).
+.** Android does not record incoming missed calls ended by the participant, just those ended by the person calling or ignored by the participant.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_ENERGY | +double_energy | +
INFERENCE | +inference | +
DOUBLE_CONVO_START | +double_convo_start | +
DOUBLE_CONVO_END | +double_convo_end | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_ENERGY | +double_energy | +
INFERENCE | +inference | +
DOUBLE_CONVO_START | +FLAG_TO_MUTATE | +
DOUBLE_CONVO_END | +FLAG_TO_MUTATE | +
MUTATION
+Script column | +Stream column | +
---|---|
DOUBLE_CONVO_START | +double_convo_start | +
DOUBLE_CONVO_END | +double_convo_end | +
src/data/streams/mutations/phone/aware/conversation_ios_timestamp.R
+
Note
+For RAPIDS columns of DOUBLE_CONVO_START
and DOUBLE_CONVO_END
:
double_convo_start
field is smaller than 9999999999, it is in seconds instead of milliseconds. Set DOUBLE_CONVO_START
= 1000 * double_convo_start
.double_convo_end
field is smaller than 9999999999, it is in seconds instead of milliseconds. Set DOUBLE_CONVO_END
= 1000 * double_convo_end
.RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
PACKAGE_NAME | +package_name | +
BEFORE_TEXT | +before_text | +
CURRENT_TEXT | +current_text | +
IS_PASSWORD | +is_password | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_LIGHT_LUX | +double_light_lux | +
ACCURACY | +accuracy | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
DOUBLE_LATITUDE | +double_latitude | +
DOUBLE_LONGITUDE | +double_longitude | +
DOUBLE_BEARING | +double_bearing | +
DOUBLE_SPEED | +double_speed | +
DOUBLE_ALTITUDE | +double_altitude | +
PROVIDER | +provider | +
ACCURACY | +accuracy | +
MUTATION
+Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
LOG_MESSAGE | +log_message | +
MUTATION
+Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
MESSAGE_TYPE | +message_type | +
TRACE | +trace | +
MUTATION
+This sensor is not supported by iOS devices.
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
SCREEN_STATUS | +screen_status | +
MUTATION
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
SCREEN_STATUS | +FLAG_TO_MUTATE | +
MUTATION
+Script column | +Stream column | +
---|---|
SCREEN_STATUS | +screen_status | +
src/data/streams/mutations/phone/aware/screen_ios_unification.R
+
Note
+For SCREEN_STATUS
RAPIDS column:
screen_status
field is 2 (lock episode), set SCREEN_STATUS
= 0 (off episode).RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
MAC_ADDRESS | +mac_address | +
SSID | +ssid | +
BSSID | +bssid | +
MUTATION
+Same as ANDROID
+RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +timestamp | +
DEVICE_ID | +device_id | +
SSID | +ssid | +
BSSID | +bssid | +
SECURITY | +security | +
FREQUENCY | +frequency | +
RSSI | +rssi | +
MUTATION
+Only old iOS versions supported this sensor (same mapping as Android).
+If you haven’t done so, create an empty file called credentials.yaml
in your RAPIDS root directory:
Add the following lines to credentials.yaml
and replace your database-specific credentials (user, password, host, and database):
MY_GROUP:
+ database: MY_DATABASE
+ host: MY_HOST
+ password: MY_PASSWORD
+ port: 3306
+ user: MY_USER
+
Notes
+The label [MY_GROUP]
is arbitrary but it has to match the [DATABASE_GROUP]
attribute of the data stream you choose to use.
Indentation matters
+You can have more than one credentials group in credentials.yaml
./.env
from RAPIDS 0.xIn RAPIDS versions 0.x, database credentials were stored in a ./.env
file. If you are migrating from that type of file, you have two options:
Migrate your credentials by hand:
+[MY_GROUP]
+user=MY_USER
+password=MY_PASSWORD
+host=MY_HOST
+port=3306
+database=MY_DATABASE
+
MY_GROUP:
+ user: MY_USER
+ password: MY_PASSWORD
+ host: MY_HOST
+ port: 3306
+ database: MY_DATABASE
+
Use the migration script we provide (make sure your conda environment is active):
+python tools/update_format_env.py
+
If you are using RAPIDS’ docker container and Docker-for-mac or Docker-for-Windows 18.03+, you can connect to a MySQL database in your host machine using host.docker.internal
instead of 127.0.0.1
or localhost
. In a Linux host, you need to run our docker container using docker run --network="host" -d moshiresearch/rapids:latest
and then 127.0.0.1
will point to your host machine.
Sensor section
+Each sensor (accelerometer, screen, etc.) of every supported device (smartphone, Fitbit, etc.) has a section in the config.yaml
with parameters
and feature PROVIDERS
.
Sensor Parameters.
+Each sensor section has one or more parameters. These are parameters that affect different aspects of how the raw data is pulled, and processed.
+The CONTAINER
parameter exists for every sensor, but some sensors will have extra parameters like [PHONE_LOCATIONS]
.
We explain these parameters in a table at the top of each sensor documentation page.
+Sensor Providers
+Each object in this list represents a feature PROVIDER
. Each sensor can have zero, one, or more providers.
A PROVIDER
is a script that creates behavioral features for a specific sensor. Providers are created by the core RAPIDS team or by the community, which are named after its first author like [PHONE_LOCATIONS][DORYAB].
In this example, there are two accelerometer feature providers RAPIDS
and PANDA
.
PROVIDER
Parameters
Each PROVIDER
has parameters that affect the computation of the behavioral features it offers.
These parameters include at least a [COMPUTE]
flag that you switch to True
to extract a provider’s behavioral features.
We explain every provider’s parameter in a table under the Parameters description
heading on each provider documentation page.
PROVIDER
Features
Each PROVIDER
offers a set of behavioral features.
These features are grouped in an array for some providers, like those for RAPIDS
provider. For others, they are grouped in a collection of arrays, like those for PANDAS
provider.
In either case, you can delete the features you are not interested in, and they will not be included in the sensor’s output feature file.
+We explain each behavioral feature in a table under the Features description
heading on each provider documentation page.
PROVIDER
script
Each PROVIDER
has a SRC_SCRIPT
that points to the script implementing its behavioral features.
It has to be a relative path from RAPIDS’ root folder and the script’s parent folder should be named after the provider, e.g. panda
.
The format.yaml
maps and transforms columns in your raw data stream to the mandatory columns RAPIDS needs for Fitbit sensors. This file is at:
src/data/streams/fitbitjson_csv/format.yaml
+
If you want RAPIDS to process Fitbit sensor data using this stream, you will need to map DEVICE_ID
and JSON_FITBIT_COLUMN
to your own raw data columns inside each sensor section in format.yaml
.
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
HEARTRATE_DAILY_RESTINGHR | +FLAG_TO_MUTATE | +
HEARTRATE_DAILY_CALORIESOUTOFRANGE | +FLAG_TO_MUTATE | +
HEARTRATE_DAILY_CALORIESFATBURN | +FLAG_TO_MUTATE | +
HEARTRATE_DAILY_CALORIESCARDIO | +FLAG_TO_MUTATE | +
HEARTRATE_DAILY_CALORIESPEAK | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_heartrate_summary_json.py
+
Note
All columns except DEVICE_ID
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-07”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:1200.6102,”max”:88,”min”:31,”minutes”:1058,”name”:”Out of Range”},{“caloriesOut”:760.3020,”max”:120,”min”:86,”minutes”:366,”name”:”Fat Burn”},{“caloriesOut”:15.2048,”max”:146,”min”:120,”minutes”:2,”name”:”Cardio”},{“caloriesOut”:0,”max”:221,”min”:148,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:72}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:68},{“time”:”00:01:00”,”value”:67},{“time”:”00:02:00”,”value”:67},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-08”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:1100.1120,”max”:89,”min”:30,”minutes”:921,”name”:”Out of Range”},{“caloriesOut”:660.0012,”max”:118,”min”:82,”minutes”:361,”name”:”Fat Burn”},{“caloriesOut”:23.7088,”max”:142,”min”:108,”minutes”:3,”name”:”Cardio”},{“caloriesOut”:0,”max”:221,”min”:148,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:70}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:77},{“time”:”00:01:00”,”value”:75},{“time”:”00:02:00”,”value”:73},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-09”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:750.3615,”max”:77,”min”:30,”minutes”:851,”name”:”Out of Range”},{“caloriesOut”:734.1516,”max”:107,”min”:77,”minutes”:550,”name”:”Fat Burn”},{“caloriesOut”:131.8579,”max”:130,”min”:107,”minutes”:29,”name”:”Cardio”},{“caloriesOut”:0,”max”:220,”min”:130,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:69}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:90},{“time”:”00:01:00”,”value”:89},{“time”:”00:02:00”,”value”:88},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
HEARTRATE | +FLAG_TO_MUTATE | +
HEARTRATE_ZONE | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_heartrate_intraday_json.py
+
Note
All columns except DEVICE_ID
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-07”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:1200.6102,”max”:88,”min”:31,”minutes”:1058,”name”:”Out of Range”},{“caloriesOut”:760.3020,”max”:120,”min”:86,”minutes”:366,”name”:”Fat Burn”},{“caloriesOut”:15.2048,”max”:146,”min”:120,”minutes”:2,”name”:”Cardio”},{“caloriesOut”:0,”max”:221,”min”:148,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:72}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:68},{“time”:”00:01:00”,”value”:67},{“time”:”00:02:00”,”value”:67},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-08”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:1100.1120,”max”:89,”min”:30,”minutes”:921,”name”:”Out of Range”},{“caloriesOut”:660.0012,”max”:118,”min”:82,”minutes”:361,”name”:”Fat Burn”},{“caloriesOut”:23.7088,”max”:142,”min”:108,”minutes”:3,”name”:”Cardio”},{“caloriesOut”:0,”max”:221,”min”:148,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:70}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:77},{“time”:”00:01:00”,”value”:75},{“time”:”00:02:00”,”value”:73},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“activities-heart”:[{“dateTime”:”2020-10-09”,”value”:{“customHeartRateZones”:[],”heartRateZones”:[{“caloriesOut”:750.3615,”max”:77,”min”:30,”minutes”:851,”name”:”Out of Range”},{“caloriesOut”:734.1516,”max”:107,”min”:77,”minutes”:550,”name”:”Fat Burn”},{“caloriesOut”:131.8579,”max”:130,”min”:107,”minutes”:29,”name”:”Cardio”},{“caloriesOut”:0,”max”:220,”min”:130,”minutes”:0,”name”:”Peak”}],”restingHeartRate”:69}}],”activities-heart-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:90},{“time”:”00:01:00”,”value”:89},{“time”:”00:02:00”,”value”:88},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
LOCAL_START_DATE_TIME | +FLAG_TO_MUTATE | +
LOCAL_END_DATE_TIME | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
EFFICIENCY | +FLAG_TO_MUTATE | +
MINUTES_AFTER_WAKEUP | +FLAG_TO_MUTATE | +
MINUTES_ASLEEP | +FLAG_TO_MUTATE | +
MINUTES_AWAKE | +FLAG_TO_MUTATE | +
MINUTES_TO_FALL_ASLEEP | +FLAG_TO_MUTATE | +
MINUTES_IN_BED | +FLAG_TO_MUTATE | +
IS_MAIN_SLEEP | +FLAG_TO_MUTATE | +
TYPE | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_sleep_summary_json.py
+
Note
+Fitbit API has two versions for sleep data, v1 and v1.2. We support both but ignore v1’s count_awake
, duration_awake
, and count_awakenings
, count_restless
, duration_restless
columns.
All columns except DEVICE_ID
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-10”,”duration”:3600000,”efficiency”:92,”endTime”:”2020-10-10T16:37:00.000”,”infoCode”:2,”isMainSleep”:false,”levels”:{“data”:[{“dateTime”:”2020-10-10T15:36:30.000”,”level”:”restless”,”seconds”:60},{“dateTime”:”2020-10-10T15:37:30.000”,”level”:”asleep”,”seconds”:660},{“dateTime”:”2020-10-10T15:48:30.000”,”level”:”restless”,”seconds”:60},…], “summary”:{“asleep”:{“count”:0,”minutes”:56},”awake”:{“count”:0,”minutes”:0},”restless”:{“count”:3,”minutes”:4}}},”logId”:26315914306,”minutesAfterWakeup”:0,”minutesAsleep”:55,”minutesAwake”:5,”minutesToFallAsleep”:0,”startTime”:”2020-10-10T15:36:30.000”,”timeInBed”:60,”type”:”classic”},{“dateOfSleep”:”2020-10-10”,”duration”:22980000,”efficiency”:88,”endTime”:”2020-10-10T08:10:00.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-10T01:46:30.000”,”level”:”light”,”seconds”:420},{“dateTime”:”2020-10-10T01:53:30.000”,”level”:”deep”,”seconds”:1230},{“dateTime”:”2020-10-10T02:14:00.000”,”level”:”light”,”seconds”:360},…], “summary”:{“deep”:{“count”:3,”minutes”:92,”thirtyDayAvgMinutes”:0},”light”:{“count”:29,”minutes”:193,”thirtyDayAvgMinutes”:0},”rem”:{“count”:4,”minutes”:33,”thirtyDayAvgMinutes”:0},”wake”:{“count”:28,”minutes”:65,”thirtyDayAvgMinutes”:0}}},”logId”:26311786557,”minutesAfterWakeup”:0,”minutesAsleep”:318,”minutesAwake”:65,”minutesToFallAsleep”:0,”startTime”:”2020-10-10T01:46:30.000”,”timeInBed”:383,”type”:”stages”}],”summary”:{“stages”:{“deep”:92,”light”:193,”rem”:33,”wake”:65},”totalMinutesAsleep”:373,”totalSleepRecords”:2,”totalTimeInBed”:443}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-11”,”duration”:41640000,”efficiency”:89,”endTime”:”2020-10-11T11:47:00.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-11T00:12:30.000”,”level”:”wake”,”seconds”:450},{“dateTime”:”2020-10-11T00:20:00.000”,”level”:”light”,”seconds”:870},{“dateTime”:”2020-10-11T00:34:30.000”,”level”:”wake”,”seconds”:780},…], “summary”:{“deep”:{“count”:4,”minutes”:52,”thirtyDayAvgMinutes”:62},”light”:{“count”:32,”minutes”:442,”thirtyDayAvgMinutes”:364},”rem”:{“count”:6,”minutes”:68,”thirtyDayAvgMinutes”:58},”wake”:{“count”:29,”minutes”:132,”thirtyDayAvgMinutes”:94}}},”logId”:26589710670,”minutesAfterWakeup”:1,”minutesAsleep”:562,”minutesAwake”:132,”minutesToFallAsleep”:0,”startTime”:”2020-10-11T00:12:30.000”,”timeInBed”:694,”type”:”stages”}],”summary”:{“stages”:{“deep”:52,”light”:442,”rem”:68,”wake”:132},”totalMinutesAsleep”:562,”totalSleepRecords”:1,”totalTimeInBed”:694}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-12”,”duration”:28980000,”efficiency”:93,”endTime”:”2020-10-12T09:34:30.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-12T01:31:00.000”,”level”:”wake”,”seconds”:600},{“dateTime”:”2020-10-12T01:41:00.000”,”level”:”light”,”seconds”:60},{“dateTime”:”2020-10-12T01:42:00.000”,”level”:”deep”,”seconds”:2340},…], “summary”:{“deep”:{“count”:4,”minutes”:63,”thirtyDayAvgMinutes”:59},”light”:{“count”:27,”minutes”:257,”thirtyDayAvgMinutes”:364},”rem”:{“count”:5,”minutes”:94,”thirtyDayAvgMinutes”:58},”wake”:{“count”:24,”minutes”:69,”thirtyDayAvgMinutes”:95}}},”logId”:26589710673,”minutesAfterWakeup”:0,”minutesAsleep”:415,”minutesAwake”:68,”minutesToFallAsleep”:0,”startTime”:”2020-10-12T01:31:00.000”,”timeInBed”:483,”type”:”stages”}],”summary”:{“stages”:{“deep”:63,”light”:257,”rem”:94,”wake”:69},”totalMinutesAsleep”:415,”totalSleepRecords”:1,”totalTimeInBed”:483}} | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
TYPE_EPISODE_ID | +FLAG_TO_MUTATE | +
DURATION | +FLAG_TO_MUTATE | +
IS_MAIN_SLEEP | +FLAG_TO_MUTATE | +
TYPE | +FLAG_TO_MUTATE | +
LEVEL | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_sleep_intraday_json.py
+
Note
+Fitbit API has two versions for sleep data, v1 and v1.2, we support both.
+All columns except DEVICE_ID
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-10”,”duration”:3600000,”efficiency”:92,”endTime”:”2020-10-10T16:37:00.000”,”infoCode”:2,”isMainSleep”:false,”levels”:{“data”:[{“dateTime”:”2020-10-10T15:36:30.000”,”level”:”restless”,”seconds”:60},{“dateTime”:”2020-10-10T15:37:30.000”,”level”:”asleep”,”seconds”:660},{“dateTime”:”2020-10-10T15:48:30.000”,”level”:”restless”,”seconds”:60},…], “summary”:{“asleep”:{“count”:0,”minutes”:56},”awake”:{“count”:0,”minutes”:0},”restless”:{“count”:3,”minutes”:4}}},”logId”:26315914306,”minutesAfterWakeup”:0,”minutesAsleep”:55,”minutesAwake”:5,”minutesToFallAsleep”:0,”startTime”:”2020-10-10T15:36:30.000”,”timeInBed”:60,”type”:”classic”},{“dateOfSleep”:”2020-10-10”,”duration”:22980000,”efficiency”:88,”endTime”:”2020-10-10T08:10:00.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-10T01:46:30.000”,”level”:”light”,”seconds”:420},{“dateTime”:”2020-10-10T01:53:30.000”,”level”:”deep”,”seconds”:1230},{“dateTime”:”2020-10-10T02:14:00.000”,”level”:”light”,”seconds”:360},…], “summary”:{“deep”:{“count”:3,”minutes”:92,”thirtyDayAvgMinutes”:0},”light”:{“count”:29,”minutes”:193,”thirtyDayAvgMinutes”:0},”rem”:{“count”:4,”minutes”:33,”thirtyDayAvgMinutes”:0},”wake”:{“count”:28,”minutes”:65,”thirtyDayAvgMinutes”:0}}},”logId”:26311786557,”minutesAfterWakeup”:0,”minutesAsleep”:318,”minutesAwake”:65,”minutesToFallAsleep”:0,”startTime”:”2020-10-10T01:46:30.000”,”timeInBed”:383,”type”:”stages”}],”summary”:{“stages”:{“deep”:92,”light”:193,”rem”:33,”wake”:65},”totalMinutesAsleep”:373,”totalSleepRecords”:2,”totalTimeInBed”:443}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-11”,”duration”:41640000,”efficiency”:89,”endTime”:”2020-10-11T11:47:00.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-11T00:12:30.000”,”level”:”wake”,”seconds”:450},{“dateTime”:”2020-10-11T00:20:00.000”,”level”:”light”,”seconds”:870},{“dateTime”:”2020-10-11T00:34:30.000”,”level”:”wake”,”seconds”:780},…], “summary”:{“deep”:{“count”:4,”minutes”:52,”thirtyDayAvgMinutes”:62},”light”:{“count”:32,”minutes”:442,”thirtyDayAvgMinutes”:364},”rem”:{“count”:6,”minutes”:68,”thirtyDayAvgMinutes”:58},”wake”:{“count”:29,”minutes”:132,”thirtyDayAvgMinutes”:94}}},”logId”:26589710670,”minutesAfterWakeup”:1,”minutesAsleep”:562,”minutesAwake”:132,”minutesToFallAsleep”:0,”startTime”:”2020-10-11T00:12:30.000”,”timeInBed”:694,”type”:”stages”}],”summary”:{“stages”:{“deep”:52,”light”:442,”rem”:68,”wake”:132},”totalMinutesAsleep”:562,”totalSleepRecords”:1,”totalTimeInBed”:694}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +{“sleep”:[{“dateOfSleep”:”2020-10-12”,”duration”:28980000,”efficiency”:93,”endTime”:”2020-10-12T09:34:30.000”,”infoCode”:0,”isMainSleep”:true,”levels”:{“data”:[{“dateTime”:”2020-10-12T01:31:00.000”,”level”:”wake”,”seconds”:600},{“dateTime”:”2020-10-12T01:41:00.000”,”level”:”light”,”seconds”:60},{“dateTime”:”2020-10-12T01:42:00.000”,”level”:”deep”,”seconds”:2340},…], “summary”:{“deep”:{“count”:4,”minutes”:63,”thirtyDayAvgMinutes”:59},”light”:{“count”:27,”minutes”:257,”thirtyDayAvgMinutes”:364},”rem”:{“count”:5,”minutes”:94,”thirtyDayAvgMinutes”:58},”wake”:{“count”:24,”minutes”:69,”thirtyDayAvgMinutes”:95}}},”logId”:26589710673,”minutesAfterWakeup”:0,”minutesAsleep”:415,”minutesAwake”:68,”minutesToFallAsleep”:0,”startTime”:”2020-10-12T01:31:00.000”,”timeInBed”:483,”type”:”stages”}],”summary”:{“stages”:{“deep”:63,”light”:257,”rem”:94,”wake”:69},”totalMinutesAsleep”:415,”totalSleepRecords”:1,”totalTimeInBed”:483}} | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
STEPS | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_steps_summary_json.py
+
Note
TIMESTAMP
, LOCAL_DATE_TIME
, and STEPS
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-07”,”value”:”1775”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:5},{“time”:”00:01:00”,”value”:3},{“time”:”00:02:00”,”value”:0},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-08”,”value”:”3201”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:14},{“time”:”00:01:00”,”value”:11},{“time”:”00:02:00”,”value”:10},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-09”,”value”:”998”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:0},{“time”:”00:01:00”,”value”:0},{“time”:”00:02:00”,”value”:0},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
LOCAL_DATE_TIME | +FLAG_TO_MUTATE | +
STEPS | +FLAG_TO_MUTATE | +
MUTATION
+COLUMN_MAPPINGS
+Script column | +Stream column | +
---|---|
JSON_FITBIT_COLUMN | +fitbit_data | +
SCRIPTS
+src/data/streams/mutations/fitbit/parse_steps_intraday_json.py
+
Note
TIMESTAMP
, LOCAL_DATE_TIME
, and STEPS
are parsed from JSON_FITBIT_COLUMN
. JSON_FITBIT_COLUMN
is a string column containing the JSON objects returned by Fitbit’s API. See an example of the raw data RAPIDS expects for this data stream:
device_id | +fitbit_data | +
---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-07”,”value”:”1775”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:5},{“time”:”00:01:00”,”value”:3},{“time”:”00:02:00”,”value”:0},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-08”,”value”:”3201”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:14},{“time”:”00:01:00”,”value”:11},{“time”:”00:02:00”,”value”:10},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +“activities-steps”:[{“dateTime”:”2020-10-09”,”value”:”998”}],”activities-steps-intraday”:{“dataset”:[{“time”:”00:00:00”,”value”:0},{“time”:”00:01:00”,”value”:0},{“time”:”00:02:00”,”value”:0},…],”datasetInterval”:1,”datasetType”:”minute”}} | +
The format.yaml
maps and transforms columns in your raw data stream to the mandatory columns RAPIDS needs for Fitbit sensors. This file is at:
src/data/streams/fitbitparsed_mysql/format.yaml
+
If you want to use this stream with your data, modify every sensor in format.yaml
to map all columns except TIMESTAMP
in [RAPIDS_COLUMN_MAPPINGS]
to your raw data column names.
All columns are mandatory; however, all except device_id
and local_date_time
can be empty if you don’t have that data. Just have in mind that some features will be empty if some of these columns are empty.
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +local_date_time | +
DEVICE_ID | +device_id | +
HEARTRATE_DAILY_RESTINGHR | +heartrate_daily_restinghr | +
HEARTRATE_DAILY_CALORIESOUTOFRANGE | +heartrate_daily_caloriesoutofrange | +
HEARTRATE_DAILY_CALORIESFATBURN | +heartrate_daily_caloriesfatburn | +
HEARTRATE_DAILY_CALORIESCARDIO | +heartrate_daily_caloriescardio | +
HEARTRATE_DAILY_CALORIESPEAK | +heartrate_daily_caloriespeak | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
device_id | +local_date_time | +heartrate_daily_restinghr | +heartrate_daily_caloriesoutofrange | +heartrate_daily_caloriesfatburn | +heartrate_daily_caloriescardio | +heartrate_daily_caloriespeak | +
---|---|---|---|---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 | +72 | +1200.6102 | +760.3020 | +15.2048 | +0 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-08 | +70 | +1100.1120 | +660.0012 | +23.7088 | +0 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-09 | +69 | +750.3615 | +734.1516 | +131.8579 | +0 | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +local_date_time | +
DEVICE_ID | +device_id | +
HEARTRATE | +heartrate | +
HEARTRATE_ZONE | +heartrate_zone | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
device_id | +local_date_time | +heartrate | +heartrate_zone | +
---|---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:00:00 | +68 | +outofrange | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:01:00 | +67 | +outofrange | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:02:00 | +67 | +outofrange | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +local_date_time | +
LOCAL_START_DATE_TIME | +local_start_date_time | +
LOCAL_END_DATE_TIME | +local_end_date_time | +
DEVICE_ID | +device_id | +
EFFICIENCY | +efficiency | +
MINUTES_AFTER_WAKEUP | +minutes_after_wakeup | +
MINUTES_ASLEEP | +minutes_asleep | +
MINUTES_AWAKE | +minutes_awake | +
MINUTES_TO_FALL_ASLEEP | +minutes_to_fall_asleep | +
MINUTES_IN_BED | +minutes_in_bed | +
IS_MAIN_SLEEP | +is_main_sleep | +
TYPE | +type | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
Fitbit API has two versions for sleep data, v1 and v1.2. We support both but ignore v1’s count_awake
, duration_awake
, and count_awakenings
, count_restless
, duration_restless
columns.
device_id | +local_start_date_time | +local_end_date_time | +efficiency | +minutes_after_wakeup | +minutes_asleep | +minutes_awake | +minutes_to_fall_asleep | +minutes_in_bed | +is_main_sleep | +type | +
---|---|---|---|---|---|---|---|---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-10 15:36:30 | +2020-10-10 16:37:00 | +92 | +0 | +55 | +5 | +0 | +60 | +0 | +classic | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-10 01:46:30 | +2020-10-10 08:10:00 | +88 | +0 | +318 | +65 | +0 | +383 | +1 | +stages | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-11 00:12:30 | +2020-10-11 11:47:00 | +89 | +1 | +562 | +132 | +0 | +694 | +1 | +stages | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-12 01:31:00 | +2020-10-12 09:34:30 | +93 | +0 | +415 | +68 | +0 | +483 | +1 | +stages | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
LOCAL_DATE_TIME | +local_date_time | +
DEVICE_ID | +device_id | +
TYPE_EPISODE_ID | +type_episode_id | +
DURATION | +duration | +
IS_MAIN_SLEEP | +is_main_sleep | +
TYPE | +type | +
LEVEL | +level | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
Fitbit API has two versions for sleep data, v1 and v1.2, we support both.
+device_id | +type_episode_id | +local_date_time | +duration | +level | +is_main_sleep | +type | +
---|---|---|---|---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +0 | +2020-10-10 15:36:30 | +60 | +restless | +0 | +classic | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +0 | +2020-10-10 15:37:30 | +660 | +asleep | +0 | +classic | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +0 | +2020-10-10 15:48:30 | +60 | +restless | +0 | +classic | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +… | +… | +… | +… | +… | +… | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +1 | +2020-10-10 01:46:30 | +420 | +light | +1 | +stages | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +1 | +2020-10-10 01:53:30 | +1230 | +deep | +1 | +stages | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
LOCAL_DATE_TIME | +local_date_time | +
STEPS | +steps | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
device_id | +local_date_time | +steps | +
---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 | +1775 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-08 | +3201 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-09 | +998 | +
RAPIDS_COLUMN_MAPPINGS
+RAPIDS column | +Stream column | +
---|---|
TIMESTAMP | +FLAG_TO_MUTATE | +
DEVICE_ID | +device_id | +
LOCAL_DATE_TIME | +local_date_time | +
STEPS | +steps | +
MUTATION
+COLUMN_MAPPINGS (None)
+SCRIPTS
+src/data/streams/mutations/fitbit/add_zero_timestamp.py
+
Note
+add_zero_timestamp
adds an all-zero column called timestamp
that will be filled in later in the pipeline by readable_time.R
converting LOCAL_DATE_TIME to a unix timestamp taking into account single or multiple time zones.
device_id | +local_date_time | +steps | +
---|---|---|
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:00:00 | +5 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:01:00 | +3 | +
a748ee1a-1d0b-4ae9-9074-279a2b6ba524 | +2020-10-07 00:02:00 | +0 | +
If you are interested in contributing feel free to submit a pull request or contact us.
+Julio Vega is a postdoctoral associate at the Mobile Sensing + Health Institute. He is interested in personalized methodologies to monitor chronic conditions that affect daily human behavior using mobile and wearable data.
+Meng Li received her Master of Science degree in Information Science from the University of Pittsburgh. She is interested in applying machine learning algorithms to the medical field.
+Abhineeth Reddy Kunta is a Senior Software Engineer with the Mobile Sensing + Health Institute. He is experienced in software development and specializes in building solutions using machine learning. Abhineeth likes exploring ways to leverage technology in advancing medicine and education. Previously he worked as a Computer Programmer at Georgia Department of Public Health. He has a master’s degree in Computer Science from George Mason University.
+Kwesi Aguillera is currently in his first year at the University of Pittsburgh pursuing a Master of Sciences in Information Science specializing in Big Data Analytics. He received his Bachelor of Science degree in Computer Science and Management from the University of the West Indies. Kwesi considers himself a full stack developer and looks forward to applying this knowledge to big data analysis.
+ +Echhit Joshi is a Masters student at the School of Computing and Information at University of Pittsburgh. His areas of interest are Machine/Deep Learning, Data Mining, and Analytics.
+ +Nicolas is a rising senior studying computer science at the University of Pittsburgh. His academic interests include databases, machine learning, and application development. After completing his undergraduate degree, he plans to attend graduate school for a MS in Computer Science with a focus on Intelligent Systems.
+Nik is a graduate student at the University of Pittsburgh pursuing Master of Science in Information Science. He earned his Bachelor of Technology degree in Information Technology from India. He is a Data Enthusiasts and passionate about finding the meaning out of raw data. In a long term, his goal is to create a breakthrough in Data Science and Deep Learning.
+ +Agam is a junior at Carnegie Mellon University studying Statistics and Machine Learning and pursuing an additional major in Computer Science. He is a member of the Data Science team in the Health and Human Performance Lab at CMU and has keen interests in software development and data science. His research interests include ML applications in medicine.
+ +Carnegie Mellon University
+University of Virginia
+We showcase these visualizations with a test study that collected 14 days of smartphone and Fitbit data from two participants (example01 and example02) and extracted behavioral features within five time segments (daily, morning, afternoon, evening, and night).
+Note
+Time segments (e.g. daily
, morning
, etc.) can have multiple instances (day 1, day 2, or morning 1, morning 2, etc.)
RAPIDS provides two histograms that show the number of time segment instances that had a certain ratio of valid yielded minutes and hours, respectively. A valid yielded minute has at least 1 row of data from any smartphone sensor and a valid yielded hour contains at least M valid minutes.
+These plots can be used as a rough indication of the smartphone monitoring coverage during a study aggregated across all participants. For example, the figure below shows a valid yielded minutes histogram for daily segments and we can infer that the monitoring coverage was very good since almost all segments contain at least 90 to 100% of the expected sensed minutes.
+Example
+Click here to see an example of these interactive visualizations in HTML format
+These heatmaps are a break down per time segment and per participant of Visualization 1. Heatmap’s rows represent participants, columns represent time segment instances and the cells’ color represent the valid yielded minute or hour ratio for a participant during a time segment instance.
+As different participants might join a study on different dates and time segments can be of any length and start on any day, the x-axis can be labelled with the absolute time of the start of each time segment instance or the time delta between the start of each time segment instance minus the start of the first instance. These plots provide a quick study overview of the monitoring coverage per person and per time segment.
+The figure below shows the heatmap of the valid yielded minute ratio for participants example01 and example02 on daily segments and, as we inferred from the previous histogram, the lighter (yellow) color on most time segment instances (cells) indicate both phones sensed data without interruptions for most days (except for the first and last ones).
+Example
+Click here to see an example of these interactive visualizations in HTML format
+Example
+Click here to see an example of these interactive visualizations in HTML format
+In these heatmaps rows represent time segment instances, columns represent minutes since the start of a time segment instance, and cells’ color shows the number of phone sensors that logged at least one row of data during those 1-minute windows.
+RAPIDS creates a plot per participant and per time segment and can be used as a rough indication of whether time-based sensors were following their sensing schedule (e.g. if location was being sensed every 2 minutes).
+The figure below shows this heatmap for phone sensors collected by participant example01 in daily time segments from Apr 23rd 2020 to May 4th 2020. We can infer that for most of the monitoring time, the participant’s phone logged data from at least 7 sensors each minute.
+Example
+Click here to see an example of these interactive visualizations in HTML format
+These heatmaps are a per-sensor breakdown of Visualization 1 and Visualization 2. Note that the second row (ratio of valid yielded minutes) of this heatmap matches the respective participant (bottom) row the screenshot in Visualization 2.
+In these heatmaps rows represent phone or Fitbit sensors, columns represent time segment instances and cell’s color shows the normalized (0 to 1) row count of each sensor within a time segment instance. RAPIDS creates one heatmap per participant and they can be used to judge missing data on a per participant and per sensor basis.
+The figure below shows data for 14 phone sensors (including data yield) of example01’s daily segments. From the top two rows, we can see that the phone was sensing data for most of the monitoring period (as suggested by Figure 3 and Figure 4). We can also infer how phone usage influenced the different sensor streams; there are peaks of screen events during the first day (Apr 23rd), peaks of location coordinates on Apr 26th and Apr 30th, and no sent or received SMS except for Apr 23rd, Apr 29th and Apr 30th (unlabeled row between screen and locations).
+Example
+Click here to see an example of these interactive visualizations in HTML format
+Columns and rows are the behavioral features computed in RAPIDS, cells’ color represents the correlation coefficient between all days of data for every pair of features of all participants.
+The user can specify a minimum number of observations (time segment instances) required to compute the correlation between two features using the MIN_ROWS_RATIO
parameter (0.5 by default) and the correlation method (Pearson, Spearman or Kendall) with the CORR_METHOD
parameter. In addition, this plot can be configured to only display correlation coefficients above a threshold using the CORR_THRESHOLD
parameter (0.1 by default).
Example
+Click here to see an example of these interactive visualizations in HTML format
+TL;DR
+Even though the bulk of RAPIDS current functionality is related to the computation of behavioral features, we recommend RAPIDS as a complementary tool to create a mobile data analysis workflow. This is because the cookiecutter data science file organization guidelines, the use of Snakemake, the provided behavioral features, and the reproducible R and Python development environments allow researchers to divide an analysis workflow into small parts that can be audited, shared in an online repository, reproduced in other computers, and understood by other people as they follow a familiar and consistent structure. We believe these advantages outweigh the time needed to learn how to create these workflows in RAPIDS.
+We clarify that to create analysis workflows in RAPIDS, researchers can still use any data manipulation tools, editors, libraries or languages they are already familiar with. RAPIDS is meant to be the final destination of analysis code that was developed in interactive notebooks or stand-alone scripts. For example, a user can compute call and location features using RAPIDS, then, they can use Jupyter notebooks to explore feature cleaning approaches and once the cleaning code is final, it can be moved to RAPIDS as a new step in the pipeline. In turn, the output of this cleaning step can be used to explore machine learning models and once a model is finished, it can also be transferred to RAPIDS as a step of its own. The idea is that when it is time to publish a piece of research, a RAPIDS workflow can be shared in a public repository as is.
+In the following sections we share an example of how we structured an analysis workflow in RAPIDS.
+To accurately reflect the complexity of a real-world modeling scenario, we decided not to oversimplify this example. Importantly, every step in this example follows a basic structure: an input file and parameters are manipulated by an R or Python script that saves the results to an output file. Input files, parameters, output files and scripts are grouped into Snakemake rules that are described on smk
files in the rules folder (we point the reader to the relevant rule(s) of each step).
Researchers can use these rules and scripts as a guide to create their own as it is expected every modeling project will have different requirements, data and goals but ultimately most follow a similar chainned pattern.
+Hint
+The example’s config file is example_profile/example_config.yaml
and its Snakefile is in example_profile/Snakefile
. The config file is already configured to process the sensor data as explained in Analysis workflow modules.
Our example is based on a hypothetical study that recruited 2 participants that underwent surgery and collected mobile data for at least one week before and one week after the procedure. Participants wore a Fitbit device and installed the AWARE client in their personal Android and iOS smartphones to collect mobile data 24/7. In addition, participants completed daily severity ratings of 12 common symptoms on a scale from 0 to 10 that we summed up into a daily symptom burden score.
+The goal of this workflow is to find out if we can predict the daily symptom burden score of a participant. Thus, we framed this question as a binary classification problem with two classes, high and low symptom burden based on the scores above and below average of each participant. We also want to compare the performance of individual (personalized) models vs a population model.
+In total, our example workflow has nine steps that are in charge of sensor data preprocessing, feature extraction, feature cleaning, machine learning model training and model evaluation (see figure below). We ship this workflow with RAPIDS and share files with test data in an Open Science Framework repository.
+ + +data/external/example_workflow/*.csv
../rapids -j1 create_example_participant_files
+
./rapids -j1 --profile example_profile
+
Note you will see a lot of warning messages, you can ignore them since they happen because we ran ML algorithms with a small fake dataset.
+We extract daily behavioral features for data yield, received and sent messages, missed, incoming and outgoing calls, resample fused location data using Doryab provider, activity recognition, battery, Bluetooth, screen, light, applications foreground, conversations, Wi-Fi connected, Wi-Fi visible, Fitbit heart rate summary and intraday data, Fitbit sleep summary data, and Fitbit step summary and intraday data without excluding sleep periods with an active bout threshold of 10 steps. In total, we obtained 237 daily sensor features over 12 days per participant.
+It is common to have demographic data in addition to mobile and target (ground truth) data. In this example we include participants’ age, gender and the number of days they spent in hospital after their surgery as features in our model. We extract these three columns from the data/external/example_workflow/participant_info.csv
file. As these three features remain the same within participants, they are used only on the population model. Refer to the demographic_features
rule in rules/models.smk
.
The two classes for our machine learning binary classification problem are high and low symptom burden. Target values are already stored in the data/external/example_workflow/participant_target.csv
file. A new rule/script can be created if further manipulation is necessary. Refer to the parse_targets
rule in rules/models.smk
.
These daily features are stored on a CSV file per sensor, a CSV file per participant, and a CSV file including all features from all participants (in every case each column represents a feature and each row represents a day). Refer to the merge_sensor_features_for_individual_participants
and merge_sensor_features_for_all_participants
rules in rules/features.smk
.
At this point the user can use the five plots RAPIDS provides (or implement new ones) to explore and understand the quality of the raw data and extracted features and decide what sensors, days, or participants to include and exclude. Refer to rules/reports.smk
to find the rules that generate these plots.
In this stage we perform four steps to clean our sensor feature file. First, we discard days with a data yield hour ratio less than or equal to 0.75, i.e. we include days with at least 18 hours of data. Second, we drop columns (features) with more than 30% of missing rows. Third, we drop columns with zero variance. Fourth, we drop rows (days) with more than 30% of missing columns (features). In this cleaning stage several parameters are created and exposed in example_profile/example_config.yaml
.
After this step, we kept 161 features over 11 days for the individual model of p01, 101 features over 12 days for the individual model of p02 and 107 features over 20 days for the population model. Note that the difference in the number of features between p01 and p02 is mostly due to iOS restrictions that stops researchers from collecting the same number of sensors than in Android phones.
+Feature cleaning for the individual models is done in the clean_sensor_features_for_individual_participants
rule and for the population model in the clean_sensor_features_for_all_participants
rule in rules/models.smk
.
In this step we merge the cleaned features and target labels for our individual models in the merge_features_and_targets_for_individual_model
rule in rules/models.smk
. Additionally, we merge the cleaned features, target labels, and demographic features of our two participants for the population model in the merge_features_and_targets_for_population_model
rule in rules/models.smk
. These two merged files are the input for our individual and population models.
This stage has three phases: model building, training and evaluation.
+In the building phase we impute, normalize and oversample our dataset. Missing numeric values in each column are imputed with their mean and we impute missing categorical values with their mode. We normalize each numeric column with one of three strategies (min-max, z-score, and scikit-learn package’s robust scaler) and we one-hot encode each categorial feature as a numerical array. We oversample our imbalanced dataset using SMOTE (Synthetic Minority Over-sampling Technique) or a Random Over sampler from scikit-learn. All these parameters are exposed in example_profile/example_config.yaml
.
In the training phase, we create eight models: logistic regression, k-nearest neighbors, support vector machine, decision tree, random forest, gradient boosting classifier, extreme gradient boosting classifier and a light gradient boosting machine. We cross-validate each model with an inner cycle to tune hyper-parameters based on the Macro F1 score and an outer cycle to predict the test set on a model with the best hyper-parameters. Both cross-validation cycles use a leave-one-out strategy. Parameters for each model like weights and learning rates are exposed in example_profile/example_config.yaml
.
Finally, in the evaluation phase we compute the accuracy, Macro F1, kappa, area under the curve and per class precision, recall and F1 score of all folds of the outer cross-validation cycle.
+Refer to the modelling_for_individual_participants
rule for the individual modeling and to the modelling_for_all_participants
rule for the population modeling, both in rules/models.smk
.
We create three baselines to evaluate our classification models.
+First, a majority classifier that labels each test sample with the majority class of our training data. Second, a random weighted classifier that predicts each test observation sampling at random from a binomial distribution based on the ratio of our target labels. Third, a decision tree classifier based solely on the demographic features of each participant. As we do not have demographic features for individual model, this baseline is only available for population model.
+Our baseline metrics (e.g. accuracy, precision, etc.) are saved into a CSV file, ready to be compared to our modeling results. Refer to the baselines_for_individual_model
rule for the individual model baselines and to the baselines_for_population_model
rule for population model baselines, both in rules/models.smk
.
This is a quick guide for creating and running a simple pipeline to extract missing, outgoing, and incoming call
features for 24 hr
(00:00:00
to 23:59:59
) and night
(00:00:00
to 05:59:59
) time segments of every day of data of one participant that was monitored on the US East coast with an Android smartphone.
conda
environment is active (see Installation)data/external/aware_csv/calls.csv
Make the changes listed below for the corresponding Configuration step (we provide an example of what the relevant sections in your config.yml
will look like after you are done)
Supported data streams.
+Based on the docs, we decided to use the aware_csv
data stream because we are processing aware data saved in a CSV file. We will use this label in a later step; there’s no need to type it or save it anywhere yet.
Create your participants file.
+Since we are processing data from a single participant, you only need to create a single participant file called p01.yaml
in data/external/participant_files
. This participant file only has a PHONE
section because this hypothetical participant was only monitored with a smartphone. Note that for a real analysis, you can do this automatically with a CSV file
Add p01
to [PIDS]
in config.yaml
Create a file in data/external/participant_files/p01.yaml
with the following content:
PHONE:
+ DEVICE_IDS: [a748ee1a-1d0b-4ae9-9074-279a2b6ba524] # the participant's AWARE device id
+ PLATFORMS: [android] # or ios
+ LABEL: MyTestP01 # any string
+ START_DATE: 2020-01-01 # this can also be empty
+ END_DATE: 2021-01-01 # this can also be empty
+
Select what time segments you want to extract features on.
+Set [TIME_SEGMENTS][FILE]
to data/external/timesegments_periodic.csv
Create a file in data/external/timesegments_periodic.csv
with the following content
label,start_time,length,repeats_on,repeats_value
+daily,00:00:00,23H 59M 59S,every_day,0
+night,00:00:00,5H 59M 59S,every_day,0
+
Choose the timezone of your study.
+We will use the default time zone settings since this example is processing data collected on the US East Coast (America/New_York
)
TIMEZONE:
+ TYPE: SINGLE
+ SINGLE:
+ TZCODE: America/New_York
+
Modify your device data stream configuration
+Set [PHONE_DATA_STREAMS][USE]
to aware_csv
.
We will use the default value for [PHONE_DATA_STREAMS][aware_csv][FOLDER]
since we already stored the test calls CSV file there.
Select what sensors and features you want to process.
+Set [PHONE_CALLS][CONTAINER]
to calls.csv
in the config.yaml
file.
Set [PHONE_CALLS][PROVIDERS][RAPIDS][COMPUTE]
to True
in the config.yaml
file.
Example of the config.yaml
sections after the changes outlined above
This will be your config.yaml
after following the instructions above. Click on the numbered markers to know more.
PIDS: [p01] # (1)
+
+TIMEZONE:
+ TYPE: SINGLE # (2)
+ SINGLE:
+ TZCODE: America/New_York
+
+# ... other irrelevant sections
+
+TIME_SEGMENTS: &time_segments
+ TYPE: PERIODIC # (3)
+ FILE: "data/external/timesegments_periodic.csv" # (4)
+ INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE
+
+PHONE_DATA_STREAMS:
+ USE: aware_csv # (5)
+
+ aware_csv:
+ FOLDER: data/external/aware_csv # (6)
+
+# ... other irrelevant sections
+
+############## PHONE ###########################################################
+################################################################################
+
+# ... other irrelevant sections
+
+# Communication call features config, TYPES and FEATURES keys need to match
+PHONE_CALLS:
+ CONTAINER: calls.csv # (7)
+ PROVIDERS:
+ RAPIDS:
+ COMPUTE: True # (8)
+ CALL_TYPES: ...
+
We added p01
to PIDS after creating the participant file:
+
data/external/participant_files/p01.yaml
+
With the following content: +
PHONE:
+ DEVICE_IDS: [a748ee1a-1d0b-4ae9-9074-279a2b6ba524] # the participant's AWARE device id
+ PLATFORMS: [android] # or ios
+ LABEL: MyTestP01 # any string
+ START_DATE: 2020-01-01 # this can also be empty
+ END_DATE: 2021-01-01 # this can also be empty
+
We use the default SINGLE
time zone.
We use the default PERIODIC
time segment [TYPE]
We created this time segments file with these lines:
+label,start_time,length,repeats_on,repeats_value
+daily,00:00:00,23H 59M 59S,every_day,0
+night,001:00:00,5H 59M 59S,every_day,0
+
We set [USE]
to aware_device
to tell RAPIDS to process sensor data collected with the AWARE Framework stored in CSV files.
We used the default [FOLDER]
for awre_csv
since we already stored our test calls.csv
file there
We changed [CONTAINER]
to calls.csv
to process our test call data.
We flipped [COMPUTE]
to True
to extract call behavioral features using the RAPIDS
feature provider.
Run RAPIDS +
./rapids -j1
+
data/processed/features/all_participants/all_sensor_features.csv
+
chunk_episodes
of utils.py
for multi time zone dataSCR_SCRIPT
and SRC_LANGUAGE
are replaced by SRC_SCRIPT
.config.yaml
validation schema and documentation. Now it’s more difficult to modify the config.yaml
file with invalid values.time at home
Doryab location featureFITBIT_DATA_YIELD
RAPIDS
provider[PHONE_DATA_YIELD][SENSORS]
when resampling location dataPHONE_BLUETOOTH
DORYAB
providerPHONE_BLUETOOTH
RAPIDS
providerfilter_data_by_segment
for Python when dataset was emptyRAPIDS and the community
-RAPIDS is a community effort and as such we want to continue recognizing the contributions from other researchers. Besides citing RAPIDS, we ask you to cite any of the authors listed below if you used those sensor providers in your analysis, thank you!
-If you used RAPIDS, please cite this paper.
-RAPIDS et al. citation
-Vega J, Li M, Aguillera K, Goel N, Joshi E, Durica KC, Kunta AR, Low CA -RAPIDS: Reproducible Analysis Pipeline for Data Streams Collected with Mobile Devices -JMIR Preprints. 18/08/2020:23246 -DOI: 10.2196/preprints.23246 -URL: https://preprints.jmir.org/preprint/23246
-If you computed features using the provider [DBDP]
of any of the Empatica sensors (accelerometer, heart rate, temperature, EDA, BVP, IBI, tags) cite this paper in addition to RAPIDS.
Bent et al. citation
-Bent, B., Wang, K., Grzesiak, E., Jiang, C., Qi, Y., Jiang, Y., Cho, P., Zingler, K., Ogbeide, F.I., Zhao, A., Runge, R., Sim, I., Dunn, J. (2020). The Digital Biomarker Discovery Pipeline: An open source software platform for the development of digital biomarkers using mHealth and wearables data. Journal of Clinical and Translational Science, 1-28. doi:10.1017/cts.2020.511
-If you computed accelerometer features using the provider [PHONE_ACCLEROMETER][PANDA]
cite this paper in addition to RAPIDS.
Panda et al. citation
-Panda N, Solsky I, Huang EJ, Lipsitz S, Pradarelli JC, Delisle M, Cusack JC, Gadd MA, Lubitz CC, Mullen JT, Qadan M, Smith BL, Specht M, Stephen AE, Tanabe KK, Gawande AA, Onnela JP, Haynes AB. Using Smartphones to Capture Novel Recovery Metrics After Cancer Surgery. JAMA Surg. 2020 Feb 1;155(2):123-129. doi: 10.1001/jamasurg.2019.4702. PMID: 31657854; PMCID: PMC6820047.
-If you computed applications foreground features using the app category (genre) catalogue in [PHONE_APPLICATIONS_FOREGROUND][RAPIDS]
cite this paper in addition to RAPIDS.
Stachl et al. citation
-Clemens Stachl, Quay Au, Ramona Schoedel, Samuel D. Gosling, Gabriella M. Harari, Daniel Buschek, Sarah Theres Völkel, Tobias Schuwerk, Michelle Oldemeier, Theresa Ullmann, Heinrich Hussmann, Bernd Bischl, Markus Bühner. Proceedings of the National Academy of Sciences Jul 2020, 117 (30) 17680-17687; DOI: 10.1073/pnas.1920484117
-If you computed bluetooth features using the provider [PHONE_BLUETOOTH][DORYAB]
cite this paper in addition to RAPIDS.
Doryab et al. citation
-Doryab, A., Chikarsel, P., Liu, X., & Dey, A. K. (2019). Extraction of Behavioral Features from Smartphone and Wearable Data. ArXiv:1812.10394 [Cs, Stat]. http://arxiv.org/abs/1812.10394
-If you computed locations features using the provider [PHONE_LOCATIONS][BARNETT]
cite this paper and this paper in addition to RAPIDS.
Barnett et al. citation
-Ian Barnett, Jukka-Pekka Onnela, Inferring mobility measures from GPS traces with missing data, Biostatistics, Volume 21, Issue 2, April 2020, Pages e98–e112, https://doi.org/10.1093/biostatistics/kxy059
-Canzian et al. citation
-Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ‘15). Association for Computing Machinery, New York, NY, USA, 1293–1304. DOI:https://doi.org/10.1145/2750858.2805845
-If you computed locations features using the provider [PHONE_LOCATIONS][DORYAB]
cite this paper and this paper in addition to RAPIDS.
Doryab et al. citation
-Doryab, A., Chikarsel, P., Liu, X., & Dey, A. K. (2019). Extraction of Behavioral Features from Smartphone and Wearable Data. ArXiv:1812.10394 [Cs, Stat]. http://arxiv.org/abs/1812.10394
-Canzian et al. citation
-Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ‘15). Association for Computing Machinery, New York, NY, USA, 1293–1304. DOI:https://doi.org/10.1145/2750858.2805845
-