Merge pull request #103 from carissalow/day_segments

Release v0.1.0
pull/104/head
MoSHI 2020-12-07 15:20:40 -05:00 committed by GitHub
commit 0102c77963
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
676 changed files with 30759 additions and 8544 deletions

View File

@ -12,10 +12,10 @@ A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
1. Enable ... feature provider
2. Setup ... sensor parameters
3. Run RAPIDS
4. etc ...
**Expected behavior**
A clear and concise description of what you expected to happen.
@ -25,7 +25,8 @@ If applicable, add screenshots to help explain your problem.
**Please complete the following information:**
- OS: [e.g. MacOS]
- Version [e.g. 22]
- RAPIDS current commit, paste the output of `git rev-parse --short HEAD`
- A link to your `config.yaml`
- Type of mobile data you are dealing with (Android/iOS)

23
.github/workflows/docs.yaml vendored 100644
View File

@ -0,0 +1,23 @@
name: docs
on:
push:
branches:
- day_segments
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- uses: actions/setup-python@v2
with:
python-version: 3.x
- run: pip install git+https://${GH_TOKEN}@github.com/carissalow/mkdocs-material-insiders.git
- run: pip install mike
- run: |
git config user.name github-actions
git config user.email github-actions@github.com
mike deploy --push --update-aliases 0.1 latest
env:
GH_TOKEN: ${{ secrets.GH_TOKEN }}

5
.gitignore vendored
View File

@ -95,6 +95,7 @@ packrat/*
data/external/*
!/data/external/.gitkeep
!/data/external/stachl_application_genre_catalogue.csv
!/data/external/timesegments*.csv
data/raw/*
!/data/raw/.gitkeep
data/interim/*
@ -107,5 +108,7 @@ reports/
.RData
.Rhistory
sn_profile_*/
!sn_profile_rapids
settings.dcf
tests/fakedata_generation/
tests/fakedata_generation/
site/

View File

@ -39,38 +39,7 @@ jobs:
- "$HOME/.local/share/renv"
- "$TRAVIS_BUILD_DIR/renv/library"
script:
- bash tests/scripts/run_tests.sh
- name: Python 3.7 on macOS
os: osx
osx_image: xcode11.3
language: generic
before_install:
- HOMEBREW_NO_AUTO_UPDATE=1 brew install gcc@9
- HOMEBREW_NO_AUTO_UPDATE=1 brew install https://github.com/Homebrew/homebrew-core/raw/218998d/Formula/r.rb
- R --version
- HOMEBREW_NO_AUTO_UPDATE=1 brew install mysql
- HOMEBREW_NO_AUTO_UPDATE=1 brew services start mysql
- HOMEBREW_NO_AUTO_UPDATE=1 brew cask install miniconda
- eval "$(/opt/miniconda3/condabin/conda shell.bash hook)"
- eval "$(conda shell.bash hook)"
install:
- conda init bash
- conda update -q --all --yes conda
- conda env create -q -n test-environment python=$TRAVIS_PYTHON_VERSION --file
environment.yml
- conda activate test-environment
- snakemake -j1 renv_install
- R -e 'renv::settings$use.cache(FALSE)'
- snakemake -j1 renv_restore
env:
- RENV_PATHS_ROOT="$HOME/renv/cache"
cache:
directories:
- "/usr/local/lib/R"
- "$RENV_PATHS_ROOT"
- "$TRAVIS_BUILD_DIR/renv/library"
script:
- bash tests/scripts/run_tests.sh
- bash tests/scripts/run_tests.sh all test
- stage: deploy
name: Python 3.7 on Xenial Linux Docker
os: linux
@ -83,6 +52,7 @@ jobs:
branches:
only:
- master
- time_segment
stages:
- name: deploy

View File

@ -1,6 +1,7 @@
[![Snakemake](https://img.shields.io/badge/snakemake-≥5.7.1-brightgreen.svg?style=flat)](https://snakemake.readthedocs.io)
[![Documentation Status](https://readthedocs.org/projects/rapidspitt/badge/?version=latest)](https://rapidspitt.readthedocs.io/en/latest/?badge=latest)
[![Documentation Status](https://github.com/carissalow/rapids/workflows/docs/badge.svg)](https://www.rapids.science/)
[![Build Status](https://travis-ci.com/carissalow/rapids.svg?branch=master)](https://travis-ci.com/carissalow/rapids)
[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg)](code_of_conduct.md)
# RAPIDS

340
Snakefile
View File

@ -12,164 +12,238 @@ files_to_compute = []
if len(config["PIDS"]) == 0:
raise ValueError("Add participants IDs to PIDS in config.yaml. Remember to create their participant files in data/external")
if config["PHONE_VALID_SENSED_BINS"]["COMPUTE"] or config["PHONE_VALID_SENSED_DAYS"]["COMPUTE"]: # valid sensed bins is necessary for sensed days, so we add these files anyways if sensed days are requested
if len(config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"]) == 0:
raise ValueError("If you want to compute PHONE_VALID_SENSED_BINS or PHONE_VALID_SENSED_DAYS, you need to add at least one table to [PHONE_VALID_SENSED_BINS][DB_TABLES] in config.yaml")
for provider in config["PHONE_DATA_YIELD"]["PROVIDERS"].keys():
if config["PHONE_DATA_YIELD"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=map(str.lower, config["PHONE_DATA_YIELD"]["SENSORS"])))
files_to_compute.extend(expand("data/interim/{pid}/phone_yielded_timestamps.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_yielded_timestamps_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_data_yield_features/phone_data_yield_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_DATA_YIELD"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_data_yield.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
pids_android = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "android", config["PIDS"]))
pids_ios = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "ios", config["PIDS"]))
tables_android = [table for table in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"] if table not in [config["CONVERSATION"]["DB_TABLE"]["IOS"], config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]]] # for android, discard any ios tables that may exist
tables_ios = [table for table in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"] if table not in [config["CONVERSATION"]["DB_TABLE"]["ANDROID"], config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["ANDROID"]]] # for ios, discard any android tables that may exist
for provider in config["PHONE_MESSAGES"]["PROVIDERS"].keys():
if config["PHONE_MESSAGES"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_messages_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_messages_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_messages_features/phone_messages_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_MESSAGES"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_messages.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for pids,table in zip([pids_android, pids_ios], [tables_android, tables_ios]):
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/interim/{pid}/phone_sensed_bins.csv", pid=config["PIDS"]))
for provider in config["PHONE_CALLS"]["PROVIDERS"].keys():
if config["PHONE_CALLS"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_calls_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_calls_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_calls_with_datetime_unified.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_calls_features/phone_calls_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_CALLS"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_calls.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if config["PHONE_VALID_SENSED_DAYS"]["COMPUTE"]:
files_to_compute.extend(expand("data/interim/{pid}/phone_valid_sensed_days_{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins.csv",
pid=config["PIDS"],
min_valid_hours_per_day=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_HOURS_PER_DAY"],
min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
for provider in config["PHONE_BLUETOOTH"]["PROVIDERS"].keys():
if config["PHONE_BLUETOOTH"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_bluetooth_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_bluetooth_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_bluetooth_features/phone_bluetooth_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_BLUETOOTH"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_bluetooth.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if config["MESSAGES"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["MESSAGES"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["MESSAGES"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/messages_{messages_type}_{day_segment}.csv", pid=config["PIDS"], messages_type = config["MESSAGES"]["TYPES"], day_segment = config["MESSAGES"]["DAY_SEGMENTS"]))
for provider in config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"].keys():
if config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_activity_recognition_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_activity_recognition_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_activity_recognition_with_datetime_unified.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_episodes.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_episodes_resampled.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_features/phone_activity_recognition_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_activity_recognition.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if config["CALLS"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["CALLS"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["CALLS"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=config["PIDS"], sensor=config["CALLS"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/calls_{call_type}_{day_segment}.csv", pid=config["PIDS"], call_type=config["CALLS"]["TYPES"], day_segment = config["CALLS"]["DAY_SEGMENTS"]))
for provider in config["PHONE_BATTERY"]["PROVIDERS"].keys():
if config["PHONE_BATTERY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_battery_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_battery_episodes.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_battery_episodes_resampled.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_battery_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_battery_features/phone_battery_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_BATTERY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_battery.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if config["BARNETT_LOCATION"]["COMPUTE"]:
if config["BARNETT_LOCATION"]["LOCATIONS_TO_USE"] == "RESAMPLE_FUSED":
if config["BARNETT_LOCATION"]["DB_TABLE"] in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"]:
files_to_compute.extend(expand("data/interim/{pid}/phone_sensed_bins.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_resampled.csv", pid=config["PIDS"], sensor=config["BARNETT_LOCATION"]["DB_TABLE"]))
else:
raise ValueError("Error: Add your locations table (and as many sensor tables as you have) to [PHONE_VALID_SENSED_BINS][DB_TABLES] in config.yaml. This is necessary to compute phone_sensed_bins (bins of time when the smartphone was sensing data) which is used to resample fused location data (RESAMPLED_FUSED)")
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["BARNETT_LOCATION"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["BARNETT_LOCATION"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/location_barnett_{day_segment}.csv", pid=config["PIDS"], day_segment = config["BARNETT_LOCATION"]["DAY_SEGMENTS"]))
for provider in config["PHONE_SCREEN"]["PROVIDERS"].keys():
if config["PHONE_SCREEN"]["PROVIDERS"][provider]["COMPUTE"]:
# if "PHONE_SCREEN" in config["PHONE_DATA_YIELD"]["SENSORS"]:# not used for now because we took episodepersensedminutes out of the list of supported features
# files_to_compute.extend(expand("data/interim/{pid}/phone_yielded_timestamps.csv", pid=config["PIDS"]))
# else:
# raise ValueError("Error: Add PHONE_SCREEN (and as many PHONE_SENSORS as you have in your database) to [PHONE_DATA_YIELD][SENSORS] in config.yaml. This is necessary to compute phone_yielded_timestamps (time when the smartphone was sensing data)")
files_to_compute.extend(expand("data/raw/{pid}/phone_screen_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_screen_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_screen_with_datetime_unified.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_screen_episodes.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_screen_episodes_resampled.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_screen_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_screen_features/phone_screen_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_SCREEN"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_screen.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if config["BLUETOOTH"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["BLUETOOTH"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["BLUETOOTH"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/bluetooth_{day_segment}.csv", pid=config["PIDS"], day_segment = config["BLUETOOTH"]["DAY_SEGMENTS"]))
for provider in config["PHONE_LIGHT"]["PROVIDERS"].keys():
if config["PHONE_LIGHT"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_light_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_light_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_light_features/phone_light_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_LIGHT"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_light.csv", pid=config["PIDS"],))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if config["ACTIVITY_RECOGNITION"]["COMPUTE"]:
pids_android = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "android", config["PIDS"]))
pids_ios = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "ios", config["PIDS"]))
for pids,table in zip([pids_android, pids_ios], [config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["ANDROID"], config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]]):
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/processed/{pid}/{sensor}_deltas.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/processed/{pid}/activity_recognition_{day_segment}.csv",pid=config["PIDS"], day_segment = config["ACTIVITY_RECOGNITION"]["DAY_SEGMENTS"]))
for provider in config["PHONE_ACCELEROMETER"]["PROVIDERS"].keys():
if config["PHONE_ACCELEROMETER"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_accelerometer_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_accelerometer_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_accelerometer_features/phone_accelerometer_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_ACCELEROMETER"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_accelerometer.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if config["BATTERY"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["BATTERY"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["BATTERY"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=config["PIDS"], sensor=config["BATTERY"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/battery_deltas.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/{pid}/battery_{day_segment}.csv", pid = config["PIDS"], day_segment = config["BATTERY"]["DAY_SEGMENTS"]))
for provider in config["PHONE_APPLICATIONS_FOREGROUND"]["PROVIDERS"].keys():
if config["PHONE_APPLICATIONS_FOREGROUND"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_applications_foreground_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_applications_foreground_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_applications_foreground_with_datetime_with_categories.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_applications_foreground_features/phone_applications_foreground_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_APPLICATIONS_FOREGROUND"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_applications_foreground.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if config["SCREEN"]["COMPUTE"]:
if config["SCREEN"]["DB_TABLE"] in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"]:
files_to_compute.extend(expand("data/interim/{pid}/phone_sensed_bins.csv", pid=config["PIDS"]))
else:
raise ValueError("Error: Add your screen table (and as many sensor tables as you have) to [PHONE_VALID_SENSED_BINS][DB_TABLES] in config.yaml. This is necessary to compute phone_sensed_bins (bins of time when the smartphone was sensing data)")
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["SCREEN"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["SCREEN"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=config["PIDS"], sensor=config["SCREEN"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/screen_deltas.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/{pid}/screen_{day_segment}.csv", pid = config["PIDS"], day_segment = config["SCREEN"]["DAY_SEGMENTS"]))
for provider in config["PHONE_WIFI_VISIBLE"]["PROVIDERS"].keys():
if config["PHONE_WIFI_VISIBLE"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_wifi_visible_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_wifi_visible_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_wifi_visible_features/phone_wifi_visible_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_WIFI_VISIBLE"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_wifi_visible.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if config["LIGHT"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["LIGHT"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["LIGHT"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/light_{day_segment}.csv", pid = config["PIDS"], day_segment = config["LIGHT"]["DAY_SEGMENTS"]))
for provider in config["PHONE_WIFI_CONNECTED"]["PROVIDERS"].keys():
if config["PHONE_WIFI_CONNECTED"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_wifi_connected_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_wifi_connected_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_wifi_connected_features/phone_wifi_connected_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_WIFI_CONNECTED"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_wifi_connected.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if config["ACCELEROMETER"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["ACCELEROMETER"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["ACCELEROMETER"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/accelerometer_{day_segment}.csv", pid = config["PIDS"], day_segment = config["ACCELEROMETER"]["DAY_SEGMENTS"]))
for provider in config["PHONE_CONVERSATION"]["PROVIDERS"].keys():
if config["PHONE_CONVERSATION"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_conversation_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_conversation_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_conversation_with_datetime_unified.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_conversation_features/phone_conversation_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_CONVERSATION"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_conversation.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if config["APPLICATIONS_FOREGROUND"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["APPLICATIONS_FOREGROUND"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["APPLICATIONS_FOREGROUND"]["DB_TABLE"]))
files_to_compute.extend(expand("data/interim/{pid}/{sensor}_with_datetime_with_genre.csv", pid=config["PIDS"], sensor=config["APPLICATIONS_FOREGROUND"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/applications_foreground_{day_segment}.csv", pid = config["PIDS"], day_segment = config["APPLICATIONS_FOREGROUND"]["DAY_SEGMENTS"]))
for provider in config["PHONE_LOCATIONS"]["PROVIDERS"].keys():
if config["PHONE_LOCATIONS"]["PROVIDERS"][provider]["COMPUTE"]:
if config["PHONE_LOCATIONS"]["LOCATIONS_TO_USE"] == "FUSED_RESAMPLED":
if "PHONE_LOCATIONS" in config["PHONE_DATA_YIELD"]["SENSORS"]:
files_to_compute.extend(expand("data/interim/{pid}/phone_yielded_timestamps.csv", pid=config["PIDS"]))
else:
raise ValueError("Error: Add PHONE_LOCATIONS (and as many PHONE_SENSORS as you have) to [PHONE_DATA_YIELD][SENSORS] in config.yaml. This is necessary to compute phone_yielded_timestamps (time when the smartphone was sensing data) which is used to resample fused location data (RESAMPLED_FUSED)")
if config["WIFI"]["COMPUTE"]:
if len(config["WIFI"]["DB_TABLE"]["VISIBLE_ACCESS_POINTS"]) > 0:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["WIFI"]["DB_TABLE"]["VISIBLE_ACCESS_POINTS"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["WIFI"]["DB_TABLE"]["VISIBLE_ACCESS_POINTS"]))
files_to_compute.extend(expand("data/processed/{pid}/wifi_{day_segment}.csv", pid = config["PIDS"], day_segment = config["WIFI"]["DAY_SEGMENTS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_locations_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_locations_processed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_locations_processed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_locations_features/phone_locations_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_LOCATIONS"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_locations.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if len(config["WIFI"]["DB_TABLE"]["CONNECTED_ACCESS_POINTS"]) > 0:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["WIFI"]["DB_TABLE"]["CONNECTED_ACCESS_POINTS"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["WIFI"]["DB_TABLE"]["CONNECTED_ACCESS_POINTS"]))
files_to_compute.extend(expand("data/processed/{pid}/wifi_{day_segment}.csv", pid = config["PIDS"], day_segment = config["WIFI"]["DAY_SEGMENTS"]))
for provider in config["FITBIT_HEARTRATE_SUMMARY"]["PROVIDERS"].keys():
if config["FITBIT_HEARTRATE_SUMMARY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_summary_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_summary_parsed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_summary_parsed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_heartrate_summary_features/fitbit_heartrate_summary_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["FITBIT_HEARTRATE_SUMMARY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_heartrate_summary.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if config["HEARTRATE"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["HEARTRATE"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_{fitbit_data_type}_with_datetime.csv", pid=config["PIDS"], fitbit_data_type=["summary", "intraday"]))
files_to_compute.extend(expand("data/processed/{pid}/fitbit_heartrate_{day_segment}.csv", pid = config["PIDS"], day_segment = config["HEARTRATE"]["DAY_SEGMENTS"]))
for provider in config["FITBIT_HEARTRATE_INTRADAY"]["PROVIDERS"].keys():
if config["FITBIT_HEARTRATE_INTRADAY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_intraday_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_intraday_parsed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_intraday_parsed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_heartrate_intraday_features/fitbit_heartrate_intraday_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["FITBIT_HEARTRATE_INTRADAY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_heartrate_intraday.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if config["STEP"]["COMPUTE"]:
if config["STEP"]["EXCLUDE_SLEEP"]["EXCLUDE"] == True and config["STEP"]["EXCLUDE_SLEEP"]["TYPE"] == "FITBIT_BASED":
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_{fitbit_data_type}_with_datetime.csv", pid=config["PIDS"], fitbit_data_type=["summary"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["STEP"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_step_{fitbit_data_type}_with_datetime.csv", pid=config["PIDS"], fitbit_data_type=["intraday"]))
files_to_compute.extend(expand("data/processed/{pid}/fitbit_step_{day_segment}.csv", pid = config["PIDS"], day_segment = config["STEP"]["DAY_SEGMENTS"]))
for provider in config["FITBIT_SLEEP_SUMMARY"]["PROVIDERS"].keys():
if config["FITBIT_SLEEP_SUMMARY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_summary_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_summary_parsed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_summary_parsed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_sleep_summary_features/fitbit_sleep_summary_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["FITBIT_SLEEP_SUMMARY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_sleep_summary.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if config["SLEEP"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["SLEEP"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_{fitbit_data_type}_with_datetime.csv", pid=config["PIDS"], fitbit_data_type=["intraday", "summary"]))
files_to_compute.extend(expand("data/processed/{pid}/fitbit_sleep_{day_segment}.csv", pid = config["PIDS"], day_segment = config["SLEEP"]["DAY_SEGMENTS"]))
# for provider in config["FITBIT_SLEEP_INTRADAY"]["PROVIDERS"].keys():
# if config["FITBIT_SLEEP_INTRADAY"]["PROVIDERS"][provider]["COMPUTE"]:
# files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_intraday_raw.csv", pid=config["PIDS"]))
# files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_intraday_parsed.csv", pid=config["PIDS"]))
# files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_intraday_parsed_with_datetime.csv", pid=config["PIDS"]))
if config["CONVERSATION"]["COMPUTE"]:
pids_android = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "android", config["PIDS"]))
pids_ios = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "ios", config["PIDS"]))
for provider in config["FITBIT_STEPS_SUMMARY"]["PROVIDERS"].keys():
if config["FITBIT_STEPS_SUMMARY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_summary_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_summary_parsed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_summary_parsed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_steps_summary_features/fitbit_steps_summary_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["FITBIT_STEPS_SUMMARY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_steps_summary.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for pids,table in zip([pids_android, pids_ios], [config["CONVERSATION"]["DB_TABLE"]["ANDROID"], config["CONVERSATION"]["DB_TABLE"]["IOS"]]):
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/processed/{pid}/conversation_{day_segment}.csv",pid=config["PIDS"], day_segment = config["CONVERSATION"]["DAY_SEGMENTS"]))
for provider in config["FITBIT_STEPS_INTRADAY"]["PROVIDERS"].keys():
if config["FITBIT_STEPS_INTRADAY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_intraday_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_intraday_parsed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_intraday_parsed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_steps_intraday_features/fitbit_steps_intraday_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["FITBIT_STEPS_INTRADAY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_steps_intraday.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if config["DORYAB_LOCATION"]["COMPUTE"]:
if config["DORYAB_LOCATION"]["LOCATIONS_TO_USE"] == "RESAMPLE_FUSED":
if config["DORYAB_LOCATION"]["DB_TABLE"] in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"]:
files_to_compute.extend(expand("data/interim/{pid}/phone_sensed_bins.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_resampled.csv", pid=config["PIDS"], sensor=config["DORYAB_LOCATION"]["DB_TABLE"]))
else:
raise ValueError("Error: Add your locations table (and as many sensor tables as you have) to [PHONE_VALID_SENSED_BINS][DB_TABLES] in config.yaml. This is necessary to compute phone_sensed_bins (bins of time when the smartphone was sensing data) which is used to resample fused location data (RESAMPLED_FUSED)")
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["DORYAB_LOCATION"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["DORYAB_LOCATION"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/location_doryab_{segment}.csv", pid=config["PIDS"], segment = config["DORYAB_LOCATION"]["DAY_SEGMENTS"]))
# for provider in config["FITBIT_CALORIES"]["PROVIDERS"].keys():
# if config["FITBIT_CALORIES"]["PROVIDERS"][provider]["COMPUTE"]:
# files_to_compute.extend(expand("data/raw/{pid}/fitbit_calories_{fitbit_data_type}_raw.csv", pid=config["PIDS"], fitbit_data_type=(["json"] if config["FITBIT_CALORIES"]["TABLE_FORMAT"] == "JSON" else ["summary", "intraday"])))
# files_to_compute.extend(expand("data/raw/{pid}/fitbit_calories_{fitbit_data_type}_parsed.csv", pid=config["PIDS"], fitbit_data_type=["summary", "intraday"]))
# files_to_compute.extend(expand("data/raw/{pid}/fitbit_calories_{fitbit_data_type}_parsed_with_datetime.csv", pid=config["PIDS"], fitbit_data_type=["summary", "intraday"]))
# files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
# files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
# visualization for data exploration
if config["HEATMAP_FEATURES_CORRELATIONS"]["PLOT"]:
files_to_compute.extend(expand("reports/data_exploration/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/heatmap_features_correlations.html", min_valid_hours_per_day=config["HEATMAP_FEATURES_CORRELATIONS"]["MIN_VALID_HOURS_PER_DAY"], min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
if config["HISTOGRAM_VALID_SENSED_HOURS"]["PLOT"]:
files_to_compute.extend(expand("reports/data_exploration/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/histogram_valid_sensed_hours.html", min_valid_hours_per_day=config["HISTOGRAM_VALID_SENSED_HOURS"]["MIN_VALID_HOURS_PER_DAY"], min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
# Visualization for Data Exploration
if config["HISTOGRAM_PHONE_DATA_YIELD"]["PLOT"]:
files_to_compute.append("reports/data_exploration/histogram_phone_data_yield.html")
if config["HEATMAP_DAYS_BY_SENSORS"]["PLOT"]:
files_to_compute.extend(expand("reports/interim/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/{pid}/heatmap_days_by_sensors.html", pid=config["PIDS"], min_valid_hours_per_day=config["HEATMAP_DAYS_BY_SENSORS"]["MIN_VALID_HOURS_PER_DAY"], min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
files_to_compute.extend(expand("reports/data_exploration/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/heatmap_days_by_sensors_all_participants.html", min_valid_hours_per_day=config["HEATMAP_DAYS_BY_SENSORS"]["MIN_VALID_HOURS_PER_DAY"], min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
if config["HEATMAP_SENSORS_PER_MINUTE_PER_TIME_SEGMENT"]["PLOT"]:
files_to_compute.extend(expand("reports/interim/{pid}/heatmap_sensors_per_minute_per_time_segment.html", pid=config["PIDS"]))
files_to_compute.append("reports/data_exploration/heatmap_sensors_per_minute_per_time_segment.html")
if config["HEATMAP_SENSED_BINS"]["PLOT"]:
files_to_compute.extend(expand("reports/interim/heatmap_sensed_bins/{pid}/heatmap_sensed_bins.html", pid=config["PIDS"]))
files_to_compute.extend(["reports/data_exploration/heatmap_sensed_bins_all_participants.html"])
if config["HEATMAP_SENSOR_ROW_COUNT_PER_TIME_SEGMENT"]["PLOT"]:
files_to_compute.extend(expand("reports/interim/{pid}/heatmap_sensor_row_count_per_time_segment.html", pid=config["PIDS"]))
files_to_compute.append("reports/data_exploration/heatmap_sensor_row_count_per_time_segment.html")
if config["OVERALL_COMPLIANCE_HEATMAP"]["PLOT"]:
files_to_compute.extend(expand("reports/data_exploration/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/overall_compliance_heatmap.html", min_valid_hours_per_day=config["OVERALL_COMPLIANCE_HEATMAP"]["MIN_VALID_HOURS_PER_DAY"], min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
if config["HEATMAP_PHONE_DATA_YIELD_PER_PARTICIPANT_PER_TIME_SEGMENT"]["PLOT"]:
files_to_compute.append("reports/data_exploration/heatmap_phone_data_yield_per_participant_per_time_segment.html")
if config["HEATMAP_FEATURE_CORRELATION_MATRIX"]["PLOT"]:
files_to_compute.append("reports/data_exploration/heatmap_feature_correlation_matrix.html")
rule all:

134
code_of_conduct.md 100644
View File

@ -0,0 +1,134 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
overall community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or
advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
moshi@pitt.edu.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series
of actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within
the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
[https://www.contributor-covenant.org/version/2/0/code_of_conduct.html][v2.0].
Community Impact Guidelines were inspired by
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available
at [https://www.contributor-covenant.org/translations][translations].
[homepage]: https://www.contributor-covenant.org
[v2.0]: https://www.contributor-covenant.org/version/2/0/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
[translations]: https://www.contributor-covenant.org/translations

View File

@ -1,244 +1,374 @@
# Participants to include in the analysis
# You must create a file for each participant named pXXX containing their device_id. This can be done manually or automatically
PIDS: [test01]
# Global var with common day segments
DAY_SEGMENTS: &day_segments
[daily, morning, afternoon, evening, night]
# Global timezone
# Use codes from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
# Double check your code, for example EST is not US Eastern Time.
TIMEZONE: &timezone
America/New_York
# See https://www.rapids.science/latest/setup/configuration/#database-credentials
DATABASE_GROUP: &database_group
MY_GROUP
DOWNLOAD_PARTICIPANTS:
IGNORED_DEVICE_IDS: [] # for example "5a1dd68c-6cd1-48fe-ae1e-14344ac5215f"
GROUP: *database_group
# See https://www.rapids.science/latest/setup/configuration/#timezone-of-your-study
TIMEZONE: &timezone
America/New_York
# Download data config
DOWNLOAD_DATASET:
GROUP: *database_group
# See https://www.rapids.science/latest/setup/configuration/#participant-files
PIDS: [test01]
# Readable datetime config
READABLE_DATETIME:
FIXED_TIMEZONE: *timezone
# See https://www.rapids.science/latest/setup/configuration/#automatic-creation-of-participant-files
CREATE_PARTICIPANT_FILES:
SOURCE:
TYPE: AWARE_DEVICE_TABLE #AWARE_DEVICE_TABLE or CSV_FILE
DATABASE_GROUP: *database_group
CSV_FILE_PATH: "data/external/example_participants.csv" # see docs for required format
TIMEZONE: *timezone
PHONE_SECTION:
ADD: TRUE
DEVICE_ID_COLUMN: device_id # column name
IGNORED_DEVICE_IDS: []
FITBIT_SECTION:
ADD: TRUE
DEVICE_ID_COLUMN: device_id # column name
IGNORED_DEVICE_IDS: []
PHONE_VALID_SENSED_BINS:
COMPUTE: False # This flag is automatically ignored (set to True) if you are extracting PHONE_VALID_SENSED_DAYS or screen or Barnett's location features
BIN_SIZE: &bin_size 5 # (in minutes)
# Add as many sensor tables as you have, they all improve the computation of PHONE_VALID_SENSED_BINS and PHONE_VALID_SENSED_DAYS.
# If you are extracting screen or Barnett's location features, screen and locations tables are mandatory.
DB_TABLES: []
# See https://www.rapids.science/latest/setup/configuration/#time-segments
TIME_SEGMENTS: &time_segments
TYPE: PERIODIC # FREQUENCY, PERIODIC, EVENT
FILE: "data/external/timesegments_periodic.csv"
INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE # Only relevant if TYPE=PERIODIC, see docs
PHONE_VALID_SENSED_DAYS:
COMPUTE: False
MIN_VALID_HOURS_PER_DAY: &min_valid_hours_per_day [16] # (out of 24) MIN_HOURS_PER_DAY
MIN_VALID_BINS_PER_HOUR: &min_valid_bins_per_hour [6] # (out of 60min/BIN_SIZE bins)
# Communication SMS features config, TYPES and FEATURES keys need to match
MESSAGES:
COMPUTE: False
DB_TABLE: messages
TYPES : [received, sent]
FEATURES:
received: [count, distinctcontacts, timefirstmessage, timelastmessage, countmostfrequentcontact]
sent: [count, distinctcontacts, timefirstmessage, timelastmessage, countmostfrequentcontact]
DAY_SEGMENTS: *day_segments
# Communication call features config, TYPES and FEATURES keys need to match
CALLS:
COMPUTE: False
DB_TABLE: calls
TYPES: [missed, incoming, outgoing]
FEATURES:
missed: [count, distinctcontacts, timefirstcall, timelastcall, countmostfrequentcontact]
incoming: [count, distinctcontacts, meanduration, sumduration, minduration, maxduration, stdduration, modeduration, entropyduration, timefirstcall, timelastcall, countmostfrequentcontact]
outgoing: [count, distinctcontacts, meanduration, sumduration, minduration, maxduration, stdduration, modeduration, entropyduration, timefirstcall, timelastcall, countmostfrequentcontact]
DAY_SEGMENTS: *day_segments
########################################################################################################################
# PHONE #
########################################################################################################################
APPLICATION_GENRES:
CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
CATALOGUE_FILE: "data/external/stachl_application_genre_catalogue.csv"
UPDATE_CATALOGUE_FILE: false # if CATALOGUE_SOURCE is equal to FILE, whether or not to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
SCRAPE_MISSING_GENRES: false # whether or not to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
# See https://www.rapids.science/latest/setup/configuration/#device-data-source-configuration
PHONE_DATA_CONFIGURATION:
SOURCE:
TYPE: DATABASE
DATABASE_GROUP: *database_group
DEVICE_ID_COLUMN: device_id # column name
TIMEZONE:
TYPE: SINGLE
VALUE: *timezone
RESAMPLE_FUSED_LOCATION:
CONSECUTIVE_THRESHOLD: 30 # minutes, only replicate location samples to the next sensed bin if the phone did not stop collecting data for more than this threshold
TIME_SINCE_VALID_LOCATION: 720 # minutes, only replicate location samples to consecutive sensed bins if they were logged within this threshold after a valid location row
TIMEZONE: *timezone
# Sensors ------
BARNETT_LOCATION:
COMPUTE: False
DB_TABLE: locations
DAY_SEGMENTS: [daily] # These features are only available on a daily basis
FEATURES: ["hometime","disttravelled","rog","maxdiam","maxhomedist","siglocsvisited","avgflightlen","stdflightlen","avgflightdur","stdflightdur","probpause","siglocentropy","circdnrtn","wkenddayrtn"]
LOCATIONS_TO_USE: ALL # ALL, ALL_EXCEPT_FUSED OR RESAMPLE_FUSED
ACCURACY_LIMIT: 51 # meters, drops location coordinates with an accuracy higher than this. This number means there's a 68% probability the true location is within this radius
TIMEZONE: *timezone
MINUTES_DATA_USED: False # Use this for quality control purposes, how many minutes of data (location coordinates gruped by minute) were used to compute features
# https://www.rapids.science/latest/features/phone-accelerometer/
PHONE_ACCELEROMETER:
TABLE: accelerometer
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
SRC_FOLDER: "rapids" # inside src/features/phone_accelerometer
SRC_LANGUAGE: "python"
PANDA:
COMPUTE: False
VALID_SENSED_MINUTES: False
FEATURES:
exertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
nonexertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
SRC_FOLDER: "panda" # inside src/features/phone_accelerometer
SRC_LANGUAGE: "python"
DORYAB_LOCATION:
COMPUTE: False
DB_TABLE: locations
DAY_SEGMENTS: *day_segments
FEATURES: ["locationvariance","loglocationvariance","totaldistance","averagespeed","varspeed","circadianmovement","numberofsignificantplaces","numberlocationtransitions","radiusgyration","timeattop1location","timeattop2location","timeattop3location","movingtostaticratio","outlierstimepercent","maxlengthstayatclusters","minlengthstayatclusters","meanlengthstayatclusters","stdlengthstayatclusters","locationentropy","normalizedlocationentropy"]
LOCATIONS_TO_USE: ALL # ALL, ALL_EXCEPT_FUSED OR RESAMPLE_FUSED
DBSCAN_EPS: 10 # meters
DBSCAN_MINSAMPLES: 5
THRESHOLD_STATIC : 1 # km/h
MAXIMUM_GAP_ALLOWED: 300
MINUTES_DATA_USED: False
SAMPLING_FREQUENCY: 0
BLUETOOTH:
COMPUTE: False
DB_TABLE: bluetooth
DAY_SEGMENTS: *day_segments
FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
ACTIVITY_RECOGNITION:
COMPUTE: False
DB_TABLE:
# See https://www.rapids.science/latest/features/phone-activity-recognition/
PHONE_ACTIVITY_RECOGNITION:
TABLE:
ANDROID: plugin_google_activity_recognition
IOS: plugin_ios_activity_recognition
DAY_SEGMENTS: *day_segments
FEATURES: ["count","mostcommonactivity","countuniqueactivities","activitychangecount","sumstationary","summobile","sumvehicle"]
EPISODE_THRESHOLD_BETWEEN_ROWS: 5 # minutes. Max time difference for two consecutive rows to be considered within the same battery episode.
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["count", "mostcommonactivity", "countuniqueactivities", "durationstationary", "durationmobile", "durationvehicle"]
ACTIVITY_CLASSES:
STATIONARY: ["still", "tilting"]
MOBILE: ["on_foot", "walking", "running", "on_bicycle"]
VEHICLE: ["in_vehicle"]
SRC_FOLDER: "rapids" # inside src/features/phone_activity_recognition
SRC_LANGUAGE: "python"
BATTERY:
COMPUTE: False
DB_TABLE: battery
DAY_SEGMENTS: *day_segments
FEATURES: ["countdischarge", "sumdurationdischarge", "countcharge", "sumdurationcharge", "avgconsumptionrate", "maxconsumptionrate"]
# See https://www.rapids.science/latest/features/phone-applications-foreground/
PHONE_APPLICATIONS_FOREGROUND:
TABLE: applications_foreground
APPLICATION_CATEGORIES:
CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
CATALOGUE_FILE: "data/external/stachl_application_genre_catalogue.csv"
UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether or not to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
SCRAPE_MISSING_CATEGORIES: False # whether or not to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
PROVIDERS:
RAPIDS:
COMPUTE: False
SINGLE_CATEGORIES: ["all", "email"]
MULTIPLE_CATEGORIES:
social: ["socialnetworks", "socialmediatools"]
entertainment: ["entertainment", "gamingknowledge", "gamingcasual", "gamingadventure", "gamingstrategy", "gamingtoolscommunity", "gamingroleplaying", "gamingaction", "gaminglogic", "gamingsports", "gamingsimulation"]
SINGLE_APPS: ["top1global", "com.facebook.moments", "com.google.android.youtube", "com.twitter.android"] # There's no entropy for single apps
EXCLUDED_CATEGORIES: []
EXCLUDED_APPS: ["com.fitbit.FitbitMobile", "com.aware.plugin.upmc.cancer"]
FEATURES: ["count", "timeoffirstuse", "timeoflastuse", "frequencyentropy"]
SRC_FOLDER: "rapids" # inside src/features/phone_applications_foreground
SRC_LANGUAGE: "python"
SCREEN:
COMPUTE: False
DB_TABLE: screen
DAY_SEGMENTS: *day_segments
REFERENCE_HOUR_FIRST_USE: 0
IGNORE_EPISODES_SHORTER_THAN: 0 # in minutes, set to 0 to disable
IGNORE_EPISODES_LONGER_THAN: 0 # in minutes, set to 0 to disable
FEATURES_DELTAS: ["countepisode", "episodepersensedminutes", "sumduration", "maxduration", "minduration", "avgduration", "stdduration", "firstuseafter"]
EPISODE_TYPES: ["unlock"]
# See https://www.rapids.science/latest/features/phone-battery/
PHONE_BATTERY:
TABLE: battery
EPISODE_THRESHOLD_BETWEEN_ROWS: 30 # minutes. Max time difference for two consecutive rows to be considered within the same battery episode.
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["countdischarge", "sumdurationdischarge", "countcharge", "sumdurationcharge", "avgconsumptionrate", "maxconsumptionrate"]
SRC_FOLDER: "rapids" # inside src/features/phone_battery
SRC_LANGUAGE: "python"
LIGHT:
COMPUTE: False
DB_TABLE: light
DAY_SEGMENTS: *day_segments
FEATURES: ["count", "maxlux", "minlux", "avglux", "medianlux", "stdlux"]
# See https://www.rapids.science/latest/features/phone-bluetooth/
PHONE_BLUETOOTH:
TABLE: bluetooth
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
SRC_FOLDER: "rapids" # inside src/features/phone_bluetooth
SRC_LANGUAGE: "r"
ACCELEROMETER:
COMPUTE: False
DB_TABLE: accelerometer
DAY_SEGMENTS: *day_segments
FEATURES:
MAGNITUDE: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
EXERTIONAL_ACTIVITY_EPISODE: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
NONEXERTIONAL_ACTIVITY_EPISODE: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
VALID_SENSED_MINUTES: False
# See https://www.rapids.science/latest/features/phone-calls/
PHONE_CALLS:
TABLE: calls
PROVIDERS:
RAPIDS:
COMPUTE: False
CALL_TYPES: [missed, incoming, outgoing]
FEATURES:
missed: [count, distinctcontacts, timefirstcall, timelastcall, countmostfrequentcontact]
incoming: [count, distinctcontacts, meanduration, sumduration, minduration, maxduration, stdduration, modeduration, entropyduration, timefirstcall, timelastcall, countmostfrequentcontact]
outgoing: [count, distinctcontacts, meanduration, sumduration, minduration, maxduration, stdduration, modeduration, entropyduration, timefirstcall, timelastcall, countmostfrequentcontact]
SRC_LANGUAGE: "r"
SRC_FOLDER: "rapids" # inside src/features/phone_calls
APPLICATIONS_FOREGROUND:
COMPUTE: False
DB_TABLE: applications_foreground
DAY_SEGMENTS: *day_segments
SINGLE_CATEGORIES: ["all", "email"]
MULTIPLE_CATEGORIES:
social: ["socialnetworks", "socialmediatools"]
entertainment: ["entertainment", "gamingknowledge", "gamingcasual", "gamingadventure", "gamingstrategy", "gamingtoolscommunity", "gamingroleplaying", "gamingaction", "gaminglogic", "gamingsports", "gamingsimulation"]
SINGLE_APPS: ["top1global", "com.facebook.moments", "com.google.android.youtube", "com.twitter.android"] # There's no entropy for single apps
EXCLUDED_CATEGORIES: ["system_apps"]
EXCLUDED_APPS: ["com.fitbit.FitbitMobile", "com.aware.plugin.upmc.cancer"]
FEATURES: ["count", "timeoffirstuse", "timeoflastuse", "frequencyentropy"]
HEARTRATE:
COMPUTE: False
DB_TABLE: fitbit_data
DAY_SEGMENTS: *day_segments
SUMMARY_FEATURES: ["restinghr"] # calories features' accuracy depend on the accuracy of the participants fitbit profile (e.g. heigh, weight) use with care: ["caloriesoutofrange", "caloriesfatburn", "caloriescardio", "caloriespeak"]
INTRADAY_FEATURES: ["maxhr", "minhr", "avghr", "medianhr", "modehr", "stdhr", "diffmaxmodehr", "diffminmodehr", "entropyhr", "minutesonoutofrangezone", "minutesonfatburnzone", "minutesoncardiozone", "minutesonpeakzone"]
STEP:
COMPUTE: False
DB_TABLE: fitbit_data
DAY_SEGMENTS: *day_segments
EXCLUDE_SLEEP:
EXCLUDE: False
TYPE: FIXED # FIXED OR FITBIT_BASED (CONFIGURE FITBIT's SLEEP DB_TABLE)
FIXED:
START: "23:00"
END: "07:00"
FEATURES:
ALL_STEPS: ["sumallsteps", "maxallsteps", "minallsteps", "avgallsteps", "stdallsteps"]
SEDENTARY_BOUT: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration"]
ACTIVE_BOUT: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration"]
THRESHOLD_ACTIVE_BOUT: 10 # steps
INCLUDE_ZERO_STEP_ROWS: False
SLEEP:
COMPUTE: False
DB_TABLE: fitbit_data
DAY_SEGMENTS: *day_segments
SLEEP_TYPES: ["main", "nap", "all"]
SUMMARY_FEATURES: ["sumdurationafterwakeup", "sumdurationasleep", "sumdurationawake", "sumdurationtofallasleep", "sumdurationinbed", "avgefficiency", "countepisode"]
WIFI:
COMPUTE: False
DB_TABLE:
VISIBLE_ACCESS_POINTS: "wifi" # if you only have a CONNECTED_ACCESS_POINTS table, set this value to ""
CONNECTED_ACCESS_POINTS: "sensor_wifi" # if you only have a VISIBLE_ACCESS_POINTS table, set this value to ""
DAY_SEGMENTS: *day_segments
FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
CONVERSATION:
COMPUTE: False
DB_TABLE:
# See https://www.rapids.science/latest/features/phone-conversation/
PHONE_CONVERSATION:
TABLE:
ANDROID: plugin_studentlife_audio_android
IOS: plugin_studentlife_audio
DAY_SEGMENTS: *day_segments
FEATURES: ["minutessilence", "minutesnoise", "minutesvoice", "minutesunknown","sumconversationduration","avgconversationduration",
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["minutessilence", "minutesnoise", "minutesvoice", "minutesunknown","sumconversationduration","avgconversationduration",
"sdconversationduration","minconversationduration","maxconversationduration","timefirstconversation","timelastconversation","noisesumenergy",
"noiseavgenergy","noisesdenergy","noiseminenergy","noisemaxenergy","voicesumenergy",
"voiceavgenergy","voicesdenergy","voiceminenergy","voicemaxenergy","silencesensedfraction","noisesensedfraction",
"voicesensedfraction","unknownsensedfraction","silenceexpectedfraction","noiseexpectedfraction","voiceexpectedfraction",
"unknownexpectedfraction","countconversation"]
RECORDINGMINUTES: 1
PAUSEDMINUTES : 3
RECORDING_MINUTES: 1
PAUSED_MINUTES : 3
SRC_FOLDER: "rapids" # inside src/features/phone_conversation
SRC_LANGUAGE: "python"
### Visualizations ################################################################
HEATMAP_FEATURES_CORRELATIONS:
# See https://www.rapids.science/latest/features/phone-data-yield/
PHONE_DATA_YIELD:
SENSORS: []
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: [ratiovalidyieldedminutes, ratiovalidyieldedhours]
MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS: 0.5 # 0 to 1 representing the number of minutes with at least
SRC_LANGUAGE: "r"
SRC_FOLDER: "rapids" # inside src/features/phone_data_yield
# See https://www.rapids.science/latest/features/phone-light/
PHONE_LIGHT:
TABLE: light
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["count", "maxlux", "minlux", "avglux", "medianlux", "stdlux"]
SRC_FOLDER: "rapids" # inside src/features/phone_light
SRC_LANGUAGE: "python"
# See https://www.rapids.science/latest/features/phone-locations/
PHONE_LOCATIONS:
TABLE: locations
LOCATIONS_TO_USE: FUSED_RESAMPLED # ALL, GPS OR FUSED_RESAMPLED
FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD: 30 # minutes, only replicate location samples to the next sensed bin if the phone did not stop collecting data for more than this threshold
FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION: 720 # minutes, only replicate location samples to consecutive sensed bins if they were logged within this threshold after a valid location row
PROVIDERS:
DORYAB:
COMPUTE: False
FEATURES: ["locationvariance","loglocationvariance","totaldistance","averagespeed","varspeed","circadianmovement","numberofsignificantplaces","numberlocationtransitions","radiusgyration","timeattop1location","timeattop2location","timeattop3location","movingtostaticratio","outlierstimepercent","maxlengthstayatclusters","minlengthstayatclusters","meanlengthstayatclusters","stdlengthstayatclusters","locationentropy","normalizedlocationentropy"]
DBSCAN_EPS: 10 # meters
DBSCAN_MINSAMPLES: 5
THRESHOLD_STATIC : 1 # km/h
MAXIMUM_GAP_ALLOWED: 300
MINUTES_DATA_USED: False
SAMPLING_FREQUENCY: 0
SRC_FOLDER: "doryab" # inside src/features/phone_locations
SRC_LANGUAGE: "python"
BARNETT:
COMPUTE: False
FEATURES: ["hometime","disttravelled","rog","maxdiam","maxhomedist","siglocsvisited","avgflightlen","stdflightlen","avgflightdur","stdflightdur","probpause","siglocentropy","circdnrtn","wkenddayrtn"]
ACCURACY_LIMIT: 51 # meters, drops location coordinates with an accuracy higher than this. This number means there's a 68% probability the true location is within this radius
TIMEZONE: *timezone
MINUTES_DATA_USED: False # Use this for quality control purposes, how many minutes of data (location coordinates gruped by minute) were used to compute features
SRC_FOLDER: "barnett" # inside src/features/phone_locations
SRC_LANGUAGE: "r"
# See https://www.rapids.science/latest/features/phone-messages/
PHONE_MESSAGES:
TABLE: messages
PROVIDERS:
RAPIDS:
COMPUTE: False
MESSAGES_TYPES : [received, sent]
FEATURES:
received: [count, distinctcontacts, timefirstmessage, timelastmessage, countmostfrequentcontact]
sent: [count, distinctcontacts, timefirstmessage, timelastmessage, countmostfrequentcontact]
SRC_LANGUAGE: "r"
SRC_FOLDER: "rapids" # inside src/features/phone_messages
# See https://www.rapids.science/latest/features/phone-screen/
PHONE_SCREEN:
TABLE: screen
PROVIDERS:
RAPIDS:
COMPUTE: False
REFERENCE_HOUR_FIRST_USE: 0
IGNORE_EPISODES_SHORTER_THAN: 0 # in minutes, set to 0 to disable
IGNORE_EPISODES_LONGER_THAN: 0 # in minutes, set to 0 to disable
FEATURES: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration", "firstuseafter"] # "episodepersensedminutes" needs to be added later
EPISODE_TYPES: ["unlock"]
SRC_FOLDER: "rapids" # inside src/features/phone_screen
SRC_LANGUAGE: "python"
# See https://www.rapids.science/latest/features/phone-wifi-connected/
PHONE_WIFI_CONNECTED:
TABLE: "sensor_wifi"
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
SRC_FOLDER: "rapids" # inside src/features/phone_wifi_connected
SRC_LANGUAGE: "r"
# See https://www.rapids.science/latest/features/phone-wifi-visible/
PHONE_WIFI_VISIBLE:
TABLE: "wifi"
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
SRC_FOLDER: "rapids" # inside src/features/phone_wifi_visible
SRC_LANGUAGE: "r"
########################################################################################################################
# FITBIT #
########################################################################################################################
# See https://www.rapids.science/latest/setup/configuration/#device-data-source-configuration
FITBIT_DATA_CONFIGURATION:
SOURCE:
TYPE: DATABASE # DATABASE or FILES (set each [FITBIT_SENSOR][TABLE] attribute with a table name or a file path accordingly)
COLUMN_FORMAT: JSON # JSON or PLAIN_TEXT
DATABASE_GROUP: *database_group
DEVICE_ID_COLUMN: device_id # column name
TIMEZONE:
TYPE: SINGLE # Fitbit devices don't support time zones so we read this data in the timezone indicated by VALUE
VALUE: *timezone
# Sensors ------
# See https://www.rapids.science/latest/features/fitbit-heartrate-summary/
FITBIT_HEARTRATE_SUMMARY:
TABLE: heartrate_summary
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["maxrestinghr", "minrestinghr", "avgrestinghr", "medianrestinghr", "moderestinghr", "stdrestinghr", "diffmaxmoderestinghr", "diffminmoderestinghr", "entropyrestinghr"] # calories features' accuracy depend on the accuracy of the participants fitbit profile (e.g. height, weight) use these with care: ["sumcaloriesoutofrange", "maxcaloriesoutofrange", "mincaloriesoutofrange", "avgcaloriesoutofrange", "mediancaloriesoutofrange", "stdcaloriesoutofrange", "entropycaloriesoutofrange", "sumcaloriesfatburn", "maxcaloriesfatburn", "mincaloriesfatburn", "avgcaloriesfatburn", "mediancaloriesfatburn", "stdcaloriesfatburn", "entropycaloriesfatburn", "sumcaloriescardio", "maxcaloriescardio", "mincaloriescardio", "avgcaloriescardio", "mediancaloriescardio", "stdcaloriescardio", "entropycaloriescardio", "sumcaloriespeak", "maxcaloriespeak", "mincaloriespeak", "avgcaloriespeak", "mediancaloriespeak", "stdcaloriespeak", "entropycaloriespeak"]
SRC_FOLDER: "rapids" # inside src/features/fitbit_heartrate_summary
SRC_LANGUAGE: "python"
# See https://www.rapids.science/latest/features/fitbit-heartrate-intraday/
FITBIT_HEARTRATE_INTRADAY:
TABLE: heartrate_intraday
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["maxhr", "minhr", "avghr", "medianhr", "modehr", "stdhr", "diffmaxmodehr", "diffminmodehr", "entropyhr", "minutesonoutofrangezone", "minutesonfatburnzone", "minutesoncardiozone", "minutesonpeakzone"]
SRC_FOLDER: "rapids" # inside src/features/fitbit_heartrate_intraday
SRC_LANGUAGE: "python"
# See https://www.rapids.science/latest/features/fitbit-sleep-summary/
FITBIT_SLEEP_SUMMARY:
TABLE: sleep_summary
SLEEP_EPISODE_TIMESTAMP: end # summary sleep episodes are considered as events based on either the start timestamp or end timestamp.
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["countepisode", "avgefficiency", "sumdurationafterwakeup", "sumdurationasleep", "sumdurationawake", "sumdurationtofallasleep", "sumdurationinbed", "avgdurationafterwakeup", "avgdurationasleep", "avgdurationawake", "avgdurationtofallasleep", "avgdurationinbed"]
SLEEP_TYPES: ["main", "nap", "all"]
SRC_FOLDER: "rapids" # inside src/features/fitbit_sleep_summary
SRC_LANGUAGE: "python"
# See https://www.rapids.science/latest/features/fitbit-steps-summary/
FITBIT_STEPS_SUMMARY:
TABLE: steps_summary
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["maxsumsteps", "minsumsteps", "avgsumsteps", "mediansumsteps", "stdsumsteps"]
SRC_FOLDER: "rapids" # inside src/features/fitbit_steps_summary
SRC_LANGUAGE: "python"
# See https://www.rapids.science/latest/features/fitbit-steps-intraday/
FITBIT_STEPS_INTRADAY:
TABLE: steps_intraday
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES:
STEPS: ["sum", "max", "min", "avg", "std"]
SEDENTARY_BOUT: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration"]
ACTIVE_BOUT: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration"]
THRESHOLD_ACTIVE_BOUT: 10 # steps
INCLUDE_ZERO_STEP_ROWS: False
SRC_FOLDER: "rapids" # inside src/features/fitbit_steps_intraday
SRC_LANGUAGE: "python"
# FITBIT_CALORIES:
# TABLE_FORMAT: JSON # JSON or CSV. If your JSON or CSV data are files change [DEVICE_DATA][FITBIT][SOURCE][TYPE] to FILES
# TABLE:
# JSON: fitbit_calories
# CSV:
# SUMMARY: calories_summary
# INTRADAY: calories_intraday
# PROVIDERS:
# RAPIDS:
# COMPUTE: False
# FEATURES: []
########################################################################################################################
# PLOTS #
########################################################################################################################
# Data quality
HISTOGRAM_PHONE_DATA_YIELD:
PLOT: False
HEATMAP_PHONE_DATA_YIELD_PER_PARTICIPANT_PER_TIME_SEGMENT:
PLOT: False
HEATMAP_SENSORS_PER_MINUTE_PER_TIME_SEGMENT:
PLOT: False
HEATMAP_SENSOR_ROW_COUNT_PER_TIME_SEGMENT:
PLOT: False
SENSORS: [PHONE_ACCELEROMETER, PHONE_ACTIVITY_RECOGNITION, PHONE_APPLICATIONS_FOREGROUND, PHONE_BATTERY, PHONE_BLUETOOTH, PHONE_CALLS, PHONE_CONVERSATION, PHONE_LIGHT, PHONE_LOCATIONS, PHONE_MESSAGES, PHONE_SCREEN, PHONE_WIFI_CONNECTED, PHONE_WIFI_VISIBLE]
# Features
HEATMAP_FEATURE_CORRELATION_MATRIX:
PLOT: False
MIN_ROWS_RATIO: 0.5
MIN_VALID_HOURS_PER_DAY: *min_valid_hours_per_day
MIN_VALID_BINS_PER_HOUR: *min_valid_bins_per_hour
PHONE_FEATURES: [accelerometer, activity_recognition, applications_foreground, battery, calls_incoming, calls_missed, calls_outgoing, conversation, light, location_doryab, messages_received, messages_sent, screen]
FITBIT_FEATURES: [fitbit_heartrate, fitbit_step, fitbit_sleep]
CORR_THRESHOLD: 0.1
CORR_METHOD: "pearson" # choose from {"pearson", "kendall", "spearman"}
HISTOGRAM_VALID_SENSED_HOURS:
PLOT: False
MIN_VALID_HOURS_PER_DAY: *min_valid_hours_per_day
MIN_VALID_BINS_PER_HOUR: *min_valid_bins_per_hour
HEATMAP_DAYS_BY_SENSORS:
PLOT: False
MIN_VALID_HOURS_PER_DAY: *min_valid_hours_per_day
MIN_VALID_BINS_PER_HOUR: *min_valid_bins_per_hour
EXPECTED_NUM_OF_DAYS: -1
DB_TABLES: [accelerometer, applications_foreground, battery, bluetooth, calls, light, locations, messages, screen, wifi, sensor_wifi, plugin_google_activity_recognition, plugin_ios_activity_recognition, plugin_studentlife_audio_android, plugin_studentlife_audio]
HEATMAP_SENSED_BINS:
PLOT: False
BIN_SIZE: *bin_size
OVERALL_COMPLIANCE_HEATMAP:
PLOT: False
ONLY_SHOW_VALID_DAYS: False
EXPECTED_NUM_OF_DAYS: -1
BIN_SIZE: *bin_size
MIN_VALID_HOURS_PER_DAY: *min_valid_hours_per_day
MIN_VALID_BINS_PER_HOUR: *min_valid_bins_per_hour

View File

@ -0,0 +1,2 @@
label,start_time,length
daily,00:00:00,"23H 59M 59S"
1 label start_time length
2 daily 00:00:00 23H 59M 59S

View File

@ -0,0 +1,9 @@
label,event_timestamp,length,shift,shift_direction,device_id
stress,1587661220000,1H,0M,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress,1587747620000,4H,4H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress,1587906020000,3H,0M,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress,1588003220000,7H,4H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress,1588172420000,9H,0,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
mood,1587661220000,1H,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
mood,1587747620000,1D,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
mood,1587906020000,7D,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
1 label event_timestamp length shift shift_direction device_id
2 stress 1587661220000 1H 0M 1 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
3 stress 1587747620000 4H 4H -1 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
4 stress 1587906020000 3H 0M 1 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
5 stress 1588003220000 7H 4H -1 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
6 stress 1588172420000 9H 0 -1 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
7 mood 1587661220000 1H 0 0 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
8 mood 1587747620000 1D 0 0 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
9 mood 1587906020000 7D 0 0 a748ee1a-1d0b-4ae9-9074-279a2b6ba524

View File

@ -0,0 +1,2 @@
label,length
thirtyminutes,30
1 label length
2 thirtyminutes 30

View File

@ -0,0 +1,6 @@
label,start_time,length,repeats_on,repeats_value
daily,00:00:00,23H 59M 59S,every_day,0
morning,06:00:00,5H 59M 59S,every_day,0
afternoon,12:00:00,5H 59M 59S,every_day,0
evening,18:00:00,5H 59M 59S,every_day,0
night,00:00:00,5H 59M 59S,every_day,0
1 label start_time length repeats_on repeats_value
2 daily 00:00:00 23H 59M 59S every_day 0
3 morning 06:00:00 5H 59M 59S every_day 0
4 afternoon 12:00:00 5H 59M 59S every_day 0
5 evening 18:00:00 5H 59M 59S every_day 0
6 night 00:00:00 5H 59M 59S every_day 0

1
docs/CNAME 100644
View File

@ -0,0 +1 @@
www.rapids.science

View File

@ -1,153 +0,0 @@
# Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " devhelp to make HTML files and a Devhelp project"
@echo " epub to make an epub"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " latexpdf to make LaTeX files and run them through pdflatex"
@echo " text to make text files"
@echo " man to make manual pages"
@echo " texinfo to make Texinfo files"
@echo " info to make Texinfo files and run them through makeinfo"
@echo " gettext to make PO message catalogs"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
clean:
-rm -rf $(BUILDDIR)/*
html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
singlehtml:
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."
json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."
htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."
qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/moshi-aware.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/moshi-aware.qhc"
devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
@echo "# mkdir -p $$HOME/.local/share/devhelp/moshi-aware"
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/moshi-aware"
@echo "# devhelp"
epub:
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
@echo
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make' in that directory to run these through (pdf)latex" \
"(use \`make latexpdf' here to do that automatically)."
latexpdf:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through pdflatex..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
@echo
@echo "Build finished. The text files are in $(BUILDDIR)/text."
man:
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
@echo
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
texinfo:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
@echo "Run \`make' in that directory to run these through makeinfo" \
"(use \`make info' here to do that automatically)."
info:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo "Running Texinfo files through makeinfo..."
make -C $(BUILDDIR)/texinfo info
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
gettext:
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
@echo
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."
linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."
doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."

15
docs/change-log.md 100644
View File

@ -0,0 +1,15 @@
# Change Log
## v0.1.0
- New and more consistent docs (this website). The [previous docs](https://rapidspitt.readthedocs.io/en/latest/) are marked as beta
- Consolidate [configuration](../setup/configuration) instructions
- Flexible [time segments](../setup/configuration#time-segments)
- Simplify Fitbit behavioral feature extraction and [documentation](../features/fitbit-heartrate-summary)
- Sensor's configuration and output is more consistent
- Update [visualizations](../visualizations/data-quality-visualizations) to handle flexible day segments
- Create a RAPIDS [execution](../setup/execution) script that allows re-computation of the pipeline after configuration changes
- Add [citation](../citation) guide
- Update [virtual environment](../developers/virtual-environments) guide
- Update analysis workflow [example](../workflow-examples/analysis)
- Add a [Code of Conduct](../code_of_conduct)
- Update [Team](../team) page

49
docs/citation.md 100644
View File

@ -0,0 +1,49 @@
# Cite RAPIDS and providers
!!! done "RAPIDS and the community"
RAPIDS is a community effort and as such we want to continue recognizing the contributions from other researchers. Besides citing RAPIDS, we ask you to cite any of the authors listed below if you used those sensor providers in your analysis, thank you!
## RAPIDS
If you used RAPIDS, please cite [this paper](https://preprints.jmir.org/preprint/23246).
!!! cite "RAPIDS et al. citation"
Vega J, Li M, Aguillera K, Goel N, Joshi E, Durica KC, Kunta AR, Low CA
RAPIDS: Reproducible Analysis Pipeline for Data Streams Collected with Mobile Devices
JMIR Preprints. 18/08/2020:23246
DOI: 10.2196/preprints.23246
URL: https://preprints.jmir.org/preprint/23246
## Panda (accelerometer)
If you computed accelerometer features using the provider `[PHONE_ACCLEROMETER][PANDA]` cite [this paper](https://pubmed.ncbi.nlm.nih.gov/31657854/) in addition to RAPIDS.
!!! cite "Panda et al. citation"
Panda N, Solsky I, Huang EJ, Lipsitz S, Pradarelli JC, Delisle M, Cusack JC, Gadd MA, Lubitz CC, Mullen JT, Qadan M, Smith BL, Specht M, Stephen AE, Tanabe KK, Gawande AA, Onnela JP, Haynes AB. Using Smartphones to Capture Novel Recovery Metrics After Cancer Surgery. JAMA Surg. 2020 Feb 1;155(2):123-129. doi: 10.1001/jamasurg.2019.4702. PMID: 31657854; PMCID: PMC6820047.
## Stachl (applications foreground)
If you computed applications foreground features using the app category (genre) catalogue in `[PHONE_APPLICATIONS_FOREGROUND][RAPIDS]` cite [this paper](https://www.pnas.org/content/117/30/17680) in addition to RAPIDS.
!!! cite "Stachl et al. citation"
Clemens Stachl, Quay Au, Ramona Schoedel, Samuel D. Gosling, Gabriella M. Harari, Daniel Buschek, Sarah Theres Völkel, Tobias Schuwerk, Michelle Oldemeier, Theresa Ullmann, Heinrich Hussmann, Bernd Bischl, Markus Bühner. Proceedings of the National Academy of Sciences Jul 2020, 117 (30) 17680-17687; DOI: 10.1073/pnas.1920484117
## Barnett (locations)
If you computed locations features using the provider `[PHONE_LOCATIONS][BARNETT]` cite [this paper](https://doi.org/10.1093/biostatistics/kxy059) and [this paper](https://doi.org/10.1145/2750858.2805845) in addition to RAPIDS.
!!! cite "Barnett et al. citation"
Ian Barnett, Jukka-Pekka Onnela, Inferring mobility measures from GPS traces with missing data, Biostatistics, Volume 21, Issue 2, April 2020, Pages e98e112, https://doi.org/10.1093/biostatistics/kxy059
!!! cite "Canzian et al. citation"
Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp '15). Association for Computing Machinery, New York, NY, USA, 12931304. DOI:https://doi.org/10.1145/2750858.2805845
## Doryab (locations)
If you computed locations features using the provider `[PHONE_LOCATIONS][DORYAB]` cite [this paper](https://arxiv.org/abs/1812.10394) and [this paper](https://doi.org/10.1145/2750858.2805845) in addition to RAPIDS.
!!! cite "Doryab et al. citation"
Doryab, A., Chikarsel, P., Liu, X., & Dey, A. K. (2019). Extraction of Behavioral Features from Smartphone and Wearable Data. ArXiv:1812.10394 [Cs, Stat]. http://arxiv.org/abs/1812.10394
!!! cite "Canzian et al. citation"
Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp '15). Association for Computing Machinery, New York, NY, USA, 12931304. DOI:https://doi.org/10.1145/2750858.2805845

View File

@ -0,0 +1,134 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
overall community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or
advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
moshi@pitt.edu.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series
of actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within
the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
[https://www.contributor-covenant.org/version/2/0/code_of_conduct.html][v2.0].
Community Impact Guidelines were inspired by
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available
at [https://www.contributor-covenant.org/translations][translations].
[homepage]: https://www.contributor-covenant.org
[v2.0]: https://www.contributor-covenant.org/version/2/0/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
[translations]: https://www.contributor-covenant.org/translations

View File

@ -1,244 +0,0 @@
# -*- coding: utf-8 -*-
#
# RAPIDS documentation build configuration file, created by
# sphinx-quickstart.
#
# This file is execfile()d with the current directory set to its containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import os
import sys
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
# sys.path.insert(0, os.path.abspath('.'))
# -- General configuration -----------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be extensions
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = []
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix of source filenames.
source_suffix = '.rst'
# The encoding of source files.
# source_encoding = 'utf-8-sig'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'RAPIDS'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = '0.1'
# The full version, including alpha/beta/rc tags.
release = '0.1'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
# language = None
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
# today = ''
# Else, today_fmt is used as the format for a strftime call.
# today_fmt = '%B %d, %Y'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ['_build']
# The reST default role (used for this markup: `text`) to use for all documents.
# default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
# add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
# add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
# show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# A list of ignored prefixes for module index sorting.
# modindex_common_prefix = []
# -- Options for HTML output ---------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'sphinx_rtd_theme'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
# html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
# html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
# html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
# html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
# html_logo = None
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
# html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
# html_last_updated_fmt = '%b %d, %Y'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
# html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
# html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
# html_additional_pages = {}
# If false, no module index is generated.
# html_domain_indices = True
# If false, no index is generated.
# html_use_index = True
# If true, the index is split into individual pages for each letter.
# html_split_index = False
# If true, links to the reST sources are added to the pages.
# html_show_sourcelink = True
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
# html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
# html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
# html_use_opensearch = ''
# This is the file name suffix for HTML files (e.g. ".xhtml").
# html_file_suffix = None
# Output file base name for HTML help builder.
htmlhelp_basename = 'rapidsdoc'
# -- Options for LaTeX output --------------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
# 'preamble': '',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title, author, documentclass [howto/manual]).
latex_documents = [
('index',
'rapids.tex',
u'RAPIDS Documentation',
u"RAPIDS", 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
# latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
# latex_use_parts = False
# If true, show page references after internal links.
# latex_show_pagerefs = False
# If true, show URL addresses after external links.
# latex_show_urls = False
# Documents to append as an appendix to all manuals.
# latex_appendices = []
# If false, no module index is generated.
# latex_domain_indices = True
# -- Options for manual page output --------------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
('index', 'RAPIDS', u'RAPIDS Documentation',
[u"RAPIDS"], 1)
]
# If true, show URL addresses after external links.
# man_show_urls = False
# -- Options for Texinfo output ------------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
('index', 'RAPIDS', u'RAPIDS Documentation',
u"RAPIDS", 'RAPIDS',
'Reproducible Analysis Pipeline for Data Streams', 'Miscellaneous'),
]
# Documents to append as an appendix to all manuals.
# texinfo_appendices = []
# If false, no module index is generated.
# texinfo_domain_indices = True
# How to display URL addresses: 'footnote', 'no', or 'inline'.
# texinfo_show_urls = 'footnote'

View File

@ -1,83 +0,0 @@
RAPIDS Contributors
====================
Currently, RAPIDS is being developed by the Mobile Sensing + Health Institute (MoSHI) but if you are interested in contributing feel free to submit a pull request or contact us.
Julio Vega, PhD
""""""""""""""""""
**Postdoctoral Associate**
vegaju@upmc.edu
Julio Vega is a postdoctoral associate at the Mobile Sensing + Health Institute. He is interested in personalized methodologies to monitor chronic conditions that affect daily human behavior using mobile and wearable data. In the long term, his goal is to explore how we can enable patients to inform, amend, and evaluate their health tracking algorithms to improve disease self-management.
`Julio Vega Personal Website`_
Meng Li, MS
"""""""""""""
**Data Scientist**
lim11@upmc.edu
Meng Li received her Master of Science degree in Information Science from the University of Pittsburgh. She is interested in applying machine learning algorithms to the medical field.
`Meng Li Linkedin Profile`_
`Meng Li Github Profile`_
Kwesi Aguillera, BS
""""""""""""""""""""
**Intern**
Kwesi Aguillera is currently in his first year at the University of Pittsburgh pursuing a Master of Sciences in Information Science specializing in Big Data Analytics. He received his Bachelor of Science degree in Computer Science and Management from the University of the West Indies. Kwesi considers himself a full stack developer and looks forward to applying this knowledge to big data analysis.
`Kwesi Aguillera Linkedin Profile`_
Echhit Joshi, BS
"""""""""""""""""
**Intern**
Echhit Joshi is a Masters student at the School of Computing and Information at University of Pittsburgh. His areas of interest are Machine/Deep Learning, Data Mining, and Analytics.
`Echhit Joshi Linkedin Profile`_
Nicolas Leo, BS
""""""""""""""""
**Intern**
Nicolas is a rising senior studying computer science at the University of Pittsburgh. His academic interests include databases, machine learning, and application development. After completing his undergraduate degree, he plans to attend graduate school for a MS in Computer Science with a focus on Intelligent Systems.
Nikunj Goel, BS
""""""""""""""""
**Intern**
Nik is a graduate student at the University of Pittsburgh pursuing Master of Science in Information Science. He earned his Bachelor of Technology degree in Information Technology from India. He is a Data Enthusiasts and passionate about finding the meaning out of raw data. In a long term, his goal is to create a breakthrough in Data Science and Deep Learning.
`Nikunj Goel Linkedin Profile`_
Agam Kumar, BS
""""""""""""""""
**Research Assistant at CMU**
Agam is a junior at Carnegie Mellon University studying Statistics and Machine Learning and pursuing an additional major in Computer Science. He is a member of the Data Science team in the Health and Human Performance Lab at CMU and has keen interests in software development and data science. His research interests include ML applications in medicine.
`Agam Kumar Linkedin Profile`_
`Agam Kumar Github Profile`_
.. _`Julio Vega Personal Website`: https://juliovega.info/
.. _`Meng Li Linkedin Profile`: https://www.linkedin.com/in/meng-li-57238414a
.. _`Meng Li Github Profile`: https://github.com/Meng6
.. _`Kwesi Aguillera Linkedin Profile`: https://www.linkedin.com/in/kwesi-aguillera-29529823
.. _`Echhit Joshi Linkedin Profile`: https://www.linkedin.com/in/echhitjoshi/
.. _`Nikunj Goel Linkedin Profile`: https://www.linkedin.com/in/nikunjgoel95/
.. _`Agam Kumar Linkedin Profile`: https://www.linkedin.com/in/agam-kumar
.. _`Agam Kumar Github Profile`: https://github.com/agam-kumar

View File

@ -1,237 +0,0 @@
How to Edit Documentation
============================
The following is a basic guide for editing the documentation for this project. The documentation is rendered using Sphinx_ documentation builder
Quick start up
----------------------------------
#. Install Sphinx in Mac OS ``brew install sphinx-doc`` or Linux (Ubuntu) ``apt-get install python3-sphinx``
#. Go to the docs folder ``cd docs``
#. Change any ``.rst`` file you need to modify
#. To visualise the results locally do ``make dirhtml`` and check the html files in the ``_build/dirhtml`` directory
#. When you are done, push your changes to the git repo.
Sphinx Workspace Structure
----------------------------
All of the files concerned with documentation can be found in the ``docs`` directory. At the top level there is the ``conf.py`` file and an ``index.rst`` file among others. There should be no need to change the ``conf.py`` file. The ``index.rst`` file is known as the master document and defines the document structure of the documentation (i.e. Menu Or Table of Contents structure). It contains the root of the “table of contents" tree -or toctree- that is used to connect the multiple files to a single hierarchy of documents. The TOC is defined using the ``toctree`` directive which is used as follows::
.. toctree::
:maxdepth: 2
:caption: Getting Started
usage/introduction
usage/installation
The ``toctree`` inserts a TOC tree at the current location using the individual TOCs of the documents given in the directive command body. In other words if there are ``toctree`` directives in the files listed in the above example it will also be applied to the resulting TOC. Relative document names (not beginning with a slash) are relative to the document the directive occurs in, absolute names are relative to the source directory. Thus in the example above the ``usage`` directory is relative to the ``index.rst`` page . The ``:maxdepth:`` parameter defines the depth of the tree for that particular menu. The ``caption`` parameter is used to give a caption for that menu tree at that level. It should be noted the titles for the links of the menu items under that header would be taken from the titles of the referenced document. For example the menu item title for ``usage/introduction`` is taken from the main header specified in ``introduction.rst`` document in the ``usage`` directory. Also note the document name does not include the extention (i.e. .rst).
Thus the directory structure for the above example is shown below::
├── index.rst
└── usage
├── introduction.rst
└── installation.rst
Basic reStructuredText Syntax
-------------------------------
Now we will look at some basic reStructuredText syntax necessary to start editing the .rst files that are used to generate documentation.
Headers
""""""""
**Section Header**
The following was used to make the header at the top of this page:
::
How to Edit Documentation
==========================
**Subsection Header**
The follwoing was used to create the secondary header (e.g. Sphinx Workspace Structure section header)
::
Sphinx Workspace structure
----------------------------
.....
Lists
""""""
**Bullets List**
::
- This is a bullet
- This is a bullet
Will produce the following:
- This is a bullet
- This is a bullet
**Numbered List**
::
#. This is a numbered list item
#. This is a numbered list item
Will produce the following:
#. This is a numbered list item
#. This is a numbered list item
.....
Inline Markup
""""""""""""""
**Emphasis/Italics**
::
*This is for emphasis*
Will produce the following
*This is for emphasis*
**Bold**
::
**This is bold text**
Will produce the following
**This is bold text**
.....
**Code Sample**
::
``Backquotes = code sample``
Will produce the following:
``Backquotes = code sample``
**Apostraphies in Text**
::
`don't know`
Will produce the following
`don't know`
**Literal blocks**
Literal code blocks are introduced by ending a paragraph with the special marker ``::``. The literal block must be indented (and, like all paragraphs, separated from the surrounding ones by blank lines)::
This is a normal text paragraph. The next paragraph is a code sample::
It is not processed in any way, except
that the indentation is removed.
It can span multiple lines.
This is a normal text paragraph again.
The following is produced:
.....
This is a normal text paragraph. The next paragraph is a code sample::
It is not processed in any way, except
that the indentation is removed.
It can span multiple lines.
This is a normal text paragraph again.
.....
**Doctest blocks**
Doctest blocks are interactive Python sessions cut-and-pasted into docstrings. They do not require the literal blocks syntax. The doctest block must end with a blank line and should not end with with an unused prompt:
>>> 1 + 1
2
**External links**
Use ```Link text <https://domain.invalid/>`_`` for inline web links `Link text <https://domain.invalid/>`_. If the link text should be the web address, you dont need special markup at all, the parser finds links and mail addresses in ordinary text. *Important:* There must be a space between the link text and the opening ``<`` for the URL.
You can also separate the link and the target definition , like this
::
This is a paragraph that contains `a link`_.
.. _a link: https://domain.invalid/
Will produce the following:
This is a paragraph that contains `a link`_.
.. _a link: https://domain.invalid/
**Internal links**
Internal linking is done via a special reST role provided by Sphinx to cross-reference arbitrary locations. For this to work label names must be unique throughout the entire documentation. There are two ways in which you can refer to labels:
- If you place a label directly before a section title, you can reference to it with ``:ref:`label-name```. For example::
.. _my-reference-label:
Section to cross-reference
--------------------------
This is the text of the section.
It refers to the section itself, see :ref:`my-reference-label`.
The ``:ref:`` role would then generate a link to the section, with the link title being “Section to cross-reference”. This works just as well when section and reference are in different source files. The above produces the following:
.....
.. _my-reference-label:
Section to cross-reference
"""""""""""""""""""""""""""
This is the text of the section.
It refers to the section itself, see :ref:`my-reference-label`.
.....
- Labels that arent placed before a section title can still be referenced, but you must give the link an explicit title, using this syntax: ``:ref:`Link title <label-name>```.
**Comments**
Every explicit markup block which isnt a valid markup construct is regarded as a comment. For example::
.. This is a comment.
Go to Sphinx_ for more documentation.
.. _Sphinx: https://www.sphinx-doc.org
.. _reStructuredText: https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html

View File

@ -1,18 +0,0 @@
Manage virtual environments
=============================
**Add new packages**
Try to install any new package using `conda install my_package`. If a package is not available in one of conda's channels you can install it with pip but make sure your virtual environment is active.
**Update your conda environment.yaml**
After installing a new package you can use the following command in your terminal to update your ``environment.yaml`` before publishing your pipeline. Note that we ignore the package version for ``libfortran`` to keep compatibility with Linux:
``conda env export --no-builds | sed 's/^.*libgfortran.*$/ - libgfortran/' > environment.yml``
**Update and prune your conda environment from a environment.yaml file**
Execute the following command in your terminal. See https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#updating-an-environment
``conda env update --prefix ./env --file environment.yml --prune``

View File

@ -1,28 +0,0 @@
Add new features to RAPIDS
============================
Take accelerometer features as an example.
#. Add your script to accelerometer_ folder
- Copy the signature of the base_accelerometer_features() function_ for your own feature function
#. Add any parameters you need for your function
- Add your parameters to the settings_ of accelerometer sensor in config file
- Add your parameters to the params_ of accelerometer_features rule in features.snakefile
#. Merge your new features with the existent features
- Call the function you just created below this line (LINK_) of accelerometer_features.py script
#. Update config file
- Add your new feature names to the ``FEATURES`` list for accelerometer in the config_ file
.. _accelerometer: https://github.com/carissalow/rapids/tree/master/src/features/accelerometer
.. _function: https://github.com/carissalow/rapids/blob/master/src/features/accelerometer/accelerometer_base.py#L35
.. _settings: https://github.com/carissalow/rapids/blob/master/config.yaml#L100
.. _params: https://github.com/carissalow/rapids/blob/master/rules/features.snakefile#L146
.. _LINK: https://github.com/carissalow/rapids/blob/master/src/features/accelerometer_features.py#L10
.. _config: https://github.com/carissalow/rapids/blob/master/config.yaml#L102

View File

@ -1,16 +0,0 @@
Remote Support
======================================
We use the Live Share extension of Visual Studio Code to debug bugs when sharing data or database credentials is not possible.
#. Install `Visual Studio Code <https://code.visualstudio.com/>`_
#. Open you rapids folder in a new VSCode window
#. Open a new Terminal ``Terminal > New terminal``
#. Install the `Live Share extension pack <https://marketplace.visualstudio.com/items?itemName=MS-vsliveshare.vsliveshare-pack>`_
#. Press ``Ctrl+P``/``Cmd+P`` and run this command ``>live share: start collaboration session``
#. Follow the instructions and share the session link you receive

View File

@ -1,110 +0,0 @@
.. _test-cases:
Test Cases
-----------
Along with the continued development and the addition of new sensors and features to the RAPIDS pipeline, tests for the currently available sensors and features are being implemented. Since this is a Work In Progress this page will be updated with the list of sensors and features for which testing is available. For each of the sensors listed a description of the data used for testing (test cases) are outline. Currently for all intent and testing purposes the ``tests/data/raw/test01/`` contains all the test data files for testing android data formats and ``tests/data/raw/test02/`` contains all the test data files for testing iOS data formats. It follows that the expected (verified output) are contained in the ``tests/data/processed/test01/`` and ``tests/data/processed/test02/`` for Android and iOS respectively. ``tests/data/raw/test03/`` and ``tests/data/raw/test04/`` contain data files for testing empty raw data files for android and iOS respectively.
List of Sensor with Tests
^^^^^^^^^^^^^^^^^^^^^^^^^^
The following is a list of the sensors that testing is currently available.
Messages (SMS)
"""""""""""""""
- The raw message data file contains data for 2 separate days.
- The data for the first day contains records 5 records for every ``epoch``.
- The second day's data contains 6 records for each of only 2 ``epoch`` (currently ``morning`` and ``evening``)
- The raw message data contains records for both ``message_types`` (i.e. ``recieved`` and ``sent``) in both days in all epochs. The number records with each ``message_types`` per epoch is randomly distributed There is at least one records with each ``message_types`` per epoch.
- There is one raw message data file each, as described above, for testing both iOS and Android data.
- There is also an additional empty data file for both android and iOS for testing empty data files
Calls
"""""""
Due to the difference in the format of the raw call data for iOS and Android (see the **Assumptions/Observations** section of :ref:`Calls<call-sensor-doc>`) the following is the expected results the ``calls_with_datetime_unified.csv``. This would give a better idea of the use cases being tested since the ``calls_with_datetime_unified.csv`` would make both the iOS and Android data comparable.
- The call data would contain data for 2 days.
- The data for the first day contains 6 records for every ``epoch``.
- The second day's data contains 6 records for each of only 2 ``epoch`` (currently ``morning`` and ``evening``)
- The call data contains records for all ``call_types`` (i.e. ``incoming``, ``outgoing`` and ``missed``) in both days in all epochs. The number records with each of the ``call_types`` per epoch is randomly distributed. There is at least one records with each ``call_types`` per epoch.
- There is one call data file each, as described above, for testing both iOS and Android data.
- There is also an additional empty data file for both android and iOS for testing empty data files
Screen
""""""""
Due to the difference in the format of the raw screen data for iOS and Android (see the **Assumptions/Observations** section of :ref:`Screen<screen-sensor-doc>`) the following is the expected results the ``screen_deltas.csv``. This would give a better idea of the use cases being tested since the ``screen_deltas.csv`` would make both the iOS and Android data comparable. These files are used to calculate the features for the screen sensor.
- The screen delta data file contains data for 1 day.
- The screen delta data contains 1 record to represent an ``unlock`` episode that falls within an ``epoch`` for every ``epoch``.
- The screen delta data contains 1 record to represent an ``unlock`` episode that falls across the boundary of 2 epochs. Namely the ``unlock`` episode starts in one epoch and ends in the next, thus there is a record for ``unlock`` episodes that fall across ``night`` to ``morning``, ``morning`` to ``afternoon`` and finally ``afternoon`` to ``night``
- The testing is done for ``unlock`` episode_type.
- There is one screen data file each for testing both iOS and Android data formats.
- There is also an additional empty data file for both android and iOS for testing empty data files
Battery
"""""""""
Due to the difference in the format of the raw battery data for iOS and Android as well as versions of iOS (see the **Assumptions/Observations** section of :ref:`Battery<battery-sensor-doc>`) the following is the expected results the ``battery_deltas.csv``. This would give a better idea of the use cases being tested since the ``battery_deltas.csv`` would make both the iOS and Android data comparable. These files are used to calculate the features for the battery sensor.
- The battery delta data file contains data for 1 day.
- The battery delta data contains 1 record each for a ``charging`` and ``discharging`` episode that falls within an ``epoch`` for every ``epoch``. Thus, for the ``daily`` epoch there would be multiple ``charging`` and ``discharging`` episodes
- Since either a ``charging`` episode or a ``discharging`` episode and not both can occur across epochs, in order to test episodes that occur across epochs alternating episodes of ``charging`` and ``discharging`` episodes that fall across ``night`` to ``morning``, ``morning`` to ``afternoon`` and finally ``afternoon`` to ``night`` are present in the battery delta data. This starts with a ``discharging`` episode that begins in ``night`` and end in ``morning``.
- There is one battery data file each, for testing both iOS and Android data formats.
- There is also an additional empty data file for both android and iOS for testing empty data files
Bluetooth
""""""""""
- The raw Bluetooth data file contains data for 1 day.
- The raw Bluetooth data contains at least 2 records for each ``epoch``. Each ``epoch`` has a record with a ``timestamp`` for the beginning boundary for that ``epoch`` and a record with a ``timestamp`` for the ending boundary for that ``epoch``. (e.g. For the ``morning`` epoch there is a record with a ``timestamp`` for ``6:00AM`` and another record with a ``timestamp`` for ``11:59:59AM``. These are to test edge cases)
- An option of 5 Bluetooth devices are randomly distributed throughout the data records.
- There is one raw Bluetooth data file each, for testing both iOS and Android data formats.
- There is also an additional empty data file for both android and iOS for testing empty data files.
WIFI
"""""
- There are 2 data files (``wifi_raw.csv`` and ``sensor_wifi_raw.csv``) for each fake participant for each phone platform. (see the **Assumptions/Observations** section of :ref:`WIFI<wifi-sensor-doc>`)
- The raw WIFI data files contain data for 1 day.
- The ``sensor_wifi_raw.csv`` data contains at least 2 records for each ``epoch``. Each ``epoch`` has a record with a ``timestamp`` for the beginning boundary for that ``epoch`` and a record with a ``timestamp`` for the ending boundary for that ``epoch``. (e.g. For the ``morning`` epoch there is a record with a ``timestamp`` for ``6:00AM`` and another record with a ``timestamp`` for ``11:59:59AM``. These are to test edge cases)
- The ``wifi_raw.csv`` data contains 3 records with random timestamps for each ``epoch`` to represent visible broadcasting WIFI network. This file is empty for the iOS phone testing data.
- An option of 10 access point devices is randomly distributed throughout the data records. 5 each for ``sensor_wifi_raw.csv`` and ``wifi_raw.csv``.
- There data files for testing both iOS and Android data formats.
- There are also additional empty data files for both android and iOS for testing empty data files.
Light
"""""""
- The raw light data file contains data for 1 day.
- The raw light data contains 3 or 4 rows of data for each ``epoch`` except ``night``. The single row of data for ``night`` is for testing features for single values inputs. (Example testing the standard deviation of one input value)
- Since light is only available for Android there is only one file that contains data for Android. All other files (i.e. for iPhone) are empty data files.
Application Foreground
"""""""""""""""""""""""
- The raw application foreground data file contains data for 1 day.
- The raw application foreground data contains 7 - 9 rows of data for each ``epoch``. The records for each ``epoch`` contains apps that are randomly selected from a list of apps that are from the ``MULTIPLE_CATEGORIES`` and ``SINGLE_CATEGORIES`` (See `testing_config.yaml`_). There are also records in each epoch that have apps randomly selected from a list of apps that are from the ``EXCLUDED_CATEGORIES`` and ``EXCLUDED_APPS``. This is to test that these apps are actually being excluded from the calculations of features. There are also records to test ``SINGLE_APPS`` calculations.
- Since application foreground is only available for Android there is only one file that contains data for Android. All other files (i.e. for iPhone) are empty data files.
Activity Recognition
""""""""""""""""""""""
- The raw Activity Recognition data file contains data for 1 day.
- The raw Activity Recognition data each ``epoch`` period contains rows that records 2 - 5 different ``activity_types``. The is such that durations of activities can be tested. Additionally, there are records that mimic the duration of an activity over the time boundary of neighboring epochs. (For example, there a set of records that mimic the participant ``in_vehicle`` from ``afternoon`` into ``evening``)
- There is one file each with raw Activity Recognition data for testing both iOS and Android data formats. (plugin_google_activity_recognition_raw.csv for android and plugin_ios_activity_recognition_raw.csv for iOS)
- There is also an additional empty data file for both android and iOS for testing empty data files.
Conversation
"""""""""""""
- The raw conversation data file contains data for 2 day.
- The raw conversation data contains records with a sample of both ``datatypes`` (i.e. ``voice/noise`` = ``0``, and ``conversation`` = ``2`` ) as well as rows with for samples of each of the ``inference`` values (i.e. ``silence`` = ``0``, ``noise`` = ``1``, ``voice`` = ``2``, and ``unknown`` = ``3``) for each ``epoch``. The different ``datatype`` and ``inference`` records are randomly distributed throughout the ``epoch``.
- Additionally there are 2 - 5 records for conversations (``datatype`` = 2, and ``inference`` = -1) in each ``epoch`` and for each ``epoch`` except night, there is a conversation record that has a ``double_convo_start`` ``timestamp`` that is from the previous ``epoch``. This is to test the calculations of features across ``epochs``.
- There is a raw conversation data file for both android and iOS platforms (``plugin_studentlife_audio_android_raw.csv`` and ``plugin_studentlife_audio_raw.csv`` respectively).
- Finally, there are also additional empty data files for both android and iOS for testing empty data files
.. _`testing_config.yaml`: https://github.com/carissalow/rapids/blob/c498b8d2dfd7cc29d1e4d53e978d30cff6cdf3f2/tests/settings/testing_config.yaml#L70

View File

@ -1,67 +0,0 @@
Testing
==========
The following is a simple guide to testing RAPIDS. All files necessary for testing are stored in the ``tests`` directory:
::
├── tests
│ ├── data <- Replica of the project root data directory for testing.
│ │ ├── external <- Contains the fake testing participant files.
│ │ ├── interim <- The expected intermediate data that has been transformed.
│ │ ├── processed <- The expected final data, canonical data sets for modeling used to test/validate feature calculations.
│ │ └── raw <- The specially created raw input datasets (fake data) that will be used for testing.
│ │
│ ├── scripts <- Scripts for testing. Add test scripts in this directory.
│ │ ├── run_tests.sh <- The shell script to runs RAPIDS pipeline test data and test the results
│ │ ├── test_sensor_features.py <- The default test script for testing RAPIDS builting sensor features.
│ │ └── utils.py <- Contains any helper functions and methods.
│ │
│ ├── settings <- The directory contains the config and settings files for testing snakemake.
│ │ ├── config.yaml <- Defines the testing profile configurations for running snakemake.
│ │ └── testing_config.yaml <- Contains the actual snakemake configuration settings for testing.
│ │
│ └── Snakefile <- The Snakefile for testing only. It contains the rules that you would be testing.
Steps for Testing
""""""""""""""""""
#. To begin testing RAPIDS place the fake raw input data ``csv`` files in ``tests/data/raw/``. The fake participant files should be placed in ``tests/data/external/``. The expected output files of RAPIDS after processing the input data should be placed in ``tests/data/processesd/``.
#. The Snakemake rule(s) that are to be tested must be placed in the ``tests/Snakemake`` file. The current ``tests/Snakemake`` is a good example of how to define them. (At the time of writing this documentation the snakefile contains rules messages (SMS), calls and screen)
#. Edit the ``tests/settings/config.yaml``. Add and/or remove the rules to be run for testing from the ``forcerun`` list.
#. Edit the ``tests/settings/testing_config.yaml`` with the necessary configuration settings for running the rules to be tested.
#. Add any additional testscripts in ``tests/scripts``.
#. Uncomment or comment off lines in the testing shell script ``tests/scripts/run_tests.sh``.
#. Run the testing shell script.
::
$ tests/scripts/run_tests.sh
The following is a snippet of the output you should see after running your test.
::
test_sensors_files_exist (test_sensor_features.TestSensorFeatures) ... ok
test_sensors_features_calculations (test_sensor_features.TestSensorFeatures) ... FAIL
======================================================================
FAIL: test_sensors_features_calculations (test_sensor_features.TestSensorFeatures)
----------------------------------------------------------------------
The results above show that the first test ``test_sensors_files_exist`` passed while ``test_sensors_features_calculations`` failed. In addition you should get the traceback of the failure (not shown here). For more information on how to implement test scripts and use unittest please see `Unittest Documentation`_
Testing of the RAPIDS sensors and features is a work-in-progess. Please see :ref:`test-cases` for a list of sensors and features that have testing currently available.
Currently the repository is set up to test a number of senssors out of the box by simply running the ``tests/scripts/run_tests.sh`` command once the RAPIDS python environment is active.
.. _`Unittest Documentation`: https://docs.python.org/3.7/library/unittest.html#command-line-interface

View File

@ -0,0 +1,41 @@
# Documentation
We use [mkdocs](https://www.mkdocs.org/) with the [material theme](https://squidfunk.github.io/mkdocs-material/) to write these docs. Whenever you make any changes, just push them back to the repo and the documentation will be deployed automatically.
## Set up development environment
1. Make sure your conda environment is active
2. `pip install mkdocs`
3. `pip install mkdocs-material`
## Preview
Run the following command in RAPIDS root folder and go to [http://127.0.0.1:8000](http://127.0.0.1:8000):
```bash
mkdocs serve
```
## File Structure
The documentation config file is `/mkdocs.yml`, if you are adding new `.md` files to the docs modify the `nav` attribute at the bottom of that file. You can use the hierarchy there to find all the files that appear in the documentation.
## Reference
Check this [page](https://squidfunk.github.io/mkdocs-material/reference/abbreviations/) to get familiar with the different visual elements we can use in the docs (admonitions, code blocks, tables, etc.) You can also refer to `/docs/setup/installation.md` and `/docs/setup/configuration.md` to see practical examples of these elements.
!!! hint
Any links to internal pages should be relative to the current page. For example, any link from this page (documentation) which is inside `./developers` should begin with `../` to go one folder level up like:
```md
[mylink](../setup/installation.md)
```
## Extras
You can insert [emojis](https://facelessuser.github.io/pymdown-extensions/extensions/emoji/) using this syntax `:[SOURCE]-[ICON_NAME]` from the following sources:
- https://materialdesignicons.com/
- https://fontawesome.com/icons/tasks?style=solid
- https://primer.style/octicons/
You can use this [page](https://www.tablesgenerator.com/markdown_tables) to create markdown tables more easily

View File

@ -0,0 +1,14 @@
# Remote Support
We use the Live Share extension of Visual Studio Code to debug bugs when sharing data or database credentials is not possible.
1. Install [Visual Studio Code](https://code.visualstudio.com/)
2. Open you RAPIDS root folder in a new VSCode window
3. Open a new Terminal `Terminal > New terminal`
4. Install the [Live Share extension pack](https://marketplace.visualstudio.com/items?itemName=MS-vsliveshare.vsliveshare-pack)
5. Press ++ctrl+p++ or ++cmd+p++ and run this command:
```bash
>live share: start collaboration session
```
6. Follow the instructions and share the session link you receive

View File

@ -0,0 +1,185 @@
# Test Cases
Along with the continued development and the addition of new sensors and features to the RAPIDS pipeline, tests for the currently available sensors and features are being implemented. Since this is a Work In Progress this page will be updated with the list of sensors and features for which testing is available. For each of the sensors listed a description of the data used for testing (test cases) are outline. Currently for all intent and testing purposes the `tests/data/raw/test01/` contains all the test data files for testing android data formats and `tests/data/raw/test02/` contains all the test data files for testing iOS data formats. It follows that the expected (verified output) are contained in the `tests/data/processed/test01/` and `tests/data/processed/test02/` for Android and iOS respectively. `tests/data/raw/test03/` and `tests/data/raw/test04/` contain data files for testing empty raw data files for android and iOS respectively.
The following is a list of the sensors that testing is currently available.
## Messages (SMS)
- The raw message data file contains data for 2 separate days.
- The data for the first day contains records 5 records for every
`epoch`.
- The second day\'s data contains 6 records for each of only 2
`epoch` (currently `morning` and `evening`)
- The raw message data contains records for both `message_types`
(i.e. `recieved` and `sent`) in both days in all epochs. The
number records with each `message_types` per epoch is randomly
distributed There is at least one records with each
`message_types` per epoch.
- There is one raw message data file each, as described above, for
testing both iOS and Android data.
- There is also an additional empty data file for both android and
iOS for testing empty data files
## Calls
Due to the difference in the format of the raw call data for iOS and Android the following is the expected results the `calls_with_datetime_unified.csv`. This would give a better idea of the use cases being tested since the `calls_with_datetime_unified.csv` would make both the iOS and Android data comparable.
- The call data would contain data for 2 days.
- The data for the first day contains 6 records for every `epoch`.
- The second day\'s data contains 6 records for each of only 2
`epoch` (currently `morning` and `evening`)
- The call data contains records for all `call_types` (i.e.
`incoming`, `outgoing` and `missed`) in both days in all epochs.
The number records with each of the `call_types` per epoch is
randomly distributed. There is at least one records with each
`call_types` per epoch.
- There is one call data file each, as described above, for testing
both iOS and Android data.
- There is also an additional empty data file for both android and
iOS for testing empty data files
## Screen
Due to the difference in the format of the raw screen data for iOS and Android the following is the expected results the `screen_deltas.csv`. This would give a better idea of the use cases being tested since the `screen_eltas.csv` would make both the iOS and Android data comparable These files are used to calculate the features for the screen sensor
- The screen delta data file contains data for 1 day.
- The screen delta data contains 1 record to represent an `unlock`
episode that falls within an `epoch` for every `epoch`.
- The screen delta data contains 1 record to represent an `unlock`
episode that falls across the boundary of 2 epochs. Namely the
`unlock` episode starts in one epoch and ends in the next, thus
there is a record for `unlock` episodes that fall across `night`
to `morning`, `morning` to `afternoon` and finally `afternoon` to
`night`
- The testing is done for `unlock` episode\_type.
- There is one screen data file each for testing both iOS and
Android data formats.
- There is also an additional empty data file for both android and
iOS for testing empty data files
## Battery
Due to the difference in the format of the raw battery data for iOS and Android as well as versions of iOS the following is the expected results the `battery_deltas.csv`. This would give a better idea of the use cases being tested since the `battery_deltas.csv` would make both the iOS and Android data comparable. These files are used to calculate the features for the battery sensor.
- The battery delta data file contains data for 1 day.
- The battery delta data contains 1 record each for a `charging` and
`discharging` episode that falls within an `epoch` for every
`epoch`. Thus, for the `daily` epoch there would be multiple
`charging` and `discharging` episodes
- Since either a `charging` episode or a `discharging` episode and
not both can occur across epochs, in order to test episodes that
occur across epochs alternating episodes of `charging` and
`discharging` episodes that fall across `night` to `morning`,
`morning` to `afternoon` and finally `afternoon` to `night` are
present in the battery delta data. This starts with a
`discharging` episode that begins in `night` and end in `morning`.
- There is one battery data file each, for testing both iOS and
Android data formats.
- There is also an additional empty data file for both android and
iOS for testing empty data files
## Bluetooth
- The raw Bluetooth data file contains data for 1 day.
- The raw Bluetooth data contains at least 2 records for each
`epoch`. Each `epoch` has a record with a `timestamp` for the
beginning boundary for that `epoch` and a record with a
`timestamp` for the ending boundary for that `epoch`. (e.g. For
the `morning` epoch there is a record with a `timestamp` for
`6:00AM` and another record with a `timestamp` for `11:59:59AM`.
These are to test edge cases)
- An option of 5 Bluetooth devices are randomly distributed
throughout the data records.
- There is one raw Bluetooth data file each, for testing both iOS
and Android data formats.
- There is also an additional empty data file for both android and
iOS for testing empty data files.
## WIFI
- There are 2 data files (`wifi_raw.csv` and `sensor_wifi_raw.csv`)
for each fake participant for each phone platform.
- The raw WIFI data files contain data for 1 day.
- The `sensor_wifi_raw.csv` data contains at least 2 records for
each `epoch`. Each `epoch` has a record with a `timestamp` for the
beginning boundary for that `epoch` and a record with a
`timestamp` for the ending boundary for that `epoch`. (e.g. For
the `morning` epoch there is a record with a `timestamp` for
`6:00AM` and another record with a `timestamp` for `11:59:59AM`.
These are to test edge cases)
- The `wifi_raw.csv` data contains 3 records with random timestamps
for each `epoch` to represent visible broadcasting WIFI network.
This file is empty for the iOS phone testing data.
- An option of 10 access point devices is randomly distributed
throughout the data records. 5 each for `sensor_wifi_raw.csv` and
`wifi_raw.csv`.
- There data files for testing both iOS and Android data formats.
- There are also additional empty data files for both android and
iOS for testing empty data files.
## Light
- The raw light data file contains data for 1 day.
- The raw light data contains 3 or 4 rows of data for each `epoch`
except `night`. The single row of data for `night` is for testing
features for single values inputs. (Example testing the standard
deviation of one input value)
- Since light is only available for Android there is only one file
that contains data for Android. All other files (i.e. for iPhone)
are empty data files.
## Application Foreground
- The raw application foreground data file contains data for 1 day.
- The raw application foreground data contains 7 - 9 rows of data
for each `epoch`. The records for each `epoch` contains apps that
are randomly selected from a list of apps that are from the
`MULTIPLE_CATEGORIES` and `SINGLE_CATEGORIES` (See
[testing\_config.yaml]()). There are also records in each epoch
that have apps randomly selected from a list of apps that are from
the `EXCLUDED_CATEGORIES` and `EXCLUDED_APPS`. This is to test
that these apps are actually being excluded from the calculations
of features. There are also records to test `SINGLE_APPS`
calculations.
- Since application foreground is only available for Android there
is only one file that contains data for Android. All other files
(i.e. for iPhone) are empty data files.
## Activity Recognition
- The raw Activity Recognition data file contains data for 1 day.
- The raw Activity Recognition data each `epoch` period contains
rows that records 2 - 5 different `activity_types`. The is such
that durations of activities can be tested. Additionally, there
are records that mimic the duration of an activity over the time
boundary of neighboring epochs. (For example, there a set of
records that mimic the participant `in_vehicle` from `afternoon`
into `evening`)
- There is one file each with raw Activity Recognition data for
testing both iOS and Android data formats.
(plugin\_google\_activity\_recognition\_raw.csv for android and
plugin\_ios\_activity\_recognition\_raw.csv for iOS)
- There is also an additional empty data file for both android and
iOS for testing empty data files.
## Conversation
- The raw conversation data file contains data for 2 day.
- The raw conversation data contains records with a sample of both
`datatypes` (i.e. `voice/noise` = `0`, and `conversation` = `2` )
as well as rows with for samples of each of the `inference` values
(i.e. `silence` = `0`, `noise` = `1`, `voice` = `2`, and `unknown`
= `3`) for each `epoch`. The different `datatype` and `inference`
records are randomly distributed throughout the `epoch`.
- Additionally there are 2 - 5 records for conversations (`datatype`
= 2, and `inference` = -1) in each `epoch` and for each `epoch`
except night, there is a conversation record that has a
`double_convo_start` `timestamp` that is from the previous
`epoch`. This is to test the calculations of features across
`epochs`.
- There is a raw conversation data file for both android and iOS
platforms (`plugin_studentlife_audio_android_raw.csv` and
`plugin_studentlife_audio_raw.csv` respectively).
- Finally, there are also additional empty data files for both
android and iOS for testing empty data files

View File

@ -0,0 +1,45 @@
# Testing
The following is a simple guide to testing RAPIDS. All files necessary for testing are stored in the `/tests` directory
## Steps for Testing
1. To begin testing RAPIDS place the fake raw input data `csv` files in
`tests/data/raw/`. The fake participant files should be placed in
`tests/data/external/`. The expected output files of RAPIDS after
processing the input data should be placed in
`tests/data/processesd/`.
2. The Snakemake rule(s) that are to be tested must be placed in the
`tests/Snakemake` file. The current `tests/Snakemake` is a good
example of how to define them. (At the time of writing this
documentation the snakefile contains rules messages (SMS), calls and
screen)
3. Edit the `tests/settings/config.yaml`. Add and/or remove the rules
to be run for testing from the `forcerun` list.
4. Edit the `tests/settings/testing_config.yaml` with the necessary
configuration settings for running the rules to be tested.
5. Add any additional testscripts in `tests/scripts`.
6. Uncomment or comment off lines in the testing shell script
`tests/scripts/run_tests.sh`.
7. Run the testing shell script.
```bash
tests/scripts/run_tests.sh
```
The following is a snippet of the output you should see after running your test.
```bash
test_sensors_files_exist (test_sensor_features.TestSensorFeatures) ... ok
test_sensors_features_calculations (test_sensor_features.TestSensorFeatures) ... FAIL
======================================================================
FAIL: test_sensors_features_calculations (test_sensor_features.TestSensorFeatures)
----------------------------------------------------------------------
```
The results above show that the first test `test_sensors_files_exist` passed while `test_sensors_features_calculations` failed. In addition you should get the traceback of the failure (not shown here). For more information on how to implement test scripts and use unittest please see [Unittest Documentation](https://docs.python.org/3.7/library/unittest.html#command-line-interface)
Testing of the RAPIDS sensors and features is a work-in-progress. Please see `test-cases`{.interpreted-text role="ref"} for a list of sensors and features that have testing currently available.
Currently the repository is set up to test a number of sensors out of the box by simply running the `tests/scripts/run_tests.sh` command once the RAPIDS python environment is active.

View File

@ -0,0 +1,35 @@
## Python Virtual Environment
### Add new packages
Try to install any new package using `conda install -c CHANNEL PACKAGE_NAME` (you can use `pip` if the package is only available there). Make sure your Python virtual environment is active (`conda activate YOUR_ENV`).
### Remove packages
Uninstall packages using the same manager you used to install them `conda remove PACKAGE_NAME` or `pip uninstall PACKAGE_NAME`
### Update your conda `environment.yaml`
After installing or removing a package you can use the following command in your terminal to update your `environment.yaml` before publishing your pipeline. Note that we ignore the package version for `libfortran` to keep compatibility with Linux:
```bash
conda env export --no-builds | sed 's/^.*libgfortran.*$/ - libgfortran/' > environment.yml
```
## R Virtual Environment
### Add new packages
1. Open your terminal and navigate to RAPIDS' root folder
2. Run `R` to open an R interactive session
3. Run `renv::install("PACKAGE_NAME")`
### Remove packages
1. Open your terminal and navigate to RAPIDS' root folder
2. Run `R` to open an R interactive session
3. Run `renv::remove("PACKAGE_NAME")`
### Update your R `renv.lock`
After installing or removing a package you can use the following command in your terminal to update your `renv.lock` before publishing your pipeline.
1. Open your terminal and navigate to RAPIDS' root folder
2. Run `R` to open an R interactive session
3. Run `renv::snapshot()` (renv will ask you to confirm any updates to this file)

195
docs/faq.md 100644
View File

@ -0,0 +1,195 @@
# Frequently Asked Questions
## Cannot connect to your MySQL server
???+ failure "Problem"
```bash
**Error in .local(drv, \...) :** **Failed to connect to database: Error:
Can\'t initialize character set unknown (path: compiled\_in)** :
Calls: dbConnect -> dbConnect -> .local -> .Call
Execution halted
[Tue Mar 10 19:40:15 2020]
Error in rule download_dataset:
jobid: 531
output: data/raw/p60/locations_raw.csv
RuleException:
CalledProcessError in line 20 of /home/ubuntu/rapids/rules/preprocessing.snakefile:
Command 'set -euo pipefail; Rscript --vanilla /home/ubuntu/rapids/.snakemake/scripts/tmp_2jnvqs7.download_dataset.R' returned non-zero exit status 1.
File "/home/ubuntu/rapids/rules/preprocessing.snakefile", line 20, in __rule_download_dataset
File "/home/ubuntu/anaconda3/envs/moshi-env/lib/python3.7/concurrent/futures/thread.py", line 57, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
```
???+ done "Solution"
Please make sure the `DATABASE_GROUP` in `config.yaml` matches your DB credentials group in `.env`.
---
## Cannot start mysql in linux via `brew services start mysql`
???+ failure "Problem"
Cannot start mysql in linux via `brew services start mysql`
???+ done "Solution"
Use `mysql.server start`
---
## Every time I run force the download_dataset rule all rules are executed
???+ failure "Problem"
When running `snakemake -j1 -R download_phone_data` or `./rapids -j1 -R download_phone_data` all the rules and files are re-computed
???+ done "Solution"
This is expected behavior. The advantage of using `snakemake` under the hood is that every time a file containing data is modified every rule that depends on that file will be re-executed to update their results. In this case, since `download_dataset` updates all the raw data, and you are forcing the rule with the flag `-R` every single rule that depends on those raw files will be executed.
---
## Error `Table XXX doesn't exist` while running the `download_phone_data` or `download_fitbit_data` rule.
???+ failure "Problem"
```bash
Error in .local(conn, statement, ...) :
could not run statement: Table 'db_name.table_name' doesn't exist
Calls: colnames ... .local -> dbSendQuery -> dbSendQuery -> .local -> .Call
Execution halted
```
???+ done "Solution"
Please make sure the sensors listed in `[PHONE_VALID_SENSED_BINS][PHONE_SENSORS]` and the `[TABLE]` of each sensor you activated in `config.yaml` match your database tables.
---
## How do I install RAPIDS on Ubuntu 16.04
???+ done "Solution"
1. Install dependencies (Homebrew - if not installed):
- `sudo apt-get install libmariadb-client-lgpl-dev libxml2-dev libssl-dev`
- Install [brew](https://docs.brew.sh/Homebrew-on-Linux) for linux and add the following line to `~/.bashrc`: `export PATH=$HOME/.linuxbrew/bin:$PATH`
- `source ~/.bashrc`
1. Install MySQL
- `brew install mysql`
- `brew services start mysql`
2. Install R, pandoc and rmarkdown:
- `brew install r`
- `brew install gcc@6` (needed due to this [bug](https://github.com/Homebrew/linuxbrew-core/issues/17812))
- `HOMEBREW_CC=gcc-6 brew install pandoc`
3. Install miniconda using these [instructions](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html)
4. Clone our repo:
- `git clone https://github.com/carissalow/rapids`
5. Create a python virtual environment:
- `cd rapids`
- `conda env create -f environment.yml -n MY_ENV_NAME`
- `conda activate MY_ENV_NAME`
6. Install R packages and virtual environment:
- `snakemake renv_install`
- `snakemake renv_init`
- `snakemake renv_restore`
This step could take several minutes to complete. Please be patient and let it run until completion.
---
## `mysql.h` cannot be found
???+ failure "Problem"
```bash
--------------------------[ ERROR MESSAGE ]----------------------------
<stdin>:1:10: fatal error: mysql.h: No such file or directory
compilation terminated.
-----------------------------------------------------------------------
ERROR: configuration failed for package 'RMySQL'
```
???+ done "Solution"
```bash
sudo apt install libmariadbclient-dev
```
---
## No package `libcurl` found
???+ failure "Problem"
`libcurl` cannot be found
???+ done "Solution"
Install `libcurl`
```bash
sudo apt install libcurl4-openssl-dev
```
---
## Configuration failed because `openssl` was not found.
???+ failure "Problem"
`openssl` cannot be found
???+ done "Solution"
Install `openssl`
```bash
sudo apt install libssl-dev
```
---
## Configuration failed because `libxml-2.0` was not found
???+ failure "Problem"
`libxml-2.0` cannot be found
???+ done "Solution"
Install `libxml-2.0`
```bash
sudo apt install libxml2-dev
```
---
## SSL connection error when running RAPIDS
???+ failure "Problem"
You are getting the following error message when running RAPIDS:
```bash
Error: Failed to connect: SSL connection error: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol.
```
???+ done "Solution"
This is a bug in Ubuntu 20.04 when trying to connect to an old MySQL server with MySQL client 8.0. You should get the same error message if you try to connect from the command line. There you can add the option `--ssl-mode=DISABLED` but we can\'t do this from the R connector.
If you can\'t update your server, the quickest solution would be to import your database to another server or to a local environment. Alternatively, you could replace `mysql-client` and `libmysqlclient-dev` with `mariadb-client` and `libmariadbclient-dev` and reinstall renv. More info about this issue [here](https://bugs.launchpad.net/ubuntu/+source/mysql-8.0/+bug/1872541)
---
## `DB_TABLES` key not found
???+ failure "Problem"
If you get the following error `KeyError in line 43 of preprocessing.smk: 'PHONE_SENSORS'`, it means that the indentation of the key `[PHONE_SENSORS]` is not matching the other child elements of `PHONE_VALID_SENSED_BINS`
???+ done "Solution"
You need to add or remove any leading whitespaces as needed on that line.
```yaml
PHONE_VALID_SENSED_BINS:
COMPUTE: False # This flag is automatically ignored (set to True) if you are extracting PHONE_VALID_SENSED_DAYS or screen or Barnett's location features
BIN_SIZE: &bin_size 5 # (in minutes)
PHONE_SENSORS: []
```
---
## Error while updating your conda environment in Ubuntu
???+ failure "Problem"
You get the following error:
```bash
CondaMultiError: CondaVerificationError: The package for tk located at /home/ubuntu/miniconda2/pkgs/tk-8.6.9-hed695b0_1003
appears to be corrupted. The path 'include/mysqlStubs.h'
specified in the package manifest cannot be found.
ClobberError: This transaction has incompatible packages due to a shared path.
packages: conda-forge/linux-64::llvm-openmp-10.0.0-hc9558a2_0, anaconda/linux-64::intel-openmp-2019.4-243
path: 'lib/libiomp5.so'
```
???+ done "Solution"
Reinstall conda

View File

@ -0,0 +1,191 @@
# Add New Features
!!! hint
We recommend reading the [Behavioral Features Introduction](../feature-introduction/) before reading this page
!!! hint
You won't have to deal with time zones, dates, times, data cleaning or preprocessing. The data that RAPIDS pipes to your feature extraction code is ready to process.
## New Features for Existing Sensors
You can add new features to any existing sensors (see list below) by adding a new provider in three steps:
1. [Modify](#modify-the-configyaml-file) the `config.yaml` file
2. [Create](#create-a-provider-folder-script-and-function) a provider folder, script and function
3. [Implement](#implement-your-feature-extraction-code) your features extraction code
As a tutorial, we will add a new provider for `PHONE_ACCELEROMETER` called `VEGA` that extracts `feature1`, `feature2`, `feature3` in Python and that it requires a parameter from the user called `MY_PARAMETER`.
??? info "Existing Sensors"
An existing sensor is any of the phone or Fitbit sensors with a configuration entry in `config.yaml`:
- Phone Accelerometer
- Phone Activity Recognition
- Phone Applications Foreground
- Phone Battery
- Phone Bluetooth
- Phone Calls
- Phone Conversation
- Phone Data Yield
- Phone Light
- Phone Locations
- Phone Messages
- Phone Screen
- Phone WiFI Connected
- Phone WiFI Visible
- Fitbit Heart Rate Summary
- Fitbit Heart Rate Intraday
- Fitbit Sleep Summary
- Fitbit Steps Summary
- Fitbit Steps Intraday
### Modify the `config.yaml` file
In this step you need to add your provider configuration section under the relevant sensor in `config.yaml`. See our example for our tutorial's `VEGA` provider for `PHONE_ACCELEROMETER`:
??? example "Example configuration for a new accelerometer provider `VEGA`"
```yaml
PHONE_ACCELEROMETER:
TABLE: accelerometer
PROVIDERS:
RAPIDS:
COMPUTE: False
...
PANDA:
COMPUTE: False
...
VEGA:
COMPUTE: False
FEATURES: ["feature1", "feature2", "feature3"]
MY_PARAMTER: a_string
SRC_FOLDER: "vega"
SRC_LANGUAGE: "python"
```
| Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description
|---|---|
|`[COMPUTE]`| Flag to activate/deactivate your provider
|`[FEATURES]`| List of features your provider supports. Your provider code should only return the features on this list
|`[MY_PARAMTER]`| An arbitrary parameter that our example provider `VEGA` needs. This can be a boolean, integer, float, string or an array of any of such types.
|`[SRC_LANGUAGE]`| The programming language of your provider script, it can be `python` or `r`, in our example `python`
|`[SRC_FOLDER]`| The name of your provider in lower case, in our example `vega` (this will be the name of your folder in the next step)
### Create a provider folder, script and function
In this step you need to add a folder, script and function for your provider.
5. Create your provider **folder** under `src/feature/DEVICE_SENSOR/YOUR_PROVIDER`, in our example `src/feature/phone_accelerometer/vega` (same as `[SRC_FOLDER]` in the step above).
6. Create your provider **script** inside your provider folder, it can be a Python file called `main.py` or an R file called `main.R`.
7. Add your provider **function** in your provider script. The name of such function should be `[providername]_features`, in our example `vega_features`
!!! info "Python function"
```python
def [providername]_features(sensor_data_files, time_segment, provider, filter_data_by_segment, *args, **kwargs):
```
!!! info "R function"
```r
[providername]_features <- function(sensor_data, time_segment, provider)
```
### Implement your feature extraction code
The provider function that you created in the step above will receive the following parameters:
| Parameter&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description
|---|---|
|`sensor_data_files`| Path to the CSV file containing the data of a single participant. This data has been cleaned and preprocessed. Your function will be automatically called for each participant in your study (in the `[PIDS]` array in `config.yaml`)
|`time_segment`| The label of the time segment that should be processed.
|`provider`| The parameters you configured for your provider in `config.yaml` will be available in this variable as a dictionary in Python or a list in R. In our example this dictionary contains `{MY_PARAMETER:"a_string"}`
|`filter_data_by_segment`| Python only. A function that you will use to filter your data. In R this function is already available in the environment.
|`*args`| Python only. Not used for now
|`**kwargs`| Python only. Not used for now
The code to extract your behavioral features should be implemented in your provider function and in general terms it will have three stages:
??? info "1. Read a participant's data by loading the CSV data stored in the file pointed by `sensor_data_files`"
``` python
acc_data = pd.read_csv(sensor_data_files["sensor_data"])
```
Note that phone's battery, screen, and activity recognition data is given as episodes instead of event rows (for example, start and end timestamps of the periods the phone screen was on)
??? info "2. Filter your data to process only those rows that belong to `time_segment`"
This step is only one line of code, but to undersand why we need it, keep reading.
```python
acc_data = filter_data_by_segment(acc_data, time_segment)
```
You should use the `filter_data_by_segment()` function to process and group those rows that belong to each of the [time segments RAPIDS could be configured with](../../setup/configuration/#time-segments).
Let's understand the `filter_data_by_segment()` function with an example. A RAPIDS user can extract features on any arbitrary [time segment](../../setup/configuration/#time-segments). A time segment is a period of time that has a label and one or more instances. For example, the user (or you) could have requested features on a daily, weekly, and week-end basis for `p01`. The labels are arbritrary and the instances depend on the days a participant was monitored for:
- the daily segment could be named `my_days` and if `p01` was monitored for 14 days, it would have 14 instances
- the weekly segment could be named `my_weeks` and if `p01` was monitored for 14 days, it would have 2 instances.
- the weekend segment could be named `my_weekends` and if `p01` was monitored for 14 days, it would have 2 instances.
For this example, RAPIDS will call your provider function three times for `p01`, once where `time_segment` is `my_days`, once where `time_segment` is `my_weeks` and once where `time_segment` is `my_weekends`. In this example not every row in `p01`'s data needs to take part in the feature computation for either segment **and** the rows need to be grouped differently.
Thus `filter_data_by_segment()` comes in handy, it will return a data frame that contains the rows that were logged during a time segment plus an extra column called `local_segment`. This new column will have as many unique values as time segment instances exist (14, 2, and 2 for our `p01`'s `my_days`, `my_weeks`, and `my_weekends` examples). After filtering, **you should group the data frame by this column and compute any desired features**, for example:
```python
acc_features["maxmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].max()
```
The reason RAPIDS does not filter the participant's data set for you is because your code might need to compute something based on a participant's complete dataset before computing their features. For example, you might want to identify the number that called a participant the most throughout the study before computing a feature with the number of calls the participant received from this number.
??? info "3. Return a data frame with your features"
After filtering, grouping your data, and computing your features, your provider function should return a data frame that has:
- One row per time segment instance (e.g. 14 our `p01`'s `my_days` example)
- The `local_segment` column added by `filter_data_by_segment()`
- One column per feature. By convention the name of your features should only contain letters or numbers (`feature1`). RAPIDS will automatically add the right sensor and provider prefix (`phone_accelerometr_vega_`)
??? example "`PHONE_ACCELEROMETER` Provider Example"
For your reference, this a short example of our own provider (`RAPIDS`) for `PHONE_ACCELEROMETER` that computes five acceleration features
```python
def rapids_features(sensor_data_files, time_segment, provider, filter_data_by_segment, *args, **kwargs):
acc_data = pd.read_csv(sensor_data_files["sensor_data"])
requested_features = provider["FEATURES"]
# name of the features this function can compute
base_features_names = ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
# the subset of requested features this function can compute
features_to_compute = list(set(requested_features) & set(base_features_names))
acc_features = pd.DataFrame(columns=["local_segment"] + features_to_compute)
if not acc_data.empty:
acc_data = filter_data_by_segment(acc_data, time_segment)
if not acc_data.empty:
acc_features = pd.DataFrame()
# get magnitude related features: magnitude = sqrt(x^2+y^2+z^2)
magnitude = acc_data.apply(lambda row: np.sqrt(row["double_values_0"] ** 2 + row["double_values_1"] ** 2 + row["double_values_2"] ** 2), axis=1)
acc_data = acc_data.assign(magnitude = magnitude.values)
if "maxmagnitude" in features_to_compute:
acc_features["maxmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].max()
if "minmagnitude" in features_to_compute:
acc_features["minmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].min()
if "avgmagnitude" in features_to_compute:
acc_features["avgmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].mean()
if "medianmagnitude" in features_to_compute:
acc_features["medianmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].median()
if "stdmagnitude" in features_to_compute:
acc_features["stdmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].std()
acc_features = acc_features.reset_index()
return acc_features
```
## New Features for Non-Existing Sensors
If you want to add features for a device or a sensor that we do not support at the moment (those that do not appear in the `"Existing Sensors"` list above), [contact us](../../team) or request it on [Slack](http://awareframework.com:3000/) and we can add the necessary code so you can follow the instructions above.

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,57 @@
# Behavioral Features Introduction
Every phone or Fitbit sensor has a corresponding config section in `config.yaml`, these sections follow a similar structure and we'll use `PHONE_ACCELEROMETER` as an example to explain this structure.
!!! hint
- We recommend reading this page if you are using RAPIDS for the first time
- All computed sensor features are stored under `/data/processed/features` on files per sensor, per participant and per study (all participants).
- Every time you change any sensor parameters, provider parameters or provider features, all the necessary files will be updated as soon as you execute RAPIDS.
!!! example "Config section example for `PHONE_ACCELEROMETER`"
```yaml
# 1) Config section
PHONE_ACCELEROMETER:
# 2) Parameters for PHONE_ACCELEROMETER
TABLE: accelerometer
# 3) Providers for PHONE_ACCELEROMETER
PROVIDERS:
# 4) RAPIDS provider
RAPIDS:
# 4.1) Parameters of RAPIDS provider of PHONE_ACCELEROMETER
COMPUTE: False
# 4.2) Features of RAPIDS provider of PHONE_ACCELEROMETER
FEATURES: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
SRC_FOLDER: "rapids" # inside src/features/phone_accelerometer
SRC_LANGUAGE: "python"
# 5) PANDA provider
PANDA:
# 5.1) Parameters of PANDA provider of PHONE_ACCELEROMETER
COMPUTE: False
VALID_SENSED_MINUTES: False
# 5.2) Features of PANDA provider of PHONE_ACCELEROMETER
FEATURES:
exertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
nonexertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
SRC_FOLDER: "panda" # inside src/features/phone_accelerometer
SRC_LANGUAGE: "python"
```
## Sensor Parameters
Each sensor configuration section has a "parameters" subsection (see `#2` in the example). These are parameters that affect different aspects of how the raw data is downloaded, and processed. The `TABLE` parameter exists for every sensor, but some sensors will have extra parameters like [`[PHONE_LOCATIONS]`](../phone-locations/). We explain these parameters in a table at the top of each sensor documentation page.
## Sensor Providers
Each sensor configuration section can have zero, one or more behavioral feature **providers** (see `#3` in the example). A provider is a script created by the core RAPIDS team or other researchers that extracts behavioral features for that sensor. In this example, accelerometer has two providers: RAPIDS (see `#4`) and PANDA (see `#5`).
### Provider Parameters
Each provider has parameters that affect the computation of the behavioral features it offers (see `#4.1` or `#5.1` in the example). These parameters will include at least a `[COMPUTE]` flag that you switch to `True` to extract a provider's behavioral features.
We explain every provider's parameter in a table under the `Parameters description` heading on each provider documentation page.
### Provider Features
Each provider offers a set of behavioral features (see `#4.2` or `#5.2` in the example). For some providers these features are grouped in an array (like those for `RAPIDS` provider in `#4.2`) but for others they are grouped in a collection of arrays depending on the meaning and purpose of those features (like those for `PANDAS` provider in `#5.2`). In either case, you can delete the features you are not interested in and they will not be included in the sensor's output feature file.
We explain each behavioral feature in a table under the `Features description` heading on each provider documentation page.

View File

@ -0,0 +1,72 @@
# Fitbit Heart Rate Intraday
Sensor parameters description for `[FITBIT_HEARTRATE_INTRADAY]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table name or file path where the heart rate intraday data is stored. The configuration keys in [Device Data Source Configuration](../../setup/configuration/#device-data-source-configuration) control whether this parameter is interpreted as table or file.
The format of the column(s) containing the Fitbit sensor data can be `JSON` or `PLAIN_TEXT`. The data in `JSON` format is obtained directly from the Fitbit API. We support `PLAIN_TEXT` in case you already parsed your data and don't have access to your participants' Fitbit accounts anymore. If your data is in `JSON` format then summary and intraday data come packed together.
We provide examples of the input format that RAPIDS expects, note that both examples for `JSON` and `PLAIN_TEXT` are tabular and the actual format difference comes in the `fitbit_data` column (we truncate the `JSON` example for brevity).
??? example "Example of the structure of source data"
=== "JSON"
|device_id |fitbit_data |
|---------------------------------------- |--------------------------------------------------------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"activities-heart":[{"dateTime":"2020-10-07","value":{"customHeartRateZones":[],"heartRateZones":[{"caloriesOut":1200.6102,"max":88,"min":31,"minutes":1058,"name":"Out of Range"},{"caloriesOut":760.3020,"max":120,"min":86,"minutes":366,"name":"Fat Burn"},{"caloriesOut":15.2048,"max":146,"min":120,"minutes":2,"name":"Cardio"},{"caloriesOut":0,"max":221,"min":148,"minutes":0,"name":"Peak"}],"restingHeartRate":72}}],"activities-heart-intraday":{"dataset":[{"time":"00:00:00","value":68},{"time":"00:01:00","value":67},{"time":"00:02:00","value":67},...],"datasetInterval":1,"datasetType":"minute"}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"activities-heart":[{"dateTime":"2020-10-08","value":{"customHeartRateZones":[],"heartRateZones":[{"caloriesOut":1100.1120,"max":89,"min":30,"minutes":921,"name":"Out of Range"},{"caloriesOut":660.0012,"max":118,"min":82,"minutes":361,"name":"Fat Burn"},{"caloriesOut":23.7088,"max":142,"min":108,"minutes":3,"name":"Cardio"},{"caloriesOut":0,"max":221,"min":148,"minutes":0,"name":"Peak"}],"restingHeartRate":70}}],"activities-heart-intraday":{"dataset":[{"time":"00:00:00","value":77},{"time":"00:01:00","value":75},{"time":"00:02:00","value":73},...],"datasetInterval":1,"datasetType":"minute"}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"activities-heart":[{"dateTime":"2020-10-09","value":{"customHeartRateZones":[],"heartRateZones":[{"caloriesOut":750.3615,"max":77,"min":30,"minutes":851,"name":"Out of Range"},{"caloriesOut":734.1516,"max":107,"min":77,"minutes":550,"name":"Fat Burn"},{"caloriesOut":131.8579,"max":130,"min":107,"minutes":29,"name":"Cardio"},{"caloriesOut":0,"max":220,"min":130,"minutes":0,"name":"Peak"}],"restingHeartRate":69}}],"activities-heart-intraday":{"dataset":[{"time":"00:00:00","value":90},{"time":"00:01:00","value":89},{"time":"00:02:00","value":88},...],"datasetInterval":1,"datasetType":"minute"}}
=== "PLAIN_TEXT"
|device_id |local_date_time |heartrate |heartrate_zone |
|-------------------------------------- |---------------------- |--------- |--------------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 00:00:00 |68 |outofrange |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 00:01:00 |67 |outofrange |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 00:02:00 |67 |outofrange |
## RAPIDS provider
!!! info "Available time segments"
- Available for all time segments
!!! info "File Sequence"
```bash
- data/raw/{pid}/fitbit_heartrate_intraday_raw.csv
- data/raw/{pid}/fitbit_heartrate_intraday_parsed.csv
- data/raw/{pid}/fitbit_heartrate_intraday_parsed_with_datetime.csv
- data/interim/{pid}/fitbit_heartrate_intraday_features/fitbit_heartrate_intraday_{language}_{provider_key}.csv
- data/processed/features/{pid}/fitbit_heartrate_intraday.csv
```
Parameters description for `[FITBIT_HEARTRATE_INTRADAY][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]` | Set to `True` to extract `FITBIT_HEARTRATE_INTRADAY` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed from heart rate intraday data, see table below |
Features description for `[FITBIT_HEARTRATE_INTRADAY][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |-------------- |---------------------------|
|maxhr |beats/mins |The maximum heart rate during a time segment.
|minhr |beats/mins |The minimum heart rate during a time segment.
|avghr |beats/mins |The average heart rate during a time segment.
|medianhr |beats/mins |The median of heart rate during a time segment.
|modehr |beats/mins |The mode of heart rate during a time segment.
|stdhr |beats/mins |The standard deviation of heart rate during a time segment.
|diffmaxmodehr |beats/mins |The difference between the maximum and mode heart rate during a time segment.
|diffminmodehr |beats/mins |The difference between the mode and minimum heart rate during a time segment.
|entropyhr |nats |Shannons entropy measurement based on heart rate during a time segment.
|minutesonZONE |minutes |Number of minutes the users heart rate fell within each `heartrate_zone` during a time segment.
!!! note "Assumptions/Observations"
1. There are four heart rate zones (ZONE): ``outofrange``, ``fatburn``, ``cardio``, and ``peak``. Please refer to [Fitbit documentation](https://help.fitbit.com/articles/en_US/Help_article/1565.htm) for more information about the way they are computed.

View File

@ -0,0 +1,80 @@
# Fitbit Heart Rate Summary
Sensor parameters description for `[FITBIT_HEARTRATE_SUMMARY]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table name or file path where the heart rate summary data is stored. The configuration keys in [Device Data Source Configuration](../../setup/configuration/#device-data-source-configuration) control whether this parameter is interpreted as table or file.
The format of the column(s) containing the Fitbit sensor data can be `JSON` or `PLAIN_TEXT`. The data in `JSON` format is obtained directly from the Fitbit API. We support `PLAIN_TEXT` in case you already parsed your data and don't have access to your participants' Fitbit accounts anymore. If your data is in `JSON` format then summary and intraday data come packed together.
We provide examples of the input format that RAPIDS expects, note that both examples for `JSON` and `PLAIN_TEXT` are tabular and the actual format difference comes in the `fitbit_data` column (we truncate the `JSON` example for brevity).
??? example "Example of the structure of source data"
=== "JSON"
|device_id |fitbit_data |
|---------------------------------------- |--------------------------------------------------------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"activities-heart":[{"dateTime":"2020-10-07","value":{"customHeartRateZones":[],"heartRateZones":[{"caloriesOut":1200.6102,"max":88,"min":31,"minutes":1058,"name":"Out of Range"},{"caloriesOut":760.3020,"max":120,"min":86,"minutes":366,"name":"Fat Burn"},{"caloriesOut":15.2048,"max":146,"min":120,"minutes":2,"name":"Cardio"},{"caloriesOut":0,"max":221,"min":148,"minutes":0,"name":"Peak"}],"restingHeartRate":72}}],"activities-heart-intraday":{"dataset":[{"time":"00:00:00","value":68},{"time":"00:01:00","value":67},{"time":"00:02:00","value":67},...],"datasetInterval":1,"datasetType":"minute"}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"activities-heart":[{"dateTime":"2020-10-08","value":{"customHeartRateZones":[],"heartRateZones":[{"caloriesOut":1100.1120,"max":89,"min":30,"minutes":921,"name":"Out of Range"},{"caloriesOut":660.0012,"max":118,"min":82,"minutes":361,"name":"Fat Burn"},{"caloriesOut":23.7088,"max":142,"min":108,"minutes":3,"name":"Cardio"},{"caloriesOut":0,"max":221,"min":148,"minutes":0,"name":"Peak"}],"restingHeartRate":70}}],"activities-heart-intraday":{"dataset":[{"time":"00:00:00","value":77},{"time":"00:01:00","value":75},{"time":"00:02:00","value":73},...],"datasetInterval":1,"datasetType":"minute"}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"activities-heart":[{"dateTime":"2020-10-09","value":{"customHeartRateZones":[],"heartRateZones":[{"caloriesOut":750.3615,"max":77,"min":30,"minutes":851,"name":"Out of Range"},{"caloriesOut":734.1516,"max":107,"min":77,"minutes":550,"name":"Fat Burn"},{"caloriesOut":131.8579,"max":130,"min":107,"minutes":29,"name":"Cardio"},{"caloriesOut":0,"max":220,"min":130,"minutes":0,"name":"Peak"}],"restingHeartRate":69}}],"activities-heart-intraday":{"dataset":[{"time":"00:00:00","value":90},{"time":"00:01:00","value":89},{"time":"00:02:00","value":88},...],"datasetInterval":1,"datasetType":"minute"}}
=== "PLAIN_TEXT"
|device_id |local_date_time |heartrate_daily_restinghr |heartrate_daily_caloriesoutofrange |heartrate_daily_caloriesfatburn |heartrate_daily_caloriescardio |heartrate_daily_caloriespeak |
|-------------------------------------- |----------------- |------- |-------------- |------------- |------------ |-------|
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 |72 |1200.6102 |760.3020 |15.2048 |0 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-08 |70 |1100.1120 |660.0012 |23.7088 |0 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-09 |69 |750.3615 |734.1516 |131.8579 |0 |
## RAPIDS provider
!!! info "Available time segments"
- Only available for segments that span 1 or more complete days (e.g. Jan 1st 00:00 to Jan 3rd 23:59)
!!! info "File Sequence"
```bash
- data/raw/{pid}/fitbit_heartrate_summary_raw.csv
- data/raw/{pid}/fitbit_heartrate_summary_parsed.csv
- data/raw/{pid}/fitbit_heartrate_summary_parsed_with_datetime.csv
- data/interim/{pid}/fitbit_heartrate_summary_features/fitbit_heartrate_summary_{language}_{provider_key}.csv
- data/processed/features/{pid}/fitbit_heartrate_summary.csv
```
Parameters description for `[FITBIT_HEARTRATE_SUMMARY][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]` | Set to `True` to extract `FITBIT_HEARTRATE_SUMMARY` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed from heart rate summary data, see table below |
Features description for `[FITBIT_HEARTRATE_SUMMARY][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|maxrestinghr |beats/mins |The maximum daily resting heart rate during a time segment.
|minrestinghr |beats/mins |The minimum daily resting heart rate during a time segment.
|avgrestinghr |beats/mins |The average daily resting heart rate during a time segment.
|medianrestinghr |beats/mins |The median of daily resting heart rate during a time segment.
|moderestinghr |beats/mins |The mode of daily resting heart rate during a time segment.
|stdrestinghr |beats/mins |The standard deviation of daily resting heart rate during a time segment.
|diffmaxmoderestinghr |beats/mins |The difference between the maximum and mode daily resting heart rate during a time segment.
|diffminmoderestinghr |beats/mins |The difference between the mode and minimum daily resting heart rate during a time segment.
|entropyrestinghr |nats |Shannons entropy measurement based on daily resting heart rate during a time segment.
|sumcaloriesZONE |cals |The total daily calories burned within `heartrate_zone` during a time segment.
|maxcaloriesZONE |cals |The maximum daily calories burned within `heartrate_zone` during a time segment.
|mincaloriesZONE |cals |The minimum daily calories burned within `heartrate_zone` during a time segment.
|avgcaloriesZONE |cals |The average daily calories burned within `heartrate_zone` during a time segment.
|mediancaloriesZONE |cals |The median of daily calories burned within `heartrate_zone` during a time segment.
|stdcaloriesZONE |cals |The standard deviation of daily calories burned within `heartrate_zone` during a time segment.
|entropycaloriesZONE |nats |Shannons entropy measurement based on daily calories burned within `heartrate_zone` during a time segment.
!!! note "Assumptions/Observations"
1. There are four heart rate zones (ZONE): ``outofrange``, ``fatburn``, ``cardio``, and ``peak``. Please refer to [Fitbit documentation](https://help.fitbit.com/articles/en_US/Help_article/1565.htm) for more information about the way they are computed.
2. Calories' accuracy depends on the users Fitbit profile (weight, height, etc.).

View File

@ -0,0 +1,102 @@
# Fitbit Sleep Summary
Sensor parameters description for `[FITBIT_SLEEP_SUMMARY]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table name or file path where the sleep summary data is stored. The configuration keys in [Device Data Source Configuration](../../setup/configuration/#device-data-source-configuration) control whether this parameter is interpreted as table or file.
The format of the column(s) containing the Fitbit sensor data can be `JSON` or `PLAIN_TEXT`. The data in `JSON` format is obtained directly from the Fitbit API. We support `PLAIN_TEXT` in case you already parsed your data and don't have access to your participants' Fitbit accounts anymore. If your data is in `JSON` format then summary and intraday data come packed together.
We provide examples of the input format that RAPIDS expects, note that both examples for `JSON` and `PLAIN_TEXT` are tabular and the actual format difference comes in the `fitbit_data` column (we truncate the `JSON` example for brevity).
??? example "Example of the structure of source data with Fitbits sleep API Version 1"
=== "JSON"
|device_id |fitbit_data |
|---------------------------------------- |--------------------------------------------------------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"sleep": [{"awakeCount": 2, "awakeDuration": 3, "awakeningsCount": 10, "dateOfSleep": "2020-10-07", "duration": 8100000, "efficiency": 91, "endTime": "2020-10-07T18:10:00.000", "isMainSleep": true, "logId": 14147921940, "minuteData": [{"dateTime": "15:55:00", "value": "3"}, {"dateTime": "15:56:00", "value": "3"}, {"dateTime": "15:57:00", "value": "2"},...], "minutesAfterWakeup": 0, "minutesAsleep": 123, "minutesAwake": 12, "minutesToFallAsleep": 0, "restlessCount": 8, "restlessDuration": 9, "startTime": "2020-10-07T15:55:00.000", "timeInBed": 135}, {"awakeCount": 0, "awakeDuration": 0, "awakeningsCount": 1, "dateOfSleep": "2020-10-07", "duration": 3780000, "efficiency": 100, "endTime": "2020-10-07T10:52:30.000", "isMainSleep": false, "logId": 14144903977, "minuteData": [{"dateTime": "09:49:00", "value": "1"}, {"dateTime": "09:50:00", "value": "1"}, {"dateTime": "09:51:00", "value": "1"},...], "minutesAfterWakeup": 1, "minutesAsleep": 62, "minutesAwake": 0, "minutesToFallAsleep": 0, "restlessCount": 1, "restlessDuration": 1, "startTime": "2020-10-07T09:49:00.000", "timeInBed": 63}], "summary": {"totalMinutesAsleep": 185, "totalSleepRecords": 2, "totalTimeInBed": 198}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"sleep": [{"awakeCount": 3, "awakeDuration": 21, "awakeningsCount": 16, "dateOfSleep": "2020-10-08", "duration": 19260000, "efficiency": 89, "endTime": "2020-10-08T06:01:30.000", "isMainSleep": true, "logId": 14150613895, "minuteData": [{"dateTime": "00:40:00", "value": "3"}, {"dateTime": "00:41:00", "value": "3"}, {"dateTime": "00:42:00", "value": "3"},...], "minutesAfterWakeup": 0, "minutesAsleep": 275, "minutesAwake": 33, "minutesToFallAsleep": 0, "restlessCount": 13, "restlessDuration": 25, "startTime": "2020-10-08T00:40:00.000", "timeInBed": 321}], "summary": {"totalMinutesAsleep": 275, "totalSleepRecords": 1, "totalTimeInBed": 321}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"sleep": [{"awakeCount": 1, "awakeDuration": 3, "awakeningsCount": 8, "dateOfSleep": "2020-10-09", "duration": 19320000, "efficiency": 96, "endTime": "2020-10-09T05:57:30.000", "isMainSleep": true, "logId": 14161136803, "minuteData": [{"dateTime": "00:35:30", "value": "2"}, {"dateTime": "00:36:30", "value": "1"}, {"dateTime": "00:37:30", "value": "1"},...], "minutesAfterWakeup": 0, "minutesAsleep": 309, "minutesAwake": 13, "minutesToFallAsleep": 0, "restlessCount": 7, "restlessDuration": 10, "startTime": "2020-10-09T00:35:30.000", "timeInBed": 322}], "summary": {"totalMinutesAsleep": 309, "totalSleepRecords": 1, "totalTimeInBed": 322}}
=== "PLAIN_TEXT"
|device_id |local_start_date_time |local_end_date_time |efficiency |minutes_after_wakeup |minutes_asleep |minutes_awake |minutes_to_fall_asleep |minutes_in_bed |is_main_sleep |type |count_awake |duration_awake |count_awakenings |count_restless |duration_restless |
|-------------------------------------- |---------------------- |---------------------- |----------- |--------------------- |--------------- |-------------- |----------------------- |--------------- |-------------- |-------- |----------- |--------------- |----------------- |--------------- |------------------ |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 15:55:00 |2020-10-07 18:10:00 |91 |0 |123 |12 |0 |135 |1 |classic |2 |3 |10 |8 |9 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 09:49:00 |2020-10-07 10:52:30 |100 |1 |62 |0 |0 |63 |0 |classic |0 |0 |1 |1 |1 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-08 00:40:00 |2020-10-08 06:01:30 |89 |0 |275 |33 |0 |321 |1 |classic |3 |21 |16 |13 |25 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-09 00:35:30 |2020-10-09 05:57:30 |96 |0 |309 |13 |0 |322 |1 |classic |1 |3 |8 |7 |10 |
??? example "Example of the structure of source data with Fitbits sleep API Version 1.2"
=== "JSON"
|device_id |fitbit_data |
|---------------------------------------- |--------------------------------------------------------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"sleep":[{"dateOfSleep":"2020-10-10","duration":3600000,"efficiency":92,"endTime":"2020-10-10T16:37:00.000","infoCode":2,"isMainSleep":false,"levels":{"data":[{"dateTime":"2020-10-10T15:36:30.000","level":"restless","seconds":60},{"dateTime":"2020-10-10T15:37:30.000","level":"asleep","seconds":660},{"dateTime":"2020-10-10T15:48:30.000","level":"restless","seconds":60},...], "summary":{"asleep":{"count":0,"minutes":56},"awake":{"count":0,"minutes":0},"restless":{"count":3,"minutes":4}}},"logId":26315914306,"minutesAfterWakeup":0,"minutesAsleep":55,"minutesAwake":5,"minutesToFallAsleep":0,"startTime":"2020-10-10T15:36:30.000","timeInBed":60,"type":"classic"},{"dateOfSleep":"2020-10-10","duration":22980000,"efficiency":88,"endTime":"2020-10-10T08:10:00.000","infoCode":0,"isMainSleep":true,"levels":{"data":[{"dateTime":"2020-10-10T01:46:30.000","level":"light","seconds":420},{"dateTime":"2020-10-10T01:53:30.000","level":"deep","seconds":1230},{"dateTime":"2020-10-10T02:14:00.000","level":"light","seconds":360},...], "summary":{"deep":{"count":3,"minutes":92,"thirtyDayAvgMinutes":0},"light":{"count":29,"minutes":193,"thirtyDayAvgMinutes":0},"rem":{"count":4,"minutes":33,"thirtyDayAvgMinutes":0},"wake":{"count":28,"minutes":65,"thirtyDayAvgMinutes":0}}},"logId":26311786557,"minutesAfterWakeup":0,"minutesAsleep":318,"minutesAwake":65,"minutesToFallAsleep":0,"startTime":"2020-10-10T01:46:30.000","timeInBed":383,"type":"stages"}],"summary":{"stages":{"deep":92,"light":193,"rem":33,"wake":65},"totalMinutesAsleep":373,"totalSleepRecords":2,"totalTimeInBed":443}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"sleep":[{"dateOfSleep":"2020-10-11","duration":41640000,"efficiency":89,"endTime":"2020-10-11T11:47:00.000","infoCode":0,"isMainSleep":true,"levels":{"data":[{"dateTime":"2020-10-11T00:12:30.000","level":"wake","seconds":450},{"dateTime":"2020-10-11T00:20:00.000","level":"light","seconds":870},{"dateTime":"2020-10-11T00:34:30.000","level":"wake","seconds":780},...], "summary":{"deep":{"count":4,"minutes":52,"thirtyDayAvgMinutes":62},"light":{"count":32,"minutes":442,"thirtyDayAvgMinutes":364},"rem":{"count":6,"minutes":68,"thirtyDayAvgMinutes":58},"wake":{"count":29,"minutes":132,"thirtyDayAvgMinutes":94}}},"logId":26589710670,"minutesAfterWakeup":1,"minutesAsleep":562,"minutesAwake":132,"minutesToFallAsleep":0,"startTime":"2020-10-11T00:12:30.000","timeInBed":694,"type":"stages"}],"summary":{"stages":{"deep":52,"light":442,"rem":68,"wake":132},"totalMinutesAsleep":562,"totalSleepRecords":1,"totalTimeInBed":694}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"sleep":[{"dateOfSleep":"2020-10-12","duration":28980000,"efficiency":93,"endTime":"2020-10-12T09:34:30.000","infoCode":0,"isMainSleep":true,"levels":{"data":[{"dateTime":"2020-10-12T01:31:00.000","level":"wake","seconds":600},{"dateTime":"2020-10-12T01:41:00.000","level":"light","seconds":60},{"dateTime":"2020-10-12T01:42:00.000","level":"deep","seconds":2340},...], "summary":{"deep":{"count":4,"minutes":63,"thirtyDayAvgMinutes":59},"light":{"count":27,"minutes":257,"thirtyDayAvgMinutes":364},"rem":{"count":5,"minutes":94,"thirtyDayAvgMinutes":58},"wake":{"count":24,"minutes":69,"thirtyDayAvgMinutes":95}}},"logId":26589710673,"minutesAfterWakeup":0,"minutesAsleep":415,"minutesAwake":68,"minutesToFallAsleep":0,"startTime":"2020-10-12T01:31:00.000","timeInBed":483,"type":"stages"}],"summary":{"stages":{"deep":63,"light":257,"rem":94,"wake":69},"totalMinutesAsleep":415,"totalSleepRecords":1,"totalTimeInBed":483}}
=== "PLAIN_TEXT"
|device_id |local_start_date_time |local_end_date_time |efficiency |minutes_after_wakeup |minutes_asleep |minutes_awake |minutes_to_fall_asleep |minutes_in_bed |is_main_sleep |type |
|-------------------------------------- |---------------------- |---------------------- |----------- |--------------------- |--------------- |-------------- |----------------------- |--------------- |-------------- |-------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-10 15:36:30 |2020-10-10 16:37:00 |92 |0 |55 |5 |0 |60 |0 |classic |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-10 01:46:30 |2020-10-10 08:10:00 |88 |0 |318 |65 |0 |383 |1 |stages |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-11 00:12:30 |2020-10-11 11:47:00 |89 |1 |562 |132 |0 |694 |1 |stages |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-12 01:31:00 |2020-10-12 09:34:30 |93 |0 |415 |68 |0 |483 |1 |stages |
## RAPIDS provider
!!! info "Available time segments"
- Only available for segments that span 1 or more complete days (e.g. Jan 1st 00:00 to Jan 3rd 23:59)
!!! info "File Sequence"
```bash
- data/raw/{pid}/fitbit_sleep_summary_raw.csv
- data/raw/{pid}/fitbit_sleep_summary_parsed.csv
- data/raw/{pid}/fitbit_sleep_summary_parsed_with_datetime.csv
- data/interim/{pid}/fitbit_sleep_summary_features/fitbit_sleep_summary_{language}_{provider_key}.csv
- data/processed/features/{pid}/fitbit_sleep_summary.csv
```
Parameters description for `[FITBIT_SLEEP_SUMMARY][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]` | Set to `True` to extract `FITBIT_SLEEP_SUMMARY` features from the `RAPIDS` provider |
|`[SLEEP_TYPES]` | Types of sleep to be included in the feature extraction computation. Fitbit provides 3 types of sleep: `main`, `nap`, `all`. |
|`[FEATURES]` | Features to be computed from sleep summary data, see table below |
Features description for `[FITBIT_SLEEP_SUMMARY][PROVIDERS][RAPIDS]`:
|Feature |Units |Description |
|------------------------------ |---------- |-------------------------------------------- |
|countepisodeTYPE |episodes |Number of sleep episodes for a certain sleep type during a time segment.
|avgefficiencyTYPE |scores |Average sleep efficiency for a certain sleep type during a time segment.
|sumdurationafterwakeupTYPE |minutes |Total duration the user stayed in bed after waking up for a certain sleep type during a time segment.
|sumdurationasleepTYPE |minutes |Total sleep duration for a certain sleep type during a time segment.
|sumdurationawakeTYPE |minutes |Total duration the user stayed awake but still in bed for a certain sleep type during a time segment.
|sumdurationtofallasleepTYPE |minutes |Total duration the user spent to fall asleep for a certain sleep type during a time segment.
|sumdurationinbedTYPE |minutes |Total duration the user stayed in bed (sumdurationtofallasleep + sumdurationawake + sumdurationasleep + sumdurationafterwakeup) for a certain sleep type during a time segment.
|avgdurationafterwakeupTYPE |minutes |Average duration the user stayed in bed after waking up for a certain sleep type during a time segment.
|avgdurationasleepTYPE |minutes |Average sleep duration for a certain sleep type during a time segment.
|avgdurationawakeTYPE |minutes |Average duration the user stayed awake but still in bed for a certain sleep type during a time segment.
|avgdurationtofallasleepTYPE |minutes |Average duration the user spent to fall asleep for a certain sleep type during a time segment.
|avgdurationinbedTYPE |minutes |Average duration the user stayed in bed (sumdurationtofallasleep + sumdurationawake + sumdurationasleep + sumdurationafterwakeup) for a certain sleep type during a time segment.
!!! note "Assumptions/Observations"
1. There are three sleep types (TYPE): `main`, `nap`, `all`. The `all` type contains both main sleep and naps.
2. There are two versions of Fitbits sleep API ([version 1](https://dev.fitbit.com/build/reference/web-api/sleep-v1/) and [version 1.2](https://dev.fitbit.com/build/reference/web-api/sleep/)), and each provides raw sleep data in a different format:
- _Count & duration summaries_. `v1` contains `count_awake`, `duration_awake`, `count_awakenings`, `count_restless`, and `duration_restless` fields for every sleep record but `v1.2` does not.
3. _API columns_. Features are computed based on the values provided by Fitbits API: `efficiency`, `minutes_after_wakeup`, `minutes_asleep`, `minutes_awake`, `minutes_to_fall_asleep`, `minutes_in_bed`, `is_main_sleep` and `type`.

View File

@ -0,0 +1,82 @@
# Fitbit Steps Intraday
Sensor parameters description for `[FITBIT_STEPS_INTRADAY]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table name or file path where the steps intraday data is stored. The configuration keys in [Device Data Source Configuration](../../setup/configuration/#device-data-source-configuration) control whether this parameter is interpreted as table or file.
The format of the column(s) containing the Fitbit sensor data can be `JSON` or `PLAIN_TEXT`. The data in `JSON` format is obtained directly from the Fitbit API. We support `PLAIN_TEXT` in case you already parsed your data and don't have access to your participants' Fitbit accounts anymore. If your data is in `JSON` format then summary and intraday data come packed together.
We provide examples of the input format that RAPIDS expects, note that both examples for `JSON` and `PLAIN_TEXT` are tabular and the actual format difference comes in the `fitbit_data` column (we truncate the `JSON` example for brevity).
??? example "Example of the structure of source data"
=== "JSON"
|device_id |fitbit_data |
|---------------------------------------- |--------------------------------------------------------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |"activities-steps":[{"dateTime":"2020-10-07","value":"1775"}],"activities-steps-intraday":{"dataset":[{"time":"00:00:00","value":5},{"time":"00:01:00","value":3},{"time":"00:02:00","value":0},...],"datasetInterval":1,"datasetType":"minute"}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |"activities-steps":[{"dateTime":"2020-10-08","value":"3201"}],"activities-steps-intraday":{"dataset":[{"time":"00:00:00","value":14},{"time":"00:01:00","value":11},{"time":"00:02:00","value":10},...],"datasetInterval":1,"datasetType":"minute"}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |"activities-steps":[{"dateTime":"2020-10-09","value":"998"}],"activities-steps-intraday":{"dataset":[{"time":"00:00:00","value":0},{"time":"00:01:00","value":0},{"time":"00:02:00","value":0},...],"datasetInterval":1,"datasetType":"minute"}}
=== "PLAIN_TEXT"
|device_id |local_date_time |steps |
|-------------------------------------- |---------------------- |--------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 00:00:00 |5 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 00:01:00 |3 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 00:02:00 |0 |
## RAPIDS provider
!!! info "Available time segments"
- Available for all time segments
!!! info "File Sequence"
```bash
- data/raw/{pid}/fitbit_steps_intraday_raw.csv
- data/raw/{pid}/fitbit_steps_intraday_parsed.csv
- data/raw/{pid}/fitbit_steps_intraday_parsed_with_datetime.csv
- data/interim/{pid}/fitbit_steps_intraday_features/fitbit_steps_intraday_{language}_{provider_key}.csv
- data/processed/features/{pid}/fitbit_steps_intraday.csv
```
Parameters description for `[FITBIT_STEPS_INTRADAY][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]` | Set to `True` to extract `FITBIT_STEPS_INTRADAY` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed from steps intraday data, see table below |
|`[THRESHOLD_ACTIVE_BOUT]` | Every minute with Fitbit steps data wil be labelled as `sedentary` if its step count is below this threshold, otherwise, `active`. |
|`[INCLUDE_ZERO_STEP_ROWS]` | Whether or not to include time segments with a 0 step count during the whole day. |
Features description for `[FITBIT_STEPS_INTRADAY][PROVIDERS][RAPIDS]`:
|Feature |Units |Description |
|-------------------------- |-------------- |-------------------------------------------------------------|
|sumsteps |steps |The total step count during a time segment.
|maxsteps |steps |The maximum step count during a time segment.
|minsteps |steps |The minimum step count during a time segment.
|avgsteps |steps |The average step count during a time segment.
|stdsteps |steps |The standard deviation of step count during a time segment.
|countepisodesedentarybout |bouts |Number of sedentary bouts during a time segment.
|sumdurationsedentarybout |minutes |Total duration of all sedentary bouts during a time segment.
|maxdurationsedentarybout |minutes |The maximum duration of any sedentary bout during a time segment.
|mindurationsedentarybout |minutes |The minimum duration of any sedentary bout during a time segment.
|avgdurationsedentarybout |minutes |The average duration of sedentary bouts during a time segment.
|stddurationsedentarybout |minutes |The standard deviation of the duration of sedentary bouts during a time segment.
|countepisodeactivebout |bouts |Number of active bouts during a time segment.
|sumdurationactivebout |minutes |Total duration of all active bouts during a time segment.
|maxdurationactivebout |minutes |The maximum duration of any active bout during a time segment.
|mindurationactivebout |minutes |The minimum duration of any active bout during a time segment.
|avgdurationactivebout |minutes |The average duration of active bouts during a time segment.
|stddurationactivebout |minutes |The standard deviation of the duration of active bouts during a time segment.
!!! note "Assumptions/Observations"
1. _Active and sedentary bouts_. If the step count per minute is smaller than `THRESHOLD_ACTIVE_BOUT` (default value is 10), that minute is labelled as sedentary, otherwise, is labelled as active. Active and sedentary bouts are periods of consecutive minutes labelled as `active` or `sedentary`.

View File

@ -0,0 +1,67 @@
# Fitbit Steps Summary
Sensor parameters description for `[FITBIT_STEPS_SUMMARY]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table name or file path where the steps summary data is stored. The configuration keys in [Device Data Source Configuration](../../setup/configuration/#device-data-source-configuration) control whether this parameter is interpreted as table or file.
The format of the column(s) containing the Fitbit sensor data can be `JSON` or `PLAIN_TEXT`. The data in `JSON` format is obtained directly from the Fitbit API. We support `PLAIN_TEXT` in case you already parsed your data and don't have access to your participants' Fitbit accounts anymore. If your data is in `JSON` format then summary and intraday data come packed together.
We provide examples of the input format that RAPIDS expects, note that both examples for `JSON` and `PLAIN_TEXT` are tabular and the actual format difference comes in the `fitbit_data` column (we truncate the `JSON` example for brevity).
??? example "Example of the structure of source data"
=== "JSON"
|device_id |fitbit_data |
|---------------------------------------- |--------------------------------------------------------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |"activities-steps":[{"dateTime":"2020-10-07","value":"1775"}],"activities-steps-intraday":{"dataset":[{"time":"00:00:00","value":5},{"time":"00:01:00","value":3},{"time":"00:02:00","value":0},...],"datasetInterval":1,"datasetType":"minute"}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |"activities-steps":[{"dateTime":"2020-10-08","value":"3201"}],"activities-steps-intraday":{"dataset":[{"time":"00:00:00","value":14},{"time":"00:01:00","value":11},{"time":"00:02:00","value":10},...],"datasetInterval":1,"datasetType":"minute"}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |"activities-steps":[{"dateTime":"2020-10-09","value":"998"}],"activities-steps-intraday":{"dataset":[{"time":"00:00:00","value":0},{"time":"00:01:00","value":0},{"time":"00:02:00","value":0},...],"datasetInterval":1,"datasetType":"minute"}}
=== "PLAIN_TEXT"
|device_id |local_date_time |steps |
|-------------------------------------- |---------------------- |--------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 |1775 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-08 |3201 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-09 |998 |
## RAPIDS provider
!!! info "Available time segments"
- Only available for segments that span 1 or more complete days (e.g. Jan 1st 00:00 to Jan 3rd 23:59)
!!! info "File Sequence"
```bash
- data/raw/{pid}/fitbit_steps_summary_raw.csv
- data/raw/{pid}/fitbit_steps_summary_parsed.csv
- data/raw/{pid}/fitbit_steps_summary_parsed_with_datetime.csv
- data/interim/{pid}/fitbit_steps_summary_features/fitbit_steps_summary_{language}_{provider_key}.csv
- data/processed/features/{pid}/fitbit_steps_summary.csv
```
Parameters description for `[FITBIT_STEPS_SUMMARY][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]` | Set to `True` to extract `FITBIT_STEPS_SUMMARY` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed from steps summary data, see table below |
Features description for `[FITBIT_STEPS_SUMMARY][PROVIDERS][RAPIDS]`:
|Feature |Units |Description |
|-------------------------- |---------- |-------------------------------------------- |
|maxsumsteps |steps |The maximum daily step count during a time segment.
|minsumsteps |steps |The minimum daily step count during a time segment.
|avgsumsteps |steps |The average daily step count during a time segment.
|mediansumsteps |steps |The median of daily step count during a time segment.
|stdsumsteps |steps |The standard deviation of daily step count during a time segment.
!!! note "Assumptions/Observations"
NA

View File

@ -0,0 +1,83 @@
# Phone Accelerometer
Sensor parameters description for `[PHONE_ACCELEROMETER]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the accelerometer data is stored
## RAPIDS provider
!!! info "Available time segments and platforms"
- Available for all time segments
- Available for Android and iOS
!!! info "File Sequence"
```bash
- data/raw/{pid}/phone_accelerometer_raw.csv
- data/raw/{pid}/phone_accelerometer_with_datetime.csv
- data/interim/{pid}/phone_accelerometer_features/phone_accelerometer_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_accelerometer.csv
```
Parameters description for `[PHONE_ACCELEROMETER][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_ACCELEROMETER` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed, see table below
Features description for `[PHONE_ACCELEROMETER][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|maxmagnitude |m/s^2^ |The maximum magnitude of acceleration ($\|acceleration\| = \sqrt{x^2 + y^2 + z^2}$).
|minmagnitude |m/s^2^ |The minimum magnitude of acceleration.
|avgmagnitude |m/s^2^ |The average magnitude of acceleration.
|medianmagnitude |m/s^2^ |The median magnitude of acceleration.
|stdmagnitude |m/s^2^ |The standard deviation of acceleration.
!!! note "Assumptions/Observations"
1. Analyzing accelerometer data is a memory intensive task. If RAPIDS crashes is likely because the accelerometer dataset for a participant is to big to fit in memory. We are considering different alternatives to overcome this problem.
## PANDA provider
These features are based on the work by [Panda et al](../../citation#panda-accelerometer).
!!! info "Available time segments and platforms"
- Available for all time segments
- Available for Android and iOS
!!! info "File Sequence"
```bash
- data/raw/{pid}/phone_accelerometer_raw.csv
- data/raw/{pid}/phone_accelerometer_with_datetime.csv
- data/interim/{pid}/phone_accelerometer_features/phone_accelerometer_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_accelerometer.csv
```
Parameters description for `[PHONE_ACCELEROMETER][PROVIDERS][PANDA]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_ACCELEROMETER` features from the `PANDA` provider|
|`[FEATURES]` | Features to be computed for exertional and non-exertional activity episodes, see table below
Features description for `[PHONE_ACCELEROMETER][PROVIDERS][PANDA]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
| sumduration | minutes | Total duration of all exertional or non-exertional activity episodes. |
| maxduration | minutes | Longest duration of any exertional or non-exertional activity episode. |
| minduration | minutes | Shortest duration of any exertional or non-exertional activity episode. |
| avgduration | minutes | Average duration of any exertional or non-exertional activity episode. |
| medianduration | minutes | Median duration of any exertional or non-exertional activity episode. |
| stdduration | minutes | Standard deviation of the duration of all exertional or non-exertional activity episodes. |
!!! note "Assumptions/Observations"
1. Analyzing accelerometer data is a memory intensive task. If RAPIDS crashes is likely because the accelerometer dataset for a participant is to big to fit in memory. We are considering different alternatives to overcome this problem.
2. See [Panda et al](../../citation#panda-accelerometer) for a definition of exertional and non-exertional activity episodes

View File

@ -0,0 +1,64 @@
# Phone Activity Recognition
Sensor parameters description for `[PHONE_ACTIVITY_RECOGNITION]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE][ANDROID]`| Database table where the activity data from Android devices is stored (the AWARE client saves this data on different tables for Android and iOS)
|`[TABLE][IOS]`| Database table where the activity data from iOS devices is stored (the AWARE client saves this data on different tables for Android and iOS)
|`[EPISODE_THRESHOLD_BETWEEN_ROWS]` | Difference in minutes between any two rows for them to be considered part of the same activity episode
## RAPIDS provider
!!! info "Available time segments and platforms"
- Available for all time segments
- Available for Android and iOS
!!! info "File Sequence"
```bash
- data/raw/{pid}/phone_activity_recognition_raw.csv
- data/raw/{pid}/phone_activity_recognition_with_datetime.csv
- data/raw/{pid}/phone_activity_recognition_with_datetime_unified.csv
- data/interim/{pid}/phone_activity_recognition_episodes.csv
- data/interim/{pid}/phone_activity_recognition_episodes_resampled.csv
- data/interim/{pid}/phone_activity_recognition_episodes_resampled_with_datetime.csv
- data/interim/{pid}/phone_activity_recognition_features/phone_activity_recognition_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_activity_recognition.csv
```
Parameters description for `[PHONE_ACTIVITY_RECOGNITION][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_ACTIVITY_RECOGNITION` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed, see table below
|`[ACTIVITY_CLASSES][STATIONARY]` | An array of the activity labels to be considered in the `STATIONARY` category choose any of `still`, `tilting`
|`[ACTIVITY_CLASSES][MOBILE]` | An array of the activity labels to be considered in the `MOBILE` category choose any of `on_foot`, `walking`, `running`, `on_bicycle`
|`[ACTIVITY_CLASSES][VEHICLE]` | An array of the activity labels to be considered in the `VEHICLE` category choose any of `in_vehicule`
Features description for `[PHONE_ACTIVITY_RECOGNITION][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|count |rows | Number of episodes.
|mostcommonactivity |activity type | The most common activity type (e.g. `still`, `on_foot`, etc.). If there is a tie, the first one is chosen.
|countuniqueactivities |activity type | Number of unique activities.
|durationstationary |minutes | The total duration of `[ACTIVITY_CLASSES][STATIONARY]` episodes
|durationmobile |minutes | The total duration of `[ACTIVITY_CLASSES][MOBILE]` episodes of on foot, running, and on bicycle activities
|durationvehicle |minutes | The total duration of `[ACTIVITY_CLASSES][VEHICLE]` episodes of on vehicle activity
!!! note "Assumptions/Observations"
1. iOS Activity Recognition names and types are unified with Android labels:
| iOS Activity Name | Android Activity Name | Android Activity Type |
|----|----|----|
|`walking`| `walking` | `7`
|`running`| `running` | `8`
|`cycling`| `on_bicycle` | `1`
|`automotive`| `in_vehicle` | `0`
|`stationary`| `still` | `3`
|`unknown`| `unknown` | `4`
2. In AWARE, Activity Recognition data for Android and iOS are stored in two different database tables, RAPIDS automatically infers what platform each participant belongs to based on their [participant file](../../setup/configuration/#participant-files).

View File

@ -0,0 +1,57 @@
# Phone Applications Foreground
Sensor parameters description for `[PHONE_APPLICATIONS_FOREGROUND]` (these parameters are used by the only provider available at the moment, RAPIDS):
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the applications foreground data is stored
|`[APPLICATION_CATEGORIES][CATALOGUE_SOURCE]` | `FILE` or `GOOGLE`. If `FILE`, app categories (genres) are read from `[CATALOGUE_FILE]`. If `[GOOGLE]`, app categories (genres) are scrapped from the Play Store
|`[APPLICATION_CATEGORIES][CATALOGUE_FILE]` | CSV file with a `package_name` and `genre` column. By default we provide the catalogue created by [Stachl et al](../../citation#stachl-applications-foreground) in `data/external/stachl_application_genre_catalogue.csv`
|`[APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE]` | if `[CATALOGUE_SOURCE]` is equal to `FILE`, this flag signals whether or not to update `[CATALOGUE_FILE]`, if `[CATALOGUE_SOURCE]` is equal to `GOOGLE` all scraped genres will be saved to `[CATALOGUE_FILE]`
|`[APPLICATION_CATEGORIES][SCRAPE_MISSING_CATEGORIES]` | This flag signals whether or not to scrape categories (genres) missing from the `[CATALOGUE_FILE]`. If `[CATALOGUE_SOURCE]` is equal to `GOOGLE`, all genres are scraped anyway (this flag is ignored)
## RAPIDS provider
The app category (genre) catalogue used in these features was originally created by [Stachl et al](../../citation#stachl-applications-foreground).
!!! info "Available time segments and platforms"
- Available for all time segments
- Available for Android only
!!! info "File Sequence"
```bash
- data/raw/{pid}/phone_applications_foreground_raw.csv
- data/raw/{pid}/phone_applications_foreground_with_datetime.csv
- data/raw/{pid}/phone_applications_foreground_with_datetime_with_categories.csv
- data/interim/{pid}/phone_applications_foreground_features/phone_applications_foreground_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_applications_foreground.csv
```
Parameters description for `[PHONE_APPLICATIONS_FOREGROUND][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_APPLICATIONS_FOREGROUND` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed, see table below
|`[SINGLE_CATEGORIES]` | An array of app categories to be *included* in the feature extraction computation. The special keyword `all` represents a category with all the apps from each participant. By default we use the category catalogue pointed by `[APPLICATION_CATEGORIES][CATALOGUE_FILE]` (see the Sensor parameters description table above)
|`[MULTIPLE_CATEGORIES]` | An array of collections representing meta-categories (a group of categories). They key of each element is the name of the `meta-category` and the value is an array of member app categories. By default we use the category catalogue pointed by `[APPLICATION_CATEGORIES][CATALOGUE_FILE]` (see the Sensor parameters description table above)
|`[SINGLE_APPS]` | An array of apps to be *included* in the feature extraction computation. Use their package name (e.g. `com.google.android.youtube`) or the reserved keyword `top1global` (the most used app by a participant over the whole monitoring study)
|`[EXCLUDED_CATEGORIES]` | An array of app categories to be *excluded* from the feature extraction computation. By default we use the category catalogue pointed by `[APPLICATION_CATEGORIES][CATALOGUE_FILE]` (see the Sensor parameters description table above)
|`[EXCLUDED_APPS]` | An array of apps to be excluded from the feature extraction computation. Use their package name, for example: `com.google.android.youtube`
Features description for `[PHONE_APPLICATIONS_FOREGROUND][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|count |apps | Number of times a single app or apps within a category were used (i.e. they were brought to the foreground either by tapping their icon or switching to it from another app)
|timeoffirstuse |minutes | The time in minutes between 12:00am (midnight) and the first use of a single app or apps within a category during a `time_segment`
|timeoflastuse |minutes | The time in minutes between 12:00am (midnight) and the last use of a single app or apps within a category during a `time_segment`
|frequencyentropy |nats | The entropy of the used apps within a category during a `time_segment` (each app is seen as a unique event, the more apps were used, the higher the entropy). This is especially relevant when computed over all apps. Entropy cannot be obtained for a single app
!!! note "Assumptions/Observations"
Features can be computed by app, by apps grouped under a single category (genre) and by multiple categories grouped together (meta-categories). For example, we can get features for `Facebook` (single app), for `Social Network` apps (a category including Facebook and other social media apps) or for `Social` (a meta-category formed by `Social Network` and `Social Media Tools` categories).
Apps installed by default like YouTube are considered systems apps on some phones. We do an exact match to exclude apps where "genre" == `EXCLUDED_CATEGORIES` or "package_name" == `EXCLUDED_APPS`.
We provide three ways of classifying and app within a category (genre): a) by automatically scraping its official category from the Google Play Store, b) by using the catalogue created by Stachl et al. which we provide in RAPIDS (`data/external/stachl_application_genre_catalogue.csv`), or c) by manually creating a personalized catalogue. You can choose a, b or c by modifying `[APPLICATION_GENRES]` keys and values (see the Sensor parameters description table above).

View File

@ -0,0 +1,48 @@
# Phone Battery
Sensor parameters description for `[PHONE_BATTERY]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the battery data is stored
|`[EPISODE_THRESHOLD_BETWEEN_ROWS]` | Difference in minutes between any two rows for them to be considered part of the same battery charge or discharge episode
## RAPIDS provider
!!! info "Available time segments and platforms"
- Available for all time segments
- Available for Android and iOS
!!! info "File Sequence"
```bash
- data/raw/{pid}/phone_battery_raw.csv
- data/interim/{pid}/phone_battery_episodes.csv
- data/interim/{pid}/phone_battery_episodes_resampled.csv
- data/interim/{pid}/phone_battery_episodes_resampled_with_datetime.csv
- data/interim/{pid}/phone_battery_features/phone_battery_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_battery.csv
```
Parameters description for `[PHONE_BATTERY][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_BATTERY` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed, see table below
Features description for `[PHONE_BATTERY][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|countdischarge |episodes | Number of discharging episodes.
|sumdurationdischarge |minutes | The total duration of all discharging episodes.
|countcharge |episodes | Number of battery charging episodes.
|sumdurationcharge |minutes | The total duration of all charging episodes.
|avgconsumptionrate |episodes/minutes | The average of all episodes' consumption rates. An episode's consumption rate is defined as the ratio between its battery delta and duration
|maxconsumptionrate |episodes/minutes | The highest of all episodes' consumption rates. An episode's consumption rate is defined as the ratio between its battery delta and duration
!!! note "Assumptions/Observations"
1. We convert battery data collected with iOS client v1 (autodetected because battery status `4` do not exist) to match Android battery format: we swap status `3` for `5` and `1` for `3`
2. We group battery data into discharge or charge episodes considering any contiguous rows with consecutive reductions or increases of the battery level if they are logged within `[EPISODE_THRESHOLD_BETWEEN_ROWS]` minutes from each other.

View File

@ -0,0 +1,41 @@
# Phone Bluetooth
Sensor parameters description for `[PHONE_BLUETOOTH]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the bluetooth data is stored
## RAPIDS provider
!!! info "Available time segments and platforms"
- Available for all time segments
- Available for Android only
!!! info "File Sequence"
```bash
- data/raw/{pid}/phone_bluetooth_raw.csv
- data/raw/{pid}/phone_bluetooth_with_datetime.csv
- data/interim/{pid}/phone_bluetooth_features/phone_bluetooth_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_bluetooth.csv"
```
Parameters description for `[PHONE_BLUETOOTH][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_BLUETOOTH` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed, see table below
Features description for `[PHONE_BLUETOOTH][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
| countscans | devices | Number of scanned devices during a `time_segment`, a device can be detected multiple times over time and these appearances are counted separately |
| uniquedevices | devices | Number of unique devices during a `time_segment` as identified by their hardware (`bt_address`) address |
| countscansmostuniquedevice | scans | Number of scans of the most scanned device during a `time_segment` across the whole monitoring period |
!!! note "Assumptions/Observations"
NA

View File

@ -0,0 +1,64 @@
# Phone Calls
Sensor parameters description for `[PHONE_CALLS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the calls data is stored
## RAPIDS Provider
!!! info "Available time segments and platforms"
- Available for all time segments
- Available for Android and iOS
!!! info "File Sequence"
```bash
- data/raw/{pid}/phone_calls_raw.csv
- data/raw/{pid}/phone_calls_with_datetime.csv
- data/raw/{pid}/phone_calls_with_datetime_unified.csv
- data/interim/{pid}/phone_calls_features/phone_calls_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_calls.csv
```
Parameters description for `[PHONE_CALLS][PROVIDERS][RAPIDS]`:
| Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|-------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|`[COMPUTE]`| Set to `True` to extract `PHONE_CALLS` features from the `RAPIDS` provider|
| `[CALL_TYPES]` | The particular call_type that will be analyzed. The options for this parameter are incoming, outgoing or missed. |
| `[FEATURES]` | Features to be computed for `outgoing`, `incoming`, and `missed` calls. Note that the same features are available for both incoming and outgoing calls, while missed calls has its own set of features. See the tables below. |
Features description for `[PHONE_CALLS][PROVIDERS][RAPIDS]` incoming and outgoing calls:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|count |calls |Number of calls of a particular `call_type` occurred during a particular `time_segment`.
|distinctcontacts |contacts |Number of distinct contacts that are associated with a particular `call_type` for a particular `time_segment`
|meanduration |seconds |The mean duration of all calls of a particular `call_type` during a particular `time_segment`.
|sumduration |seconds |The sum of the duration of all calls of a particular `call_type` during a particular `time_segment`.
|minduration |seconds |The duration of the shortest call of a particular `call_type` during a particular `time_segment`.
|maxduration |seconds |The duration of the longest call of a particular `call_type` during a particular `time_segment`.
|stdduration |seconds |The standard deviation of the duration of all the calls of a particular `call_type` during a particular `time_segment`.
|modeduration |seconds |The mode of the duration of all the calls of a particular `call_type` during a particular `time_segment`.
|entropyduration |nats |The estimate of the Shannon entropy for the the duration of all the calls of a particular `call_type` during a particular `time_segment`.
|timefirstcall |minutes |The time in minutes between 12:00am (midnight) and the first call of `call_type`.
|timelastcall |minutes |The time in minutes between 12:00am (midnight) and the last call of `call_type`.
|countmostfrequentcontact |calls |The number of calls of a particular `call_type` during a particular `time_segment` of the most frequent contact throughout the monitored period.
Features description for `[PHONE_CALLS][PROVIDERS][RAPIDS]` missed calls:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|count |calls |Number of `missed` calls that occurred during a particular `time_segment`.
|distinctcontacts |contacts |Number of distinct contacts that are associated with `missed` calls for a particular `time_segment`
|timefirstcall |minutes |The time in hours from 12:00am (Midnight) that the first `missed` call occurred.
|timelastcall |minutes |The time in hours from 12:00am (Midnight) that the last `missed` call occurred.
|countmostfrequentcontact |calls |The number of `missed` calls during a particular `time_segment` of the most frequent contact throughout the monitored period.
!!! note "Assumptions/Observations"
1. Traces for iOS calls are unique even for the same contact calling a participant more than once which renders `countmostfrequentcontact` meaningless and `distinctcontacts` equal to the total number of traces.
2. `[CALL_TYPES]` and `[FEATURES]` keys in `config.yaml` need to match. For example, `[CALL_TYPES]` `outgoing` matches the `[FEATURES]` key `outgoing`
3. iOS calls data is transformed to match Android calls data format. See our [algorithm](algorithms/phone-algorithms.md#phone-calls)

View File

@ -0,0 +1,71 @@
# Phone Conversation
Sensor parameters description for `[PHONE_CONVERSATION]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE][ANDROID]`| Database table where the conversation data from Android devices is stored (the AWARE client saves this data on different tables for Android and iOS)
|`[TABLE][IOS]`| Database table where the conversation data from iOS devices is stored (the AWARE client saves this data on different tables for Android and iOS)
## RAPIDS provider
!!! info "Available time segments and platforms"
- Available for all time segments
- Available for Android only
!!! info "File Sequence"
```bash
- data/raw/{pid}/phone_conversation_raw.csv
- data/raw/{pid}/phone_conversation_with_datetime.csv
- data/raw/{pid}/phone_conversation_with_datetime_unified.csv
- data/interim/{pid}/phone_conversation_features/phone_conversation_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_conversation.csv
```
Parameters description for `[PHONE_CONVERSATION][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_CONVERSATION` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed, see table below
|`[RECORDING_MINUTES]` | Minutes the plugin was recording audio (default 1 min)
|`[PAUSED_MINUTES]` | Minutes the plugin was NOT recording audio (default 3 min)
Features description for `[PHONE_CONVERSATION][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
| minutessilence | minutes | Minutes labeled as silence |
| minutesnoise | minutes | Minutes labeled as noise |
| minutesvoice | minutes | Minutes labeled as voice |
| minutesunknown | minutes | Minutes labeled as unknown |
| sumconversationduration | minutes | Total duration of all conversations |
| maxconversationduration | minutes | Longest duration of all conversations |
| minconversationduration | minutes | Shortest duration of all conversations |
| avgconversationduration | minutes | Average duration of all conversations |
| sdconversationduration | minutes | Standard Deviation of the duration of all conversations |
| timefirstconversation | minutes | Minutes since midnight when the first conversation for a time segment was detected |
| timelastconversation | minutes | Minutes since midnight when the last conversation for a time segment was detected |
| noisesumenergy | L2-norm | Sum of all energy values when inference is noise |
| noiseavgenergy | L2-norm | Average of all energy values when inference is noise |
| noisesdenergy | L2-norm | Standard Deviation of all energy values when inference is noise |
| noiseminenergy | L2-norm | Minimum of all energy values when inference is noise |
| noisemaxenergy | L2-norm | Maximum of all energy values when inference is noise |
| voicesumenergy | L2-norm | Sum of all energy values when inference is voice |
| voiceavgenergy | L2-norm | Average of all energy values when inference is voice |
| voicesdenergy | L2-norm | Standard Deviation of all energy values when inference is voice |
| voiceminenergy | L2-norm | Minimum of all energy values when inference is voice |
| voicemaxenergy | L2-norm | Maximum of all energy values when inference is voice |
| silencesensedfraction | - | Ratio between minutessilence and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown) |
| noisesensedfraction | - | Ratio between minutesnoise and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown) |
| voicesensedfraction | - | Ratio between minutesvoice and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown) |
| unknownsensedfraction | - | Ratio between minutesunknown and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown) |
| silenceexpectedfraction | - | Ration between minutessilence and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes) |
| noiseexpectedfraction | - | Ration between minutesnoise and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes) |
| voiceexpectedfraction | - | Ration between minutesvoice and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes) |
| unknownexpectedfraction | - | Ration between minutesunknown and the number of minutes that in theory should have been sensed based on the record and pause cycle of the plugin (1440 / recordingMinutes+pausedMinutes) |
!!! note "Assumptions/Observations"
1. The timestamp of conversation rows in iOS is in seconds so we convert it to milliseconds to match Android's format

View File

@ -0,0 +1,81 @@
# Phone Data Yield
This is a combinatorial sensor which means that we use the data from multiple sensors to extract data yield features. Data yield features can be used to remove rows ([time segments](../../setup/configuration/#time-segments)) that do not contain enough data. You should decide what is your "enough" threshold depending on the type of sensors you collected (frequency vs event based, e.g. acceleroemter vs calls), the length of your study, and the rates of missing data that your analysis could handle.
!!! hint "Why is data yield important?"
Imagine that you want to extract `PHONE_CALL` features on daily segments (`00:00` to `23:59`). Let's say that on day 1 the phone logged 10 calls and 23 hours of data from other sensors and on day 2 the phone logged 10 calls and only 2 hours of data from other sensors. It's more likely that other calls were placed on the 22 hours of data that you didn't log on day 2 than on the 1 hour of data you didn't log on day 1, and so including day 2 in your analysis could bias your results.
Sensor parameters description for `[PHONE_DATA_YIELD]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[SENSORS]`| One or more phone sensor config keys (e.g. `PHONE_MESSAGE`). The more keys you include the more accurately RAPIDS can approximate the time an smartphone was sensing data. The supported phone sensors you can include in this list are outlined below (**do NOT include Fitbit sensors**).
!!! info "Supported phone sensors for `[PHONE_DATA_YIELD][SENSORS]`"
```yaml
PHONE_ACCELEROMETER
PHONE_ACTIVITY_RECOGNITION
PHONE_APPLICATIONS_FOREGROUND
PHONE_BATTERY
PHONE_BLUETOOTH
PHONE_CALLS
PHONE_CONVERSATION
PHONE_MESSAGES
PHONE_LIGHT
PHONE_LOCATIONS
PHONE_SCREEN
PHONE_WIFI_VISIBLE
PHONE_WIFI_CONNECTED
```
## RAPIDS provider
Before explaining the data yield features, let's define the following relevant concepts:
- A valid minute is any 60 second window when any phone sensor logged at least 1 row of data
- A valid hour is any 60 minute window with at least X valid minutes. The X or threshold is given by `[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]`
The timestamps of all sensors are concatenated and then grouped per time segment. Minute and hour windows are created from the beginning of each time segment instance and these windows are marked as valid based on the definitions above. The duration of each time segment is taken into account to compute the features described below.
!!! info "Available time segments and platforms"
- Available for all time segments
- Available for Android and iOS
!!! info "File Sequence"
```bash
- data/raw/{pid}/{sensor}_raw.csv # one for every [PHONE_DATA_YIELD][SENSORS]
- data/interim/{pid}/phone_yielded_timestamps.csv
- data/interim/{pid}/phone_yielded_timestamps_with_datetime.csv
- data/interim/{pid}/phone_data_yield_features/phone_data_yield_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_data_yield.csv
```
Parameters description for `[PHONE_DATA_YIELD][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_DATA_YIELD` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed, see table below
|`[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]` | The proportion `[0.0 ,1.0]` of valid minutes in a 60-minute window necessary to flag that window as valid.
Features description for `[PHONE_DATA_YIELD][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|ratiovalidyieldedminutes |rows | The ratio between the number of valid minutes and the duration in minutes of a time segment.
|ratiovalidyieldedhours |lux | The ratio between the number of valid hours and the duration in hours of a time segment. If the time segment is shorter than 1 hour this feature will always be 1.
!!! note "Assumptions/Observations"
1. We recommend using `ratiovalidyieldedminutes` on time segments that are shorter than two or three hours and `ratiovalidyieldedhours` for longer segments. This is because relying on yielded minutes only can be misleading when a big chunk of those missing minutes are clustered together.
For example, let's assume we are working with a 24-hour time segment that is missing 12 hours of data. Two extreme cases can occur:
<ol type="A">
<li>the 12 missing hours are from the beginning of the segment or </li>
<li>30 minutes could be missing from every hour (24 * 30 minutes = 12 hours).</li>
</ol>
`ratiovalidyieldedminutes` would be 0.5 for both `a` and `b` (hinting the missing circumstances are similar). However, `ratiovalidyieldedhours` would be 0.5 for `a` and 1.0 for `b` if `[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]` is between [0.0 and 0.49] (hinting that the missing circumstances might be more favorable for `b`. In other words, sensed data for `b` is more evenly spread compared to `a`.

View File

@ -0,0 +1,44 @@
# Phone Light
Sensor parameters description for `[PHONE_LIGHT]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the light data is stored
## RAPIDS provider
!!! info "Available time segments and platforms"
- Available for all time segments
- Available for Android only
!!! info "File Sequence"
```bash
- data/raw/{pid}/phone_light_raw.csv
- data/raw/{pid}/phone_light_with_datetime.csv
- data/interim/{pid}/phone_light_features/phone_light_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_light.csv
```
Parameters description for `[PHONE_LIGHT][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_LIGHT` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed, see table below
Features description for `[PHONE_LIGHT][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|count |rows | Number light sensor rows recorded.
|maxlux |lux | The maximum ambient luminance.
|minlux |lux | The minimum ambient luminance.
|avglux |lux | The average ambient luminance.
|medianlux |lux | The median ambient luminance.
|stdlux |lux | The standard deviation of ambient luminance.
!!! note "Assumptions/Observations"
NA

View File

@ -0,0 +1,142 @@
# Phone Locations
Sensor parameters description for `[PHONE_LOCATIONS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the location data is stored
|`[LOCATIONS_TO_USE]`| Type of location data to use, one of `ALL`, `GPS` or `FUSED_RESAMPLED`. This filter is based on the `provider` column of the AWARE locations table, `ALL` includes every row, `GPS` only includes rows where provider is gps, and `FUSED_RESAMPLED` only includes rows where provider is fused after being resampled.
|`[FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD]`| if `FUSED_RESAMPLED` is used, the original fused data has to be resampled, a location row will be resampled to the next valid timestamp (see the Assumptions/Observations below) only if the time difference between them is less or equal than this threshold (in minutes).
|`[FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION]`| if `FUSED_RESAMPLED` is used, the original fused data has to be resampled, a location row will be resampled at most for this long (in minutes)
!!! note "Assumptions/Observations"
**Types of location data to use**
AWARE Android and iOS clients can collect location coordinates through the phone\'s GPS, the network cellular towers around the phone or Google\'s fused location API. If you want to use only the GPS provider set `[LOCATIONS_TO_USE]` to `GPS`, if you want to use all providers (not recommended due to the difference in accuracy) set `[LOCATIONS_TO_USE]` to `ALL`, if your AWARE client was configured to use fused location only or want to focus only on this provider, set `[LOCATIONS_TO_USE]` to `RESAMPLE_FUSED`. `RESAMPLE_FUSED` takes the original fused location coordinates and replicates each pair forward in time as long as the phone was sensing data as indicated by the joined timestamps of [`[PHONE_DATA_YIELD][SENSORS]`](../phone-data-yield/), this is done because Google\'s API only logs a new location coordinate pair when it is sufficiently different in time or space from the previous one.
There are two parameters associated with resampling fused location. `FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD` (in minutes, default 30) controls the maximum gap between any two coordinate pairs to replicate the last known pair (for example, participant A\'s phone did not collect data between 10.30am and 10:50am and between 11:05am and 11:40am, the last known coordinate pair will be replicated during the first period but not the second, in other words, we assume that we cannot longer guarantee the participant stayed at the last known location if the phone did not sense data for more than 30 minutes). `FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION` (in minutes, default 720 or 12 hours) stops the last known fused location from being replicated longer that this threshold even if the phone was sensing data continuously (for example, participant A went home at 9pm and their phone was sensing data without gaps until 11am the next morning, the last known location will only be replicated until 9am). If you have suggestions to modify or improve this resampling, let us know.
## BARNETT provider
These features are based on the original open-source implementation by [Barnett et al](../../citation#barnett-locations) and some features created by [Canzian et al](../../citation#barnett-locations).
!!! info "Available time segments and platforms"
- Available only for segments that start at 00:00:00 and end at 23:59:59 of the same day (daily segments)
- Available for Android and iOS
!!! info "File Sequence"
```bash
- data/raw/{pid}/phone_locations_raw.csv
- data/interim/{pid}/phone_locations_processed.csv
- data/interim/{pid}/phone_locations_processed_with_datetime.csv
- data/interim/{pid}/phone_locations_features/phone_locations_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_locations.csv
```
Parameters description for `[PHONE_LOCATIONS][PROVIDERS][BARNETT]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_LOCATIONS` features from the `BARNETT` provider|
|`[FEATURES]` | Features to be computed, see table below
|`[ACCURACY_LIMIT]` | An integer in meters, any location rows with an accuracy higher than this will be dropped. This number means there's a 68% probability the true location is within this radius
|`[TIMEZONE]` | Timezone where the location data was collected. By default points to the one defined in the [Configuration](../../setup/configuration#timezone-of-your-study)
|`[MINUTES_DATA_USED]` | Set to `True` to include an extra column in the final location feature file containing the number of minutes used to compute the features on each time segment. Use this for quality control purposes, the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough.
Features description for `[PHONE_LOCATIONS][PROVIDERS][BARNETT]` adapted from [Beiwe Summary Statistics](http://wiki.beiwe.org/wiki/Summary_Statistics):
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|hometime |minutes | Time at home. Time spent at home in minutes. Home is the most visited significant location between 8 pm and 8 am including any pauses within a 200-meter radius.
|disttravelled |meters | Total distance travelled over a day (flights).
|rog |meters | The Radius of Gyration (rog) is a measure in meters of the area covered by a person over a day. A centroid is calculated for all the places (pauses) visited during a day and a weighted distance between all the places and that centroid is computed. The weights are proportional to the time spent in each place.
|maxdiam |meters | The maximum diameter is the largest distance between any two pauses.
|maxhomedist |meters | The maximum distance from home in meters.
|siglocsvisited |locations | The number of significant locations visited during the day. Significant locations are computed using k-means clustering over pauses found in the whole monitoring period. The number of clusters is found iterating k from 1 to 200 stopping until the centroids of two significant locations are within 400 meters of one another.
|avgflightlen |meters | Mean length of all flights.
|stdflightlen |meters | Standard deviation of the length of all flights.
|avgflightdur |seconds | Mean duration of all flights.
|stdflightdur |seconds | The standard deviation of the duration of all flights.
|probpause | - | The fraction of a day spent in a pause (as opposed to a flight)
|siglocentropy |nats | Shannons entropy measurement based on the proportion of time spent at each significant location visited during a day.
|circdnrtn | - | A continuous metric quantifying a persons circadian routine that can take any value between 0 and 1, where 0 represents a daily routine completely different from any other sensed days and 1 a routine the same as every other sensed day.
|wkenddayrtn | - | Same as circdnrtn but computed separately for weekends and weekdays.
!!! note "Assumptions/Observations"
**Barnett\'s et al features**
These features are based on a Pause-Flight model. A pause is defined as a mobiity trace (location pings) within a certain duration and distance (by default 300 seconds and 60 meters). A flight is any mobility trace between two pauses. Data is resampled and imputed before the features are computed. See [Barnett et al](../../citation#barnett-locations) for more information. In RAPIDS we only expose two parameters for these features (timezone and accuracy limit). You can change other parameters in `src/features/phone_locations/barnett/library/MobilityFeatures.R`.
**Significant Locations**
Significant locations are determined using K-means clustering on pauses longer than 10 minutes. The number of clusters (K) is increased until no two clusters are within 400 meters from each other. After this, pauses within a certain range of a cluster (200 meters by default) will count as a visit to that significant location. This description was adapted from the Supplementary Materials of [Barnett et al](../../citation#barnett-locations).
**The Circadian Calculation**
For a detailed description of how this is calculated, see [Canzian et al](../../citation#barnett-locations).
## DORYAB provider
These features are based on the original implementation by [Doryab et al.](../../citation#doryab-locations).
!!! info "Available time segments and platforms"
- Available for all time segments
- Available for Android and iOS
!!! info "File Sequence"
```bash
- data/raw/{pid}/phone_locations_raw.csv
- data/interim/{pid}/phone_locations_processed.csv
- data/interim/{pid}/phone_locations_processed_with_datetime.csv
- data/interim/{pid}/phone_locations_features/phone_locations_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_locations.csv
```
Parameters description for `[PHONE_LOCATIONS][PROVIDERS][BARNETT]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_LOCATIONS` features from the `BARNETT` provider|
|`[FEATURES]` | Features to be computed, see table below
| `[DBSCAN_EPS]` | The maximum distance in meters between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function.
| `[DBSCAN_MINSAMPLES]` | The number of samples (or total weight) in a neighborhood for a point to be considered as a core point of a cluster. This includes the point itself.
| `[THRESHOLD_STATIC]` | It is the threshold value in km/hr which labels a row as Static or Moving.
| `[MAXIMUM_GAP_ALLOWED]` | The maximum gap (in seconds) allowed between any two consecutive rows for them to be considered part of the same displacement. If this threshold is too high, it can throw speed and distance calculations off for periods when the the phone was not sensing.
| `[MINUTES_DATA_USED]` | Set to `True` to include an extra column in the final location feature file containing the number of minutes used to compute the features on each time segment. Use this for quality control purposes, the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough.
| `[SAMPLING_FREQUENCY]` | Expected time difference between any two location rows in minutes. If set to `0`, the sampling frequency will be inferred automatically as the median of all the differences between any two consecutive row timestamps (recommended if you are using `FUSED_RESAMPLED` data). This parameter impacts all the time calculations.
Features description for `[PHONE_LOCATIONS][PROVIDERS][BARNETT]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|locationvariance |$meters^2$ |The sum of the variances of the latitude and longitude columns.
|loglocationvariance | - | Log of the sum of the variances of the latitude and longitude columns.
|totaldistance |meters |Total distance travelled in a time segment using the haversine formula.
|averagespeed |km/hr |Average speed in a time segment considering only the instances labeled as Moving.
|varspeed |km/hr |Speed variance in a time segment considering only the instances labeled as Moving.
|circadianmovement |- | \"It encodes the extent to which a person's location patterns follow a 24-hour circadian cycle.\" [Doryab et al.](../../citation#doryab-locations).
|numberofsignificantplaces |places |Number of significant locations visited. It is calculated using the DBSCAN clustering algorithm which takes in EPS and MIN_SAMPLES as parameters to identify clusters. Each cluster is a significant place.
|numberlocationtransitions |transitions |Number of movements between any two clusters in a time segment.
|radiusgyration |meters |Quantifies the area covered by a participant
|timeattop1location |minutes |Time spent at the most significant location.
|timeattop2location |minutes |Time spent at the 2nd most significant location.
|timeattop3location |minutes |Time spent at the 3rd most significant location.
|movingtostaticratio | - | Ratio between the number of rows labeled Moving versus Static
|outlierstimepercent | - | Ratio between the number of rows that belong to non-significant clusters divided by the total number of rows in a time segment.
|maxlengthstayatclusters |minutes |Maximum time spent in a cluster (significant location).
|minlengthstayatclusters |minutes |Minimum time spent in a cluster (significant location).
|meanlengthstayatclusters |minutes |Average time spent in a cluster (significant location).
|stdlengthstayatclusters |minutes |Standard deviation of time spent in a cluster (significant location).
|locationentropy |nats |Shannon Entropy computed over the row count of each cluster (significant location), it will be higher the more rows belong to a cluster (i.e. the more time a participant spent at a significant location).
|normalizedlocationentropy |nats |Shannon Entropy computed over the row count of each cluster (significant location) divided by the number of clusters, it will be higher the more rows belong to a cluster (i.e. the more time a participant spent at a significant location).
!!! note "Assumptions/Observations"
**Significant Locations Identified**
Significant locations are determined using DBSCAN clustering on locations that a patient visit over the course of the period of data collection.
**The Circadian Calculation**
For a detailed description of how this is calculated, see [Canzian et al](../../citation#doryab-locations).

View File

@ -0,0 +1,46 @@
# Phone Messages
Sensor parameters description for `[PHONE_MESSAGES]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the messages data is stored
## RAPIDS provider
!!! info "Available time segments and platforms"
- Available for all time segments
- Available for Android only
!!! info "File Sequence"
```bash
- data/raw/{pid}/phone_messages_raw.csv
- data/raw/{pid}/phone_messages_with_datetime.csv
- data/interim/{pid}/phone_messages_features/phone_messages_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_messages.csv
```
Parameters description for `[PHONE_MESSAGES][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_MESSAGES` features from the `RAPIDS` provider|
|`[MESSAGES_TYPES]` | The `messages_type` that will be analyzed. The options for this parameter are `received` or `sent`.
|`[FEATURES]` | Features to be computed, see table below for `[MESSAGES_TYPES]` `received` and `sent`
Features description for `[PHONE_MESSAGES][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|count |messages |Number of messages of type `messages_type` that occurred during a particular `time_segment`.
|distinctcontacts |contacts |Number of distinct contacts that are associated with a particular `messages_type` during a particular `time_segment`.
|timefirstmessages |minutes |Number of minutes between 12:00am (midnight) and the first `message` of a particular `messages_type` during a particular `time_segment`.
|timelastmessages |minutes |Number of minutes between 12:00am (midnight) and the last `message` of a particular `messages_type` during a particular `time_segment`.
|countmostfrequentcontact |messages |Number of messages from the contact with the most messages of `messages_type` during a `time_segment` throughout the whole dataset of each participant.
!!! note "Assumptions/Observations"
1. `[MESSAGES_TYPES]` and `[FEATURES]` keys in `config.yaml` need to match. For example, `[MESSAGES_TYPES]` `sent` matches the `[FEATURES]` key `sent`

View File

@ -0,0 +1,58 @@
# Phone Screen
Sensor parameters description for `[PHONE_SCREEN]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the screen data is stored
## RAPIDS provider
!!! info "Available time segments and platforms"
- Available for all time segments
- Available for Android and iOS
!!! info "File Sequence"
```bash
- data/raw/{pid}/phone_screen_raw.csv
- data/raw/{pid}/phone_screen_with_datetime.csv
- data/raw/{pid}/phone_screen_with_datetime_unified.csv
- data/interim/{pid}/phone_screen_episodes.csv
- data/interim/{pid}/phone_screen_episodes_resampled.csv
- data/interim/{pid}/phone_screen_episodes_resampled_with_datetime.csv
- data/interim/{pid}/phone_screen_features/phone_screen_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_screen.csv
```
Parameters description for `[PHONE_SCREEN][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_SCREEN` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed, see table below
|`[REFERENCE_HOUR_FIRST_USE]` | The reference point from which `firstuseafter` is to be computed, default is midnight
|`[IGNORE_EPISODES_SHORTER_THAN]` | Ignore episodes that are shorter than this threshold (minutes). Set to 0 to disable this filter.
|`[IGNORE_EPISODES_LONGER_THAN]` | Ignore episodes that are longer than this threshold (minutes). Set to 0 to disable this filter.
|`[EPISODE_TYPES]` | Currently we only support `unlock` episodes (from when the phone is unlocked until the screen is off)
Features description for `[PHONE_SCREEN][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|sumduration |minutes |Total duration of all unlock episodes.
|maxduration |minutes |Longest duration of any unlock episode.
|minduration |minutes |Shortest duration of any unlock episode.
|avgduration |minutes |Average duration of all unlock episodes.
|stdduration |minutes |Standard deviation duration of all unlock episodes.
|countepisode |episodes |Number of all unlock episodes
<!-- |episodepersensedminutes |episodes/minute |The ratio between the total number of episodes in an epoch divided by the total time (minutes) the phone was sensing data. -->
|firstuseafter |minutes |Minutes until the first unlock episode.
!!! note "Assumptions/Observations"
1. In Android, `lock` events can happen right after an `off` event, after a few seconds of an `off` event, or never happen depending on the phone\'s settings, therefore, an `unlock` episode is defined as the time between an `unlock` and a `off` event. In iOS, `on` and `off` events do not exist, so an `unlock` episode is defined as the time between an `unlock` and a `lock` event.
2. Events in iOS are recorded reliably albeit some duplicated `lock` events within milliseconds from each other, so we only keep consecutive unlock/lock pairs. In Android you cand find multiple consecutive `unlock` or `lock` events, so we only keep consecutive unlock/off pairs. In our experiments these cases are less than 10% of the screen events collected and this happens because `ACTION_SCREEN_OFF` and `ACTION_SCREEN_ON` are `sent when the device becomes non-interactive which may have nothing to do with the screen turning off`. In addition to unlock/off episodes, in Android it is possible to measure the time spent on the lock screen before an `unlock` event as well as the total screen time (i.e. `ON` to `OFF`) but these are not implemented at the moment.
3. We transform iOS screen events to match Android's format, we replace `lock` episodes with `off` episodes (2 with 0) in iOS. However, as mentioned above this is still computing `unlock` to `lock` episodes.

View File

@ -0,0 +1,42 @@
# Phone WiFi Connected
Sensor parameters description for `[PHONE_WIFI_CONNECTED]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the wifi (connected) data is stored
## RAPIDS provider
!!! info "Available time segments and platforms"
- Available for all time segments
- Available for Android and iOS
!!! info "File Sequence"
```bash
- data/raw/{pid}/phone_wifi_connected_raw.csv
- data/raw/{pid}/phone_wifi_connected_with_datetime.csv
- data/interim/{pid}/phone_wifi_connected_features/phone_wifi_connected_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_wifi_connected.csv
```
Parameters description for `[PHONE_WIFI_CONNECTED][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_WIFI_CONNECTED` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed, see table below
Features description for `[PHONE_WIFI_CONNECTED][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
| countscans | devices | Number of scanned WiFi access points connected during a time_segment, an access point can be detected multiple times over time and these appearances are counted separately |
| uniquedevices | devices | Number of unique access point during a time_segment as identified by their hardware address |
| countscansmostuniquedevice | scans | Number of scans of the most scanned access point during a time_segment across the whole monitoring period |
!!! note "Assumptions/Observations"
1. A connected WiFI access point is one that a phone was connected to.
2. By default AWARE stores this data in the `sensor_wifi` table.

View File

@ -0,0 +1,42 @@
# Phone WiFi Visible
Sensor parameters description for `[PHONE_WIFI_VISIBLE]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the wifi (visible) data is stored
## RAPIDS provider
!!! info "Available time segments and platforms"
- Available for all time segments
- Available for Android only
!!! info "File Sequence"
```bash
- data/raw/{pid}/phone_wifi_visible_raw.csv
- data/raw/{pid}/phone_wifi_visible_with_datetime.csv
- data/interim/{pid}/phone_wifi_visible_features/phone_wifi_visible_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_wifi_visible.csv
```
Parameters description for `[PHONE_WIFI_VISIBLE][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_WIFI_VISIBLE` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed, see table below
Features description for `[PHONE_WIFI_VISIBLE][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
| countscans | devices | Number of scanned WiFi access points visible during a time_segment, an access point can be detected multiple times over time and these appearances are counted separately |
| uniquedevices | devices | Number of unique access point during a time_segment as identified by their hardware address |
| countscansmostuniquedevice | scans | Number of scans of the most scanned access point during a time_segment across the whole monitoring period |
!!! note "Assumptions/Observations"
1. A visible WiFI access point is one that a phone sensed around itself but that it was not connected to. Due to API restrictions, this sensor is not available on iOS.
2. By default AWARE stores this data in the `wifi` table.

View File

@ -0,0 +1,20 @@
# File Structure
!!! tip
- Read this page if you want to learn more about how RAPIDS is structured. If you want to start using it go to [Installation](../setup/installation/), then to [Configuration](../setup/configuration/), and then to [Execution](../setup/execution/)
- All paths mentioned in this page are relative to RAPIDS' root folder.
If you want to extract the behavioral features that RAPIDS offers, you will only have to create or modify the [`.env` file](../setup/configuration/#database-credentials), [participants files](../setup/configuration/#participant-files), [time segment files](../setup/configuration/#time-segments), and the `config.yaml` file as instructed in the [Configuration page](../setup/configuration). The `config.yaml` file is the heart of RAPIDS and includes parameters to manage participants, data sources, sensor data, visualizations and more.
All data is saved in `data/`. The `data/external/` folder stores any data imported or created by the user, `data/raw/` stores sensor data as imported from your database, `data/interim/` has intermediate files necessary to compute behavioral features from raw data, and `data/processed/` has all the final files with the behavioral features in folders per participant and sensor.
RAPIDS source code is saved in `src/`. The `src/data/` folder stores scripts to download, clean and pre-process sensor data, `src/features` has scripts to extract behavioral features organized in their respective sensor subfolders , `src/models/` can host any script to create models or statistical analyses with the behavioral features you extract, and `src/visualization/` has scripts to create plots of the raw and processed data. There are other files and folders but only relevant if you are interested in extending RAPIDS (e.g. virtual env files, docs, tests, Dockerfile, the Snakefile, etc.).
In the figure below, we represent the interactions between users and files. After a user modifies the configuration files mentioned above, the `Snakefile` file will search for and execute the Snakemake rules that contain the Python or R scripts necessary to generate or update the required output files (behavioral features, plots, etc.).
<figure>
<img src="../img/files.png" max-width="100%" />
<figcaption>Interaction diagram between the user, and important files in RAPIDS</figcaption>
</figure>

Binary file not shown.

After

Width:  |  Height:  |  Size: 506 KiB

BIN
docs/img/files.png 100644

Binary file not shown.

After

Width:  |  Height:  |  Size: 314 KiB

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 288 KiB

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

35
docs/index.md 100644
View File

@ -0,0 +1,35 @@
# Welcome to RAPIDS documentation
Reproducible Analysis Pipeline for Data Streams (RAPIDS) allows you to process smartphone and wearable data to [extract](features/feature-introduction.md) and [create](features/add-new-features.md) **behavioral features** (a.k.a. digital biomarkers), [visualize](visualizations/data-quality-visualizations.md) mobile sensor data and [structure](workflow-examples/analysis.md) your analysis into reproducible workflows.
RAPIDS is open source, documented, modular, tested, and reproducible. At the moment we support smartphone data collected with [AWARE](https://awareframework.com/) and wearable data from Fitbit devices.
!!! tip
:material-slack: Questions or feedback can be posted on the \#rapids channel in AWARE Framework\'s [slack](http://awareframework.com:3000/).
:material-github: Bugs and feature requests should be posted on [Github](https://github.com/carissalow/rapids/issues).
:fontawesome-solid-tasks: Join our discussions on our algorithms and assumptions for feature [processing](https://github.com/carissalow/rapids/issues?q=is%3Aissue+is%3Aopen+label%3Adiscussion).
:fontawesome-solid-play: Ready to start? Go to [Installation](setup/installation/), then to [Configuration](setup/configuration/), and then to [Execution](setup/execution/)
## How does it work?
RAPIDS is formed by R and Python scripts orchestrated by [Snakemake](https://snakemake.readthedocs.io/en/stable/). We suggest you read Snakemake's docs but in short: every link in the analysis chain is atomic and has files as input and output. Behavioral features are processed per sensor and per participant.
## What are the benefits of using RAPIDS?
1. **Consistent analysis**. Every participant sensor dataset is analyzed in the exact same way and isolated from each other.
2. **Efficient analysis**. Every analysis step is executed only once. Whenever your data or configuration changes only the affected files are updated.
5. **Parallel execution**. Thanks to Snakemake, your analysis can be executed over multiple cores without changing your code.
6. **Code-free features**. Extract any of the behavioral features offered by RAPIDS without writing any code.
7. **Extensible code**. You can easily add your own behavioral features in R or Python, share them with the community, and keep authorship and citations.
8. **Timezone aware**. Your data is adjusted to the specified timezone (multiple timezones suport *coming soon*).
9. **Flexible time segments**. You can extract behavioral features on time windows of any length (e.g. 5 minutes, 3 hours, 2 days), on every day or particular days (e.g. weekends, Mondays, the 1st of each month, etc.) or around events of interest (e.g. surveys or clinical relapses).
10. **Tested code**. We are constantly adding tests to make sure our behavioral features are correct.
11. **Reproducible code**. If you structure your analysis within RAPIDS, you can be sure your code will run in other computers as intended thanks to R and Python virtual environments. You can share your analysis code along your publications without any overhead.
12. **Private**. All your data is processed locally.
## How is it organized?
In broad terms the `config.yaml`, [`.env` file](../setup/configuration/#database-credentials), [participants files](../setup/configuration/#participant-files), and [time segment files](../setup/configuration/#time-segments) are the only ones that you will have to modify. All data is stored in `data/` and all scripts are stored in `src/`. For more information see RAPIDS' [File Structure](file-structure.md).

View File

@ -1,50 +0,0 @@
.. moshi-aware documentation master file, created by
sphinx-quickstart.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
RAPIDS
======
**R**\ eproducible **A**\ nalysis **Pi**\ peline for **D**\ ata **S**\ treams
Do you want to keep up to date with new functionality or have a question? Join the #rapids channel in AWARE Framework's slack_
Contents:
.. toctree::
:maxdepth: 2
:caption: Getting Started
usage/introduction
usage/installation
usage/quick_rule
usage/example
usage/snakemake_docs
usage/faq
.. toctree::
:maxdepth: 2
:caption: Features
features/extracted
.. toctree::
:maxdepth: 2
:caption: Visualization
visualization/data_exploration
.. toctree::
:maxdepth: 2
:caption: Developers
develop/remotesupport
develop/documentation
develop/features
develop/environments
develop/contributors
develop/testing
develop/test_cases
.. _slack: http://awareframework.com:3000/

View File

@ -0,0 +1,14 @@
window.addEventListener("DOMContentLoaded", function() {
var xhr = new XMLHttpRequest();
xhr.open("GET", window.location + "../versions.json");
xhr.onload = function() {
var versions = JSON.parse(this.responseText);
latest_version = ""
for(id in versions)
if(versions[id]["aliases"].length > 0 && versions[id]["aliases"].includes("latest"))
latest_version = "/" + versions[id].version + "/"
if(!window.location.pathname.includes("/latest/") && (latest_version.length > 0 && !window.location.pathname.includes(latest_version)))
document.querySelector("div[data-md-component=announce]").innerHTML = "<div id='announce-msg'>You are seeing the docs for a previous version of RAPIDS, <a href='" + window.location.href + "latest/'>click here to go to the latest</a></div>"
};
xhr.send();
});

View File

@ -1,190 +0,0 @@
@ECHO OFF
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set BUILDDIR=_build
set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .
set I18NSPHINXOPTS=%SPHINXOPTS% .
if NOT "%PAPER%" == "" (
set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
)
if "%1" == "" goto help
if "%1" == "help" (
:help
echo.Please use `make ^<target^>` where ^<target^> is one of
echo. html to make standalone HTML files
echo. dirhtml to make HTML files named index.html in directories
echo. singlehtml to make a single large HTML file
echo. pickle to make pickle files
echo. json to make JSON files
echo. htmlhelp to make HTML files and a HTML help project
echo. qthelp to make HTML files and a qthelp project
echo. devhelp to make HTML files and a Devhelp project
echo. epub to make an epub
echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter
echo. text to make text files
echo. man to make manual pages
echo. texinfo to make Texinfo files
echo. gettext to make PO message catalogs
echo. changes to make an overview over all changed/added/deprecated items
echo. linkcheck to check all external links for integrity
echo. doctest to run all doctests embedded in the documentation if enabled
goto end
)
if "%1" == "clean" (
for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
del /q /s %BUILDDIR%\*
goto end
)
if "%1" == "html" (
%SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The HTML pages are in %BUILDDIR%/html.
goto end
)
if "%1" == "dirhtml" (
%SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
goto end
)
if "%1" == "singlehtml" (
%SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
goto end
)
if "%1" == "pickle" (
%SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
if errorlevel 1 exit /b 1
echo.
echo.Build finished; now you can process the pickle files.
goto end
)
if "%1" == "json" (
%SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
if errorlevel 1 exit /b 1
echo.
echo.Build finished; now you can process the JSON files.
goto end
)
if "%1" == "htmlhelp" (
%SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
if errorlevel 1 exit /b 1
echo.
echo.Build finished; now you can run HTML Help Workshop with the ^
.hhp project file in %BUILDDIR%/htmlhelp.
goto end
)
if "%1" == "qthelp" (
%SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
if errorlevel 1 exit /b 1
echo.
echo.Build finished; now you can run "qcollectiongenerator" with the ^
.qhcp project file in %BUILDDIR%/qthelp, like this:
echo.^> qcollectiongenerator %BUILDDIR%\qthelp\moshi-aware.qhcp
echo.To view the help file:
echo.^> assistant -collectionFile %BUILDDIR%\qthelp\moshi-aware.ghc
goto end
)
if "%1" == "devhelp" (
%SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
if errorlevel 1 exit /b 1
echo.
echo.Build finished.
goto end
)
if "%1" == "epub" (
%SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The epub file is in %BUILDDIR%/epub.
goto end
)
if "%1" == "latex" (
%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
if errorlevel 1 exit /b 1
echo.
echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
goto end
)
if "%1" == "text" (
%SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The text files are in %BUILDDIR%/text.
goto end
)
if "%1" == "man" (
%SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The manual pages are in %BUILDDIR%/man.
goto end
)
if "%1" == "texinfo" (
%SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.
goto end
)
if "%1" == "gettext" (
%SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
if errorlevel 1 exit /b 1
echo.
echo.Build finished. The message catalogs are in %BUILDDIR%/locale.
goto end
)
if "%1" == "changes" (
%SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
if errorlevel 1 exit /b 1
echo.
echo.The overview file is in %BUILDDIR%/changes.
goto end
)
if "%1" == "linkcheck" (
%SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
if errorlevel 1 exit /b 1
echo.
echo.Link check complete; look for any errors in the above output ^
or in %BUILDDIR%/linkcheck/output.txt.
goto end
)
if "%1" == "doctest" (
%SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
if errorlevel 1 exit /b 1
echo.
echo.Testing of doctests in the sources finished, look at the ^
results in %BUILDDIR%/doctest/output.txt.
goto end
)
:end

View File

@ -0,0 +1,19 @@
# Migrating from RAPIDS beta
If you were relying on the [old docs](https://rapidspitt.readthedocs.io/en/latest/) and the most recent version of RAPIDS you are working with is from or before [Oct 13, 2020](https://github.com/carissalow/rapids/commit/640890c7b49492d150accff5c87b1eb25bd97a49) you are using the beta version of RAPIDS.
You can start using the new RAPIDS (we are starting with `v0.1.0`) right away, just take into account the following:
1. [Install](setup/installation.md) a new copy of RAPIDS (the R and Python virtual environments didn't change so the cached versions will be reused)
1. Make sure you don't skip a new Installation step to give execution permissions to the RAPIDS script: `chmod +x rapids`
2. Follow the new [Configuration](setup/configuration.md) guide.
1. You can copy and paste your old `.env` file
2. You can migrate your old participant files:
```
python tools/update_format_participant_files.py
```
3. Get familiar with the new way of [Executing](setup/execution.md) RAPIDS
3. You can proceed to reconfigure your `config.yaml`, its structure is more consistent and should be familiar to you.
!!! info
If you have any questions reach out to us on [Slack](http://awareframework.com:3000/).

View File

@ -0,0 +1,435 @@
# Configuration
You need to follow these steps to configure your RAPIDS deployment before you can extract behavioral features
1. Add your [database credentials](#database-credentials)
2. Choose the [timezone of your study](#timezone-of-your-study)
3. Create your [participants files](#participant-files)
4. Select what [time segments](#time-segments) you want to extract features on
5. Modify your [device data source configuration](#device-data-source-configuration)
6. Select what [sensors and features](#sensor-and-features-to-process) you want to process
When you are done with this configuration, go to [executing RAPIDS](setup/execution).
!!! hint
Every time you see `config["KEY"]` or `[KEY]` in these docs we are referring to the corresponding key in the `config.yaml` file.
---
## Database credentials
1. Create an empty file called `#!bash .env` in your RAPIDS root directory
2. Add the following lines and replace your database-specific credentials (user, password, host, and database):
``` yaml
[MY_GROUP]
user=MY_USER
password=MY_PASSWORD
host=MY_HOST
port=3306
database=MY_DATABASE
```
!!! warning
The label `MY_GROUP` is arbitrary but it has to match the following `config.yaml` key:
```yaml
DATABASE_GROUP: &database_group
MY_GROUP
```
!!! note
You can ignore this step if you are only processing Fitbit data in CSV files.
---
## Timezone of your study
### Single timezone
If your study only happened in a single time zone, select the appropriate code form this [list](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) and change the following config key. Double check your timezone code pick, for example US Eastern Time is `America/New_York` not `EST`
``` yaml
TIMEZONE: &timezone
America/New_York
```
### Multiple timezones
Support coming soon.
---
## Participant files
Participant files link together multiple devices (smartphones and wearables) to specific participants and identify them throughout RAPIDS. You can create these files manually or [automatically](#automatic-creation-of-participant-files). Participant files are stored in `data/external/participant_files/pxx.yaml` and follow a unified [structure](#structure-of-participants-files).
!!! note
The list `PIDS` in `config.yaml` needs to have the participant file names of the people you want to process. For example, if you created `p01.yaml`, `p02.yaml` and `p03.yaml` files in `/data/external/participant_files/ `, then `PIDS` should be:
```yaml
PIDS: [p01, p02, p03]
```
!!! tip
Attribute *values* of the `[PHONE]` and `[FITBIT]` sections in every participant file are optional which allows you to analyze data from participants that only carried smartphones, only Fitbit devices, or both.
??? hint "Optional: Migrating participants files with the old format"
If you were using the pre-release version of RAPIDS with participant files in plain text (as opposed to yaml), you can run the following command and your old files will be converted into yaml files stored in `data/external/participant_files/`
```bash
python tools/update_format_participant_files.py
```
### Structure of participants files
!!! example "Example of the structure of a participant file"
In this example, the participant used an android phone, an ios phone, and a fitbit device throughout the study between Apr 23rd 2020 and Oct 28th 2020
```yaml
PHONE:
DEVICE_IDS: [a748ee1a-1d0b-4ae9-9074-279a2b6ba524, dsadas-2324-fgsf-sdwr-gdfgs4rfsdf43]
PLATFORMS: [android,ios]
LABEL: test01
START_DATE: 2020-04-23
END_DATE: 2020-10-28
FITBIT:
DEVICE_IDS: [fitbit1]
LABEL: test01
START_DATE: 2020-04-23
END_DATE: 2020-10-28
```
**For `[PHONE]`**
| Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|-------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `[DEVICE_IDS]` | An array of the strings that uniquely identify each smartphone, you can have more than one for when participants changed phones in the middle of the study, in this case, data from all their devices will be joined and relabeled with the last 1 on this list. |
| `[PLATFORMS]` | An array that specifies the OS of each smartphone in `[DEVICE_IDS]` , use a combination of `android` or `ios` (we support participants that changed platforms in the middle of your study!). If you have an `aware_device` table in your database you can set `[PLATFORMS]: [multiple]` and RAPIDS will infer them automatically. |
| `[LABEL]` | A string that is used in reports and visualizations. |
| `[START_DATE]` | A string with format `YYY-MM-DD` . Only data collected *after* this date will be included in the analysis |
| `[END_DATE]` | A string with format `YYY-MM-DD` . Only data collected *before* this date will be included in the analysis |
**For `[FITBIT]`**
| Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|------------------|-----------------------------------------------------------------------------------------------------------|
| `[DEVICE_IDS]` | An array of the strings that uniquely identify each Fitbit, you can have more than one in case the participant changed devices in the middle of the study, in this case, data from all devices will be joined and relabeled with the last `device_id` on this list. |
| `[LABEL]` | A string that is used in reports and visualizations. |
| `[START_DATE]` | A string with format `YYY-MM-DD` . Only data collected *after* this date will be included in the analysis |
| `[END_DATE]` | A string with format `YYY-MM-DD` . Only data collected *before* this date will be included in the analysis |
### Automatic creation of participant files
You have two options a) use the `aware_device` table in your database or b) use a CSV file. In either case, in your `config.yaml`, set `[PHONE_SECTION][ADD]` or `[FITBIT_SECTION][ADD]` to `TRUE` depending on what devices you used in your study. Set `[DEVICE_ID_COLUMN]` to the name of the column that uniquely identifies each device and include any device ids you want to ignore in `[IGNORED_DEVICE_IDS]`.
=== "aware_device table"
Set the following keys in your `config.yaml`
```yaml
CREATE_PARTICIPANT_FILES:
SOURCE:
TYPE: AWARE_DEVICE_TABLE
DATABASE_GROUP: *database_group
CSV_FILE_PATH: ""
TIMEZONE: *timezone
PHONE_SECTION:
ADD: TRUE # or FALSE
DEVICE_ID_COLUMN: device_id # column name
IGNORED_DEVICE_IDS: []
FITBIT_SECTION:
ADD: TRUE # or FALSE
DEVICE_ID_COLUMN: fitbit_id # column name
IGNORED_DEVICE_IDS: []
```
Then run
```bash
snakemake -j1 create_participants_files
```
=== "CSV file"
Set the following keys in your `config.yaml`.
```yaml
CREATE_PARTICIPANT_FILES:
SOURCE:
TYPE: CSV_FILE
DATABASE_GROUP: ""
CSV_FILE_PATH: "your_path/to_your.csv"
TIMEZONE: *timezone
PHONE_SECTION:
ADD: TRUE # or FALSE
DEVICE_ID_COLUMN: device_id # column name
IGNORED_DEVICE_IDS: []
FITBIT_SECTION:
ADD: TRUE # or FALSE
DEVICE_ID_COLUMN: fitbit_id # column name
IGNORED_DEVICE_IDS: []
```
Your CSV file (`[SOURCE][CSV_FILE_PATH]`) should have the following columns but you can omit any values you don't have on each column:
| Column | Description |
|------------------|-----------------------------------------------------------------------------------------------------------|
| phone device id | The name of this column has to match `[PHONE_SECTION][DEVICE_ID_COLUMN]`. Separate multiple ids with `;` |
| fitbit device id | The name of this column has to match `[FITBIT_SECTION][DEVICE_ID_COLUMN]`. Separate multiple ids with `;` |
| pid | Unique identifiers with the format pXXX (your participant files will be named with this string |
| platform | Use `android`, `ios` or `multiple` as explained above, separate values with `;` |
| label | A human readable string that is used in reports and visualizations. |
| start_date | A string with format `YYY-MM-DD`. |
| end_date | A string with format `YYY-MM-DD`. |
!!! example
```csv
device_id,pid,label,platform,start_date,end_date,fitbit_id
a748ee1a-1d0b-4ae9-9074-279a2b6ba524;dsadas-2324-fgsf-sdwr-gdfgs4rfsdf43,p01,julio,android;ios,2020-01-01,2021-01-01,fitbit1
4c4cf7a1-0340-44bc-be0f-d5053bf7390c,p02,meng,ios,2021-01-01,2022-01-01,fitbit2
```
Then run
```bash
snakemake -j1 create_participants_files
```
---
## Time Segments
Time segments (or epochs) are the time windows on which you want to extract behavioral features. For example, you might want to process data on every day, every morning, or only during weekends. RAPIDS offers three categories of time segments that are flexible enough to cover most use cases: **frequency** (short time windows every day), **periodic** (arbitrary time windows on any day), and **event** (arbitrary time windows around events of interest). See also our [examples](#segment-examples).
=== "Frequency Segments"
These segments are computed on every day and all have the same duration (for example 30 minutes). Set the following keys in your `config.yaml`
```yaml
TIME_SEGMENTS: &time_segments
TYPE: FREQUENCY
FILE: "data/external/your_frequency_segments.csv"
INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE
```
The file pointed by `[TIME_SEGMENTS][FILE]` should have the following format and can only have 1 row.
| Column | Description |
|--------|----------------------------------------------------------------------|
| label | A string that is used as a prefix in the name of your time segments |
| length | An integer representing the duration of your time segments in minutes |
!!! example
```csv
label,length
thirtyminutes,30
```
This configuration will compute 48 time segments for every day when any data from any participant was sensed. For example:
```csv
start_time,length,label
00:00,30,thirtyminutes0000
00:30,30,thirtyminutes0001
01:00,30,thirtyminutes0002
01:30,30,thirtyminutes0003
...
```
=== "Periodic Segments"
These segments can be computed every day, or on specific days of the week, month, quarter, and year. Their minimum duration is 1 minute but they can be as long as you want. Set the following keys in your `config.yaml`.
```yaml
TIME_SEGMENTS: &time_segments
TYPE: PERIODIC
FILE: "data/external/your_periodic_segments.csv"
INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE # or TRUE
```
If `[INCLUDE_PAST_PERIODIC_SEGMENTS]` is set to `TRUE`, RAPIDS will consider instances of your segments back enough in the past as to include the first row of data of each participant. For example, if the first row of data from a participant happened on Saturday March 7th 2020 and the requested segment duration is 7 days starting on every Sunday, the first segment to be considered would start on Sunday March 1st if `[INCLUDE_PAST_PERIODIC_SEGMENTS]` is `TRUE` or on Sunday March 8th if `FALSE`.
The file pointed by `[TIME_SEGMENTS][FILE]` should have the following format and can have multiple rows.
| Column | Description |
|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| label | A string that is used as a prefix in the name of your time segments. It has to be **unique** between rows |
| start_time | A string with format `HH:MM:SS` representing the starting time of this segment on any day |
| length | A string representing the length of this segment.It can have one or more of the following strings **`XXD XXH XXM XXS`** to represent days, hours, minutes and seconds. For example `7D 23H 59M 59S` |
| repeats_on | One of the follow options `every_day`, `wday`, `qday`, `mday`, and `yday`. The last four represent a week, quarter, month and year day |
| repeats_value | An integer complementing `repeats_on`. If you set `repeats_on` to `every_day` set this to `0`, otherwise `1-7` represent a `wday` starting from Mondays, `1-31` represent a `mday`, `1-91` represent a `qday`, and `1-366` represent a `yday` |
!!! example
```csv
label,start_time,length,repeats_on,repeats_value
daily,00:00:00,23H 59M 59S,every_day,0
morning,06:00:00,5H 59M 59S,every_day,0
afternoon,12:00:00,5H 59M 59S,every_day,0
evening,18:00:00,5H 59M 59S,every_day,0
night,00:00:00,5H 59M 59S,every_day,0
```
This configuration will create five segments instances (`daily`, `morning`, `afternoon`, `evening`, `night`) on any given day (`every_day` set to 0). The `daily` segment will start at midnight and will last `23:59:59`, the other four segments will start at 6am, 12pm, 6pm, and 12am respectively and last for `05:59:59`.
=== "Event segments"
These segments can be computed before or after an event of interest (defined as any UNIX timestamp). Their minimum duration is 1 minute but they can be as long as you want. The start of each segment can be shifted backwards or forwards from the specified timestamp. Set the following keys in your `config.yaml`.
```yaml
TIME_SEGMENTS: &time_segments
TYPE: EVENT
FILE: "data/external/your_event_segments.csv"
INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE # or TRUE
```
The file pointed by `[TIME_SEGMENTS][FILE]` should have the following format and can have multiple rows.
| Column | Description |
|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| label | A string that is used as a prefix in the name of your time segments. If labels are unique, every segment is independent; if two or more segments have the same label, their data will be grouped when computing auxiliary data for features like the `most frequent contact` for calls (the most frequent contact will be computed across all these segments). There cannot be two *overlaping* event segments with the same label (RAPIDS will throw an error) |
| event_timestamp | A UNIX timestamp that represents the moment an event of interest happened (clinical relapse, survey, readmission, etc.). The corresponding time segment will be computed around this moment using `length`, `shift`, and `shift_direction` |
| length | A string representing the length of this segment. It can have one or more of the following keys `XXD XXH XXM XXS` to represent a number of days, hours, minutes, and seconds. For example `7D 23H 59M 59S` |
| shift | A string representing the time shift from `event_timestamp`. It can have one or more of the following keys `XXD XXH XXM XXS` to represent a number of days, hours, minutes and seconds. For example `7D 23H 59M 59S`. Use this value to change the start of a segment with respect to its `event_timestamp`. For example, set this variable to `1H` to create a segment that starts 1 hour from an event of interest (`shift_direction` determines if it's before or after). |
| shift_direction | An integer representing whether the `shift` is before (`-1`) or after (`1`) an `event_timestamp` |
|device_id| The device id (smartphone or fitbit) to whom this segment belongs to. You have to create a line in this event segment file for each event of a participant that you want to analyse. If you have participants with multiple device ids you can choose any of them|
!!! example
```csv
label,event_timestamp,length,shift,shift_direction,device_id
stress1,1587661220000,1H,5M,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress2,1587747620000,4H,4H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress3,1587906020000,3H,5M,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress4,1584291600000,7H,4H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress5,1588172420000,9H,5M,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
mood,1587661220000,1H,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
mood,1587747620000,1D,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
mood,1587906020000,7D,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
```
This example will create eight segments for a single participant (`a748ee1a...`), five independent `stressX` segments with various lengths (1,4,3,7, and 9 hours). Segments `stress1`, `stress3`, and `stress5` are shifted forwards by 5 minutes and `stress2` and `stress4` are shifted backwards by 4 hours (that is, if the `stress4` event happened on March 15th at 1pm EST (`1584291600000`), the time segment will start on that day at 9am and end at 4pm).
The three `mood` segments are 1 hour, 1 day and 7 days long and have no shift. In addition, these `mood` segments are grouped together, meaning that although RAPIDS will compute features on each one of them, some necessary information to compute a few of such features will be extracted from all three segments, for example the phone contact that called a participant the most or the location clusters visited by a participant.
### Segment Examples
=== "5-minutes"
Use the following `Frequency` segment file to create 288 (12 * 60 * 24) 5-minute segments starting from midnight of every day in your study
```csv
label,length
fiveminutes,5
```
=== "Daily"
Use the following `Periodic` segment file to create daily segments starting from midnight of every day in your study
```csv
label,start_time,length,repeats_on,repeats_value
daily,00:00:00,23H 59M 59S,every_day,0
```
=== "Morning"
Use the following `Periodic` segment file to create morning segments starting at 06:00:00 and ending at 11:59:59 of every day in your study
```csv
label,start_time,length,repeats_on,repeats_value
morning,06:00:00,5H 59M 59S,every_day,0
```
=== "Overnight"
Use the following `Periodic` segment file to create overnight segments starting at 20:00:00 and ending at 07:59:59 (next day) of every day in your study
```csv
label,start_time,length,repeats_on,repeats_value
morning,20:00:00,11H 59M 59S,every_day,0
```
=== "Weekly"
Use the following `Periodic` segment file to create **non-overlapping** weekly segments starting at midnight of every **Monday** in your study
```csv
label,start_time,length,repeats_on,repeats_value
weekly,00:00:00,6D 23H 59M 59S,wday,1
```
Use the following `Periodic` segment file to create **overlapping** weekly segments starting at midnight of **every day** in your study
```csv
label,start_time,length,repeats_on,repeats_value
weekly,00:00:00,6D 23H 59M 59S,every_day,0
```
=== "Week-ends"
Use the following `Periodic` segment file to create week-end segments starting at midnight of every **Saturday** in your study
```csv
label,start_time,length,repeats_on,repeats_value
weekend,00:00:00,1D 23H 59M 59S,wday,6
```
=== "Around surveys"
Use the following `Event` segment file to create two 2-hour segments that start 1 hour before surveys answered by 3 participants
```csv
label,event_timestamp,length,shift,shift_direction,device_id
survey1,1587661220000,2H,1H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
survey2,1587747620000,2H,1H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
survey1,1587906020000,2H,1H,-1,rqtertsd-43ff-34fr-3eeg-efe4fergregr
survey2,1584291600000,2H,1H,-1,rqtertsd-43ff-34fr-3eeg-efe4fergregr
survey1,1588172420000,2H,1H,-1,klj34oi2-8frk-2343-21kk-324ljklewlr3
survey2,1584291600000,2H,1H,-1,klj34oi2-8frk-2343-21kk-324ljklewlr3
```
---
## Device Data Source Configuration
You might need to modify the following config keys in your `config.yaml` depending on what devices your participants used and where you are storing your data. You can ignore `[PHONE_DATA_CONFIGURATION]` or `[FITBIT_DATA_CONFIGURATION]` if you are not working with either devices.
=== "Phone"
The relevant `config.yaml` section looks like this by default:
```yaml
PHONE_DATA_CONFIGURATION:
SOURCE:
TYPE: DATABASE
DATABASE_GROUP: *database_group
DEVICE_ID_COLUMN: device_id # column name
TIMEZONE:
TYPE: SINGLE # SINGLE (MULTIPLE support coming soon)
VALUE: *timezone
```
**Parameters for `[PHONE_DATA_CONFIGURATION]`**
| Key | Description |
|---------------------|----------------------------------------------------------------------------------------------------------------------------|
| `[SOURCE] [TYPE]` | Only `DATABASE` is supported (phone data will be pulled from a database) |
| `[SOURCE] [DATABASE_GROUP]` | `*database_group` points to the value defined before in [Database credentials](#database-credentials) |
| `[SOURCE] [DEVICE_ID_COLUMN]` | A column that contains strings that uniquely identify smartphones. For data collected with AWARE this is usually `device_id` |
| `[TIMEZONE] [TYPE]` | Only `SINGLE` is supported for now |
| `[TIMEZONE] [VALUE]` | `*timezone` points to the value defined before in [Timezone of your study](#timezone-of-your-study) |
=== "Fitbit"
The relevant `config.yaml` section looks like this by default:
```yaml
FITBIT_DATA_CONFIGURATION:
SOURCE:
TYPE: DATABASE # DATABASE or FILES (set each [FITBIT_SENSOR][TABLE] attribute with a table name or a file path accordingly)
COLUMN_FORMAT: JSON # JSON or PLAIN_TEXT
DATABASE_GROUP: *database_group
DEVICE_ID_COLUMN: device_id # column name
TIMEZONE:
TYPE: SINGLE # Fitbit devices don't support time zones so we read this data in the timezone indicated by VALUE
VALUE: *timezone
```
**Parameters for For `[FITBIT_DATA_CONFIGURATION]`**
| Key | Description |
|------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `[SOURCE]` `[TYPE]` | `DATABASE` or `FILES` (set each `[FITBIT_SENSOR]` `[TABLE]` attribute accordingly with a table name or a file path) |
| `[SOURCE]` `[COLUMN_FORMAT]` | `JSON` or `PLAIN_TEXT`. Column format of the source data. If you pulled your data directly from the Fitbit API the column containing the sensor data will be in `JSON` format |
| `[SOURCE]` `[DATABASE_GROUP]` | `*database_group` points to the value defined before in [Database credentials](#database-credentials). Only used if `[TYPE]` is `DATABASE` . |
| `[SOURCE]` `[DEVICE_ID_COLUMN]` | A column that contains strings that uniquely identify Fitbit devices. |
| `[TIMEZONE]` `[TYPE]` | Only `SINGLE` is supported (Fitbit devices always store data in local time). |
| `[TIMEZONE]` `[VALUE]` | `*timezone` points to the value defined before in [Timezone of your study](#timezone-of-your-study) |
---
## Sensor and Features to Process
Finally, you need to modify the `config.yaml` section of the sensors you want to extract behavioral features from. All sensors follow the same naming nomenclature (`DEVICE_SENSOR`) and parameter structure which we explain in the [Behavioral Features Introduction](../../features/feature-introduction/).
!!! done
Head over to [Execution](../execution/) to learn how to execute RAPIDS.

View File

@ -0,0 +1,36 @@
# Execution
After you have [installed](../installation) and [configured](../configuration) RAPIDS, use the following command to execute it.
```bash
./rapids -j1
```
!!! done "Ready to extract behavioral features"
If you are ready to extract features head over to the [Behavioral Features Introduction](../../features/feature-introduction/)
!!! info
The script `#!bash ./rapids` is a wrapper around Snakemake so you can pass any parameters that Snakemake accepts (e.g. `-j1`).
!!! hint "Updating RAPIDS output after modifying `config.yaml`"
Any changes to the `config.yaml` file will be applied automatically and only the relevant files will be updated. This means that after modifying the features list for `PHONE_MESSAGE` for example, RAPIDS will update the output file with the correct features.
!!! hint "Multi-core"
You can run RAPIDS over multiple cores by modifying the `-j` argument (e.g. use `-j8` to use 8 cores). **However**, take into account that this means multiple sensor datasets for different participants will be load in memory at the same time. If RAPIDS crashes because it ran out of memory reduce the number of cores and try again.
As reference, we have run RAPIDS over 12 cores and 32 Gb of RAM without problems for a study with 200 participants with 14 days of low-frequency smartphone data (no accelerometer, gyroscope, or magnetometer).
!!! hint "Forcing a complete rerun"
If you want to update your data from your database or rerun the whole pipeline from scratch run one or both of the following commands depending on the devices you are using:
```bash
./rapids -j1 -R download_phone_data
./rapids -j1 -R download_fitbit_data
```
!!! hint "Deleting RAPIDS output"
If you want to delete all the output files RAPIDS produces you can execute the following command:
```bash
./rapids -j1 --delete-all-output
```

View File

@ -0,0 +1,203 @@
# Installation
You can install RAPIDS using Docker (the fastest), or native instructions for MacOS and Ubuntu
=== "Docker"
1. Install [Docker](https://docs.docker.com/desktop/)
2. Pull our RAPIDS container
``` bash
docker pull agamk/rapids:latest`
```
3. Run RAPIDS\' container (after this step is done you should see a
prompt in the main RAPIDS folder with its python environment active)
``` bash
docker run -it agamk/rapids:latest
```
4. Pull the latest version of RAPIDS
``` bash
git pull
```
5. Make RAPIDS script executable
```bash
chmod +x rapids
```
6. Check that RAPIDS is working
``` bash
./rapids -j1
```
7. *Optional*. You can edit RAPIDS files with `vim` but we recommend using `Visual Studio Code` and its `Remote Containers` extension
??? info "How to configure Remote Containers extension"
- Make sure RAPIDS container is running
- Install the [Remote - Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers)
- Go to the `Remote Explorer` panel on the left hand sidebar
- On the top right dropdown menu choose `Containers`
- Double click on the `agamk/rapids` container in the`CONTAINERS` tree
- A new VS Code session should open on RAPIDS main folder insidethe container.
=== "MacOS"
We tested these instructions in Catalina
1. Install [brew](https://brew.sh/)
2. Install MySQL
``` bash
brew install mysql
brew services start mysql
```
3. Install R 4.0, pandoc and rmarkdown. If you have other instances of R, we recommend uninstalling them
``` bash
brew install r
brew install pandoc
Rscript --vanilla -e 'install.packages("rmarkdown", repos="http://cran.us.r-project.org")'
```
4. Install miniconda (restart your terminal afterwards)
``` bash
brew cask install miniconda
conda init zsh # (or conda init bash)
```
5. Clone our repo
``` bash
git clone https://github.com/carissalow/rapids
```
6. Create a python virtual environment
``` bash
cd rapids
conda env create -f environment.yml -n rapids
conda activate rapids
```
7. Install R packages and virtual environment:
``` bash
snakemake -j1 renv_install
snakemake -j1 renv_restore
```
!!! note
This step could take several minutes to complete, especially if you have less than 3Gb of RAM or packages need to be compiled from source. Please be patient and let it run until completion.
5. Make RAPIDS script executable
```bash
chmod +x rapids
```
8. Check that RAPIDS is working
``` bash
./rapids -j1
```
=== "Ubuntu"
We tested on Ubuntu 18.04 & 20.04
1. Install dependencies
``` bash
sudo apt install libcurl4-openssl-dev
sudo apt install libssl-dev
sudo apt install libxml2-dev
```
2. Install MySQL
``` bash
sudo apt install libmysqlclient-dev
sudo apt install mysql-server
```
3. Add key for R's repository.
``` bash
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
```
4. Add R's repository
1. For 18.04
``` bash
sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/'
```
1. For 20.04
``` bash
sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/'
```
5. Install R 4.0. If you have other instances of R, we recommend uninstalling them
``` bash
sudo apt update
sudo apt install r-base
```
6. Install Pandoc and rmarkdown
``` bash
sudo apt install pandoc
Rscript --vanilla -e 'install.packages("rmarkdown", repos="http://cran.us.r-project.org")'
```
7. Install git
``` bash
sudo apt install git
```
8. Install [miniconda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html)
9. Restart your current shell
10. Clone our repo:
``` bash
git clone https://github.com/carissalow/rapids
```
11. Create a python virtual environment:
``` bash
cd rapids
conda env create -f environment.yml -n MY_ENV_NAME
conda activate MY_ENV_NAME
```
7. Install R packages and virtual environment:
``` bash
snakemake -j1 renv_install
snakemake -j1 renv_restore
```
!!! note
This step could take several minutes to complete, especially if you have less than 3Gb of RAM or packages need to be compiled from source. Please be patient and let it run until completion.
5. Make RAPIDS script executable
```bash
chmod +x rapids
```
8. Check that RAPIDS is working
``` bash
./rapids -j1
```

View File

@ -0,0 +1,28 @@
@media screen and (min-width: 76.25em) {
.md-nav__item--section {
display: block;
margin: 1.75em 0;
}
.md-nav :not(.md-nav--primary) > .md-nav__list {
padding-left: 7px;
}
}
.md-nav__item .md-nav__link--active {
color: var(--md-typeset-a-color);
background-color: var(--md-code-bg-color);
}
div[data-md-component=announce] {
background-color: rgba(255,145,0,.1);
}
div[data-md-component=announce]>div#announce-msg{
color: var(--md-admonition-fg-color);
font-size: .8rem;
text-align: center;
margin: 15px;
}
div[data-md-component=announce]>div#announce-msg>a{
color: var(--md-typeset-a-color);
text-decoration: underline;
}

81
docs/team.md 100644
View File

@ -0,0 +1,81 @@
# RAPIDS Team
If you are interested in contributing feel free to submit a pull request or contact us.
## Core Team
### Julio Vega (Designer and Lead Developer)
??? abstract "About"
Julio Vega is a postdoctoral associate at the Mobile Sensing + Health Institute. He is interested in personalized methodologies to monitor chronic conditions that affect daily human behavior using mobile and wearable data.
- *vegaju* at *upmc* . *edu*
- [Personal Website](https://juliovega.info/)
### Meng Li
??? abstract "About"
Meng Li received her Master of Science degree in Information Science from the University of Pittsburgh. She is interested in applying machine learning algorithms to the medical field.
- *lim11* at *upmc* . *edu*
- [Linkedin Profile](https://www.linkedin.com/in/meng-li-57238414a)
- [Github Profile](https://github.com/Meng6)
### Abhineeth Reddy Kunta
??? abstract "About"
Abhineeth Reddy Kunta is a Senior Software Engineer with the Mobile Sensing + Health Institute. He is experienced in software development and specializes in building solutions using machine learning. Abhineeth likes exploring ways to leverage technology in advancing medicine and education. Previously he worked as a Computer Programmer at Georgia Department of Public Health. He has a master's degree in Computer Science from George Mason University.
### Kwesi Aguillera
??? abstract "About"
Kwesi Aguillera is currently in his first year at the University of Pittsburgh pursuing a Master of Sciences in Information Science specializing in Big Data Analytics. He received his Bachelor of Science degree in Computer Science and Management from the University of the West Indies. Kwesi considers himself a full stack developer and looks forward to applying this knowledge to big data analysis.
- [Linkedin Profile](https://www.linkedin.com/in/kwesi-aguillera-29529823)
### Echhit Joshi
??? abstract "About"
Echhit Joshi is a Masters student at the School of Computing and Information at University of Pittsburgh. His areas of interest are Machine/Deep Learning, Data Mining, and Analytics.
- [Linkedin Profile](https://www.linkedin.com/in/echhitjoshi/)
### Nicolas Leo
??? abstract "About"
Nicolas is a rising senior studying computer science at the University of Pittsburgh. His academic interests include databases, machine learning, and application development. After completing his undergraduate degree, he plans to attend graduate school for a MS in Computer Science with a focus on Intelligent Systems.
### Nikunj Goel
??? abstract "About"
Nik is a graduate student at the University of Pittsburgh pursuing Master of Science in Information Science. He earned his Bachelor of Technology degree in Information Technology from India. He is a Data Enthusiasts and passionate about finding the meaning out of raw data. In a long term, his goal is to create a breakthrough in Data Science and Deep Learning.
- [Linkedin Profile](https://www.linkedin.com/in/nikunjgoel95/)
## Community Contributors
### Agam Kumar
??? abstract "About"
Agam is a junior at Carnegie Mellon University studying Statistics and Machine Learning and pursuing an additional major in Computer Science. He is a member of the Data Science team in the Health and Human Performance Lab at CMU and has keen interests in software development and data science. His research interests include ML applications in medicine.
- [Linkedin Profile](https://www.linkedin.com/in/agam-kumar)
- [Github Profile](https://github.com/agam-kumar)
### Yasaman S. Sefidgar
??? abstract "About"
- [Linkedin Profile](https://www.linkedin.com/in/ysefidgar/)
## Advisors
### Afsaneh Doryab
??? abstract "About"
- [Personal Website](https://sites.google.com/view/afsanehdoryab)
### Carissa Low
??? abstract "About"
- [Profile](https://www.moshi.pitt.edu/people/carissa-low-phd)

View File

@ -1,49 +0,0 @@
.. _analysis-workflow-example:
Analysis Workflow Example
==========================
This is a quick guide for creating and running a simple pipeline to analysis an example dataset with 2 participants.
#. Install RAPIDS. See :ref:`Installation Section <install-page>`.
#. Configure your database credentials (see the example below or step 1 of :ref:`Usage Section <db-configuration>` for more information).
- Create an ``.env`` file at the root of RAPIDS folder
- Your MySQL user must have write permissions because we will restore our example database
- Name your credentials group ``MY_GROUP``.
- If you are trying to connect to a local MySQL server from our docker container set your host according to this link_.
- You can name your database any way you want, for example ``rapids_example``
.. code-block:: bash
[MY_GROUP]
user=rapids
password=rapids
host=127.0.0.1
port=3306
database=rapids_example
#. Make sure your conda environment is active (the environment is already active in our docker container). See step 6 of :ref:`install-page`.
#. If you installed RAPIDS from GitHub (did not use docker) you need to download the `example db backup <https://osf.io/skqfv/files/>`_ and save it to ``data/external/rapids_example.sql``.
#. Run the following command to restore database from ``rapids_example.sql`` file::
snakemake -j1 restore_sql_file
#. Create example participants files with the following command::
snakemake -j1 create_example_participant_files
#. Run the following command to analysis the example dataset.
- Execute over a single core::
snakemake -j1 --profile example_profile
- Execute over multiple cores (here, we use 8 cores)::
snakemake -j8 --profile example_profile
.. _link: https://stackoverflow.com/questions/24319662/from-inside-of-a-docker-container-how-do-i-connect-to-the-localhost-of-the-mach

View File

@ -1,182 +0,0 @@
Frequently Asked Questions
============================
1. Cannot connect to the MySQL server
"""""""""""""""""""""""""""""""""""""""
**Error in .local(drv, ...) :**
**Failed to connect to database: Error: Can't initialize character set unknown (path: compiled_in)**
::
Calls: dbConnect -> dbConnect -> .local -> .Call
Execution halted
[Tue Mar 10 19:40:15 2020]
Error in rule download_dataset:
jobid: 531
output: data/raw/p60/locations_raw.csv
RuleException:
CalledProcessError in line 20 of /home/ubuntu/rapids/rules/preprocessing.snakefile:
Command 'set -euo pipefail; Rscript --vanilla /home/ubuntu/rapids/.snakemake/scripts/tmp_2jnvqs7.download_dataset.R' returned non-zero exit status 1.
File "/home/ubuntu/rapids/rules/preprocessing.snakefile", line 20, in __rule_download_dataset
File "/home/ubuntu/anaconda3/envs/moshi-env/lib/python3.7/concurrent/futures/thread.py", line 57, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
**Solution:**
Please make sure the ``DATABASE_GROUP`` in ``config.yaml`` matches your DB credentials group in ``.env``.
2. Cannot start mysql in linux via ``brew services start mysql``
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Use the following command instead:
``mysql.server start``
3. Every time I run ``snakemake -R download_dataset`` all rules are executed
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
This is expected behavior. The advantage of using ``snakemake`` under the hood is that every time a file containing data is modified every rule that depends on that file will be re-executed to update their results. In this case, since ``download_dataset`` updates all the raw data, and you are forcing the rule with the flag ``-R`` every single rule that depends on those raw files will be executed.
4. Got an error ``Table XXX doesn't exist`` while running the download_dataset rule.
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
::
Error in .local(conn, statement, ...) :
could not run statement: Table 'db_name.table_name' doesn't exist
Calls: colnames ... .local -> dbSendQuery -> dbSendQuery -> .local -> .Call
Execution halted
**Solution:**
Please make sure the sensors listed in ``[PHONE_VALID_SENSED_BINS][TABLES]`` and each sensor section you activated in ``config.yaml`` match your database tables.
5. How do I install on Ubuntu 16.04
""""""""""""""""""""""""""""""""""""
#. Install dependencies (Homebrew - if not installed):
- ``sudo apt-get install libmariadb-client-lgpl-dev libxml2-dev libssl-dev``
- Install brew_ for linux and add the following line to ~/.bashrc: ``export PATH=$HOME/.linuxbrew/bin:$PATH``
- ``source ~/.bashrc``
#. Install MySQL
- ``brew install mysql``
- ``brew services start mysql``
#. Install R, pandoc and rmarkdown:
- ``brew install r``
- ``brew install gcc@6`` (needed due to this bug_)
- ``HOMEBREW_CC=gcc-6 brew install pandoc``
#. Install miniconda using these instructions_
#. Clone our repo:
- ``git clone https://github.com/carissalow/rapids``
#. Create a python virtual environment:
- ``cd rapids``
- ``conda env create -f environment.yml -n MY_ENV_NAME``
- ``conda activate MY_ENV_NAME``
#. Install R packages and virtual environment:
- ``snakemake renv_install``
- ``snakemake renv_init``
- ``snakemake renv_restore``
- This step could take several minutes to complete. Please be patient and let it run until completion.
#. See :ref:`Usage section <usage-section>`.
6. Configuration failed for package ``RMySQL``
""""""""""""""""""""""""""""""""""""""""""""""""
::
--------------------------[ ERROR MESSAGE ]----------------------------
<stdin>:1:10: fatal error: mysql.h: No such file or directory
compilation terminated.
-----------------------------------------------------------------------
ERROR: configuration failed for package 'RMySQL'
Run ``sudo apt install libmariadbclient-dev``
7. No package ``libcurl`` found
"""""""""""""""""""""""""""""""""
The ``libcurl`` needs to installed using the following command
Run ``sudo apt install libcurl4-openssl-dev``
8. Configuration failed because ``openssl`` was not found.
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Install the ``openssl`` library using the following command
Run ``sudo apt install libssl-dev``
9. Configuration failed because ``libxml-2.0`` was not found
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Install the ``xml`` library using the following command
Run ``sudo apt install libxml2-dev``
10. SSL connection error when running RAPIDS
""""""""""""""""""""""""""""""""""""""""""""""
You are getting the following error message when running RAPIDS:
``Error: Failed to connect: SSL connection error: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol``.
This is a bug in Ubuntu 20.04 when trying to connect to an old MySQL server with MySQL client 8.0. You should get the same error message if you try to connect from the command line. There you can add the option ``--ssl-mode=DISABLED`` but we can't do this from the R connector.
If you can't update your server, the quickest solution would be to import your database to another server or to a local environment. Alternatively, you could replace ``mysql-client`` and ``libmysqlclient-dev`` with ``mariadb-client`` and ``libmariadbclient-dev`` and reinstall renv. More info about this issue here https://bugs.launchpad.net/ubuntu/+source/mysql-8.0/+bug/1872541
11. ``DB_TABLES`` key not found
""""""""""""""""""""""""""""""""
If you get the following error ``KeyError in line 43 of preprocessing.smk: 'DB_TABLES'``, means that the indentation of the key ``DB_TABLES`` is not matching the other child elements of ``PHONE_VALID_SENSED_BINS`` and you need to add or remove any leading whitespaces as needed.
::
PHONE_VALID_SENSED_BINS:
COMPUTE: False # This flag is automatically ignored (set to True) if you are extracting PHONE_VALID_SENSED_DAYS or screen or Barnett's location features
BIN_SIZE: &bin_size 5 # (in minutes)
# Add as many sensor tables as you have, they all improve the computation of PHONE_VALID_SENSED_BINS and PHONE_VALID_SENSED_DAYS.
# If you are extracting screen or Barnett's location features, screen and locations tables are mandatory.
DB_TABLES: []
12. Error while updating your conda environment in Ubuntu
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
If you get the following error try reinstalling conda.
::
CondaMultiError: CondaVerificationError: The package for tk located at /home/ubuntu/miniconda2/pkgs/tk-8.6.9-hed695b0_1003
appears to be corrupted. The path 'include/mysqlStubs.h'
specified in the package manifest cannot be found.
ClobberError: This transaction has incompatible packages due to a shared path.
packages: conda-forge/linux-64::llvm-openmp-10.0.0-hc9558a2_0, anaconda/linux-64::intel-openmp-2019.4-243
path: 'lib/libiomp5.so'
.. ------------------------ Links --------------------------- ..
.. _bug: https://github.com/Homebrew/linuxbrew-core/issues/17812
.. _instructions: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html
.. _brew: https://docs.brew.sh/Homebrew-on-Linux

View File

@ -1,209 +0,0 @@
.. _install-page:
Installation
===============
These instructions have been tested on macOS (Catalina and Mojave) and Ubuntu 16.04. If you find a problem, please create a GitHub issue or contact us. If you want to test RAPIDS quickly try our docker image or follow the Linux instructions on a virtual machine.
Docker (the fastest and easiest way)
------------------------------------
#. Install docker
#. Pull RAPIDS' container
``docker pull agamk/rapids:latest``
#. Run RAPIDS' container (after this step is done you should see a prompt in the main RAPIDS folder with its python environment active)
``docker run -it agamk/rapids:latest``
#. Pull the latest version of RAPIDS
``git pull``
#. Optional. You can start editing files with vim but we recommend using Visual Studio Code and its Remote extension
- Make sure RAPIDS container is running
- Install the Remote - Containers extension: https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers
- Go to the ``Remote Explorer`` panel on the left hand sidebar
- On the top right dropdown menu choose ``Containers``
- Double click on the ``agamk/rapids`` container in the ``CONTAINERS`` tree
- A new VS Code session should open on RAPIDS main folder inside the container.
#. See Usage section below.
macOS (tested on Catalina 10.15)
--------------------------------
#. Install dependencies (Homebrew if not installed):
- Install brew_ for Mac: ``/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"``
#. Install MySQL
- ``brew install mysql``
- ``brew services start mysql``
#. Install R 4.0, pandoc and rmarkdown. If you have other instances of R, we recommend uninstalling them.
- ``brew install r``
- ``brew install pandoc``
- ``Rscript --vanilla -e 'install.packages("rmarkdown", repos="http://cran.us.r-project.org")'``
#. Install miniconda:
- ``brew cask install miniconda``
- ``conda init zsh`` or ``conda init bash``
- Restart terminal if necessary
#. Clone our repo:
- ``git clone https://github.com/carissalow/rapids``
#. Create a python virtual environment:
- ``cd rapids``
- ``conda env create -f environment.yml -n rapids``
- ``conda activate rapids``
#. Install R packages and virtual environment:
- ``snakemake -j1 renv_install``
- ``snakemake -j1 renv_restore``
- This step could take several minutes to complete, especially if you have less than 3Gb of RAM or packages need to be compiled from source. Please be patient and let it run until completion.
#. See Usage section below.
Linux (tested on Ubuntu 18.04 & 20.04)
---------------------------------------
#. Install dependencies :
- ``sudo apt install libcurl4-openssl-dev``
- ``sudo apt install libssl-dev``
- ``sudo apt install libxml2-dev``
#. Install MySQL
- ``sudo apt install libmysqlclient-dev``
- ``sudo apt install mysql-server``
#. Install R 4.0 . If you have other instances of R, we recommend uninstalling them.
- ``sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9``
- Add R's repository:
- For 18.04 do: ``sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/'``
- For 20.04 do: ``sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/'``
- ``sudo apt update``
- ``sudo apt install r-base``
#. Install Pandoc and rmarkdown
- ``sudo apt install pandoc``
- ``Rscript --vanilla -e 'install.packages("rmarkdown", repos="http://cran.us.r-project.org")'``
#. Install GIT
- ``sudo apt install git``
#. Install miniconda using these instructions_
#. Restart your current shell
#. Clone our repo:
- ``git clone https://github.com/carissalow/rapids``
#. Create a python virtual environment:
- ``cd rapids``
- ``conda env create -f environment.yml -n MY_ENV_NAME``
- ``conda activate MY_ENV_NAME``
#. Install R packages and virtual environment:
- ``snakemake -j1 renv_install``
- ``snakemake -j1 renv_restore``
- This step could take several minutes to complete, especially if you have less than 3Gb of RAM or packages need to be compiled from source. Please be patient and let it run until completion.
#. See Usage section below.
.. _usage-section:
Usage
======
Once RAPIDS is installed, follow these steps to start processing mobile data.
.. _db-configuration:
#. Configure the database connection:
- Create an empty file called `.env` in the root directory (``rapids/``)
- Add the following lines and replace your database-specific credentials (user, password, host, and database):
.. code-block:: bash
[MY_GROUP]
user=MY_USER
password=MY_PASSWORD
host=MY_HOST
port=3306
database=MY_DATABASE
.. note::
``MY_GROUP`` is a custom label for your credentials. It has to match ``DATABASE_GROUP`` in the ``config.yaml`` file_. It is not related to your database configuration.
#. Setup the participants' devices whose data you want to analyze, for this you have two options:
#. **Automatically**. You can automatically include all devices that are stored in the ``aware_device`` table. If you want to control what devices and dates are included, see the Manual configuration::
snakemake -j1 download_participants
#. **Manually**. Create one file per participant in the ``rapids/data/external/`` directory. The file should NOT have an extension (i.e., no .txt). The name of the file will become the label for that participant in the pipeline.
- The first line of the file should be the Aware ``device_id`` for that participant. If one participant has multiple device_ids (i.e. Aware had to be re-installed), add all device_ids separated by commas.
- The second line should list the device's operating system (``android`` or ``ios``). If a participant used more than one device (i.e., the participant changed phones and/or platforms mid-study) you can a) list each platform matching the order of the first line (``android,ios``), b) use ``android`` or ``ios`` if all phones belong to the same platform, or c) if you have an ``aware_device`` table in your database, set this line to ``multiple`` and RAPIDS will infer the multiple platforms automatically.
- The third line is an optional human-friendly label that will appear in any plots for that participant.
- The fourth line is optional and contains a start and end date separated by a comma ``YYYYMMDD,YYYYMMDD`` (e.g., ``20201301,20202505``). If these dates are specified, only data within this range will be processed, otherwise, all data from the device(s) will be used.
For example, let's say participant `p01` had two AWARE device_ids and they were running Android between February 1st 2020 and March 3rd 2020. Their participant file would be named ``p01`` and contain:
.. code-block:: bash
3a7b0d0a-a9ce-4059-ab98-93a7b189da8a,44f20139-50cc-4b13-bdde-0d5a3889e8f9
android
Participant01
2020/02/01,2020/03/03
#. Choose what features to extract:
- See :ref:`Minimal Working Example<minimal-working-example>`.
#. Execute RAPIDS
- Standard execution over a single core::
snakemake -j1
- Standard execution over multiple cores::
snakemake -j8
- Force a rule (useful if you modify your code and want to update its results)::
snakemake -j1 -R RULE_NAME
.. _bug: https://github.com/Homebrew/linuxbrew-core/issues/17812
.. _instructions: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html
.. _brew: https://docs.brew.sh/Homebrew-on-Linux
.. _AWARE: https://awareframework.com/what-is-aware/
.. _file: https://github.com/carissalow/rapids/blob/master/config.yaml#L22

View File

@ -1,44 +0,0 @@
Quick Introduction
==================
The goal of this pipeline is to standardize the data cleaning, feature extraction, analysis, and evaluation of mobile sensing projects. It leverages Conda_, Cookiecutter_, SciPy_, Snakemake_, Sphinx_, and R_ to create an end-to-end reproducible environment that can be published along with research papers.
At the moment, mobile data can be collected using different sensing frameworks (AWARE_, Beiwe_) and hardware (Fitbit_). The pipeline is agnostic to these data sources and can unify their analysis. The current implementation only handles data collected with AWARE_ and Fitbit_. However, it can be easily extended to other providers.
We recommend reading Snakemake_ docs, but the main idea behind the pipeline is that every link in the analysis chain is a rule with an input and an output. Input and output are files, which can be manipulated using any programming language (although Snakemake_ has wrappers for Julia_, Python_, and R_ that can make development slightly more comfortable). Snakemake_ also allows the pipeline rules to be executed in parallel on multiple cores without any code changes. This can drastically reduce the time needed to complete an analysis.
Do you want to keep up to date with new functionality or have a question? Join the #rapids channel in AWARE Framework's slack_
Available features:
- :ref:`accelerometer-sensor-doc`
- :ref:`applications-foreground-sensor-doc`
- :ref:`battery-sensor-doc`
- :ref:`bluetooth-sensor-doc`
- :ref:`wifi-sensor-doc`
- :ref:`call-sensor-doc`
- :ref:`activity-recognition-sensor-doc`
- :ref:`light-doc`
- :ref:`location-sensor-doc`
- :ref:`screen-sensor-doc`
- :ref:`messages-sensor-doc`
- :ref:`fitbit-sleep-sensor-doc`
- :ref:`fitbit-heart-rate-sensor-doc`
- :ref:`fitbit-steps-sensor-doc`
We are updating these docs constantly, but if you think something needs clarification, feel free to reach out or submit a pull request on GitHub.
.. _Conda: https://docs.conda.io/en/latest/
.. _Cookiecutter: http://drivendata.github.io/cookiecutter-data-science/
.. _SciPy: https://www.scipy.org/index.html
.. _Snakemake: https://snakemake.readthedocs.io/en/stable/
.. _Sphinx: https://www.sphinx-doc.org/en/master/
.. _R: https://www.r-project.org/
.. _AWARE: https://awareframework.com/what-is-aware/
.. _Beiwe: https://www.beiwe.org/
.. _Fitbit: https://www.fitbit.com/us/home
.. _Python: https://www.python.org/
.. _Julia: https://julialang.org/
.. _slack: http://awareframework.com:3000/

View File

@ -1,42 +0,0 @@
.. _minimal-working-example:
Minimal Working Example
========================
This is a quick guide for creating and running a simple pipeline to extract call features for daily and night epochs of one participant monitored on the US East coast.
#. Make sure your database connection credentials in ``.env`` are correct. See step 1 of :ref:`Usage Section <db-configuration>`.
#. Create at least one participant file ``p01`` under ``data/external/``. See step 2 of :ref:`Usage Section <db-configuration>`.
#. Make sure your Conda (python) environment is active. See step 6 of :ref:`install-page`.
#. Modify the following settings in the ``config.yaml`` file with the values shown below (leave all other settings as they are)
::
PIDS: [p01]
DAY_SEGMENTS: &day_segments
[daily, night]
TIMEZONE: &timezone
America/New_York
DATABASE_GROUP: &database_group
MY_GROUP (change this if you added your DB credentials to .env with a different label)
CALLS:
COMPUTE: True
DB_TABLE: calls (only change DB_TABLE if your database calls table has a different name)
For more information on the ``calls`` sensor see :ref:`call-sensor-doc`
#. Run the following command to execute RAPIDS
::
snakemake -j1
#. Daily and night call metrics will be found in files under the ``data/processed/p01/`` directory.

View File

@ -1,238 +0,0 @@
.. _rapids-structure:
RAPIDS Structure
=================
.. _the-config-file:
The ``config.yaml`` File
------------------------
RAPIDS configuration settings are defined in ``config.yaml`` (See `config.yaml`_). This is the only file that you need to understand in order to compute the features that RAPIDS ships with.
It has global settings like ``PIDS``, ``DAY_SEGMENTS``, among others (see :ref:`global-sensor-doc` for more information). As well as per sensor settings, for example, for the :ref:`messages-sensor-doc`::
| ``MESSAGES:``
| ``COMPUTE: True``
| ``DB_TABLE: messages``
| ``...``
.. _the-snakefile-file:
The ``Snakefile`` File
----------------------
The ``Snakefile`` file (see the actual `Snakefile`_) pulls the entire system together. The first line in this file identifies the configuration file. Next are a list of included directives that import the rules used to pull, clean, process, analyze and report data. It compiles the list of ``files_to_compute`` by scaning the config file looking for the sensors with a ``COMPUTE`` flag equal to ``True``. Then, the ``all`` rule is called with this list which prompts Snakemake to exectue the pipeline (raw files, intermediate files, feature files, reports, etc).
.. _includes-section:
Includes
"""""""""
There are 5 included files in the ``Snakefile`` file.
- ``renv.smk`` - Rules to create, backup and restore the R renv virtual environment for RAPIDS. (See `renv`_)
- ``preprocessing.smk`` - Rules that are used to pre-preprocess the data such as downloading, cleaning and formatting. (See `preprocessing`_)
- ``features.smk`` - Rules that used for behavioral feature extraction. (See `features`_)
- ``models.smk`` - Rules that are used to build models from features that have been extreacted from the sensor data. (See `models`_)
- ``reports.smk`` - Rules that are used to produce reports and visualizations. (See `reports`_)
Includes are relative to the root directory.
.. _rule-all-section:
``Rule all:``
"""""""""""""
In RAPIDS the ``all`` rule lists the output files we expect the pipeline to compute. Before the ``all`` rule is called snakemake checks the ``config.yaml`` and adds all the rules for which the sensors ``COMPUTE`` parameter is ``True``. The ``expand`` function allows us to generate a list of file paths that have a common structure except for PIDS or other parameters. Consider the following::
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["MESSAGES"]["DB_TABLE"]))
If ``pids = ['p01','p02']`` and ``config["MESSAGES"]["DB_TABLE"] = messages`` then the above directive would produce::
["data/raw/p01/messages_raw.csv", "data/raw/p02/messages_raw.csv"]
Thus, this allows us to define all the desired output files without having to manually list each path for every participant and every sensor. The way Snakemake works is that it looks for the rule that produces the desired output files and then executes that rule. For more information on ``expand`` see `The Expand Function`_
.. _the-env-file:
The ``.env`` File
-------------------
Your database credentials are stored in the ``.env`` file (See :ref:`install-page`)::
[MY_GROUP_NAME]
user=MyUSER
password=MyPassword
host=MyIP/DOMAIN
port=3306
.. _rules-syntax:
The ``Rules`` Directory
------------------------
The ``rules`` directory contains the ``snakefiles`` that were included in the main ``Snakefile`` file. A short description of these files are given in the :ref:`includes-section` section.
Rules
""""""
A Snakemake workflow is defined by rules (See the features_ snakefile as an actual example). Rules decompose the workflow into small steps by specifying what output files should be created by running a script on a set of input files. Snakemake automatically determines the dependencies between the rules by matching file names. Thus, a rule can consist of a name, input files, output files, and a command to generate the output from the input. The following is the basic structure of a Snakemake rule::
rule NAME:
input: "path/to/inputfile", "path/to/other/inputfile"
output: "path/to/outputfile", "path/to/another/outputfile"
script: "path/to/somescript.R"
A sample rule from the RAPIDS source code is shown below::
rule messages_features:
input:
expand("data/raw/{{pid}}/{sensor}_with_datetime.csv", sensor=config["MESSAGES"]["DB_TABLE"])
params:
messages_type = "{messages_type}",
day_segment = "{day_segment}",
features = lambda wildcards: config["MESSAGES"]["FEATURES"][wildcards.messages_type]
output:
"data/processed/{pid}/messages_{messages_type}_{day_segment}.csv"
script:
"../src/features/messages_features.R"
The ``rule`` directive specifies the name of the rule that is being defined. ``params`` defines additional parameters for the rule's script. In the example above, the parameters are passed to the ``messages_features.R`` script as an dictionary. Instead of ``script`` a ``shell`` command call can also be called by replacing the ``script`` directive of the rule and replacing it with::
shell: "somecommand {input} {output}"
It should be noted that rules can be defined without input and output as seen in the ``renv.snakemake``. For more information see `Rules documentation`_ and for an actual example see the `renv`_ snakefile.
.. _wildcards:
Wildcards
""""""""""
There are times when the same rule should be applied to different participants and day segments. For this we use wildcards ``{my_wildcard}``. All wildcards are inferred from the files listed in the ``all` rule of the ``Snakefile`` file and therefore from the output of any rule::
rule messages_features:
input:
expand("data/raw/{{pid}}/{sensor}_with_datetime.csv", sensor=config["MESSAGES"]["DB_TABLE"])
params:
messages_type = "{messages_type}",
day_segment = "{day_segment}",
features = lambda wildcards: config["MESSAGES"]["FEATURES"][wildcards.messages_type]
output:
"data/processed/{pid}/messages_{messages_type}_{day_segment}.csv"
script:
"../src/features/messages_features.R"
If the rules output matches a requested file, the substrings matched by the wildcards are propagated to the input and params directives. For example, if another rule in the workflow requires the file ``data/processed/p01/messages_sent_daily.csv``, Snakemake recognizes that the above rule is able to produce it by setting ``pid=p01``, ``messages_type=sent`` and ``day_segment=daily``. Thus, it requests the input file ``data/raw/p01/messages_with_datetime.csv`` as input, sets ``messages_type=sent``, ``day_segment=daily`` in the ``params`` directive and executes the script. ``../src/features/messages_features.R``. See the preprocessing_ snakefile for an actual example.
.. _the-data-directory:
The ``data`` Directory
-----------------------
This directory contains the data files for the project. These directories are as follows:
- ``external`` - This directory stores the participant `pxxx` files as well as data from third party sources (see :ref:`install-page` page).
- ``raw`` - This directory contains the original, immutable data dump from your database.
- ``interim`` - This directory contains intermediate data that has been transformed but do not represent features.
- ``processed`` - This directory contains all behavioral features.
.. _the-src-directory:
The ``src`` Directory
----------------------
The ``src`` directory holds all the scripts used by the pipeline for data manipulation. These scripts can be in any programming language including but not limited to Python_, R_ and Julia_. This directory is organized into the following directories:
- ``data`` - This directory contains scripts that are used to download and preprocess raw data that will be used in analysis. See `data directory`_
- ``features`` - This directory contains scripts to extract behavioral features. See `features directory`_
- ``models`` - This directory contains the scripts for building and training models. See `models directory`_
- ``visualization`` - This directory contains the scripts to create plots and reports. See `visualization directory`_
.. _RAPIDS_directory_structure:
::
├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── config.yaml <- The configuration settings for the pipeline.
├── environment.yml <- Environmental settings - channels and dependences that are installed in the env)
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── docs <- A default Sphinx project; see sphinx-doc.org for details
├── models <- Trained and serialized models, model predictions, or model summaries
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
`1.0-jqp-initial-data-exploration`.
├── packrat <- Installed R dependences. (Packrat is a dependency management system for R)
│ (Depreciated - replaced by renv)
├── references <- Data dictionaries, manuals, and all other explanatory materials.
├── renv.lock <- List of R packages and dependences for that are installed for the pipeline.
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting.
├── rules
│ ├── features <- Rules to process the feature data pulled in to pipeline.
│ ├── models <- Rules for building models.
│ ├── mystudy <- Rules added by you that are specifically tailored to your project/study.
│ ├── packrat <- Rules for setting up packrat. (Depreciated replaced by renv)
│ ├── preprocessing <- Preprocessing rules to clean data before processing.
│ ├── renv <- Rules for setting up renv and R packages.
│ └── reports <- Snakefile used to produce reports.
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported
├── Snakemake <- The root snakemake file (the equivalent of a Makefile)
├── src <- Source code for use in this project. Can be in any language e.g. Python,
│ │ R, Julia, etc.
│ │
│ ├── data <- Scripts to download or generate data. Can be in any language e.g. Python,
│ │ R, Julia, etc.
│ │
│ ├── features <- Scripts to turn raw data into features for modeling. Can be in any language
│ │ e.g. Python, R, Julia, etc.
│ │
│ ├── models <- Scripts to train models and then use trained models to make prediction. Can
│ │ be in any language e.g. Python, R, Julia, etc.
│ │
│ └── visualization <- Scripts to create exploratory and results oriented visualizations. Can be
│ in any language e.g. Python, R, Julia, etc.
├── tests
│ ├── data <- Replication of the project root data directory for testing.
│ ├── scripts <- Scripts for testing.
│ ├── settings <- The config and settings files for running tests.
│ └── Snakefile <- The Snakefile for testing only.
└── tox.ini <- tox file with settings for running tox; see tox.testrun.org
.. _Python: https://www.python.org/
.. _Julia: https://julialang.org/
.. _R: https://www.r-project.org/
.. _`List of Timezone`: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
.. _`The Expand Function`: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#the-expand-function
.. _`example snakefile`: https://github.com/carissalow/rapids/blob/master/rules/features.snakefile
.. _renv: https://github.com/carissalow/rapids/blob/master/rules/renv.snakefile
.. _preprocessing: https://github.com/carissalow/rapids/blob/master/rules/preprocessing.snakefile
.. _features: https://github.com/carissalow/rapids/blob/master/rules/features.snakefile
.. _models: https://github.com/carissalow/rapids/blob/master/rules/models.snakefile
.. _reports: https://github.com/carissalow/rapids/blob/master/rules/reports.snakefile
.. _mystudy: https://github.com/carissalow/rapids/blob/master/rules/mystudy.snakefile
.. _`Rules documentation`: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#rules
.. _`data directory`: https://github.com/carissalow/rapids/tree/master/src/data
.. _`features directory`: https://github.com/carissalow/rapids/tree/master/src/features
.. _`models directory`: https://github.com/carissalow/rapids/tree/master/src/models
.. _`visualization directory`: https://github.com/carissalow/rapids/tree/master/src/visualization
.. _`config.yaml`: https://github.com/carissalow/rapids/blob/master/config.yaml
.. _`Snakefile`: https://github.com/carissalow/rapids/blob/master/Snakefile

View File

@ -1,216 +0,0 @@
.. _data_exploration:
Data Exploration
================
These plots are in beta, if you get an error while computing them please let us know.
.. _histogram-of-valid-sensed-hours:
Histogram of valid sensed hours
"""""""""""""""""""""""""""""""
See `Histogram of Valid Sensed Hours Config Code`_
**Rule Chain:**
- Rule: ``rules/preprocessing.smk/download_dataset``
- Rule: ``rules/preprocessing.smk/readable_datetime``
- Rule: ``rules/preprocessing.smk/phone_sensed_bins``
- Rule: ``rules/preprocessing.smk/phone_valid_sensed_days``
- Rule: ``rules/reports.smk/histogram_valid_sensed_hours``
.. _figure1-parameters:
**Parameters of histogram_valid_sensed_hours Rule:**
======================= =======================
Name Description
======================= =======================
plot Whether the rule is executed or not. The available options are ``True`` and ``False``.
min_valid_bins_per_hour The minimum valid bins an hour should have to be considered valid. A valid bin has at least 1 row of data. It modifies the way we compute phone valid days. Read :ref:`PHONE_VALID_SENSED_BINS<phone-valid-sensed-bins>` for more information.
min_valid_hours_per_day The minimum valid hours a day should have to be considered valid. It modifies the way we compute phone valid days. Read :ref:`PHONE_VALID_SENSED_DAYS<phone-valid-sensed-days>` for more information.
======================= =======================
**Observations:**
This histogram shows the valid sensed hours of all participants processed in RAPIDS (See valid sensed :ref:`bins<phone-valid-sensed-bins>` and :ref:`days<phone-valid-sensed-days>` sections). It can be used as a rough indication of the AWARE client monitoring coverage during a study for all participants. See Figure 1.
.. figure:: figures/Figure1.png
:scale: 90 %
:align: center
Figure 1 Histogram of valid sensed hours for all participants
.. _heatmap-of-phone-sensed-bins:
Heatmap of phone sensed bins
""""""""""""""""""""""""""""
See `Heatmap of Phone Sensed Bins Config Code`_
**Rule Chain:**
- Rule: ``rules/preprocessing.smk/download_dataset``
- Rule: ``rules/preprocessing.smk/readable_datetime``
- Rule: ``rules/preprocessing.smk/phone_sensed_bins``
- Rule: ``rules/reports.smk/heatmap_sensed_bins``
.. _figure2-parameters:
**Parameters of heatmap_sensed_bins Rule:**
======================= =======================
Name Description
======================= =======================
plot Whether the rule is executed or not. The available options are ``True`` and ``False``.
bin_size Every hour is divided into N bins of size ``BIN_SIZE`` (in minutes). It modifies the way we compute ``data/interim/pXX/phone_sensed_bins.csv`` file.
======================= =======================
**Observations:**
In this heatmap rows are dates, columns are sensed bins for a participant, and cells color shows the number of mobile sensors that logged at least one row of data during that bin. This plot shows the periods of time without data for a participant and can be used as a rough indication of whether time-based sensors were following their sensing schedule (e.g. if location was being sensed every 2 minutes). See Figure 2.
.. figure:: figures/Figure2.png
:scale: 90 %
:align: center
Figure 2 Heatmap of phone sensed bins for a single participant
.. _heatmap-of-days-by-sensors
Heatmap of days by sensors
""""""""""""""""""""""""""
See `Heatmap of Days by Sensors Config Code`_
**Rule Chain:**
- Rule: ``rules/preprocessing.smk/download_dataset``
- Rule: ``rules/preprocessing.smk/readable_datetime``
- Rule: ``rules/preprocessing.smk/phone_sensed_bins``
- Rule: ``rules/preprocessing.smk/phone_valid_sensed_days``
- Rule: ``rules/reports.smk/heatmap_days_by_sensors``
.. _figure3-parameters:
**Parameters of heatmap_days_by_sensors Rule:**
======================= =======================
Name Description
======================= =======================
plot Whether the rule is executed or not. The available options are ``True`` and ``False``.
min_valid_bins_per_hour The minimum valid bins an hour should have to be considered valid. A valid bin has at least 1 row of data. It modifies the way we compute phone valid days. Read :ref:`PHONE_VALID_SENSED_BINS<phone-valid-sensed-bins>` for more information.
min_valid_hours_per_day The minimum valid hours a day should have to be considered valid. It modifies the way we compute phone valid days. Read :ref:`PHONE_VALID_SENSED_DAYS<phone-valid-sensed-days>` for more information.
expected_num_of_days The number of days of data to show starting from the first day of each participant.
db_tables List of sensor tables to compute valid bins & hours.
======================= =======================
**Observations:**
In this heatmap rows are sensors, columns are days and cells color shows the normalized (0 to 1) number of valid sensed hours (See valid sensed :ref:`bins<phone-valid-sensed-bins>` and :ref:`days<phone-valid-sensed-days>` sections) collected by a sensor during a day for a participant. The user can decide how many days of data to show starting from the first day of each participant. This plot can used to judge missing data on a per participant, per sensor basis as well as the number of valid sensed hours (usable data) for each day. See Figure 3.
.. figure:: figures/Figure3.png
:scale: 90 %
:align: center
Figure 3 Heatmap of days by sensors for a participant
.. _overall-compliance-heatmap
Overall compliance heatmap
""""""""""""""""""""""""""
See `Overall Compliance Heatmap Config Code`_
**Rule Chain:**
- Rule: ``rules/preprocessing.smk/download_dataset``
- Rule: ``rules/preprocessing.smk/readable_datetime``
- Rule: ``rules/preprocessing.smk/phone_sensed_bins``
- Rule: ``rules/preprocessing.smk/phone_valid_sensed_days``
- Rule: ``rules/reports.smk/overall_compliance_heatmap``
.. _figure4-parameters:
**Parameters of overall_compliance_heatmap Rule:**
======================= =======================
Name Description
======================= =======================
plot Whether the rule is executed or not. The available options are ``True`` and ``False``.
only_show_valid_days Whether the plot only shows valid days or not. The available options are ``True`` and ``False``.
expected_num_of_days The number of days to show before today.
bin_size Every hour is divided into N bins of size ``BIN_SIZE`` (in minutes). It modifies the way we compute ``data/interim/pXX/phone_sensed_bins.csv`` file.
min_valid_bins_per_hour The minimum valid bins an hour should have to be considered valid. A valid bin has at least 1 row of data. It modifies the way we compute phone valid days. Read :ref:`PHONE_VALID_SENSED_BINS<phone-valid-sensed-bins>` for more information.
min_valid_hours_per_day The minimum valid hours a day should have to be considered valid. It modifies the way we compute phone valid days. Read :ref:`PHONE_VALID_SENSED_DAYS<phone-valid-sensed-days>` for more information.
======================= =======================
**Observations:**
In this heatmap rows are participants, columns are days and cells color shows the valid sensed hours for a participant during a day (See valid sensed :ref:`bins<phone-valid-sensed-bins>` and :ref:`days<phone-valid-sensed-days>` sections). This plot can be configured to show a certain number of days before today using the ``EXPECTED_NUM_OF_DAYS`` parameter (by default -1 showing all days for every participant). As different participants might join the study on different dates, the x-axis has a day index instead of a date. This plot gives the user a quick overview of the amount of data collected per person and is complementary to the histogram of valid sensed hours as it is broken down per participant and per day. See Figure 4.
.. figure:: figures/Figure4.png
:scale: 90 %
:align: center
Figure 4 Overall compliance heatmap for all participants
.. _heatmap-of-correlation-matrix-between-features
Heatmap of correlation matrix between features
""""""""""""""""""""""""""""""""""""""""""""""
See `Heatmap of Correlation Matrix Config Code`_
**Rule Chain:**
- Rules to extract features
- Rule: ``rules/preprocessing.smk/download_dataset``
- Rule: ``rules/preprocessing.smk/readable_datetime``
- Rule: ``rules/preprocessing.smk/phone_sensed_bins``
- Rule: ``rules/preprocessing.smk/phone_valid_sensed_days``
- Rule: ``rules/reports.smk/heatmap_features_correlations``
.. _figure5-parameters:
**Parameters of heatmap_features_correlations Rule:**
======================= ==============
Name Description
======================= ==============
plot Whether the rule is executed or not. The available options are ``True`` and ``False``.
min_valid_bins_per_hour The minimum valid bins an hour should have to be considered valid. A valid bin has at least 1 row of data. It modifies the way we compute phone valid days. Read :ref:`PHONE_VALID_SENSED_BINS<phone-valid-sensed-bins>` for more information.
min_valid_hours_per_day The minimum valid hours a day should have to be considered valid. It modifies the way we compute phone valid days. Read :ref:`PHONE_VALID_SENSED_DAYS<phone-valid-sensed-days>` for more information.
corr_method Method of correlation. The available options are ``pearson``, ``kendall`` and ``spearman``.
min_rows_ratio Minimum number of observations required per pair of columns to have a valid correlation coefient. Currently, only available for ``pearson`` and ``spearman`` correlation.
phone_features The list of phone features.
fitbit_features The list of Fitbit features.
corr_threshold Only correlation coefficients larger than ``corr_threshold`` can be shown in the heatmap.
======================= ==============
**Observations:**
Columns and rows are features computed in RAPIDS, cells color represents the correlation coefficient between all days of data for every pair of feature of all participants. The user can specify a minimum number of observations required to compute the correlation between two features using the ``MIN_ROWS_RATIO`` parameter (0.5 by default). In addition, this plot can be configured to only display correlation coefficients above a threshold using the ``CORR_THRESHOLD`` parameter (0.1 by default). See Figure 5.
.. figure:: figures/Figure5.png
:scale: 90 %
:align: center
Figure 5 Correlation matrix heatmap for all the data of all participants
.. _`Histogram of Valid Sensed Hours Config Code`: https://github.com/carissalow/rapids/blob/master/config.yaml#L221
.. _`Heatmap of Phone Sensed Bins Config Code`: https://github.com/carissalow/rapids/blob/master/config.yaml#L233
.. _`Heatmap of Days by Sensors Config Code`: https://github.com/carissalow/rapids/blob/master/config.yaml#L226
.. _`Overall Compliance Heatmap Config Code`: https://github.com/carissalow/rapids/blob/master/config.yaml#L237
.. _`Heatmap of Correlation Matrix Config Code`: https://github.com/carissalow/rapids/blob/master/config.yaml#L211

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 274 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 121 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 113 KiB

View File

@ -0,0 +1,64 @@
# Data Quality Visualizations
We showcase these visualizations with a test study that collected 14 days of smartphone and Fitbit data from two participants (t01 and t02) and extracted behavioral features within five time segments (daily, morning, afternoon, evening, and night).
!!! note
[Time segments](../../setup/configuration#time-segments) (e.g. `daily`, `morning`, etc.) can have multiple instances (day 1, day 2, or morning 1, morning 2, etc.)
## 1. Histograms of phone data yield
RAPIDS provides two histograms that show the number of time segment instances that had a certain ratio of valid [yielded minutes and hours](../../features/phone-data-yield/#rapids-provider), respectively. A valid yielded minute has at least 1 row of data from any smartphone sensor and a valid yielded hour contains at least M valid minutes.
These plots can be used as a rough indication of the smartphone monitoring coverage during a study aggregated across all participants. For example, the figure below shows a valid yielded minutes histogram for daily segments and we can infer that the monitoring coverage was very good since almost all segments contain at least 90 to 100% of the expected sensed minutes.
!!! example
Click [here](../../img/h-data-yield.html) to see an example of these interactive visualizations in HTML format
<figure>
<img src="../../img/h-data-yield.png" max-width="100%" />
<figcaption>Histogram of the data yielded minute ratio for a single participant during five time segments (daily, afternoon, evening, and night)</figcaption>
</figure>
## 2. Heatmaps of overall data yield
These heatmaps are a break down per time segment and per participant of [Visualization 1](#1-histograms-of-phone-data-yield). Heatmap's rows represent participants, columns represent time segment instances and the cells color represent the valid yielded minute or hour ratio for a participant during a time segment instance.
As different participants might join a study on different dates and time segments can be of any length and start on any day, the x-axis is labelled with the time delta between the start of each time segment instance minus the start of the first instance. These plots provide a quick study overview of the monitoring coverage per person and per time segment.
The figure below shows the heatmap of the valid yielded minute ratio for participants t01 and t02 on daily segments and, as we inferred from the previous histogram, the lighter (yellow) color on most time segment instances (cells) indicate both phones sensed data without interruptions for most days (except for the first and last ones).
!!! example
Click [here](../../img/hm-data-yield-participants.html) to see an example of these interactive visualizations in HTML format
<figure>
<img src="../../img/hm-data-yield-participants.png" max-width="100%" />
<figcaption>Overall compliance heatmap for all participants</figcaption>
</figure>
## 3. Heatmap of recorded phone sensors
In these heatmaps rows represent time segment instances, columns represent minutes since the start of a time segment instance, and cells color shows the number of phone sensors that logged at least one row of data during those 1-minute windows.
RAPIDS creates a plot per participant and per time segment and can be used as a rough indication of whether time-based sensors were following their sensing schedule (e.g. if location was being sensed every 2 minutes).
The figure below shows this heatmap for phone sensors collected by participant t01 in daily time segments from Apr 23rd 2020 to May 4th 2020. We can infer that for most of the monitoring time, the participants phone logged data from at least 8 sensors each minute.
!!! example
Click [here](../../img/hm-phone-sensors.html) to see an example of these interactive visualizations in HTML format
<figure>
<img src="../../img/hm-phone-sensors.png" max-width="100%" />
<figcaption>Heatmap of the recorded phone sensors per minute and per time segment of a single participant</figcaption>
</figure>
## 4. Heatmap of sensor row count
These heatmaps are a per-sensor breakdown of [Visualization 1](#1-histograms-of-phone-data-yield) and [Visualization 2](#2-heatmaps-of-overall-data-yield). Note that the second row (ratio of valid yielded minutes) of this heatmap matches the respective participant (bottom) row the screenshot in Visualization 2.
In these heatmaps rows represent phone or Fitbit sensors, columns represent time segment instances and cells color shows the normalized (0 to 1) row count of each sensor within a time segment instance. RAPIDS creates one heatmap per participant and they can be used to judge missing data on a per participant and per sensor basis.
The figure below shows data for 16 phone sensors (including data yield) of t01s daily segments (only half of the sensor names and dates are visible in the screenshot but all can be accessed in the interactive plot). From the top two rows, we can see that the phone was sensing data for most of the monitoring period (as suggested by Figure 3 and Figure 4). We can also infer how phone usage influenced the different sensor streams; there are peaks of screen events during the first day (Apr 23rd), peaks of location coordinates on Apr 26th and Apr 30th, and no sent or received SMS except for Apr 23rd, Apr 29th and Apr 30th (unlabeled row between screen and locations).
!!! example
Click [here](../../img/hm-sensor_rows.html) to see an example of these interactive visualizations in HTML format
<figure>
<img src="../../img/hm-sensor_rows.png" max-width="100%" />
<figcaption>Heatmap of the sensor row count per time segment of a single participant</figcaption>
</figure>

View File

@ -0,0 +1,14 @@
# Feature Visualizations
## 1. Heatmap Correlation Matrix
Columns and rows are the behavioral features computed in RAPIDS, cells color represents the correlation coefficient between all days of data for every pair of features of all participants.
The user can specify a minimum number of observations ([time segment](../../setup/configuration#time-segments) instances) required to compute the correlation between two features using the `MIN_ROWS_RATIO` parameter (0.5 by default) and the correlation method (Pearson, Spearman or Kendall) with the `CORR_METHOD` parameter. In addition, this plot can be configured to only display correlation coefficients above a threshold using the `CORR_THRESHOLD` parameter (0.1 by default).
!!! example
Click [here](../../img/hm-feature-correlations.html) to see an example of these interactive visualizations in HTML format
<figure>
<img src="../../img/hm-feature-correlations.png" max-width="100%" />
<figcaption>Correlation matrix heatmap for all the features of all participants</figcaption>
</figure>

View File

@ -0,0 +1,97 @@
# Analysis Workflow Example
!!! info "TL;DR"
- In addition to using RAPIDS to extract behavioral features and create plots, you can structure your data analysis within RAPIDS (i.e. cleaning your features and creating ML/statistical models)
- We include an analysis example in RAPIDS that covers raw data processing, cleaning, feature extraction, machine learning modeling, and evaluation
- Use this example as a guide to structure your own analysis within RAPIDS
- RAPIDS analysis workflows are compatible with your favorite data science tools and libraries
- RAPIDS analysis workflows are reproducible and we encourage you to publish them along with your research papers
## Why should I integrate my analysis in RAPIDS?
Even though the bulk of RAPIDS current functionality is related to the computation of behavioral features, we recommend RAPIDS as a complementary tool to create a mobile data analysis workflow. This is because the cookiecutter data science file organization guidelines, the use of Snakemake, the provided behavioral features, and the reproducible R and Python development environments allow researchers to divide an analysis workflow into small parts that can be audited, shared in an online repository, reproduced in other computers, and understood by other people as they follow a familiar and consistent structure. We believe these advantages outweigh the time needed to learn how to create these workflows in RAPIDS.
We clarify that to create analysis workflows in RAPIDS, researchers can still use any data manipulation tools, editors, libraries or languages they are already familiar with. RAPIDS is meant to be the final destination of analysis code that was developed in interactive notebooks or stand-alone scripts. For example, a user can compute call and location features using RAPIDS, then, they can use Jupyter notebooks to explore feature cleaning approaches and once the cleaning code is final, it can be moved to RAPIDS as a new step in the pipeline. In turn, the output of this cleaning step can be used to explore machine learning models and once a model is finished, it can also be transferred to RAPIDS as a step of its own. The idea is that when it is time to publish a piece of research, a RAPIDS workflow can be shared in a public repository as is.
In the following sections we share an example of how we structured an analysis workflow in RAPIDS.
## Analysis workflow structure
To accurately reflect the complexity of a real-world modeling scenario, we decided not to oversimplify this example. Importantly, every step in this example follows a basic structure: an input file and parameters are manipulated by an R or Python script that saves the results to an output file. Input files, parameters, output files and scripts are grouped into Snakemake rules that are described on `smk` files in the rules folder (we point the reader to the relevant rule(s) of each step).
Researchers can use these rules and scripts as a guide to create their own as it is expected every modeling project will have different requirements, data and goals but ultimately most follow a similar chainned pattern.
!!! hint
The example's config file is `example_profile/example_config.yaml` and its Snakefile is in `example_profile/Snakefile`. The config file is already configured to process the sensor data as explained in [Analysis workflow modules](#analysis-workflow-modules).
## Description of the study modeled in our analysis workflow example
Our example is based on a hypothetical study that recruited 2 participants that underwent surgery and collected mobile data for at least one week before and one week after the procedure. Participants wore a Fitbit device and installed the AWARE client in their personal Android and iOS smartphones to collect mobile data 24/7. In addition, participants completed daily severity ratings of 12 common symptoms on a scale from 0 to 10 that we summed up into a daily symptom burden score.
The goal of this workflow is to find out if we can predict the daily symptom burden score of a participant. Thus, we framed this question as a binary classification problem with two classes, high and low symptom burden based on the scores above and below average of each participant. We also want to compare the performance of individual (personalized) models vs a population model.
In total, our example workflow has nine steps that are in charge of sensor data preprocessing, feature extraction, feature cleaning, machine learning model training and model evaluation (see figure below). We ship this workflow with RAPIDS and share a database with [test data](https://osf.io/skqfv/files/) in an Open Science Framework repository.
<figure>
<img src="../../img/analysis_workflow.png" max-width="100%" />
<figcaption>Modules of RAPIDS example workflow, from raw data to model evaluation</figcaption>
</figure>
## Configure and run the analysis workflow example
1. [Install](../../setup/installation) RAPIDS
2. Configure the [user credentials](../../setup/configuration/#database-credentials) of a local or remote MySQL server with writing permissions in your `.env` file. The example config file is at `example_profile/example_config.yaml`.
3. Unzip the [test database](https://osf.io/skqfv/files/) to `data/external/rapids_example.sql` and run:
```bash
./rapids -j1 restore_sql_file --profile example_profile
```
4. Create the participant files for this example by running:
```bash
./rapids -j1 create_example_participant_files
```
5. Run the example pipeline with:
```bash
./rapids -j1 --profile example_profile
```
## Modules of our analysis workflow example
??? info "1. Feature extraction"
We extract daily behavioral features for data yield, received and sent messages, missed, incoming and outgoing calls, resample fused location data using Doryab provider, activity recognition, battery, Bluetooth, screen, light, applications foreground, conversations, Wi-Fi connected, Wi-Fi visible, Fitbit heart rate summary and intraday data, Fitbit sleep summary data, and Fitbit step summary and intraday data without excluding sleep periods with an active bout threshold of 10 steps. In total, we obtained 237 daily sensor features over 12 days per participant.
??? info "2. Extract demographic data."
It is common to have demographic data in addition to mobile and target (ground truth) data. In this example we include participants age, gender and the number of days they spent in hospital after their surgery as features in our model. We extract these three columns from the participant_info table of our test database . As these three features remain the same within participants, they are used only on the population model. Refer to the `demographic_features` rule in `rules/models.smk`.
??? info "3. Create target labels."
The two classes for our machine learning binary classification problem are high and low symptom burden. Target values are already stored in the `participant_target` table of our test database and transferred to a CSV file. A new rule/script can be created if further manipulation is necessary. Refer to the `parse_targets` rule in `rules/models.smk`.
??? info "4. Feature merging."
These daily features are stored on a CSV file per sensor, a CSV file per participant, and a CSV file including all features from all participants (in every case each column represents a feature and each row represents a day). Refer to the `merge_sensor_features_for_individual_participants` and `merge_features_for_population_model` rules in `rules/features.smk`.
??? info "5. Data visualization."
At this point the user can use the five plots RAPIDS provides (or implement new ones) to explore and understand the quality of the raw data and extracted features and decide what sensors, days, or participants to include and exclude. Refer to `rules/reports.smk` to find the rules that generate these plots.
??? info "6. Feature cleaning."
In this stage we perform four steps to clean our sensor feature file. First, we discard days with a data yield hour ratio less than or equal to 0.75, i.e. we include days with at least 18 hours of data. Second, we drop columns (features) with more than 30% of missing rows. Third, we drop columns with zero variance. Fourth, we drop rows (days) with more than 30% of missing columns (features). In this cleaning stage several parameters are created and exposed in `example_profile/example_config.yaml`.
After this step, we kept 162 features over 11 days for the individual model of p01, 107 features over 12 days for the individual model of p02 and 101 features over 20 days for the population model. Note that the difference in the number of features between p01 and p02 is mostly due to iOS restrictions that stops researchers from collecting the same number of sensors than in Android phones.
Feature cleaning for the individual models is done in the `clean_sensor_features_for_individual_participants` rule and for the population model in the `clean_sensor_features_for_all_participants` rule in `rules/models.smk`.
??? info "7. Merge features and targets."
In this step we merge the cleaned features and target labels for our individual models in the `merge_features_and_targets_for_individual_model` rule in `rules/models.smk`. Additionally, we merge the cleaned features, target labels, and demographic features of our two participants for the population model in the `merge_features_and_targets_for_population_model` rule in `rules/models.smk`. These two merged files are the input for our individual and population models.
??? info "8. Modelling."
This stage has three phases: model building, training and evaluation.
In the building phase we impute, normalize and oversample our dataset. Missing numeric values in each column are imputed with their mean and we impute missing categorical values with their mode. We normalize each numeric column with one of three strategies (min-max, z-score, and scikit-learn packages robust scaler) and we one-hot encode each categorial feature as a numerical array. We oversample our imbalanced dataset using SMOTE (Synthetic Minority Over-sampling Technique) or a Random Over sampler from scikit-learn. All these parameters are exposed in `example_profile/example_config.yaml`.
In the training phase, we create eight models: logistic regression, k-nearest neighbors, support vector machine, decision tree, random forest, gradient boosting classifier, extreme gradient boosting classifier and a light gradient boosting machine. We cross-validate each model with an inner cycle to tune hyper-parameters based on the Macro F1 score and an outer cycle to predict the test set on a model with the best hyper-parameters. Both cross-validation cycles use a leave-one-out strategy. Parameters for each model like weights and learning rates are exposed in `example_profile/example_config.yaml`.
Finally, in the evaluation phase we compute the accuracy, Macro F1, kappa, area under the curve and per class precision, recall and F1 score of all folds of the outer cross-validation cycle.
Refer to the `modelling_for_individual_participants` rule for the individual modeling and to the `modelling_for_all_participants` rule for the population modeling, both in `rules/models.smk`.
??? info "9. Compute model baselines."
We create three baselines to evaluate our classification models.
First, a majority classifier that labels each test sample with the majority class of our training data. Second, a random weighted classifier that predicts each test observation sampling at random from a binomial distribution based on the ratio of our target labels. Third, a decision tree classifier based solely on the demographic features of each participant. As we do not have demographic features for individual model, this baseline is only available for population model.
Our baseline metrics (e.g. accuracy, precision, etc.) are saved into a CSV file, ready to be compared to our modeling results. Refer to the `baselines_for_individual_model` rule for the individual model baselines and to the `baselines_for_population_model` rule for population model baselines, both in `rules/models.smk`.

View File

@ -0,0 +1,88 @@
Minimal Working Example
=======================
This is a quick guide for creating and running a simple pipeline to extract missing, outgoing, and incoming call features for `daily` and `night` epochs of one participant monitored on the US East coast.
1. Install RAPIDS and make sure your `conda` environment is active (see [Installation](../../setup/installation))
2. Make the changes listed below for the corresponding [Configuration](../../setup/configuration) step (we provide an example of what the relevant sections in your `config.yml` will look like after you are done)
??? info "Things to change on each configuration step"
1\. Setup your database connection credentials in `.env`. We assume your credentials group is called `MY_GROUP`.
2\. `America/New_York` should be the default timezone
3\. Create a participant file `p01.yaml` based on one of your participants and add `p01` to `[PIDS]` in `config.yaml`. The following would be the content of your `p01.yaml` participant file:
```yaml
PHONE:
DEVICE_IDS: [aaaaaaaa-1111-bbbb-2222-cccccccccccc] # your participant's AWARE device id
PLATFORMS: [android] # or ios
LABEL: MyTestP01 # any string
START_DATE: 2020-01-01 # this can also be empty
END_DATE: 2021-01-01 # this can also be empty
```
4\. `[TIME_SEGMENTS][TYPE]` should be the default `PERIODIC`. Change `[TIME_SEGMENTS][FILE]` with the path of a file containing the following lines:
```csv
label,start_time,length,repeats_on,repeats_value
daily,00:00:00,23H 59M 59S,every_day,0
night,00:00:00,5H 59M 59S,every_day,0
```
5\. If you collected data with AWARE you won't need to modify the attributes of `[DEVICE_DATA][PHONE]`
6\. Set `[PHONE_CALLS][PROVIDERS][RAPIDS][COMPUTE]` to `True`
??? example "Example of the `config.yaml` sections after the changes outlined above"
```
PIDS: [p01]
TIMEZONE: &timezone
America/New_York
DATABASE_GROUP: &database_group
MY_GROUP
# ... other irrelevant sections
TIME_SEGMENTS: &time_segments
TYPE: PERIODIC
FILE: "data/external/timesegments_periodic.csv" # make sure the three lines specified above are in the file
INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE
# No need to change this if you collected AWARE data on a database and your credentials are grouped under `MY_GROUP` in `.env`
DEVICE_DATA:
PHONE:
SOURCE:
TYPE: DATABASE
DATABASE_GROUP: *database_group
DEVICE_ID_COLUMN: device_id # column name
TIMEZONE:
TYPE: SINGLE # SINGLE or MULTIPLE
VALUE: *timezone
############## PHONE ###########################################################
################################################################################
# ... other irrelevant sections
# Communication call features config, TYPES and FEATURES keys need to match
PHONE_CALLS:
TABLE: calls # change if your calls table has a different name
PROVIDERS:
RAPIDS:
COMPUTE: True # set this to True!
CALL_TYPES: ...
```
3. Run RAPIDS
```bash
./rapids -j1
```
4. The call features for daily and morning time segments will be in
```
/data/processed/features/p01/phone_calls.csv
```

View File

@ -13,272 +13,262 @@ files_to_compute = []
if len(config["PIDS"]) == 0:
raise ValueError("Add participants IDs to PIDS in config.yaml. Remember to create their participant files in data/external")
if config["PHONE_VALID_SENSED_BINS"]["COMPUTE"] or config["PHONE_VALID_SENSED_DAYS"]["COMPUTE"]: # valid sensed bins is necessary for sensed days, so we add these files anyways if sensed days are requested
if len(config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"]) == 0:
raise ValueError("If you want to compute PHONE_VALID_SENSED_BINS or PHONE_VALID_SENSED_DAYS, you need to add at least one table to [PHONE_VALID_SENSED_BINS][DB_TABLES] in config.yaml")
for provider in config["PHONE_DATA_YIELD"]["PROVIDERS"].keys():
if config["PHONE_DATA_YIELD"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=map(str.lower, config["PHONE_DATA_YIELD"]["SENSORS"])))
files_to_compute.extend(expand("data/interim/{pid}/phone_yielded_timestamps.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_yielded_timestamps_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_data_yield_features/phone_data_yield_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_DATA_YIELD"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_data_yield.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
pids_android = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "android", config["PIDS"]))
pids_ios = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "ios", config["PIDS"]))
tables_android = [table for table in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"] if table not in [config["CONVERSATION"]["DB_TABLE"]["IOS"], config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]]] # for android, discard any ios tables that may exist
tables_ios = [table for table in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"] if table not in [config["CONVERSATION"]["DB_TABLE"]["ANDROID"], config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["ANDROID"]]] # for ios, discard any android tables that may exist
for provider in config["PHONE_MESSAGES"]["PROVIDERS"].keys():
if config["PHONE_MESSAGES"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_messages_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_messages_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_messages_features/phone_messages_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_MESSAGES"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_messages.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for pids,table in zip([pids_android, pids_ios], [tables_android, tables_ios]):
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/interim/{pid}/phone_sensed_bins.csv", pid=config["PIDS"]))
for provider in config["PHONE_CALLS"]["PROVIDERS"].keys():
if config["PHONE_CALLS"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_calls_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_calls_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_calls_with_datetime_unified.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_calls_features/phone_calls_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_CALLS"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_calls.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if config["PHONE_VALID_SENSED_DAYS"]["COMPUTE"]:
files_to_compute.extend(expand("data/interim/{pid}/phone_valid_sensed_days_{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins.csv",
for provider in config["PHONE_BLUETOOTH"]["PROVIDERS"].keys():
if config["PHONE_BLUETOOTH"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_bluetooth_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_bluetooth_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_bluetooth_features/phone_bluetooth_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_BLUETOOTH"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_bluetooth.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"].keys():
if config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_activity_recognition_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_activity_recognition_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_activity_recognition_with_datetime_unified.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_episodes.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_episodes_resampled.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_features/phone_activity_recognition_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_activity_recognition.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["PHONE_BATTERY"]["PROVIDERS"].keys():
if config["PHONE_BATTERY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_battery_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_battery_episodes.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_battery_episodes_resampled.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_battery_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_battery_features/phone_battery_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_BATTERY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_battery.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["PHONE_SCREEN"]["PROVIDERS"].keys():
if config["PHONE_SCREEN"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_screen_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_screen_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_screen_with_datetime_unified.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_screen_episodes.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_screen_episodes_resampled.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_screen_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_screen_features/phone_screen_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_SCREEN"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_screen.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["PHONE_LIGHT"]["PROVIDERS"].keys():
if config["PHONE_LIGHT"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_light_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_light_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_light_features/phone_light_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_LIGHT"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_light.csv", pid=config["PIDS"],))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["PHONE_ACCELEROMETER"]["PROVIDERS"].keys():
if config["PHONE_ACCELEROMETER"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_accelerometer_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_accelerometer_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_accelerometer_features/phone_accelerometer_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_ACCELEROMETER"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_accelerometer.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["PHONE_APPLICATIONS_FOREGROUND"]["PROVIDERS"].keys():
if config["PHONE_APPLICATIONS_FOREGROUND"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_applications_foreground_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_applications_foreground_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_applications_foreground_with_datetime_with_categories.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_applications_foreground_features/phone_applications_foreground_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_APPLICATIONS_FOREGROUND"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_applications_foreground.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["PHONE_WIFI_VISIBLE"]["PROVIDERS"].keys():
if config["PHONE_WIFI_VISIBLE"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_wifi_visible_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_wifi_visible_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_wifi_visible_features/phone_wifi_visible_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_WIFI_VISIBLE"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_wifi_visible.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["PHONE_WIFI_CONNECTED"]["PROVIDERS"].keys():
if config["PHONE_WIFI_CONNECTED"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_wifi_connected_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_wifi_connected_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_wifi_connected_features/phone_wifi_connected_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_WIFI_CONNECTED"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_wifi_connected.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["PHONE_CONVERSATION"]["PROVIDERS"].keys():
if config["PHONE_CONVERSATION"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_conversation_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_conversation_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_conversation_with_datetime_unified.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_conversation_features/phone_conversation_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_CONVERSATION"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_conversation.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["PHONE_LOCATIONS"]["PROVIDERS"].keys():
if config["PHONE_LOCATIONS"]["PROVIDERS"][provider]["COMPUTE"]:
if config["PHONE_LOCATIONS"]["LOCATIONS_TO_USE"] == "FUSED_RESAMPLED":
if "PHONE_LOCATIONS" in config["PHONE_DATA_YIELD"]["SENSORS"]:
files_to_compute.extend(expand("data/interim/{pid}/phone_yielded_timestamps.csv", pid=config["PIDS"]))
else:
raise ValueError("Error: Add PHONE_LOCATIONS (and as many PHONE_SENSORS as you have) to [PHONE_DATA_YIELD][SENSORS] in config.yaml. This is necessary to compute phone_yielded_timestamps (time when the smartphone was sensing data) which is used to resample fused location data (RESAMPLED_FUSED)")
files_to_compute.extend(expand("data/raw/{pid}/phone_locations_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_locations_processed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_locations_processed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_locations_features/phone_locations_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_LOCATIONS"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_locations.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["FITBIT_HEARTRATE_SUMMARY"]["PROVIDERS"].keys():
if config["FITBIT_HEARTRATE_SUMMARY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_summary_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_summary_parsed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_summary_parsed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_heartrate_summary_features/fitbit_heartrate_summary_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["FITBIT_HEARTRATE_SUMMARY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_heartrate_summary.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["FITBIT_HEARTRATE_INTRADAY"]["PROVIDERS"].keys():
if config["FITBIT_HEARTRATE_INTRADAY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_intraday_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_intraday_parsed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_intraday_parsed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_heartrate_intraday_features/fitbit_heartrate_intraday_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["FITBIT_HEARTRATE_INTRADAY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_heartrate_intraday.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["FITBIT_SLEEP_SUMMARY"]["PROVIDERS"].keys():
if config["FITBIT_SLEEP_SUMMARY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_summary_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_summary_parsed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_summary_parsed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_sleep_summary_features/fitbit_sleep_summary_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["FITBIT_SLEEP_SUMMARY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_sleep_summary.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["FITBIT_STEPS_SUMMARY"]["PROVIDERS"].keys():
if config["FITBIT_STEPS_SUMMARY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_summary_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_summary_parsed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_summary_parsed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_steps_summary_features/fitbit_steps_summary_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["FITBIT_STEPS_SUMMARY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_steps_summary.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["FITBIT_STEPS_INTRADAY"]["PROVIDERS"].keys():
if config["FITBIT_STEPS_INTRADAY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_intraday_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_intraday_parsed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_intraday_parsed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_steps_intraday_features/fitbit_steps_intraday_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["FITBIT_STEPS_INTRADAY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_steps_intraday.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
# Visualization for Data Exploration
if config["HISTOGRAM_PHONE_DATA_YIELD"]["PLOT"]:
files_to_compute.append("reports/data_exploration/histogram_phone_data_yield.html")
if config["HEATMAP_SENSORS_PER_MINUTE_PER_TIME_SEGMENT"]["PLOT"]:
files_to_compute.extend(expand("reports/interim/{pid}/heatmap_sensors_per_minute_per_time_segment.html", pid=config["PIDS"]))
files_to_compute.append("reports/data_exploration/heatmap_sensors_per_minute_per_time_segment.html")
if config["HEATMAP_SENSOR_ROW_COUNT_PER_TIME_SEGMENT"]["PLOT"]:
files_to_compute.extend(expand("reports/interim/{pid}/heatmap_sensor_row_count_per_time_segment.html", pid=config["PIDS"]))
files_to_compute.append("reports/data_exploration/heatmap_sensor_row_count_per_time_segment.html")
if config["HEATMAP_PHONE_DATA_YIELD_PER_PARTICIPANT_PER_TIME_SEGMENT"]["PLOT"]:
files_to_compute.append("reports/data_exploration/heatmap_phone_data_yield_per_participant_per_time_segment.html")
if config["HEATMAP_FEATURE_CORRELATION_MATRIX"]["PLOT"]:
files_to_compute.append("reports/data_exploration/heatmap_feature_correlation_matrix.html")
# Analysis Workflow Example
models, scalers = [], []
for model_name in config["PARAMS_FOR_ANALYSIS"]["MODEL_NAMES"]:
models = models + [model_name] * len(config["PARAMS_FOR_ANALYSIS"]["MODEL_SCALER"][model_name])
scalers = scalers + config["PARAMS_FOR_ANALYSIS"]["MODEL_SCALER"][model_name]
results = config["PARAMS_FOR_ANALYSIS"]["RESULT_COMPONENTS"]
# Demographic features
files_to_compute.extend(expand("data/raw/{pid}/participant_info_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/demographic_features.csv", pid=config["PIDS"]))
# Targets
files_to_compute.extend(expand("data/raw/{pid}/participant_target_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/participant_target_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/targets/{pid}/parsed_targets.csv", pid=config["PIDS"]))
# Individual model
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features_cleaned.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/models/individual_model/{pid}/input.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/models/individual_model/{pid}/output_{cv_method}/baselines.csv", pid=config["PIDS"], cv_method=config["PARAMS_FOR_ANALYSIS"]["CV_METHODS"]))
files_to_compute.extend(expand(
expand("data/processed/models/individual_model/{pid}/output_{cv_method}/{{model}}/{{scaler}}/{result}.csv",
pid=config["PIDS"],
min_valid_hours_per_day=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_HOURS_PER_DAY"],
min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
cv_method=config["PARAMS_FOR_ANALYSIS"]["CV_METHODS"],
result = results),
zip,
model=models,
scaler=scalers))
if config["MESSAGES"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["MESSAGES"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["MESSAGES"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/messages_{messages_type}_{day_segment}.csv", pid=config["PIDS"], messages_type = config["MESSAGES"]["TYPES"], day_segment = config["MESSAGES"]["DAY_SEGMENTS"]))
# Population model
files_to_compute.append("data/processed/features/all_participants/all_sensor_features_cleaned.csv")
files_to_compute.append("data/processed/models/population_model/input.csv")
files_to_compute.extend(expand("data/processed/models/population_model/output_{cv_method}/baselines.csv", cv_method=config["PARAMS_FOR_ANALYSIS"]["CV_METHODS"]))
files_to_compute.extend(expand(
expand("data/processed/models/population_model/output_{cv_method}/{{model}}/{{scaler}}/{result}.csv",
cv_method=config["PARAMS_FOR_ANALYSIS"]["CV_METHODS"],
result = results),
zip,
model=models,
scaler=scalers))
if config["CALLS"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["CALLS"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["CALLS"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=config["PIDS"], sensor=config["CALLS"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/calls_{call_type}_{day_segment}.csv", pid=config["PIDS"], call_type=config["CALLS"]["TYPES"], day_segment = config["CALLS"]["DAY_SEGMENTS"]))
if config["BARNETT_LOCATION"]["COMPUTE"]:
if config["BARNETT_LOCATION"]["LOCATIONS_TO_USE"] == "RESAMPLE_FUSED":
if config["BARNETT_LOCATION"]["DB_TABLE"] in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"]:
files_to_compute.extend(expand("data/interim/{pid}/phone_sensed_bins.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_resampled.csv", pid=config["PIDS"], sensor=config["BARNETT_LOCATION"]["DB_TABLE"]))
else:
raise ValueError("Error: Add your locations table (and as many sensor tables as you have) to [PHONE_VALID_SENSED_BINS][DB_TABLES] in config.yaml. This is necessary to compute phone_sensed_bins (bins of time when the smartphone was sensing data) which is used to resample fused location data (RESAMPLED_FUSED)")
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["BARNETT_LOCATION"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["BARNETT_LOCATION"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/location_barnett_{day_segment}.csv", pid=config["PIDS"], day_segment = config["BARNETT_LOCATION"]["DAY_SEGMENTS"]))
if config["BLUETOOTH"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["BLUETOOTH"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["BLUETOOTH"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/bluetooth_{day_segment}.csv", pid=config["PIDS"], day_segment = config["BLUETOOTH"]["DAY_SEGMENTS"]))
if config["ACTIVITY_RECOGNITION"]["COMPUTE"]:
pids_android = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "android", config["PIDS"]))
pids_ios = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "ios", config["PIDS"]))
for pids,table in zip([pids_android, pids_ios], [config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["ANDROID"], config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]]):
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/processed/{pid}/{sensor}_deltas.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/processed/{pid}/activity_recognition_{day_segment}.csv",pid=config["PIDS"], day_segment = config["ACTIVITY_RECOGNITION"]["DAY_SEGMENTS"]))
if config["BATTERY"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["BATTERY"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["BATTERY"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=config["PIDS"], sensor=config["BATTERY"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/battery_deltas.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/{pid}/battery_{day_segment}.csv", pid = config["PIDS"], day_segment = config["BATTERY"]["DAY_SEGMENTS"]))
if config["SCREEN"]["COMPUTE"]:
if config["SCREEN"]["DB_TABLE"] in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"]:
files_to_compute.extend(expand("data/interim/{pid}/phone_sensed_bins.csv", pid=config["PIDS"]))
else:
raise ValueError("Error: Add your screen table (and as many sensor tables as you have) to [PHONE_VALID_SENSED_BINS][DB_TABLES] in config.yaml. This is necessary to compute phone_sensed_bins (bins of time when the smartphone was sensing data)")
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["SCREEN"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["SCREEN"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=config["PIDS"], sensor=config["SCREEN"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/screen_deltas.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/{pid}/screen_{day_segment}.csv", pid = config["PIDS"], day_segment = config["SCREEN"]["DAY_SEGMENTS"]))
if config["LIGHT"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["LIGHT"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["LIGHT"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/light_{day_segment}.csv", pid = config["PIDS"], day_segment = config["LIGHT"]["DAY_SEGMENTS"]))
if config["ACCELEROMETER"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["ACCELEROMETER"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["ACCELEROMETER"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/accelerometer_{day_segment}.csv", pid = config["PIDS"], day_segment = config["ACCELEROMETER"]["DAY_SEGMENTS"]))
if config["APPLICATIONS_FOREGROUND"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["APPLICATIONS_FOREGROUND"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["APPLICATIONS_FOREGROUND"]["DB_TABLE"]))
files_to_compute.extend(expand("data/interim/{pid}/{sensor}_with_datetime_with_genre.csv", pid=config["PIDS"], sensor=config["APPLICATIONS_FOREGROUND"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/applications_foreground_{day_segment}.csv", pid = config["PIDS"], day_segment = config["APPLICATIONS_FOREGROUND"]["DAY_SEGMENTS"]))
if config["WIFI"]["COMPUTE"]:
if len(config["WIFI"]["DB_TABLE"]["VISIBLE_ACCESS_POINTS"]) > 0:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["WIFI"]["DB_TABLE"]["VISIBLE_ACCESS_POINTS"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["WIFI"]["DB_TABLE"]["VISIBLE_ACCESS_POINTS"]))
files_to_compute.extend(expand("data/processed/{pid}/wifi_{day_segment}.csv", pid = config["PIDS"], day_segment = config["WIFI"]["DAY_SEGMENTS"]))
if len(config["WIFI"]["DB_TABLE"]["CONNECTED_ACCESS_POINTS"]) > 0:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["WIFI"]["DB_TABLE"]["CONNECTED_ACCESS_POINTS"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["WIFI"]["DB_TABLE"]["CONNECTED_ACCESS_POINTS"]))
files_to_compute.extend(expand("data/processed/{pid}/wifi_{day_segment}.csv", pid = config["PIDS"], day_segment = config["WIFI"]["DAY_SEGMENTS"]))
if config["HEARTRATE"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["HEARTRATE"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_{fitbit_data_type}_with_datetime.csv", pid=config["PIDS"], fitbit_data_type=["summary", "intraday"]))
files_to_compute.extend(expand("data/processed/{pid}/fitbit_heartrate_{day_segment}.csv", pid = config["PIDS"], day_segment = config["HEARTRATE"]["DAY_SEGMENTS"]))
if config["STEP"]["COMPUTE"]:
if config["STEP"]["EXCLUDE_SLEEP"]["EXCLUDE"] == True and config["STEP"]["EXCLUDE_SLEEP"]["TYPE"] == "FITBIT_BASED":
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_{fitbit_data_type}_with_datetime.csv", pid=config["PIDS"], fitbit_data_type=["summary"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["STEP"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_step_{fitbit_data_type}_with_datetime.csv", pid=config["PIDS"], fitbit_data_type=["intraday"]))
files_to_compute.extend(expand("data/processed/{pid}/fitbit_step_{day_segment}.csv", pid = config["PIDS"], day_segment = config["STEP"]["DAY_SEGMENTS"]))
if config["SLEEP"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["SLEEP"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_{fitbit_data_type}_with_datetime.csv", pid=config["PIDS"], fitbit_data_type=["intraday", "summary"]))
files_to_compute.extend(expand("data/processed/{pid}/fitbit_sleep_{day_segment}.csv", pid = config["PIDS"], day_segment = config["SLEEP"]["DAY_SEGMENTS"]))
if config["CONVERSATION"]["COMPUTE"]:
pids_android = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "android", config["PIDS"]))
pids_ios = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "ios", config["PIDS"]))
for pids,table in zip([pids_android, pids_ios], [config["CONVERSATION"]["DB_TABLE"]["ANDROID"], config["CONVERSATION"]["DB_TABLE"]["IOS"]]):
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=pids, sensor=table))
files_to_compute.extend(expand("data/processed/{pid}/conversation_{day_segment}.csv",pid=config["PIDS"], day_segment = config["CONVERSATION"]["DAY_SEGMENTS"]))
if config["DORYAB_LOCATION"]["COMPUTE"]:
if config["DORYAB_LOCATION"]["LOCATIONS_TO_USE"] == "RESAMPLE_FUSED":
if config["DORYAB_LOCATION"]["DB_TABLE"] in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"]:
files_to_compute.extend(expand("data/interim/{pid}/phone_sensed_bins.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_resampled.csv", pid=config["PIDS"], sensor=config["DORYAB_LOCATION"]["DB_TABLE"]))
else:
raise ValueError("Error: Add your locations table (and as many sensor tables as you have) to [PHONE_VALID_SENSED_BINS][DB_TABLES] in config.yaml. This is necessary to compute phone_sensed_bins (bins of time when the smartphone was sensing data) which is used to resample fused location data (RESAMPLED_FUSED)")
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["DORYAB_LOCATION"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["DORYAB_LOCATION"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/location_doryab_{segment}.csv", pid=config["PIDS"], segment = config["DORYAB_LOCATION"]["DAY_SEGMENTS"]))
# visualization for data exploration
if config["HEATMAP_FEATURES_CORRELATIONS"]["PLOT"]:
files_to_compute.extend(expand("reports/data_exploration/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/heatmap_features_correlations.html", min_valid_hours_per_day=config["HEATMAP_FEATURES_CORRELATIONS"]["MIN_VALID_HOURS_PER_DAY"], min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
if config["HISTOGRAM_VALID_SENSED_HOURS"]["PLOT"]:
files_to_compute.extend(expand("reports/data_exploration/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/histogram_valid_sensed_hours.html", min_valid_hours_per_day=config["HISTOGRAM_VALID_SENSED_HOURS"]["MIN_VALID_HOURS_PER_DAY"], min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
if config["HEATMAP_DAYS_BY_SENSORS"]["PLOT"]:
files_to_compute.extend(expand("reports/interim/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/{pid}/heatmap_days_by_sensors.html", pid=config["PIDS"], min_valid_hours_per_day=config["HEATMAP_DAYS_BY_SENSORS"]["MIN_VALID_HOURS_PER_DAY"], min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
files_to_compute.extend(expand("reports/data_exploration/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/heatmap_days_by_sensors_all_participants.html", min_valid_hours_per_day=config["HEATMAP_DAYS_BY_SENSORS"]["MIN_VALID_HOURS_PER_DAY"], min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
if config["HEATMAP_SENSED_BINS"]["PLOT"]:
files_to_compute.extend(expand("reports/interim/heatmap_sensed_bins/{pid}/heatmap_sensed_bins.html", pid=config["PIDS"]))
files_to_compute.extend(["reports/data_exploration/heatmap_sensed_bins_all_participants.html"])
if config["OVERALL_COMPLIANCE_HEATMAP"]["PLOT"]:
files_to_compute.extend(expand("reports/data_exploration/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/overall_compliance_heatmap.html", min_valid_hours_per_day=config["OVERALL_COMPLIANCE_HEATMAP"]["MIN_VALID_HOURS_PER_DAY"], min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
# analysis example
if config["PARAMS_FOR_ANALYSIS"]["COMPUTE"]:
rows_nan_threshold = config["PARAMS_FOR_ANALYSIS"]["ROWS_NAN_THRESHOLD"]
cols_nan_threshold = config["PARAMS_FOR_ANALYSIS"]["COLS_NAN_THRESHOLD"]
models, scalers, rows_nan_thresholds, cols_nan_thresholds = [], [], [], []
for model_name in config["PARAMS_FOR_ANALYSIS"]["MODEL_NAMES"]:
models = models + [model_name] * len(config["PARAMS_FOR_ANALYSIS"]["MODEL_SCALER"][model_name]) * len(rows_nan_threshold)
scalers = scalers + config["PARAMS_FOR_ANALYSIS"]["MODEL_SCALER"][model_name] * len(rows_nan_threshold)
rows_nan_thresholds = rows_nan_thresholds + list(itertools.chain.from_iterable([threshold] * len(config["PARAMS_FOR_ANALYSIS"]["MODEL_SCALER"][model_name]) for threshold in rows_nan_threshold))
cols_nan_thresholds = cols_nan_thresholds + list(itertools.chain.from_iterable([threshold] * len(config["PARAMS_FOR_ANALYSIS"]["MODEL_SCALER"][model_name]) for threshold in cols_nan_threshold))
results = config["PARAMS_FOR_ANALYSIS"]["RESULT_COMPONENTS"] + ["merged_population_model_results"]
files_to_compute.extend(expand("data/processed/{pid}/data_for_individual_model/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/{source}_{day_segment}_original.csv",
pid = config["PIDS"],
min_valid_hours_per_day=config["OVERALL_COMPLIANCE_HEATMAP"]["MIN_VALID_HOURS_PER_DAY"],
min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"],
source = config["PARAMS_FOR_ANALYSIS"]["SOURCES"],
day_segment = config["PARAMS_FOR_ANALYSIS"]["DAY_SEGMENTS"]))
files_to_compute.extend(expand("data/processed/data_for_population_model/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/{source}_{day_segment}_original.csv",
min_valid_hours_per_day=config["OVERALL_COMPLIANCE_HEATMAP"]["MIN_VALID_HOURS_PER_DAY"],
min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"],
source = config["PARAMS_FOR_ANALYSIS"]["SOURCES"],
day_segment = config["PARAMS_FOR_ANALYSIS"]["DAY_SEGMENTS"]))
files_to_compute.extend(expand(
expand("data/processed/{pid}/data_for_individual_model/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/{{rows_nan_threshold}}|{{cols_nan_threshold}}_{days_before_threshold}|{days_after_threshold}_{cols_var_threshold}/{source}_{day_segment}_clean.csv",
pid = config["PIDS"],
min_valid_hours_per_day=config["OVERALL_COMPLIANCE_HEATMAP"]["MIN_VALID_HOURS_PER_DAY"],
min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"],
days_before_threshold = config["PARAMS_FOR_ANALYSIS"]["PARTICIPANT_DAYS_BEFORE_THRESHOLD"],
days_after_threshold = config["PARAMS_FOR_ANALYSIS"]["PARTICIPANT_DAYS_AFTER_THRESHOLD"],
cols_var_threshold = config["PARAMS_FOR_ANALYSIS"]["COLS_VAR_THRESHOLD"],
source = config["PARAMS_FOR_ANALYSIS"]["SOURCES"],
day_segment = config["PARAMS_FOR_ANALYSIS"]["DAY_SEGMENTS"]),
zip,
rows_nan_threshold = config["PARAMS_FOR_ANALYSIS"]["ROWS_NAN_THRESHOLD"],
cols_nan_threshold = config["PARAMS_FOR_ANALYSIS"]["COLS_NAN_THRESHOLD"]))
files_to_compute.extend(expand(
expand("data/processed/data_for_population_model/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/{{rows_nan_threshold}}|{{cols_nan_threshold}}_{days_before_threshold}|{days_after_threshold}_{cols_var_threshold}/{source}_{day_segment}_clean.csv",
min_valid_hours_per_day=config["OVERALL_COMPLIANCE_HEATMAP"]["MIN_VALID_HOURS_PER_DAY"],
min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"],
days_before_threshold = config["PARAMS_FOR_ANALYSIS"]["PARTICIPANT_DAYS_BEFORE_THRESHOLD"],
days_after_threshold = config["PARAMS_FOR_ANALYSIS"]["PARTICIPANT_DAYS_AFTER_THRESHOLD"],
cols_var_threshold = config["PARAMS_FOR_ANALYSIS"]["COLS_VAR_THRESHOLD"],
source = config["PARAMS_FOR_ANALYSIS"]["SOURCES"],
day_segment = config["PARAMS_FOR_ANALYSIS"]["DAY_SEGMENTS"]),
zip,
rows_nan_threshold = config["PARAMS_FOR_ANALYSIS"]["ROWS_NAN_THRESHOLD"],
cols_nan_threshold = config["PARAMS_FOR_ANALYSIS"]["COLS_NAN_THRESHOLD"]))
files_to_compute.extend(expand("data/processed/data_for_population_model/demographic_features.csv"))
files_to_compute.extend(expand("data/processed/data_for_population_model/targets_{summarised}.csv",
summarised = config["PARAMS_FOR_ANALYSIS"]["SUMMARISED"]))
files_to_compute.extend(expand(
expand("data/processed/data_for_population_model/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/{{rows_nan_threshold}}|{{cols_nan_threshold}}_{days_before_threshold}|{days_after_threshold}_{cols_var_threshold}/{source}_{day_segment}_nancellsratio.csv",
min_valid_hours_per_day=config["OVERALL_COMPLIANCE_HEATMAP"]["MIN_VALID_HOURS_PER_DAY"],
min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"],
days_before_threshold = config["PARAMS_FOR_ANALYSIS"]["PARTICIPANT_DAYS_BEFORE_THRESHOLD"],
days_after_threshold = config["PARAMS_FOR_ANALYSIS"]["PARTICIPANT_DAYS_AFTER_THRESHOLD"],
cols_var_threshold = config["PARAMS_FOR_ANALYSIS"]["COLS_VAR_THRESHOLD"],
source = config["PARAMS_FOR_ANALYSIS"]["SOURCES"],
day_segment = config["PARAMS_FOR_ANALYSIS"]["DAY_SEGMENTS"]),
zip,
rows_nan_threshold = config["PARAMS_FOR_ANALYSIS"]["ROWS_NAN_THRESHOLD"],
cols_nan_threshold = config["PARAMS_FOR_ANALYSIS"]["COLS_NAN_THRESHOLD"]))
files_to_compute.extend(expand(
expand("data/processed/data_for_population_model/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/{{rows_nan_threshold}}|{{cols_nan_threshold}}_{days_before_threshold}|{days_after_threshold}_{cols_var_threshold}/{source}_{day_segment}_{summarised}.csv",
min_valid_hours_per_day=config["OVERALL_COMPLIANCE_HEATMAP"]["MIN_VALID_HOURS_PER_DAY"],
min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"],
days_before_threshold = config["PARAMS_FOR_ANALYSIS"]["PARTICIPANT_DAYS_BEFORE_THRESHOLD"],
days_after_threshold = config["PARAMS_FOR_ANALYSIS"]["PARTICIPANT_DAYS_AFTER_THRESHOLD"],
cols_var_threshold = config["PARAMS_FOR_ANALYSIS"]["COLS_VAR_THRESHOLD"],
source = config["PARAMS_FOR_ANALYSIS"]["SOURCES"],
day_segment = config["PARAMS_FOR_ANALYSIS"]["DAY_SEGMENTS"],
summarised = config["PARAMS_FOR_ANALYSIS"]["SUMMARISED"]),
zip,
rows_nan_threshold = config["PARAMS_FOR_ANALYSIS"]["ROWS_NAN_THRESHOLD"],
cols_nan_threshold = config["PARAMS_FOR_ANALYSIS"]["COLS_NAN_THRESHOLD"]))
files_to_compute.extend(expand(
expand("data/processed/output_population_model/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/{{rows_nan_threshold}}|{{cols_nan_threshold}}_{days_before_threshold}|{days_after_threshold}_{cols_var_threshold}/baseline/{cv_method}/{source}_{day_segment}_{summarised}.csv",
min_valid_hours_per_day=config["OVERALL_COMPLIANCE_HEATMAP"]["MIN_VALID_HOURS_PER_DAY"],
min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"],
days_before_threshold = config["PARAMS_FOR_ANALYSIS"]["PARTICIPANT_DAYS_BEFORE_THRESHOLD"],
days_after_threshold = config["PARAMS_FOR_ANALYSIS"]["PARTICIPANT_DAYS_AFTER_THRESHOLD"],
cols_var_threshold = config["PARAMS_FOR_ANALYSIS"]["COLS_VAR_THRESHOLD"],
cv_method = config["PARAMS_FOR_ANALYSIS"]["CV_METHODS"],
source = config["PARAMS_FOR_ANALYSIS"]["SOURCES"],
day_segment = config["PARAMS_FOR_ANALYSIS"]["DAY_SEGMENTS"],
summarised = config["PARAMS_FOR_ANALYSIS"]["SUMMARISED"]),
zip,
rows_nan_threshold = config["PARAMS_FOR_ANALYSIS"]["ROWS_NAN_THRESHOLD"],
cols_nan_threshold = config["PARAMS_FOR_ANALYSIS"]["COLS_NAN_THRESHOLD"]))
files_to_compute.extend(expand(
expand("data/processed/output_population_model/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/{{rows_nan_threshold}}|{{cols_nan_threshold}}_{days_before_threshold}|{days_after_threshold}_{cols_var_threshold}/{{model}}/{cv_method}/{source}_{day_segment}_{summarised}_{{scaler}}/{result}.csv",
min_valid_hours_per_day=config["OVERALL_COMPLIANCE_HEATMAP"]["MIN_VALID_HOURS_PER_DAY"],
min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"],
days_before_threshold = config["PARAMS_FOR_ANALYSIS"]["PARTICIPANT_DAYS_BEFORE_THRESHOLD"],
days_after_threshold = config["PARAMS_FOR_ANALYSIS"]["PARTICIPANT_DAYS_AFTER_THRESHOLD"],
cols_var_threshold = config["PARAMS_FOR_ANALYSIS"]["COLS_VAR_THRESHOLD"],
cv_method = config["PARAMS_FOR_ANALYSIS"]["CV_METHODS"],
source = config["PARAMS_FOR_ANALYSIS"]["SOURCES"],
day_segment = config["PARAMS_FOR_ANALYSIS"]["DAY_SEGMENTS"],
summarised = config["PARAMS_FOR_ANALYSIS"]["SUMMARISED"],
result = results),
zip,
rows_nan_threshold = rows_nan_thresholds,
cols_nan_threshold = cols_nan_thresholds,
model = models,
scaler = scalers))
rule all:
input:

View File

@ -1,313 +1,401 @@
# Participants to include in the analysis
# You must create a file for each participant named pXXX containing their device_id. This can be done manually or automatically
PIDS: [example01, example02]
# Global var with common day segments
DAY_SEGMENTS: &day_segments
[daily]
# Global timezone
# Use codes from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
# Double check your code, for example EST is not US Eastern Time.
TIMEZONE: &timezone
America/New_York
# See https://www.rapids.science/setup/configuration/#database-credentials
DATABASE_GROUP: &database_group
MY_GROUP
DOWNLOAD_PARTICIPANTS:
IGNORED_DEVICE_IDS: [] # for example "5a1dd68c-6cd1-48fe-ae1e-14344ac5215f"
GROUP: *database_group
# See https://www.rapids.science/setup/configuration/#timezone-of-your-study
TIMEZONE: &timezone
America/New_York
# Download data config
DOWNLOAD_DATASET:
GROUP: *database_group
# See https://www.rapids.science/setup/configuration/#participant-files
PIDS: [example01, example02]
# Readable datetime config
READABLE_DATETIME:
FIXED_TIMEZONE: *timezone
# See https://www.rapids.science/setup/configuration/#automatic-creation-of-participant-files
CREATE_PARTICIPANT_FILES:
SOURCE:
TYPE: AWARE_DEVICE_TABLE #AWARE_DEVICE_TABLE or CSV_FILE
DATABASE_GROUP: *database_group
CSV_FILE_PATH: "data/external/example_participants.csv" # see docs for required format
TIMEZONE: *timezone
PHONE_SECTION:
ADD: TRUE
DEVICE_ID_COLUMN: device_id # column name
IGNORED_DEVICE_IDS: []
FITBIT_SECTION:
ADD: TRUE
DEVICE_ID_COLUMN: device_id # column name
IGNORED_DEVICE_IDS: []
PHONE_VALID_SENSED_BINS:
COMPUTE: False # This flag is automatically ignored (set to True) if you are extracting PHONE_VALID_SENSED_DAYS or screen or Barnett's location features
BIN_SIZE: &bin_size 5 # (in minutes)
# Add as many sensor tables as you have, they all improve the computation of PHONE_VALID_SENSED_BINS and PHONE_VALID_SENSED_DAYS.
# If you are extracting screen or Barnett's location features, screen and locations tables are mandatory.
DB_TABLES: [messages, calls, locations, plugin_google_activity_recognition, plugin_ios_activity_recognition, battery, screen, light, applications_foreground, plugin_studentlife_audio_android, plugin_studentlife_audio, wifi, sensor_wifi, bluetooth, applications_notifications, aware_log, ios_status_monitor, push_notification, significant, timezone, touch, keyboard]
# See https://www.rapids.science/setup/configuration/#time-segments
TIME_SEGMENTS: &time_segments
TYPE: PERIODIC # FREQUENCY, PERIODIC, EVENT
FILE: "example_profile/exampleworkflow_timesegments.csv"
INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE # Only relevant if TYPE=PERIODIC, see docs
PHONE_VALID_SENSED_DAYS:
COMPUTE: False
MIN_VALID_HOURS_PER_DAY: &min_valid_hours_per_day [16, 20] # (out of 24) MIN_HOURS_PER_DAY
MIN_VALID_BINS_PER_HOUR: &min_valid_bins_per_hour [12] # (out of 60min/BIN_SIZE bins)
# Communication SMS features config, TYPES and FEATURES keys need to match
MESSAGES:
COMPUTE: True
DB_TABLE: messages
TYPES : [received, sent]
FEATURES:
received: [count, distinctcontacts, timefirstmessage, timelastmessage, countmostfrequentcontact]
sent: [count, distinctcontacts, timefirstmessage, timelastmessage, countmostfrequentcontact]
DAY_SEGMENTS: *day_segments
# Communication call features config, TYPES and FEATURES keys need to match
CALLS:
COMPUTE: True
DB_TABLE: calls
TYPES: [missed, incoming, outgoing]
FEATURES:
missed: [count, distinctcontacts, timefirstcall, timelastcall, countmostfrequentcontact]
incoming: [count, distinctcontacts, meanduration, sumduration, minduration, maxduration, stdduration, modeduration, entropyduration, timefirstcall, timelastcall, countmostfrequentcontact]
outgoing: [count, distinctcontacts, meanduration, sumduration, minduration, maxduration, stdduration, modeduration, entropyduration, timefirstcall, timelastcall, countmostfrequentcontact]
DAY_SEGMENTS: *day_segments
########################################################################################################################
# PHONE #
########################################################################################################################
APPLICATION_GENRES:
CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
CATALOGUE_FILE: "data/external/stachl_application_genre_catalogue.csv"
UPDATE_CATALOGUE_FILE: false # if CATALOGUE_SOURCE is equal to FILE, whether or not to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
SCRAPE_MISSING_GENRES: false # whether or not to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
# See https://www.rapids.science/setup/configuration/#device-data-source-configuration
PHONE_DATA_CONFIGURATION:
SOURCE:
TYPE: DATABASE
DATABASE_GROUP: *database_group
DEVICE_ID_COLUMN: device_id # column name
TIMEZONE:
TYPE: SINGLE # SINGLE or MULTIPLE
VALUE: *timezone # IF TYPE=SINGLE, see docs
RESAMPLE_FUSED_LOCATION:
CONSECUTIVE_THRESHOLD: 30 # minutes, only replicate location samples to the next sensed bin if the phone did not stop collecting data for more than this threshold
TIME_SINCE_VALID_LOCATION: 720 # minutes, only replicate location samples to consecutive sensed bins if they were logged within this threshold after a valid location row
TIMEZONE: *timezone
# Sensors ------
BARNETT_LOCATION:
COMPUTE: False
DB_TABLE: locations
DAY_SEGMENTS: [daily] # These features are only available on a daily basis
FEATURES: ["hometime","disttravelled","rog","maxdiam","maxhomedist","siglocsvisited","avgflightlen","stdflightlen","avgflightdur","stdflightdur","probpause","siglocentropy","circdnrtn","wkenddayrtn"]
LOCATIONS_TO_USE: ALL # ALL, ALL_EXCEPT_FUSED OR RESAMPLE_FUSED
ACCURACY_LIMIT: 51 # meters, drops location coordinates with an accuracy higher than this. This number means there's a 68% probability the true location is within this radius
TIMEZONE: *timezone
MINUTES_DATA_USED: False # Use this for quality control purposes, how many minutes of data (location coordinates gruped by minute) were used to compute features
PHONE_ACCELEROMETER:
TABLE: accelerometer
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
SRC_FOLDER: "rapids" # inside src/features/phone_accelerometer
SRC_LANGUAGE: "python"
PANDA:
COMPUTE: False
VALID_SENSED_MINUTES: False
FEATURES:
exertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
nonexertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
SRC_FOLDER: "panda" # inside src/features/phone_accelerometer
SRC_LANGUAGE: "python"
DORYAB_LOCATION:
COMPUTE: True
DB_TABLE: locations
DAY_SEGMENTS: *day_segments
FEATURES: ["locationvariance","loglocationvariance","totaldistance","averagespeed","varspeed","circadianmovement","numberofsignificantplaces","numberlocationtransitions","radiusgyration","timeattop1location","timeattop2location","timeattop3location","movingtostaticratio","outlierstimepercent","maxlengthstayatclusters","minlengthstayatclusters","meanlengthstayatclusters","stdlengthstayatclusters","locationentropy","normalizedlocationentropy"]
LOCATIONS_TO_USE: RESAMPLE_FUSED # ALL, ALL_EXCEPT_FUSED OR RESAMPLE_FUSED
DBSCAN_EPS: 10 # meters
DBSCAN_MINSAMPLES: 5
THRESHOLD_STATIC : 1 # km/h
MAXIMUM_GAP_ALLOWED: 300
MINUTES_DATA_USED: False
BLUETOOTH:
COMPUTE: True
DB_TABLE: bluetooth
DAY_SEGMENTS: *day_segments
FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
ACTIVITY_RECOGNITION:
COMPUTE: True
DB_TABLE:
PHONE_ACTIVITY_RECOGNITION:
TABLE:
ANDROID: plugin_google_activity_recognition
IOS: plugin_ios_activity_recognition
DAY_SEGMENTS: *day_segments
FEATURES: ["count","mostcommonactivity","countuniqueactivities","activitychangecount","sumstationary","summobile","sumvehicle"]
EPISODE_THRESHOLD_BETWEEN_ROWS: 5 # minutes. Max time difference for two consecutive rows to be considered within the same battery episode.
PROVIDERS:
RAPIDS:
COMPUTE: True
FEATURES: ["count", "mostcommonactivity", "countuniqueactivities", "durationstationary", "durationmobile", "durationvehicle"]
ACTIVITY_CLASSES:
STATIONARY: ["still", "tilting"]
MOBILE: ["on_foot", "walking", "running", "on_bicycle"]
VEHICLE: ["in_vehicle"]
SRC_FOLDER: "rapids" # inside src/features/phone_activity_recognition
SRC_LANGUAGE: "python"
BATTERY:
COMPUTE: True
DB_TABLE: battery
DAY_SEGMENTS: *day_segments
FEATURES: ["countdischarge", "sumdurationdischarge", "countcharge", "sumdurationcharge", "avgconsumptionrate", "maxconsumptionrate"]
PHONE_APPLICATIONS_FOREGROUND:
TABLE: applications_foreground
APPLICATION_CATEGORIES:
CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
CATALOGUE_FILE: "data/external/stachl_application_genre_catalogue.csv"
UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether or not to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
SCRAPE_MISSING_CATEGORIES: False # whether or not to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
PROVIDERS:
RAPIDS:
COMPUTE: True
SINGLE_CATEGORIES: ["all", "email"]
MULTIPLE_CATEGORIES:
social: ["socialnetworks", "socialmediatools"]
entertainment: ["entertainment", "gamingknowledge", "gamingcasual", "gamingadventure", "gamingstrategy", "gamingtoolscommunity", "gamingroleplaying", "gamingaction", "gaminglogic", "gamingsports", "gamingsimulation"]
SINGLE_APPS: ["top1global", "com.facebook.moments", "com.google.android.youtube", "com.twitter.android"] # There's no entropy for single apps
EXCLUDED_CATEGORIES: ["system_apps"]
EXCLUDED_APPS: ["com.fitbit.FitbitMobile", "com.aware.plugin.upmc.cancer"]
FEATURES: ["count", "timeoffirstuse", "timeoflastuse", "frequencyentropy"]
SRC_FOLDER: "rapids" # inside src/features/phone_applications_foreground
SRC_LANGUAGE: "python"
SCREEN:
COMPUTE: True
DB_TABLE: screen
DAY_SEGMENTS: *day_segments
REFERENCE_HOUR_FIRST_USE: 0
IGNORE_EPISODES_SHORTER_THAN: 0 # in minutes, set to 0 to disable
IGNORE_EPISODES_LONGER_THAN: 0 # in minutes, set to 0 to disable
FEATURES_DELTAS: ["countepisode", "episodepersensedminutes", "sumduration", "maxduration", "minduration", "avgduration", "stdduration", "firstuseafter"]
EPISODE_TYPES: ["unlock"]
PHONE_BATTERY:
TABLE: battery
EPISODE_THRESHOLD_BETWEEN_ROWS: 30 # minutes. Max time difference for two consecutive rows to be considered within the same battery episode.
PROVIDERS:
RAPIDS:
COMPUTE: True
FEATURES: ["countdischarge", "sumdurationdischarge", "countcharge", "sumdurationcharge", "avgconsumptionrate", "maxconsumptionrate"]
SRC_FOLDER: "rapids" # inside src/features/phone_battery
SRC_LANGUAGE: "python"
LIGHT:
COMPUTE: True
DB_TABLE: light
DAY_SEGMENTS: *day_segments
FEATURES: ["count", "maxlux", "minlux", "avglux", "medianlux", "stdlux"]
PHONE_BLUETOOTH:
TABLE: bluetooth
PROVIDERS:
RAPIDS:
COMPUTE: True
FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
SRC_FOLDER: "rapids" # inside src/features/phone_bluetooth
SRC_LANGUAGE: "r"
ACCELEROMETER:
COMPUTE: False
DB_TABLE: accelerometer
DAY_SEGMENTS: *day_segments
FEATURES:
MAGNITUDE: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
EXERTIONAL_ACTIVITY_EPISODE: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
NONEXERTIONAL_ACTIVITY_EPISODE: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
VALID_SENSED_MINUTES: True
PHONE_CALLS:
TABLE: calls
PROVIDERS:
RAPIDS:
COMPUTE: True
CALL_TYPES: [missed, incoming, outgoing]
FEATURES:
missed: [count, distinctcontacts, timefirstcall, timelastcall, countmostfrequentcontact]
incoming: [count, distinctcontacts, meanduration, sumduration, minduration, maxduration, stdduration, modeduration, entropyduration, timefirstcall, timelastcall, countmostfrequentcontact]
outgoing: [count, distinctcontacts, meanduration, sumduration, minduration, maxduration, stdduration, modeduration, entropyduration, timefirstcall, timelastcall, countmostfrequentcontact]
SRC_LANGUAGE: "r"
SRC_FOLDER: "rapids" # inside src/features/phone_calls
APPLICATIONS_FOREGROUND:
COMPUTE: True
DB_TABLE: applications_foreground
DAY_SEGMENTS: *day_segments
SINGLE_CATEGORIES: ["all", "email"]
MULTIPLE_CATEGORIES:
social: ["socialnetworks", "socialmediatools"]
entertainment: ["entertainment", "gamingknowledge", "gamingcasual", "gamingadventure", "gamingstrategy", "gamingtoolscommunity", "gamingroleplaying", "gamingaction", "gaminglogic", "gamingsports", "gamingsimulation"]
SINGLE_APPS: ["top1global", "com.facebook.moments", "com.google.android.youtube", "com.twitter.android"] # There's no entropy for single apps
EXCLUDED_CATEGORIES: ["system_apps"]
EXCLUDED_APPS: ["com.fitbit.FitbitMobile", "com.aware.plugin.upmc.cancer"]
FEATURES: ["count", "timeoffirstuse", "timeoflastuse", "frequencyentropy"]
HEARTRATE:
COMPUTE: True
DB_TABLE: fitbit_data
DAY_SEGMENTS: *day_segments
SUMMARY_FEATURES: ["restinghr"] # calories features' accuracy depend on the accuracy of the participants fitbit profile (e.g. heigh, weight) use with care: ["caloriesoutofrange", "caloriesfatburn", "caloriescardio", "caloriespeak"]
INTRADAY_FEATURES: ["maxhr", "minhr", "avghr", "medianhr", "modehr", "stdhr", "diffmaxmodehr", "diffminmodehr", "entropyhr", "minutesonoutofrangezone", "minutesonfatburnzone", "minutesoncardiozone", "minutesonpeakzone"]
STEP:
COMPUTE: True
DB_TABLE: fitbit_data
DAY_SEGMENTS: *day_segments
EXCLUDE_SLEEP:
EXCLUDE: False
TYPE: FIXED # FIXED OR FITBIT_BASED (CONFIGURE FITBIT's SLEEP DB_TABLE)
FIXED:
START: "23:00"
END: "07:00"
FEATURES:
ALL_STEPS: ["sumallsteps", "maxallsteps", "minallsteps", "avgallsteps", "stdallsteps"]
SEDENTARY_BOUT: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration"]
ACTIVE_BOUT: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration"]
THRESHOLD_ACTIVE_BOUT: 10 # steps
INCLUDE_ZERO_STEP_ROWS: False
SLEEP:
COMPUTE: True
DB_TABLE: fitbit_data
DAY_SEGMENTS: *day_segments
SLEEP_TYPES: ["main", "nap", "all"]
SUMMARY_FEATURES: ["sumdurationafterwakeup", "sumdurationasleep", "sumdurationawake", "sumdurationtofallasleep", "sumdurationinbed", "avgefficiency", "countepisode"]
WIFI:
COMPUTE: True
DB_TABLE:
VISIBLE_ACCESS_POINTS: "wifi" # if you only have a CONNECTED_ACCESS_POINTS table, set this value to ""
CONNECTED_ACCESS_POINTS: "sensor_wifi" # if you only have a VISIBLE_ACCESS_POINTS table, set this value to ""
DAY_SEGMENTS: *day_segments
FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
CONVERSATION:
COMPUTE: True
DB_TABLE:
PHONE_CONVERSATION:
TABLE:
ANDROID: plugin_studentlife_audio_android
IOS: plugin_studentlife_audio
DAY_SEGMENTS: *day_segments
FEATURES: ["minutessilence", "minutesnoise", "minutesvoice", "minutesunknown","sumconversationduration","avgconversationduration",
"sdconversationduration","minconversationduration","maxconversationduration","timefirstconversation","timelastconversation","sumenergy",
"avgenergy","sdenergy","minenergy","maxenergy","silencesensedfraction","noisesensedfraction",
PROVIDERS:
RAPIDS:
COMPUTE: True
FEATURES: ["minutessilence", "minutesnoise", "minutesvoice", "minutesunknown","sumconversationduration","avgconversationduration",
"sdconversationduration","minconversationduration","maxconversationduration","timefirstconversation","timelastconversation","noisesumenergy",
"noiseavgenergy","noisesdenergy","noiseminenergy","noisemaxenergy","voicesumenergy",
"voiceavgenergy","voicesdenergy","voiceminenergy","voicemaxenergy","silencesensedfraction","noisesensedfraction",
"voicesensedfraction","unknownsensedfraction","silenceexpectedfraction","noiseexpectedfraction","voiceexpectedfraction",
"unknownexpectedfraction","countconversation"]
RECORDINGMINUTES: 1
PAUSEDMINUTES : 3
RECORDING_MINUTES: 1
PAUSED_MINUTES : 3
SRC_FOLDER: "rapids" # inside src/features/phone_conversation
SRC_LANGUAGE: "python"
### Visualizations ################################################################
HEATMAP_FEATURES_CORRELATIONS:
PHONE_DATA_YIELD:
SENSORS: [PHONE_ACCELEROMETER, PHONE_ACTIVITY_RECOGNITION, PHONE_APPLICATIONS_FOREGROUND, PHONE_BATTERY, PHONE_BLUETOOTH, PHONE_CALLS, PHONE_CONVERSATION, PHONE_LIGHT, PHONE_LOCATIONS, PHONE_MESSAGES, PHONE_SCREEN, PHONE_WIFI_CONNECTED, PHONE_WIFI_VISIBLE]
PROVIDERS:
RAPIDS:
COMPUTE: True
FEATURES: [ratiovalidyieldedminutes, ratiovalidyieldedhours]
MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS: 0.5 # 0 to 1 representing the number of minutes with at least
SRC_LANGUAGE: "r"
SRC_FOLDER: "rapids" # inside src/features/phone_data_yield
PHONE_LIGHT:
TABLE: light
PROVIDERS:
RAPIDS:
COMPUTE: True
FEATURES: ["count", "maxlux", "minlux", "avglux", "medianlux", "stdlux"]
SRC_FOLDER: "rapids" # inside src/features/phone_light
SRC_LANGUAGE: "python"
PHONE_LOCATIONS:
TABLE: locations
LOCATIONS_TO_USE: FUSED_RESAMPLED # ALL, GPS OR FUSED_RESAMPLED
FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD: 30 # minutes, only replicate location samples to the next sensed bin if the phone did not stop collecting data for more than this threshold
FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION: 720 # minutes, only replicate location samples to consecutive sensed bins if they were logged within this threshold after a valid location row
PROVIDERS:
DORYAB:
COMPUTE: True
FEATURES: ["locationvariance","loglocationvariance","totaldistance","averagespeed","varspeed","circadianmovement","numberofsignificantplaces","numberlocationtransitions","radiusgyration","timeattop1location","timeattop2location","timeattop3location","movingtostaticratio","outlierstimepercent","maxlengthstayatclusters","minlengthstayatclusters","meanlengthstayatclusters","stdlengthstayatclusters","locationentropy","normalizedlocationentropy"]
DBSCAN_EPS: 10 # meters
DBSCAN_MINSAMPLES: 5
THRESHOLD_STATIC : 1 # km/h
MAXIMUM_GAP_ALLOWED: 300
MINUTES_DATA_USED: False
SAMPLING_FREQUENCY: 0
SRC_FOLDER: "doryab" # inside src/features/phone_locations
SRC_LANGUAGE: "python"
BARNETT:
COMPUTE: False
FEATURES: ["hometime","disttravelled","rog","maxdiam","maxhomedist","siglocsvisited","avgflightlen","stdflightlen","avgflightdur","stdflightdur","probpause","siglocentropy","circdnrtn","wkenddayrtn"]
ACCURACY_LIMIT: 51 # meters, drops location coordinates with an accuracy higher than this. This number means there's a 68% probability the true location is within this radius
TIMEZONE: *timezone
MINUTES_DATA_USED: False # Use this for quality control purposes, how many minutes of data (location coordinates gruped by minute) were used to compute features
SRC_FOLDER: "barnett" # inside src/features/phone_locations
SRC_LANGUAGE: "r"
PHONE_MESSAGES:
TABLE: messages
PROVIDERS:
RAPIDS:
COMPUTE: True
MESSAGES_TYPES : [received, sent]
FEATURES:
received: [count, distinctcontacts, timefirstmessage, timelastmessage, countmostfrequentcontact]
sent: [count, distinctcontacts, timefirstmessage, timelastmessage, countmostfrequentcontact]
SRC_LANGUAGE: "r"
SRC_FOLDER: "rapids" # inside src/features/phone_messages
PHONE_SCREEN:
TABLE: screen
PROVIDERS:
RAPIDS:
COMPUTE: True
REFERENCE_HOUR_FIRST_USE: 0
IGNORE_EPISODES_SHORTER_THAN: 0 # in minutes, set to 0 to disable
IGNORE_EPISODES_LONGER_THAN: 0 # in minutes, set to 0 to disable
FEATURES: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration", "firstuseafter"] # "episodepersensedminutes" needs to be added later
EPISODE_TYPES: ["unlock"]
SRC_FOLDER: "rapids" # inside src/features/phone_screen
SRC_LANGUAGE: "python"
PHONE_WIFI_CONNECTED:
TABLE: "sensor_wifi"
PROVIDERS:
RAPIDS:
COMPUTE: True
FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
SRC_FOLDER: "rapids" # inside src/features/phone_wifi_connected
SRC_LANGUAGE: "r"
PHONE_WIFI_VISIBLE:
TABLE: "wifi"
PROVIDERS:
RAPIDS:
COMPUTE: True
FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
SRC_FOLDER: "rapids" # inside src/features/phone_wifi_visible
SRC_LANGUAGE: "r"
########################################################################################################################
# FITBIT #
########################################################################################################################
# See https://www.rapids.science/latest/setup/configuration/#device-data-source-configuration
FITBIT_DATA_CONFIGURATION:
SOURCE:
TYPE: DATABASE # DATABASE or FILES (set each [FITBIT_SENSOR][TABLE] attribute with a table name or a file path accordingly)
COLUMN_FORMAT: JSON # JSON or PLAIN_TEXT
DATABASE_GROUP: *database_group
DEVICE_ID_COLUMN: device_id # column name
TIMEZONE:
TYPE: SINGLE # Fitbit only supports SINGLE timezones
VALUE: *timezone # see docs
HIDDEN:
SINGLE_FITBIT_TABLE: TRUE
FITBIT_HEARTRATE_SUMMARY:
TABLE: fitbit_data
PROVIDERS:
RAPIDS:
COMPUTE: True
FEATURES: ["maxrestinghr", "minrestinghr", "avgrestinghr", "medianrestinghr", "moderestinghr", "stdrestinghr", "diffmaxmoderestinghr", "diffminmoderestinghr", "entropyrestinghr"] # calories features' accuracy depend on the accuracy of the participants fitbit profile (e.g. height, weight) use these with care: ["sumcaloriesoutofrange", "maxcaloriesoutofrange", "mincaloriesoutofrange", "avgcaloriesoutofrange", "mediancaloriesoutofrange", "stdcaloriesoutofrange", "entropycaloriesoutofrange", "sumcaloriesfatburn", "maxcaloriesfatburn", "mincaloriesfatburn", "avgcaloriesfatburn", "mediancaloriesfatburn", "stdcaloriesfatburn", "entropycaloriesfatburn", "sumcaloriescardio", "maxcaloriescardio", "mincaloriescardio", "avgcaloriescardio", "mediancaloriescardio", "stdcaloriescardio", "entropycaloriescardio", "sumcaloriespeak", "maxcaloriespeak", "mincaloriespeak", "avgcaloriespeak", "mediancaloriespeak", "stdcaloriespeak", "entropycaloriespeak"]
SRC_FOLDER: "rapids" # inside src/features/fitbit_heartrate_summary
SRC_LANGUAGE: "python"
FITBIT_HEARTRATE_INTRADAY:
TABLE: fitbit_data
PROVIDERS:
RAPIDS:
COMPUTE: True
FEATURES: ["maxhr", "minhr", "avghr", "medianhr", "modehr", "stdhr", "diffmaxmodehr", "diffminmodehr", "entropyhr", "minutesonoutofrangezone", "minutesonfatburnzone", "minutesoncardiozone", "minutesonpeakzone"]
SRC_FOLDER: "rapids" # inside src/features/fitbit_heartrate_intraday
SRC_LANGUAGE: "python"
FITBIT_SLEEP_SUMMARY:
TABLE: fitbit_data
SLEEP_EPISODE_TIMESTAMP: end # summary sleep episodes are considered as events based on either the start timestamp or end timestamp.
PROVIDERS:
RAPIDS:
COMPUTE: True
FEATURES: ["countepisode", "avgefficiency", "sumdurationafterwakeup", "sumdurationasleep", "sumdurationawake", "sumdurationtofallasleep", "sumdurationinbed", "avgdurationafterwakeup", "avgdurationasleep", "avgdurationawake", "avgdurationtofallasleep", "avgdurationinbed"]
SLEEP_TYPES: ["main", "nap", "all"]
SRC_FOLDER: "rapids" # inside src/features/fitbit_sleep_summary
SRC_LANGUAGE: "python"
FITBIT_STEPS_SUMMARY:
TABLE: fitbit_data
PROVIDERS:
RAPIDS:
COMPUTE: True
FEATURES: ["maxsumsteps", "minsumsteps", "avgsumsteps", "mediansumsteps", "stdsumsteps"]
SRC_FOLDER: "rapids" # inside src/features/fitbit_steps_summary
SRC_LANGUAGE: "python"
FITBIT_STEPS_INTRADAY:
TABLE: fitbit_data
PROVIDERS:
RAPIDS:
COMPUTE: True
FEATURES:
STEPS: ["sum", "max", "min", "avg", "std"]
SEDENTARY_BOUT: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration"]
ACTIVE_BOUT: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration"]
THRESHOLD_ACTIVE_BOUT: 10 # steps
INCLUDE_ZERO_STEP_ROWS: False
SRC_FOLDER: "rapids" # inside src/features/fitbit_steps_intraday
SRC_LANGUAGE: "python"
########################################################################################################################
# PLOTS #
########################################################################################################################
HISTOGRAM_PHONE_DATA_YIELD:
PLOT: True
HEATMAP_SENSORS_PER_MINUTE_PER_TIME_SEGMENT:
PLOT: True
HEATMAP_SENSOR_ROW_COUNT_PER_TIME_SEGMENT:
PLOT: True
SENSORS: [PHONE_ACCELEROMETER, PHONE_ACTIVITY_RECOGNITION, PHONE_APPLICATIONS_FOREGROUND, PHONE_BATTERY, PHONE_BLUETOOTH, PHONE_CALLS, PHONE_CONVERSATION, PHONE_LIGHT, PHONE_LOCATIONS, PHONE_MESSAGES, PHONE_SCREEN, PHONE_WIFI_CONNECTED, PHONE_WIFI_VISIBLE]
HEATMAP_PHONE_DATA_YIELD_PER_PARTICIPANT_PER_TIME_SEGMENT:
PLOT: True
HEATMAP_FEATURE_CORRELATION_MATRIX:
PLOT: TRUE
MIN_ROWS_RATIO: 0.5
MIN_VALID_HOURS_PER_DAY: *min_valid_hours_per_day
MIN_VALID_BINS_PER_HOUR: *min_valid_bins_per_hour
PHONE_FEATURES: [activity_recognition, applications_foreground, battery, calls_incoming, calls_missed, calls_outgoing, conversation, light, location_doryab, messages_received, messages_sent, screen]
FITBIT_FEATURES: [fitbit_heartrate, fitbit_step, fitbit_sleep]
CORR_THRESHOLD: 0.1
CORR_METHOD: "pearson" # choose from {"pearson", "kendall", "spearman"}
HISTOGRAM_VALID_SENSED_HOURS:
PLOT: True
MIN_VALID_HOURS_PER_DAY: *min_valid_hours_per_day
MIN_VALID_BINS_PER_HOUR: *min_valid_bins_per_hour
HEATMAP_DAYS_BY_SENSORS:
PLOT: True
MIN_VALID_HOURS_PER_DAY: *min_valid_hours_per_day
MIN_VALID_BINS_PER_HOUR: *min_valid_bins_per_hour
EXPECTED_NUM_OF_DAYS: -1
DB_TABLES: [applications_foreground, battery, bluetooth, calls, light, locations, messages, screen, wifi, sensor_wifi, plugin_google_activity_recognition, plugin_ios_activity_recognition, plugin_studentlife_audio_android, plugin_studentlife_audio]
HEATMAP_SENSED_BINS:
PLOT: True
BIN_SIZE: *bin_size
########################################################################################################################
# Analysis Workflow Example #
########################################################################################################################
OVERALL_COMPLIANCE_HEATMAP:
PLOT: True
ONLY_SHOW_VALID_DAYS: False
EXPECTED_NUM_OF_DAYS: -1
BIN_SIZE: *bin_size
MIN_VALID_HOURS_PER_DAY: *min_valid_hours_per_day
MIN_VALID_BINS_PER_HOUR: *min_valid_bins_per_hour
### Example Analysis ################################################################
PARAMS_FOR_ANALYSIS:
COMPUTE: True
GROUNDTRUTH_TABLE: participant_info
TARGET_TABLE: participant_target
SOURCES: &sources ["phone_features", "fitbit_features", "phone_fitbit_features"]
DAY_SEGMENTS: *day_segments
PHONE_FEATURES: [activity_recognition, applications_foreground, battery, bluetooth, calls_incoming, calls_missed, calls_outgoing, conversation, light, location_doryab, messages_received, messages_sent, screen, wifi]
FITBIT_FEATURES: [fitbit_heartrate, fitbit_step, fitbit_sleep]
PHONE_FITBIT_FEATURES: "" # This array is merged in the input_merge_features_of_single_participant function in models.snakefile
DEMOGRAPHIC_FEATURES: [age, gender, inpatientdays]
CATEGORICAL_DEMOGRAPHIC_FEATURES: ["gender"]
FEATURES_EXCLUDE_DAY_IDX: True
CATEGORICAL_OPERATORS: [mostcommon]
# Whether or not to include only days with enough valid sensed hours
# logic can be found in rule phone_valid_sensed_days of rules/preprocessing.snakefile
DROP_VALID_SENSED_DAYS:
ENABLED: True
# Whether or not to include certain days in the analysis, logic can be found in rule days_to_analyse of rules/mystudy.snakefile
# If you want to include all days downloaded for each participant, set ENABLED to False
DAYS_TO_ANALYSE:
ENABLED: True
DAYS_BEFORE_SURGERY: 6 #15
DAYS_IN_HOSPITAL: F # T or F
DAYS_AFTER_DISCHARGE: 5 #7
DEMOGRAPHIC:
TABLE: participant_info
FEATURES: [age, gender, inpatientdays]
CATEGORICAL_FEATURES: [gender]
SOURCE:
DATABASE_GROUP: *database_group
TIMEZONE: *timezone
TARGET:
TABLE: participant_target
SOURCE:
DATABASE_GROUP: *database_group
TIMEZONE: *timezone
# Cleaning Parameters
COLS_NAN_THRESHOLD: [0.1, 0.3]
COLS_NAN_THRESHOLD: 0.3
COLS_VAR_THRESHOLD: True
ROWS_NAN_THRESHOLD: [0.1, 0.3]
PARTICIPANT_DAYS_BEFORE_THRESHOLD: 3
PARTICIPANT_DAYS_AFTER_THRESHOLD: 3
# Extract summarised features from daily features with any of the following substrings
NUMERICAL_OPERATORS: ["count", "sum", "length", "avg", "restinghr"]
CATEGORICAL_OPERATORS: ["mostcommon"]
MODEL_NAMES: ["LogReg", "kNN", "SVM", "DT", "RF", "GB", "XGBoost", "LightGBM"]
CV_METHODS: ["LeaveOneOut"]
SUMMARISED: ["notsummarised"] # "summarised" or "notsummarised"
RESULT_COMPONENTS: ["fold_predictions", "fold_metrics", "overall_results", "fold_feature_importances"]
ROWS_NAN_THRESHOLD: 0.3
DATA_YIELDED_HOURS_RATIO_THRESHOLD: 0.75
MODEL_NAMES: [LogReg, kNN , SVM, DT, RF, GB, XGBoost, LightGBM]
CV_METHODS: [LeaveOneOut]
RESULT_COMPONENTS: [fold_predictions, fold_metrics, overall_results, fold_feature_importances]
MODEL_SCALER:
LogReg: ["notnormalized", "minmaxscaler", "standardscaler", "robustscaler"]
kNN: ["minmaxscaler", "standardscaler", "robustscaler"]
SVM: ["minmaxscaler", "standardscaler", "robustscaler"]
DT: ["notnormalized"]
RF: ["notnormalized"]
GB: ["notnormalized"]
XGBoost: ["notnormalized"]
LightGBM: ["notnormalized"]
LogReg: [notnormalized, minmaxscaler, standardscaler, robustscaler]
kNN: [minmaxscaler, standardscaler, robustscaler]
SVM: [minmaxscaler, standardscaler, robustscaler]
DT: [notnormalized]
RF: [notnormalized]
GB: [notnormalized]
XGBoost: [notnormalized]
LightGBM: [notnormalized]
MODEL_HYPERPARAMS:
LogReg:
{"clf__C": [0.01, 0.1, 1, 10, 100], "clf__solver": ["newton-cg", "lbfgs", "liblinear", "saga"], "clf__penalty": ["l2"]}
kNN:
{"clf__n_neighbors": [1, 3, 5], "clf__weights": ["uniform", "distance"], "clf__metric": ["euclidean", "manhattan", "minkowski"]}
{"clf__n_neighbors": [3, 5, 7], "clf__weights": ["uniform", "distance"], "clf__metric": ["euclidean", "manhattan", "minkowski"]}
SVM:
{"clf__C": [0.01, 0.1, 1, 10, 100], "clf__gamma": ["scale", "auto"], "clf__kernel": ["rbf", "poly", "sigmoid"]}
DT:
{"clf__criterion": ["gini", "entropy"], "clf__max_depth": [null, 3, 5, 7, 9], "clf__max_features": [null, "auto", "sqrt", "log2"]}
{"clf__criterion": ["gini", "entropy"], "clf__max_depth": [null, 3, 7, 15], "clf__max_features": [null, "auto", "sqrt", "log2"]}
RF:
{"clf__n_estimators": [2, 5, 10, 100],"clf__max_depth": [null, 3, 5, 7, 9]}
{"clf__n_estimators": [10, 100, 200],"clf__max_depth": [null, 3, 7, 15]}
GB:
{"clf__learning_rate": [0.01, 0.1, 1], "clf__n_estimators": [5, 10, 100, 200], "clf__subsample": [0.5, 0.7, 1.0], "clf__max_depth": [3, 5, 7, 9]}
{"clf__learning_rate": [0.01, 0.1, 1], "clf__n_estimators": [10, 100, 200], "clf__subsample": [0.5, 0.7, 1.0], "clf__max_depth": [null, 3, 5, 7]}
XGBoost:
{"clf__learning_rate": [0.01, 0.1, 1], "clf__n_estimators": [5, 10, 100, 200], "clf__num_leaves": [5, 16, 31, 62]}
{"clf__learning_rate": [0.01, 0.1, 1], "clf__n_estimators": [10, 100, 200], "clf__max_depth": [3, 5, 7]}
LightGBM:
{"clf__learning_rate": [0.01, 0.1, 1], "clf__n_estimators": [5, 10, 100, 200], "clf__num_leaves": [5, 16, 31, 62]}
{"clf__learning_rate": [0.01, 0.1, 1], "clf__n_estimators": [10, 100, 200], "clf__num_leaves": [3, 5, 7], "clf__colsample_bytree": [0.6, 0.8, 1]}

View File

@ -0,0 +1,2 @@
label,start_time,length,repeats_on,repeats_value
daily,00:00:00,23H 59M 59S,every_day,0
1 label start_time length repeats_on repeats_value
2 daily 00:00:00 23H 59M 59S every_day 0

119
mkdocs.yml 100644
View File

@ -0,0 +1,119 @@
site_name: RAPIDS
markdown_extensions:
- toc:
permalink: True
- admonition
- smarty
- wikilinks
- codehilite:
linenums: True
# - urlize # requires: pip install git+https://github.com/r0wb0t/markdown-urlize.git
- pymdownx.arithmatex:
generic: true
- pymdownx.betterem:
smart_enable: all
- pymdownx.caret
- pymdownx.critic
- pymdownx.details
- pymdownx.emoji:
emoji_index: !!python/name:materialx.emoji.twemoji
emoji_generator: !!python/name:materialx.emoji.to_svg
- pymdownx.highlight
- pymdownx.inlinehilite
- pymdownx.magiclink
- pymdownx.mark
- pymdownx.smartsymbols
- pymdownx.superfences
- pymdownx.tabbed
- pymdownx.tasklist:
custom_checkbox: True
- pymdownx.tilde
- attr_list
- pymdownx.keys
extra:
version:
method: mike
social:
- icon: fontawesome/brands/twitter
link: 'https://twitter.com/julio_ui'
extra_javascript:
- https://polyfill.io/v3/polyfill.min.js?features=es6
- https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js
- javascripts/extra.js
repo_name: 'carissalow/rapids'
repo_url: 'https://github.com/carissalow/rapids'
copyright: 'Released under AGPL'
theme:
name: material
icon:
logo: material/air-filter
palette:
- scheme: default
primary: blue
accent: blue
toggle:
icon: material/toggle-switch
name: Switch to light mode
- scheme: slate
primary: blue
accent: blue
toggle:
icon: material/toggle-switch-off-outline
name: Switch to dark mode
features:
- navigation.sections
- search.suggest
- search.highlight
extra_css:
- stylesheets/extra.css
nav:
- Home: 'index.md'
- Setup:
- File Structure: file-structure.md
- Installation: 'setup/installation.md'
- Configuration: setup/configuration.md
- Execution: setup/execution.md
- Example Workflows:
- Minimal: workflow-examples/minimal.md
- Analysis: workflow-examples/analysis.md
- Behavioral Features:
- Introduction: features/feature-introduction.md
- Phone:
- Phone Accelerometer: features/phone-accelerometer.md
- Phone Activity Recognition: features/phone-activity-recognition.md
- Phone Applications Foreground: features/phone-applications-foreground.md
- Phone Battery: features/phone-battery.md
- Phone Bluetooth: features/phone-bluetooth.md
- Phone Calls: features/phone-calls.md
- Phone Conversation: features/phone-conversation.md
- Phone Data Yield: features/phone-data-yield.md
- Phone Light: features/phone-light.md
- Phone Locations: features/phone-locations.md
- Phone Messages: features/phone-messages.md
- Phone Screen: features/phone-screen.md
- Phone WiFI Connected: features/phone-wifi-connected.md
- Phone WiFI Visible: features/phone-wifi-visible.md
- Fitbit:
- Fitbit Heart Rate Summary: features/fitbit-heartrate-summary.md
- Fitbit Heart Rate Intraday: features/fitbit-heartrate-intraday.md
- Fitbit Sleep Summary: features/fitbit-sleep-summary.md
- Fitbit Steps Summary: features/fitbit-steps-summary.md
- Fitbit Steps Intraday: features/fitbit-steps-intraday.md
- Add New Features: features/add-new-features.md
- Visualizations:
- Data Quality: visualizations/data-quality-visualizations.md
- Features: visualizations/feature-visualizations.md
- Developers:
- Remote Support: developers/remote-support.md
- Virtual Environments: developers/virtual-environments.md
- Documentation: developers/documentation.md
- Testing: developers/testing.md
- Test cases: developers/test-cases.md
- Others:
- Migrating from beta: migrating-from-old-versions.md
- Code of Conduct: code_of_conduct.md
- FAQ: faq.md
- Team: team.md
- Change Log: change-log.md
- Citation: citation.md

4
rapids 100755
View File

@ -0,0 +1,4 @@
#!/usr/bin/env python
from sys import argv
import subprocess
subprocess.run(" ".join(["snakemake", "-R", "`snakemake --list-params-changes`"] + argv[1:]), shell=True)

View File

@ -86,12 +86,12 @@
"Repository": "CRAN",
"Hash": "e031418365a7f7a766181ab5a41a5716"
},
"RMySQL": {
"Package": "RMySQL",
"Version": "0.10.20",
"RMariaDB": {
"Package": "RMariaDB",
"Version": "1.0.10",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "022066398851453f187950137e21daad"
"Hash": "d149479ea03e5cbb1e788449ef5e04b4"
},
"Rcpp": {
"Package": "Rcpp",
@ -156,6 +156,20 @@
"Repository": "CRAN",
"Hash": "543776ae6848fde2f48ff3816d0628bc"
},
"bit": {
"Package": "bit",
"Version": "4.0.4",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "f36715f14d94678eea9933af927bc15d"
},
"bit64": {
"Package": "bit64",
"Version": "4.0.5",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "9fe98599ca456d6552421db0d6772d8f"
},
"boot": {
"Package": "boot",
"Version": "1.3-24",

View File

@ -1,6 +1,8 @@
local({
options(tidyverse.quiet = TRUE)
# the requested version of renv
version <- "0.10.0"
@ -302,7 +304,7 @@ local({
renv_bootstrap_validate_version(version)
# load the project
renv::load(project)
renv::load(project, quiet = TRUE)
TRUE

View File

@ -1,62 +1,10 @@
# Common.smk ##########################################################################################################
def infer_participant_platform(participant_file):
with open(participant_file, encoding="ISO-8859-1") as external_file:
external_file_content = external_file.readlines()
platforms = external_file_content[1].strip().split(",")
if platforms[0] == "multiple" or (len(platforms) > 1 and "android" in platforms and "ios" in platforms):
platform = "android"
else:
platform = platforms[0]
if platform not in ["android", "ios"]:
raise ValueError("Platform (line 2) in a participant file should be 'android', 'ios', or 'multiple'. You typed '" + platforms + "'")
return platform
# Preprocessing.smk ####################################################################################################
def optional_phone_sensed_bins_input(wildcards):
platform = infer_participant_platform("data/external/"+wildcards.pid)
if platform == "android":
tables_platform = [table for table in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"] if table not in [config["CONVERSATION"]["DB_TABLE"]["IOS"], config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]]] # for android, discard any ios tables that may exist
elif platform == "ios":
tables_platform = [table for table in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"] if table not in [config["CONVERSATION"]["DB_TABLE"]["ANDROID"], config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["ANDROID"]]] # for ios, discard any android tables that may exist
return expand("data/raw/{{pid}}/{table}_with_datetime.csv", table = tables_platform)
# Features.smk #########################################################################################################
def optional_ar_input(wildcards):
platform = infer_participant_platform("data/external/"+wildcards.pid)
if platform == "android":
return ["data/raw/{pid}/" + config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["ANDROID"] + "_with_datetime_unified.csv",
"data/processed/{pid}/" + config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["ANDROID"] + "_deltas.csv"]
elif platform == "ios":
return ["data/raw/{pid}/"+config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]+"_with_datetime_unified.csv",
"data/processed/{pid}/"+config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]+"_deltas.csv"]
def optional_conversation_input(wildcards):
platform = infer_participant_platform("data/external/"+wildcards.pid)
if platform == "android":
return ["data/raw/{pid}/" + config["CONVERSATION"]["DB_TABLE"]["ANDROID"] + "_with_datetime_unified.csv"]
elif platform == "ios":
return ["data/raw/{pid}/" + config["CONVERSATION"]["DB_TABLE"]["IOS"] + "_with_datetime_unified.csv"]
def optional_location_barnett_input(wildcards):
if config["BARNETT_LOCATION"]["LOCATIONS_TO_USE"] == "RESAMPLE_FUSED":
return expand("data/raw/{{pid}}/{sensor}_resampled.csv", sensor=config["BARNETT_LOCATION"]["DB_TABLE"])
else:
return expand("data/raw/{{pid}}/{sensor}_with_datetime.csv", sensor=config["BARNETT_LOCATION"]["DB_TABLE"])
def optional_location_doryab_input(wildcards):
if config["DORYAB_LOCATION"]["LOCATIONS_TO_USE"] == "RESAMPLE_FUSED":
return expand("data/raw/{{pid}}/{sensor}_resampled.csv", sensor=config["DORYAB_LOCATION"]["DB_TABLE"])
else:
return expand("data/raw/{{pid}}/{sensor}_with_datetime.csv", sensor=config["DORYAB_LOCATION"]["DB_TABLE"])
def find_features_files(wildcards):
feature_files = []
for provider_key, provider in config[(wildcards.sensor_key).upper()]["PROVIDERS"].items():
if provider["COMPUTE"]:
feature_files.extend(expand("data/interim/{{pid}}/{sensor_key}_features/{sensor_key}_{language}_{provider_key}.csv", sensor_key=wildcards.sensor_key.lower(), language=provider["SRC_LANGUAGE"].lower(), provider_key=provider_key.lower()))
return(feature_files)
def optional_steps_sleep_input(wildcards):
if config["STEP"]["EXCLUDE_SLEEP"]["EXCLUDE"] == True and config["STEP"]["EXCLUDE_SLEEP"]["TYPE"] == "FITBIT_BASED":
@ -64,50 +12,13 @@ def optional_steps_sleep_input(wildcards):
else:
return []
def optional_wifi_input(wildcards):
if len(config["WIFI"]["DB_TABLE"]["VISIBLE_ACCESS_POINTS"]) > 0 and len(config["WIFI"]["DB_TABLE"]["CONNECTED_ACCESS_POINTS"]) == 0:
return {"visible_access_points": expand("data/raw/{{pid}}/{sensor}_with_datetime.csv", sensor=config["WIFI"]["DB_TABLE"]["VISIBLE_ACCESS_POINTS"])}
elif len(config["WIFI"]["DB_TABLE"]["VISIBLE_ACCESS_POINTS"]) == 0 and len(config["WIFI"]["DB_TABLE"]["CONNECTED_ACCESS_POINTS"]) > 0:
return {"connected_access_points": expand("data/raw/{{pid}}/{sensor}_with_datetime.csv", sensor=config["WIFI"]["DB_TABLE"]["CONNECTED_ACCESS_POINTS"])}
elif len(config["WIFI"]["DB_TABLE"]["VISIBLE_ACCESS_POINTS"]) > 0 and len(config["WIFI"]["DB_TABLE"]["CONNECTED_ACCESS_POINTS"]) > 0:
return {"visible_access_points": expand("data/raw/{{pid}}/{sensor}_with_datetime.csv", sensor=config["WIFI"]["DB_TABLE"]["VISIBLE_ACCESS_POINTS"]), "connected_access_points": expand("data/raw/{{pid}}/{sensor}_with_datetime.csv", sensor=config["WIFI"]["DB_TABLE"]["CONNECTED_ACCESS_POINTS"])}
else:
raise ValueError("If you are computing WIFI features you need to provide either VISIBLE_ACCESS_POINTS, CONNECTED_ACCESS_POINTS or both")
def input_merge_sensor_features_for_individual_participants(wildcards):
feature_files = []
for config_key in config.keys():
if config_key.startswith(("PHONE", "FITBIT")) and "PROVIDERS" in config[config_key]:
for provider_key, provider in config[config_key]["PROVIDERS"].items():
if "COMPUTE" in provider.keys() and provider["COMPUTE"]:
feature_files.append("data/processed/features/{pid}/" + config_key.lower() + ".csv")
break
return feature_files
# Models.smk ###########################################################################################################
def input_merge_features_of_single_participant(wildcards):
if wildcards.source == "phone_fitbit_features":
return expand("data/processed/{pid}/{features}_{day_segment}.csv", pid=wildcards.pid, features=config["PARAMS_FOR_ANALYSIS"]["PHONE_FEATURES"] + config["PARAMS_FOR_ANALYSIS"]["FITBIT_FEATURES"], day_segment=wildcards.day_segment)
else:
return expand("data/processed/{pid}/{features}_{day_segment}.csv", pid=wildcards.pid, features=config["PARAMS_FOR_ANALYSIS"][wildcards.source.upper()], day_segment=wildcards.day_segment)
def optional_input_days_to_include(wildcards):
if config["PARAMS_FOR_ANALYSIS"]["DAYS_TO_ANALYSE"]["ENABLED"]:
# This input automatically trigers the rule days_to_analyse in mystudy.snakefile
return ["data/interim/{pid}/days_to_analyse" + \
"_" + str(config["PARAMS_FOR_ANALYSIS"]["DAYS_TO_ANALYSE"]["DAYS_BEFORE_SURGERY"]) + \
"_" + str(config["PARAMS_FOR_ANALYSIS"]["DAYS_TO_ANALYSE"]["DAYS_IN_HOSPITAL"]) + \
"_" + str(config["PARAMS_FOR_ANALYSIS"]["DAYS_TO_ANALYSE"]["DAYS_AFTER_DISCHARGE"]) + ".csv"]
else:
return []
def optional_input_valid_sensed_days(wildcards):
if config["PARAMS_FOR_ANALYSIS"]["DROP_VALID_SENSED_DAYS"]["ENABLED"]:
# This input automatically trigers the rule phone_valid_sensed_days in preprocessing.snakefile
return ["data/interim/{pid}/phone_valid_sensed_days_{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins.csv"]
else:
return []
# Reports.smk ###########################################################################################################
def optional_heatmap_days_by_sensors_input(wildcards):
platform = infer_participant_platform("data/external/"+wildcards.pid)
if platform == "android":
tables_platform = [table for table in config["HEATMAP_DAYS_BY_SENSORS"]["DB_TABLES"] if table not in [config["CONVERSATION"]["DB_TABLE"]["IOS"], config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]]] # for android, discard any ios tables that may exist
elif platform == "ios":
tables_platform = [table for table in config["HEATMAP_DAYS_BY_SENSORS"]["DB_TABLES"] if table not in [config["CONVERSATION"]["DB_TABLE"]["ANDROID"], config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["ANDROID"]]] # for ios, discard any android tables that may exist
return expand("data/raw/{{pid}}/{table}_with_datetime.csv", table = tables_platform)

Some files were not shown because too many files have changed in this diff Show More