Compare commits

...

610 Commits

Author SHA1 Message Date
junos 63f5a526fc Bring back requested fields in config.yaml.
Update coding files based on 7e565c34db98265afcda922a337493781fdd8ed5 in supermodule.
2023-04-19 11:07:58 +02:00
junos 1cc7339fc8 Completely remove PACKAGE_NAMES_HASHED and instead provide a differently structured file. 2023-04-18 22:58:42 +02:00
junos 5307c71df0 Missing comma. 2023-04-18 22:45:12 +02:00
junos f261286542 Add package_names_hashed param for rule phone_application_categories. 2023-04-18 22:40:11 +02:00
junos a6bc0a90d1 Do not ignore application categories. 2023-04-18 21:34:59 +02:00
junos f161da41f4 Merge branch 'master' into runner 2023-04-18 21:23:26 +02:00
junos 8ffd934fd3 Categorize applications in config.yaml. 2023-04-18 20:39:57 +02:00
junos cf6af7c9a4 Add a TODO. 2023-04-18 16:11:30 +02:00
junos 4dacb7129d Change targets for 30 before.
Further increase resources for acc.
2023-04-18 10:47:54 +02:00
junos f542a97eab Change targets for 90 before. 2023-04-15 16:29:06 +02:00
junos 5cb2dcfb00 Run 90 before event. 2023-04-15 16:18:55 +02:00
junos 8cef60ba87 Limit memory usage by readable_datetime.
Especially important for accelerometer data.
2023-04-14 16:01:44 +02:00
junos 0d634f3622 Remove deprecated numpy dtype. 2023-04-14 13:43:20 +02:00
junos 00e4f8deae More numeric_only arguments.
See 1d903f3629 for explanation.
2023-04-13 13:04:53 +02:00
junos 03687a1ac2 Fix deprecated attribute. 2023-04-12 18:21:43 +02:00
junos a36da99ccb Catch another possible exception. 2023-04-12 16:37:25 +02:00
junos 1d903f3629 Specify numeric_only for pandas.core.groupby.DataFrameGroupBy.mean.
This parameter used to be None by default, but this usage is deprecated since pandas 2.0.
See [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.mean.html):

> Changed in version 2.0.0: numeric_only no longer accepts None and defaults to False.
2023-04-12 16:05:58 +02:00
junos d678be0641 Extract definition of function. 2023-04-12 16:00:13 +02:00
junos 27b90421bf Add a missing pip dependency. 2023-04-11 19:06:29 +02:00
junos cb006ed0cf Completely overhaul environment.yml. 2023-04-11 19:04:28 +02:00
junos 9ca58ed204 Fix Python libraries. 2023-04-11 17:21:03 +02:00
junos 982fa982f7 Remove Python libraries versions. 2023-04-11 17:16:42 +02:00
junos f8088172e9 Update more R packages. 2023-04-11 15:31:44 +02:00
junos 801fbe1c10 Update R packages. 2023-04-11 15:26:21 +02:00
Primoz 8721b944ca Revert "Set config to testing mode."
This reverts commit e825aa7c89.
2023-02-16 09:56:23 +00:00
Primoz 36651a11c8 Drop all window count related features in cleaning script. 2023-02-15 14:15:56 +00:00
Primoz 8ae5ad0e88 Add missing rapids_columns for speech sensor. 2023-02-15 13:42:44 +00:00
Primoz e825aa7c89 Set config to testing mode. 2023-02-15 13:36:00 +00:00
Primoz 5958948af2 Merge branch 'speech_sensor' 2023-02-15 13:30:43 +00:00
Primoz 7e37eb9067 Change SPEECH sensor place in config 2023-01-23 15:32:52 +00:00
Primoz 4d0497a5e0 Set appropriate calculations for speech senzor. 2023-01-17 14:00:42 +00:00
Primoz 75b054d358 Integrate phone_speech into rapids pipeline. 2023-01-17 14:00:14 +00:00
Primoz e27ec0269f Revert "Add Speech sensor - preparation."
This reverts commit 74fd4dfbd7.
2023-01-17 12:06:53 +00:00
Primoz 9b45188a61 Revert "PHONE SPEECH - continuation ..."
This reverts commit 3e6b34babc.
2023-01-17 12:06:44 +00:00
Primoz 3e6b34babc PHONE SPEECH - continuation ... 2023-01-11 13:05:42 +00:00
Primoz 74fd4dfbd7 Add Speech sensor - preparation. 2023-01-11 12:48:38 +00:00
Primoz 7b8538ce51 Fix a bug and remove sys.exit line from cleaning script. 2022-12-21 10:40:07 +00:00
Primoz 41a17d35f1 Update ERS stress_event logic. 2022-12-19 15:40:40 +00:00
Primoz 7f5a4e6744 Make stress events to be equal in duration. 2022-12-14 14:52:20 +00:00
Primoz 3ce7f2c2a5 Seperate target standardization from rest of the features. 2022-12-13 15:31:39 +00:00
Primoz e40f0fd8dc Bug fixed (mixed dtype warning). 2022-12-12 11:29:59 +00:00
Primoz 8af3bdf768 Reset PIDS 2022-12-09 16:07:13 +00:00
Primoz 01931b8873 Update README 2022-12-09 16:04:11 +00:00
Primoz 569854ddf5 Merge branch 'master' of https://repo.ijs.si/junoslukan/rapids 2022-12-09 16:01:52 +00:00
Primoz 3b2001f570 Modify the stress_event logic so that it includes where stressfulness is 0. 2022-12-09 16:01:46 +00:00
junos 44a87c53eb Clarify runtime vs installation export of TZ. 2022-12-09 15:34:06 +01:00
junos 8da7bd71b2 Merge branch 'master' of https://repo.ijs.si/junoslukan/rapids 2022-12-09 15:26:23 +01:00
junos 788a81d96f Update readme with info from supermodule. 2022-12-09 15:26:12 +01:00
Primoz 87e5209a9f Squashed commit of the following:
commit 8a6b52a97c
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 29 11:35:49 2022 +0000

    Switch to 30_before ERS with corresponding targets.

commit 244a053730
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 29 11:19:43 2022 +0000

    Change output files settings to nonstandardized.

commit be0324fd01
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Nov 28 12:44:25 2022 +0000

    Fix some bugs and set categorical columns as categories dtypes.

commit 99c2fab8f9
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Nov 16 09:50:18 2022 +0000

    Fix a bug in the making of the individual model (when there is no target in the participants columns).

commit 286de93bfd
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 15 11:21:51 2022 +0000

    Fix some bugs and extend ERS and cleaning scripts with multiple stress event targets logic.

commit ab803ee49c
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 15 10:14:07 2022 +0000

    Add additional appraisal targets.

commit 621f11b2d9
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 15 09:53:31 2022 +0000

    Fix a bug related to wrong user input (duplicated events).

commit bd41f42a5d
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Nov 14 15:07:36 2022 +0000

    Rename target_ to segmenting_ method.

commit a543ce372f
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Nov 14 15:04:16 2022 +0000

    Add comments for event_related_script understanding.

commit 74b454b07b
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Nov 11 09:15:12 2022 +0000

    Apply changes to string answers to make them language-generic.

commit 6ebe83e47e
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Nov 10 12:42:52 2022 +0000

    Improve the ERS extract method with a couple of validations.

commit 00350ef8ca
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Nov 10 10:32:58 2022 +0000

    Change config for stressfulness event target method.

commit e4985c9121
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Nov 10 10:29:11 2022 +0000

    Override stressfulness event target with extracted values from csv.

commit a668b6e8da
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Nov 10 09:37:27 2022 +0000

    Extract ERS and stress event targets to csv files (completed).

commit 9199b53ded
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Nov 9 15:11:51 2022 +0000

    Get, join and start processing required ERS stress event data.

commit f3c6a66da9
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 8 15:53:43 2022 +0000

    Begin with stress events in the ERS script.

commit 0b3e9226b3
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 8 14:44:24 2022 +0000

    Make small corrections in ERS file.

commit 2d83f7ddec
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 8 11:32:05 2022 +0000

    Begin the ERS logic for 90-minutes events.

commit 1da72a7cbe
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 8 09:45:37 2022 +0000

    Rename targets method in config.

commit 9f441afc16
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Nov 4 15:09:04 2022 +0000

    Begin ERS logic for 90-minutes events.

commit c1c9f4d05a
Merge: 62f46ea3 7ab0280d
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Nov 4 09:11:58 2022 +0000

    Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning

commit 62f46ea376
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Nov 4 09:11:53 2022 +0000

    Prepare method-based logic for ERS generating.

commit 7ab0280d7e
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Nov 4 08:58:08 2022 +0000

    Correctly rename stressful event target variable.

commit eefa9f3f4d
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Nov 3 14:49:54 2022 +0000

    Add new target: stressfulness_event.

commit 5e8174dd41
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Nov 3 13:52:45 2022 +0000

    Add new target: stressfulness_period.

commit 35c1a762e7
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Nov 3 13:51:18 2022 +0000

    Improve filtering by esm_session and device_id.

commit 02264b21fd
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Nov 3 09:30:12 2022 +0000

    Add logic for target selection in ERS processing.

commit 0ce8723bdb
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Nov 2 14:01:21 2022 +0000

    Extend imputation logic within the cleaning script.

commit 30b38bfc02
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Oct 28 09:00:13 2022 +0000

    Fix the generating procedure of ERS file for participants with multiple devices.

commit cd137af15a
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Oct 27 14:20:15 2022 +0000

    Config for 30 minute EMA segments.

commit 3c0585a566
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Oct 27 14:12:56 2022 +0000

    Remove obsolete comments.

commit 6b487fcf7b
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Oct 27 14:11:42 2022 +0000

    Set E4 data yield to 1 if it is over 1. Optimize E4 data_yield script.

commit 5d17c92e54
Merge: a31fdd14 0d143e6a
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 26 14:18:20 2022 +0000

    Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning

commit a31fdd1479
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 26 14:18:08 2022 +0000

    Start to test empatica_data_yield precieved error.

commit 936324d234
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 26 14:17:27 2022 +0000

    Switch config for 30 minutes event related segments.

commit da0a4596f8
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 26 14:16:25 2022 +0000

    Add additional ESM processing logic for ERS csv extraction.

commit d4d74818e6
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 26 14:14:32 2022 +0000

    Fix a bug - missing time_segment column when df is empty

commit 14ff59914b
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 26 09:59:46 2022 +0000

    Fix to correct dtypes.

commit 6ab0ac5329
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 26 09:57:26 2022 +0000

    Optimize memory consumption with dtype definition while reading csv file.

commit 0d143e6aad
Merge: 8acac501 b92a3aa3
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Oct 25 15:28:27 2022 +0000

    Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning

commit 8acac50125
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Oct 25 15:26:43 2022 +0000

    Add safenet when features dataframe is empty.

commit b92a3aa37a
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Oct 25 15:25:22 2022 +0000

    Remove unwanted output or other error producing code.

commit bfd637eb9c
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Oct 25 08:53:44 2022 +0000

    Improve strings formatting in straw_events file.

commit 0d81ad5756
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 19 13:35:04 2022 +0000

    Debug assignment of segments to rows

commit cea451d344
Merge: e88bbd54 cf38d9f1
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Oct 18 09:15:06 2022 +0000

    Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning

commit e88bbd548f
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Oct 18 09:15:00 2022 +0000

    Add new daily segment and filter by segment in the cleaning script.

commit cf38d9f175
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Oct 17 15:07:33 2022 +0000

    Implement ERS generating logic.

commit f3ca56cdbf
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Oct 14 14:46:28 2022 +0000

    Start with ERS logic integration within Snakemake.

commit 797aa98f4f
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 12 15:51:50 2022 +0000

    Config for ERS testing.

commit 9baff159cd
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 12 15:51:23 2022 +0000

    Changes needed for testing and starting of the Event-Related Segments.

commit 0f21273508
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 12 12:32:51 2022 +0000

    Bugs fix

commit 55517eb737
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 12 12:23:11 2022 +0000

    Necessary commit before proceeding.

commit de15a52dba
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Oct 11 08:36:23 2022 +0000

    Bug fix

commit 1ad25bb572
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Oct 11 08:26:17 2022 +0000

    Few modifications of some imputation values in cleaning script and feature extraction.

commit 9884b383cf
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Oct 10 16:45:38 2022 +0000

    Testing new data with AutoML.

commit 2dc89c083c
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Oct 7 08:52:12 2022 +0000

    Small changes in cleaning overall

commit 001d400729
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Oct 6 14:28:12 2022 +0000

    Clean features and create input files based on all possible targets.

commit 1e38d9bf1e
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Oct 6 13:27:38 2022 +0000

    Standardization and correlation visualization in overall cleaning script.

commit a34412a18d
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 5 14:16:55 2022 +0000

    E4 data yield corrections. Changes in overal cs - standardization.

commit 437459648f
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 5 13:35:05 2022 +0000

    Errors fix: individual script - treat participants missing data.

commit 53f6cc60d5
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Oct 3 13:06:39 2022 +0000

    Config and cleaning script necessary changes ...

commit bbeabeee6f
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Oct 3 12:53:31 2022 +0000

    Last changes before processing on the server.

commit 44531c6d94
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Sep 30 10:04:07 2022 +0000

    Code cleaning, reworking cleaning individual based on changes in overall script. Changes in thresholds.

commit 7ac7cd5a37
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 29 14:33:21 2022 +0000

    Preparation of the overall cleaning script.

commit 68fd69dada
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 29 11:55:25 2022 +0000

    Cleaning script for individuals: corrections and comments.

commit a4f0d056a0
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 29 11:44:27 2022 +0000

    Fillna for app foreground and activity recognition

commit 6286e7a44c
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Sep 28 12:47:08 2022 +0000

    firstuseafter column removed from contextual imputation

commit 9b3447febd
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Sep 28 12:40:05 2022 +0000

    Contextual imputation correction

commit d6adda30cf
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Sep 28 12:37:51 2022 +0000

    Contextual imputation on time(first/last) features.

commit 8af4ef11dc
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Sep 28 10:02:47 2022 +0000

    Contextual imputation by feature type.

commit 536b9494cd
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Sep 27 14:12:08 2022 +0000

    Cleaning script corrections

commit f0b87c9dd0
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Sep 27 09:54:15 2022 +0000

    Debugging of the empatica data yield integration.

commit 7fcdb873fe
Merge: 5c7bb0f4 bd53dc16
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Sep 27 07:50:29 2022 +0000

    Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning

commit 5c7bb0f4c1
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Sep 27 07:48:32 2022 +0000

    Config changes

commit bd53dc1684
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Sep 26 15:54:00 2022 +0000

    Empatica data yield usage in the cleaning script.

commit d9a574c550
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Sep 23 13:24:50 2022 +0000

    Changes in the cleaning script and preparation of empatica data yield method.

commit 19aa8707c0
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 22 13:45:51 2022 +0000

    Redefined cleaning steps after revision

commit 247d758cb7
Merge: 90ee99e4 7493aaa6
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Sep 21 07:18:01 2022 +0000

    Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning

commit 90ee99e4b9
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Sep 21 07:16:00 2022 +0000

    Remove TODO comments

commit 7493aaa643
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Sep 20 12:57:55 2022 +0000

    Small changes in cleaning scrtipt and missing vals testing.

commit eaf4340afd
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Sep 20 08:03:48 2022 +0000

    Small imputation and cleaning corrections.

commit a96ea508c6
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Sep 19 07:34:02 2022 +0000

    Fill NaN of Empatica's SD second order feature (must be tested).

commit 52e11cdcab
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Sep 19 07:25:54 2022 +0000

    Configurations for new standardization path.

commit 92aff93e65
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Sep 19 07:25:16 2022 +0000

    Remove standardization script.

commit 18b63127de
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Sep 19 06:16:26 2022 +0000

    Removed all standardizaton rules and configurations.

commit 62982866cd
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Sep 16 13:24:21 2022 +0000

    Phone wifi visible inspection (WIP)

commit 0ce6da5444
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Sep 16 11:30:08 2022 +0000

    kNN imputation relocation and execution only on specific columns.

commit e3b78c8a85
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Sep 16 10:58:57 2022 +0000

    Impute selected phone features with 0.
    Wifi visible, screen, and light.

commit 7d85f75d21
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Sep 16 09:03:30 2022 +0000

    Changes in phone features NaN values script.

commit 385e21409d
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 15 14:16:58 2022 +0000

    Changes in NaN values testing script.

commit 18002f59e1
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 15 10:48:59 2022 +0000

    Doryab bluetooth and locations features fill in NaN values.

commit 3cf7ca41aa
Merge: d27a4a71 d5ab5a03
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Sep 14 15:38:32 2022 +0000

    Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning

commit d5ab5a0394
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Sep 14 14:13:03 2022 +0000

    Writing testing scripts to determine the point of manual imputation.

commit dfbb758902
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Sep 13 13:54:06 2022 +0000

    Changes in AutoML params and environment.yml

commit 4ec371ed96
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Sep 13 09:51:03 2022 +0000

    Testing auto-sklearn

commit d27a4a71c8
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Sep 12 13:44:17 2022 +0000

    Reorganisation and reordering of the cleaning script.

commit 15d792089d
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 1 10:33:36 2022 +0000

    Changes in cleaning script:
    - target extracted from config to remove rows where target is nan
    - prepared sns.heatmap for further missing values analysis
    - necessary changes in config and participant p01
    - picture of heatmap which shows the values state after cleaning

commit cb351e0ff6
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 1 10:06:57 2022 +0000

    Unnecessary line (rows with no target value will be removed in cleaning script).

commit 86299d346b
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 1 09:57:21 2022 +0000

    Impute phone and sms NAs with 0

commit 3f7ec80c18
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Aug 31 10:18:50 2022 +0000

    Preparation a) phone_calls 0 imputation b) remove rows with NaN target
2022-12-08 16:04:39 +00:00
Primoz f78aa3e7b3 Preparation for cleaning & imputation 2022-08-26 10:56:14 +00:00
Primoz a620def209 Generate standardized model input files (NOTE: commented unstandardized sections!) 2022-08-24 13:42:39 +00:00
Primoz c498ecb742 Include baseline models (+corrections), disable columns drop in cleaning function. 2022-08-23 14:12:14 +00:00
Primoz f088e9586f Handle empty ACC.csv 2022-08-22 14:20:47 +00:00
Primoz 0aa0e82673 Handle empty Empatica csv files. 2022-08-22 14:18:12 +00:00
Primoz 4cfe5a3a98 Disable discarding rows if DATA_YIELD_RATIO_THRESHOLD==0. 2022-08-19 13:10:56 +00:00
Primoz 607da820f2 Configuration and cleaning changes 2022-08-18 14:21:05 +00:00
Primoz fb577bc9ad Squashed commit of the following:
commit 43ecc243cb62bb31eed85cb477ca4131555c7fe7
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Jul 22 15:26:09 2022 +0000

    Adding TODO comments

commit 2df1ebf90c3a93812b112b8ed0ee4e23cd74533f
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Jul 21 13:59:23 2022 +0000

    README update

commit 5182c2b16dff3537aad42984b8ea5214743cdb32
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Jul 21 11:03:01 2022 +0000

    Few corrections for all_cleaning

commit 3d9254c1b3bed6e95e631d4e0402548830a19534
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Jul 21 10:28:05 2022 +0000

    Adding the min overlap for corr threshold and preservation of esm cols.

commit e27c49cc8fa4c51f9fe8e593a8d25e9a032ab393
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Jul 21 09:02:00 2022 +0000

    Commenting and cleaning.

commit 31a47a5ee4569264e39d7c445525a6e64bb7700a
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Jul 20 13:49:22 2022 +0000

    Environment version change.

commit 5b274ed8993f58e783bda6d82fce936764209c28
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Jul 19 16:10:07 2022 +0000

    Enabled cleaning for all participants + standardization files.

commit 203fdb31e0f3c647ef8c8a60cb9531831b7ab924
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Jul 19 14:14:51 2022 +0000

    Features cleaning fixes after testing. Visualization script for phone features values.

commit 176178d73b154c30b9eb9eb4a67514f00d6a924e
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Jul 19 09:05:14 2022 +0000

    Revert "Necessary config changes."

    This reverts commit 6ec1ef50430d2e1f5ce4670d505d5e84ac47f0a0.

commit 26ea6512c9d512f95837e7b047fe510c1d196403
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Jul 18 13:19:47 2022 +0000

    Adding cleaning function condition and cleaning functionality.

commit 575c29eef9c21e6f2d7832871e73bc0941643734
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Jul 18 12:51:56 2022 +0000

    Translation of the cleaning individual RAPIDS function from R to py.

commit 6ec1ef50430d2e1f5ce4670d505d5e84ac47f0a0
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Jul 18 12:02:18 2022 +0000

    Necessary config changes.

commit b5669f51612fbd8378848615d639677851ab032f
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Jul 15 15:26:00 2022 +0000

    Modified snakemake rule to dynamically choose script extention.

commit 66636be1e8ae4828228b37c59b9df1faf3fc3d3d
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Jul 15 14:43:08 2022 +0000

    Trying to modify the snakefile rule to execute scripts in two languages depended on the provider.

commit 574778b00f3cbb368ef4bc74de15cf5070c65ea9
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Jul 15 09:49:41 2022 +0000

    gitignore: adding required files so that RAPIDS can be run successfully.

commit 71018ab178256970535e78961602ab8c7f0ebb14
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Jul 15 08:34:19 2022 +0000

    Standardization bug fixes

commit 6253c470a624e6bfbb02e0c453b652452eb2dbbc
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Jul 14 15:28:02 2022 +0000

    Seperate rules for empatica vs. nonempatica standardization.
    Parameter in config that controls the creation of standardized merged files for individual and all participants..

commit 90f902778565e0896d3bae22ae8551be8b487e67
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Jul 12 14:23:03 2022 +0000

    Preparing for final csvs' standardization.

commit d25dde3998786a9a582f5cda544ee104386778f9
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Jul 11 12:08:47 2022 +0000

    Revert "Changes in config to be reverted."

    This reverts commit bea7608e7095021fb7c53a9afa07074448fe4313.

commit 6b23e70857e63deda98eb98d190af9090626c84b
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Jul 11 12:08:26 2022 +0000

    Enabled standardization for rest (previously active)  phone features.
    Testing still needed.

commit 8ec58a6f34ba3d42e5cc71d26e6d91837472ca5f
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Jul 11 09:07:55 2022 +0000

    Enabled standardization for phone calls.
    All steps completed and tested.

commit bea7608e7095021fb7c53a9afa07074448fe4313
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Jul 11 07:47:51 2022 +0000

    Changes in config to be reverted.

commit 4e84ca0e51bf709bff56fd09437b95310ec6bedd
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Jul 8 14:11:24 2022 +0000

    Standardization for the rest of the features.

commit cc581aa788e3d5c17131af8f3d5dd6b0c3b5aff7
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Jul 8 14:11:08 2022 +0000

    README update again
2022-07-22 15:31:30 +00:00
Primoz 6ba4a66deb Squashed commit of the following:
commit 31a47a5ee4569264e39d7c445525a6e64bb7700a
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Jul 20 13:49:22 2022 +0000

    Environment version change.

commit 5b274ed8993f58e783bda6d82fce936764209c28
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Jul 19 16:10:07 2022 +0000

    Enabled cleaning for all participants + standardization files.

commit 203fdb31e0f3c647ef8c8a60cb9531831b7ab924
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Jul 19 14:14:51 2022 +0000

    Features cleaning fixes after testing. Visualization script for phone features values.

commit 176178d73b154c30b9eb9eb4a67514f00d6a924e
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Jul 19 09:05:14 2022 +0000

    Revert "Necessary config changes."

    This reverts commit 6ec1ef50430d2e1f5ce4670d505d5e84ac47f0a0.

commit 26ea6512c9d512f95837e7b047fe510c1d196403
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Jul 18 13:19:47 2022 +0000

    Adding cleaning function condition and cleaning functionality.

commit 575c29eef9c21e6f2d7832871e73bc0941643734
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Jul 18 12:51:56 2022 +0000

    Translation of the cleaning individual RAPIDS function from R to py.

commit 6ec1ef50430d2e1f5ce4670d505d5e84ac47f0a0
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Jul 18 12:02:18 2022 +0000

    Necessary config changes.

commit b5669f51612fbd8378848615d639677851ab032f
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Jul 15 15:26:00 2022 +0000

    Modified snakemake rule to dynamically choose script extention.

commit 66636be1e8ae4828228b37c59b9df1faf3fc3d3d
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Jul 15 14:43:08 2022 +0000

    Trying to modify the snakefile rule to execute scripts in two languages depended on the provider.

commit 574778b00f3cbb368ef4bc74de15cf5070c65ea9
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Jul 15 09:49:41 2022 +0000

    gitignore: adding required files so that RAPIDS can be run successfully.

commit 71018ab178256970535e78961602ab8c7f0ebb14
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Jul 15 08:34:19 2022 +0000

    Standardization bug fixes

commit 6253c470a624e6bfbb02e0c453b652452eb2dbbc
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Jul 14 15:28:02 2022 +0000

    Seperate rules for empatica vs. nonempatica standardization.
    Parameter in config that controls the creation of standardized merged files for individual and all participants..

commit 90f902778565e0896d3bae22ae8551be8b487e67
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Jul 12 14:23:03 2022 +0000

    Preparing for final csvs' standardization.

commit d25dde3998786a9a582f5cda544ee104386778f9
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Jul 11 12:08:47 2022 +0000

    Revert "Changes in config to be reverted."

    This reverts commit bea7608e7095021fb7c53a9afa07074448fe4313.

commit 6b23e70857e63deda98eb98d190af9090626c84b
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Jul 11 12:08:26 2022 +0000

    Enabled standardization for rest (previously active)  phone features.
    Testing still needed.

commit 8ec58a6f34ba3d42e5cc71d26e6d91837472ca5f
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Jul 11 09:07:55 2022 +0000

    Enabled standardization for phone calls.
    All steps completed and tested.

commit bea7608e7095021fb7c53a9afa07074448fe4313
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Jul 11 07:47:51 2022 +0000

    Changes in config to be reverted.

commit 4e84ca0e51bf709bff56fd09437b95310ec6bedd
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Jul 8 14:11:24 2022 +0000

    Standardization for the rest of the features.

commit cc581aa788e3d5c17131af8f3d5dd6b0c3b5aff7
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Jul 8 14:11:08 2022 +0000

    README update again
2022-07-20 13:51:22 +00:00
Primoz 788ac31190 Bug fix: if df has no rows write an empty zscore file. 2022-07-08 10:40:45 +00:00
Primoz 21eb2665d7 README: few changes. 2022-07-08 10:40:08 +00:00
Primoz a65a85cce9 Merge branch 'empatica_calculating_features' 2022-07-07 15:35:47 +00:00
Primoz fa961fe2f5 gitignore 2022-07-07 15:34:31 +00:00
Primoz 6c8014ba8e Updated R and Python package files. Updated README. 2022-07-07 15:30:07 +00:00
Primoz 5a777ac79f Working version that integrates both phone and empatica feature calculations. 2022-07-07 15:00:47 +00:00
Primoz 0425403951 Merge branch 'master' of https://repo.ijs.si/junoslukan/rapids 2022-07-06 11:53:31 +00:00
Primoz 887fd7dc72 Merge branch 'empatica_calculating_features' 2022-07-06 11:53:21 +00:00
Primoz 5a4696c548 Misc. changes 2022-07-06 11:30:18 +00:00
Primoz d2758eef46 Set not NaN sum insted of 0 sum for HRV features windows. 2022-07-06 07:36:35 +00:00
Primoz 2d5d23b615 Testing files change and remove standardization from hrv sensors main files. 2022-07-06 07:35:39 +00:00
Primoz a5480f1369 Few changes during addition to file structure. 2022-07-04 13:00:47 +00:00
Primoz 505c3a86b9 Testing different EDA findPeaks parameters. 2022-06-30 15:15:37 +00:00
junos ce04394679 Merge commit 'c05b047c2d9452151553961928c846c01d7395bc' 2022-06-25 20:06:24 +02:00
Primoz c851ab0763 Fill EDA NaN values where numPeak is zero. Other small changes. 2022-06-21 14:09:49 +00:00
Primoz a8cd16f88c Debugging eda_explorer 2022-06-16 11:32:59 +00:00
Primoz dda4554d46 Various small changes. 2022-06-15 13:57:46 +00:00
Primoz 212cf300f8 Debugging EDA signal - preliminary step for imputation. 2022-06-14 15:09:14 +00:00
Primoz 9ea39dc557 Standardization as a Snakefile's rule enabled for all E4 sensors. 2022-06-13 18:17:30 +00:00
Primoz 402059871f Making standardization as a rule. WIP: done only for BVP. 2022-06-13 14:12:03 +00:00
Primoz 094743244d Added SO feature for sum all rows that are non zero for BVP and IBI sensors. 2022-06-13 10:51:22 +00:00
primoz e1d7607de4 Extraction of additional SO features. Min/max has been changed to nsmallest/nlargest means. 2022-06-10 12:34:48 +00:00
primoz f371249b99 First order features standardization WIP 2022-06-09 13:35:15 +00:00
primoz 64e41cfa35 Second order features standardization in config.yaml. 2022-06-07 10:39:48 +00:00
primoz 2c7ac21465 Added standardization on SO features. 2022-06-06 13:51:15 +00:00
primoz 2acf6ff9fb Exception handling in case of empty ibi. Changes of the method EDA uses in main.py. Other small corrections. 2022-06-03 12:34:36 +00:00
primoz d300f0f8f0 Fixed RAPIDS bug: error when IBI.csv is empty. 2022-06-02 11:43:49 +00:00
Primoz fbf6a77dfc Small misc changes 2022-06-02 06:41:53 +00:00
Primoz 5532043b1f Patching IBI with BVP - completed. 2022-05-25 19:39:47 +00:00
Primoz bb62497ba6 Patching IBI with BVP - selecting appropriate pipeline entry point. WIP 2022-05-24 11:07:18 +00:00
Primoz 2a8f58f5c8 Patching IBI with BVP. WIP 2022-05-20 13:18:45 +00:00
Primoz 1471c86c62 Cr-features version update in rapids venv. 2022-05-13 13:37:12 +00:00
Primoz 6864cfe775 Changes after thorough testing with available data. 2022-05-13 13:35:34 +00:00
Primoz c1564f0cae Changed wrapper method calculate_feature to its newest version (for TEMP and ACC). 2022-05-11 14:21:21 +00:00
Primoz 31e36e7400 Alternating Second order and full segment features corresponding to config settings. 2022-05-11 08:50:15 +00:00
Primoz 9cf9e1fe14 Testing and modifying the code with different E4 data. 2022-05-10 11:36:49 +00:00
Primoz f62a1302dd Cr-features corrections for ACC and TEMP sensors 2022-05-09 11:01:52 +00:00
Primoz 5638367999 Implementation of the second order features. 2022-04-25 13:07:03 +00:00
Primoz 66451160e9 Calculating HRV features with IBI.csv. 2022-04-20 10:44:51 +00:00
= 8c8fe1fec7 Modifications, mostly imports, after changes in cr-features module. 2022-04-19 13:24:46 +00:00
= 075c64d1e5 HRV: changed wrapper calcFeat method with specialized one. 2022-04-14 11:51:53 +00:00
junos c05b047c2d Correct outstanding baseline feature mistake. 2022-04-13 17:05:16 +02:00
junos 53ec52a954 Disable (SOME) feature cleaning for ESM data. 2022-04-13 16:01:31 +02:00
= 3c058e4463 Add option to calculate features within windows and store it in CSV (all sensors). 2022-04-13 13:18:23 +00:00
junos 144f0d0dcf Account for missing baseline data. 2022-04-13 14:56:28 +02:00
junos ed5314aa98 Merge remote-tracking branch 'origin/master' 2022-04-12 17:27:25 +02:00
junos 11c64cfc1a Include all participants again. 2022-04-12 17:20:19 +02:00
junos a6a37c7bd9 Drop NaN targets.
This mirrors INNER join in merge_features_and_targets_for_individual_model.py:

data = pd.concat([sensor_features, targets[["target"]]], axis=1, join="inner")
2022-04-12 17:01:49 +02:00
junos 9f5edf1c2b Revert "Add a rule for model baselines."
The example was for a classification rather than regression problem.

This reverts commit 9ab0c8f289.

# Conflicts:
#	rules/models.smk
2022-04-12 16:59:42 +02:00
junos 4ad261fae5 Rename baseline features AGAIN.
Correct other mistakes.
2022-04-12 16:55:01 +02:00
= 74cf4ada1c Cr-feat window length for all empaticas sensors. 2022-04-12 14:00:44 +00:00
junos 9ab0c8f289 Add a rule for model baselines.
Add baselines and helper functions to main models dir.
2022-04-12 14:23:58 +02:00
junos 570d2eb656 Add the file for population model to Snakefile. 2022-04-12 14:11:40 +02:00
junos f5688f6154 Add a rule to merge sensor and baseline features.
And select target as before.
2022-04-08 15:42:04 +02:00
junos b1f356c3f7 Extract a function to be used elsewhere. 2022-04-08 15:36:32 +02:00
junos 7ff3dcf5fc Move and rename target variable. 2022-04-06 18:21:09 +02:00
junos 50c0defca7 Select target columns (no parsing necessary). 2022-04-06 18:16:49 +02:00
junos ac86221662 [WIP] Add a rule to parse targets.
Does nothing for now.
2022-04-06 17:47:03 +02:00
junos baa94c4c4e Correct additional error in feature file naming.
Add the final feature file to the list in Snakefile.
2022-04-06 17:29:17 +02:00
junos d2fbef5234 Merge branch 'labels' of https://repo.ijs.si/junoslukan/rapids into labels
# Conflicts:
#	src/features/phone_esm/straw/preprocess.py
2022-04-05 19:28:37 +02:00
junos d326a1b09d Include the constant directly in main.py. 2022-04-05 19:08:43 +02:00
junos 2e545e81f0 Include feature calculations for different scales. 2022-04-05 19:05:34 +02:00
junos cbc8ae4e03 Add necessary checks for empty data frames. 2022-04-05 18:58:09 +02:00
junos f50a13167e Add feature files back to Snakefile. 2022-04-05 18:37:58 +02:00
junos e84c35a36a Remove unnecessary parameters from preprocess_esm.
And correct the newly named interim file.
2022-04-05 18:36:09 +02:00
junos e2ce68f591 Defer creation of feature files to esm_features rule. 2022-04-05 18:30:04 +02:00
junos 751b04f3f4 Pass scale names to Snakemake correctly. 2022-04-05 18:14:37 +02:00
junos 99245afca3 Try a different approach for preprocessing ESMs.
It is important that this follows generic RAPIDS pattern.
In the subsequent step of calculating features,
there is an expected file and folder structure of data/interim.
See rules/common.smk/find_features_files()
2022-04-05 18:02:31 +02:00
junos ed298a9479 Implement the basic feature extraction steps. 2022-04-05 15:46:02 +02:00
= 1c42347b9b Small changes. 2022-04-04 12:19:33 +00:00
Primoz c050174ca3 Various minimal changes. 2022-03-31 09:16:00 +00:00
Primoz f9e40711e7 Modified README for RAPIDS-CalculatingFeatures integration. 2022-03-30 16:17:07 +00:00
Primoz a357138f6e Added CF for HRV and shortened test data 2022-03-30 15:01:24 +00:00
Primoz 470993eeb0 Modification of getSampleRate method for all CF scripts. 2022-03-30 15:00:11 +00:00
junos 798ec973b4 [WIP] Add a rule for ESM features. 2022-03-30 10:43:30 +02:00
junos 3af8de6235 Create feature provider script. 2022-03-30 10:40:53 +02:00
junos 7173ca13e3 Rename a parameter. 2022-03-30 10:40:53 +02:00
junos 9478dc94f2 Add an else.
This is to make sure that in case the reversing fails, we do not get any output items.
Snakemake will inform us of an error in this event.
2022-03-30 10:40:53 +02:00
= ab0b9227d7 Added ACC calculated features and shorter version of ACC data. 2022-03-29 09:41:51 +00:00
= a9244a60fc Corrections for TEMP cf src script. 2022-03-28 14:26:37 +00:00
= 8b76c96e47 Cleaning existing CF mains' and preparing src script for ACC. 2022-03-28 14:18:29 +00:00
= ca59a54d8f Get a sample rate from two sequential timestamps. 2022-03-28 13:50:08 +00:00
= 393dab72f5 Added components for the temperature features extraction. 2022-03-28 12:37:02 +00:00
Primoz 1902d02a86 Updating conda env. 2022-03-25 16:27:28 +00:00
Primoz f389ac9d89 Delete CF features folder 2022-03-25 16:24:52 +00:00
Primoz 191e53e543 Added cf provider for EDA feature processing. 2022-03-23 15:13:53 +00:00
Primoz d3a3f01f29 Preparation for the EDA features integration from CF. 2022-03-22 15:36:52 +00:00
Primoz 2da0911d4c Skeleton file main.py for EDA CalcFt. integration. 2022-03-22 12:48:43 +00:00
Primoz bd5a811256 Shortening input CSVs' and added test for ACC. 2022-03-21 12:00:57 +00:00
Primoz d1c59de2e9 Add folder structure for CF testing and EDA test. 2022-03-21 10:40:18 +00:00
Primoz a80f7c0cc4 Change of the relative import statements. 2022-03-21 10:38:15 +00:00
Primoz d63158c199 Build calc features lib and related packages. 2022-03-21 08:28:28 +00:00
junos b18dba366e Add an else.
This is to make sure that in case the reversing fails, we do not get any output items.
Snakemake will inform us of an error in this event.
2022-03-16 18:59:29 +01:00
junos 916bb21a53 Merge branch 'labels' into run_test_participant 2022-03-16 18:56:00 +01:00
junos c6144f8403 Reverse JCQ items. 2022-03-16 18:55:46 +01:00
junos fec7cc9550 Merge branch 'labels' into run_test_participant 2022-03-16 18:30:03 +01:00
junos 23f0aaba3a Get the name of the questionnaire from Snakefile. 2022-03-16 18:28:57 +01:00
junos 8ed7d23348 Merge branch 'labels' into run_test_participant 2022-03-16 17:56:07 +01:00
junos 679f00dc19 Enable selecting any questionnaire as target. 2022-03-16 17:55:44 +01:00
junos 1374eda171 Flatten questionnaire ID dict. 2022-03-16 17:38:09 +01:00
junos 3e9cdde66e Merge branch 'master' into run_test_participant 2022-03-16 17:27:50 +01:00
junos 155395512c Merge branch 'labels' into run_test_participant 2022-03-16 17:09:53 +01:00
junos cb116100dd Move preprocessing to features. 2022-03-16 17:06:42 +01:00
junos 19b9da0ba3 Separate function definitions from main. 2022-03-16 16:49:28 +01:00
Primoz 3f8e1cc252 Empatica features calculations with an example ZIP 2022-03-16 15:03:32 +00:00
junos 83a8bb6689 Add an option to disable calculation of baseline features. 2022-03-16 15:51:12 +01:00
Primoz dc2b462145 Reseting files to defaults - for Minimal Working Example 2022-03-16 13:30:19 +00:00
Primoz 50358978cc Testing Git integration on PC 2022-03-15 12:49:51 +00:00
junos ef57103bac Add questionnaire ID key. 2022-03-15 13:41:33 +01:00
junos 5f293211a7 Reformat. 2022-03-15 13:28:51 +01:00
Primoz 86c6312574 Added .devcontainer to gitignore 2022-03-14 18:36:10 +00:00
junos d470eef27e Add a rule to preprocess and clean ESM. 2022-03-09 18:38:46 +01:00
junos b09522a8af Merge branch 'labels' into run_test_participant 2022-03-09 17:58:44 +01:00
junos d4a4bbbff0 Remove unused columns. 2022-03-09 17:58:36 +01:00
junos 085a6d144b Add files to compute and create an empty script. 2022-03-09 17:32:02 +01:00
junos 42d62f16d0 Add RAPIDS mandatory columns for ESM. 2022-03-09 17:31:37 +01:00
junos a159ca3d3a Merge branch 'labels' into run_test_participant 2022-03-08 15:43:42 +01:00
junos 2bef86b1da Add a format for ESM and add to config. 2022-03-08 15:43:25 +01:00
junos d8e9a309f7 Rename features and write baseline_interim. 2022-03-08 15:10:36 +01:00
junos ba7c3e620b Merge branch 'master' into run_test_participant 2022-03-01 12:03:14 +01:00
junos a3a4f04ffe Setting with : produces NaNs. 2022-03-01 12:02:57 +01:00
junos aedb8b6785 Write questionnaire data to data/interim. 2022-03-01 12:02:36 +01:00
junos 631581cc8a Merge branch 'master' into run_test_participant 2022-03-01 11:42:19 +01:00
junos d3ebfeeabd Write questionnaire data to data/interim. 2022-03-01 11:42:08 +01:00
junos 70e077f6ab Merge branch 'master' into run_test_participant 2022-03-01 11:40:17 +01:00
junos f13a91044d Write questionnaire data to data/interim. 2022-03-01 11:39:58 +01:00
junos b5a6317f4b Calculate JCQ control and demand control ratio.
Include norms and corresponding quartile.
2022-02-28 18:51:47 +01:00
junos 2fed962644 Calculate JCQ demand score.
Hardcode question IDs to be reversed.
2022-02-28 18:30:41 +01:00
junos 30ac8b1cd5 Start calculating demand control features. 2022-02-23 19:08:10 +01:00
junos 9a74e74d08 Add the baseline features rule to snakefile.
Correct age calculation for a single value instead of dataframe.
2022-02-23 18:15:26 +01:00
junos 43e5ac7918 Merge branch 'master' into run_test_participant 2022-02-23 18:06:07 +01:00
junos 07da6be398 Add age, gender, and language as features.
Move calculation of age from merge_baseline_data.py to baseline_features.py.
2022-02-23 18:05:23 +01:00
junos c801f66533 Retain a single participant ID.
Do not plot heatmaps as this is bugged.
2022-02-23 11:10:54 +01:00
junos 176367631b Prepare baseline feature rule. 2022-02-23 11:09:33 +01:00
Meng Li 28e580e597 Update change-log for v1.8.0 2022-02-10 15:05:55 -05:00
junos bf9c764c97 Split baseline data to participants.
And some csv I/O settings.
2022-02-04 18:37:57 +01:00
junos 16e608db74 First merge baseline datasets. 2022-02-04 18:21:42 +01:00
junos 204f6f50b0 Read the relevant files. 2022-02-04 18:06:02 +01:00
junos 685ed6a546 Set up demographic data download. 2022-02-04 17:37:00 +01:00
junos ffa7a30575 Make place for STRAW models. 2022-02-04 17:25:24 +01:00
Meng Li 463ac0a2aa
Fix bug#169 (#174) 2022-01-27 11:27:32 -05:00
Sam 10e896ca1d
Add data stream for AWARE Micro server (#173)
* Add data stream for AWARE Micro server

* Fix one documentation typo and one ommission
2022-01-27 10:47:50 -05:00
junos afa3b8546f Mutate data in an R script.
The Python script did not read the timestamp correctly for some reason. All timestamps were 0.
2022-01-26 16:34:19 +01:00
junos 1efb8e3112 Clean features across participants.
Explore the best linear regression feature.
2022-01-19 13:41:09 +01:00
Sam e5dbbfce44
Avoid NA problem in barnett location evaluation (#172)
* Avoid occasional issue where does_not_span evaluates to NA, which breaks the if()

* Restored original warning
2022-01-18 10:16:37 -05:00
Sam 8ae26fb845
Fixes issue where 'duration' in the 'ios_calls' dataframe is seen as a character type. (#171) 2022-01-18 10:15:53 -05:00
junos b17a7eff1a Deal with inexplicable snakemake failure. 2022-01-07 18:11:38 +01:00
junos 2fb068cb8b Do not calculate accelerometer features.
Add data cleaning.
2022-01-07 12:20:51 +01:00
junos e1499a5ae2 Account for missing device_ids. 2021-12-15 20:41:28 +01:00
junos b29f902915 Look into ESM table for device_id. 2021-12-15 20:18:12 +01:00
junos c03ee788f6 Add missing dependencies for caret and corrr. 2021-12-15 19:26:16 +01:00
junos 5a9252e46e Merge remote-tracking branch 'origin/master' 2021-12-15 18:32:36 +01:00
junos e5cc02501f Set the timezone.csv path in config.
Take into account that TZCODES_FILE can be created with a rule.
2021-12-15 18:09:30 +01:00
junos 352598f3da Use absolute path to avoid RuleException. 2021-12-15 17:27:13 +01:00
junos 15653b6e70 Add forgotten line for hashed app names in config. 2021-12-15 17:26:54 +01:00
junos a66a7d0cc3 Keep track of warning messages.
These are not runtime errors, but might still indicate a problem.
2021-12-15 16:19:29 +01:00
junos 70cada8bb8 Consider a subset of columns when dropping. 2021-12-15 16:14:33 +01:00
junos d2ed73dccf Debug ValueError for index.
See exploration/debug_heatmap.py for illustration.
2021-12-15 16:03:04 +01:00
junos 6f451e05ac Bring back application_name.
This column still needs to be in the data, so add it in app_add_name.py.
Later, join categories by package hash.
2021-12-15 12:58:27 +01:00
junos 4485c4c95e Delete columns we don't have.
Rename light table.
Correct timesegments.
2021-12-08 20:02:47 +01:00
junos 633384c6a9 Use all available sensors for PHONE_YIELD. 2021-12-08 19:04:19 +01:00
junos 8e2222f307 Bring back deleted lines which are required. 2021-12-08 18:59:10 +01:00
junos 712ff74898 Set table names and calculate all relevant features. 2021-12-08 18:37:34 +01:00
junos 1f54195437 Configure timezone file to be created automatically. 2021-12-08 18:21:29 +01:00
junos 2b52f686b3 Define daily segments. 2021-12-08 18:20:22 +01:00
junos 22513415e9 Do not ask for specific patch numbers of libraries! 2021-12-03 14:38:25 +01:00
junos 0b8a493ff2 Incorporate mulitple timezones into RAPIDS. 2021-12-01 18:20:27 +01:00
junos f0d29d0d1a Incorporate DB query for usernames into snakemake workflow. 2021-12-01 18:14:27 +01:00
junos 37b3460b76 Use Empatica wristband numbers as provided in CSV. 2021-12-01 17:20:57 +01:00
junos 22f9e0722d Start preparing the true usernames CSV file. 2021-12-01 11:29:22 +01:00
junos 0be4cd5a8f Remove unnecessary library. 2021-11-30 17:08:07 +01:00
junos b99a3c19ed Update dbplyr to the latest version.
distinct changed its behaviour from 2.0.0 to 2.1.0.
2021-11-29 18:34:26 +01:00
junos 04ad2d0b81 Source specific container script.
It is probably not worth the effort of making this general.
2021-11-29 18:19:47 +01:00
junos da5ff0f36e Correct small errors in settings. 2021-11-29 18:04:06 +01:00
junos 35d9779026 Prepare the tibble in requested format.
Write it to a CSV file.
2021-11-29 17:54:16 +01:00
junos 32025cbd8c Start with a tibble from CSV. 2021-11-29 17:51:07 +01:00
junos 181e4f0118 Add parameters to yaml file.
And use these in the prepare_participants_file function.
2021-11-29 16:57:50 +01:00
junos 39bd244511 [WIP] Prepare yaml files.
These will be used to create participants files.
2021-11-24 19:11:19 +01:00
junos ab84109d55 Prepare a function to compile participants data.
It combines functions from container.R
2021-11-24 19:07:56 +01:00
junos f9863ec622 Fix small mistakes. 2021-11-24 19:01:30 +01:00
junos c1f56c61e8 Add a function to pull start and end datetimes. 2021-11-24 18:33:06 +01:00
junos 3acf6ece14 Add a function to pull device IDs. 2021-11-24 18:23:53 +01:00
junos 8b2717122d Add a function to get participants' IDs. 2021-11-24 18:05:17 +01:00
Meng Li 9338f77ae6 Update docs for Git Flow section & RAPIDS paper info 2021-11-19 13:57:10 -05:00
Meng Li 5bad3eb8b5
Data cleaning (#166)
* Refactor data cleaning module: move it from example workflow to main directory

* Replace NAs with 0 in selected event-based features

* Add one step to drop highly correlated features

Co-authored-by: Weiyu <weiyuhuang7@gmail.com>
2021-11-19 10:34:36 -05:00
Meng Li 296960f425 Fix the bug of location doryab features when a participant is moving during the whole time segment 2021-11-18 18:42:19 -05:00
Meng Li 3d34036eae
Add firststeptime and laststeptime features to FITBIT_STEPS_INTRADAY RAPIDS provider (#168)
* Add firststeptime and laststeptime features to FITBIT_STEPS_INTRADAY RAPIDS provider

* Update test config files
2021-11-18 18:35:27 -05:00
junos ed193d2290 Revert "Correct the name of a field."
This reverts commit b335561a55.

It was actually correct.
2021-11-17 19:16:35 +01:00
junos 24b11ea101 Force Unix style end of line. 2021-11-17 19:12:40 +01:00
junos 4829b155d5 Make config changes for minimal workflow. 2021-11-17 18:53:44 +01:00
junos b335561a55 Correct the name of a field. 2021-11-17 18:50:06 +01:00
junos fcec3e2f93 Implement the necessary functions for PSQL. 2021-11-17 18:49:25 +01:00
junos 7a1e4f7139 Add the format file copied from MySQL. 2021-11-17 18:46:16 +01:00
junos ae8ed3999f Add RPostgres to renv.
And update its dependencies.
2021-11-17 12:56:14 +01:00
JulioV 399dbc7d75
Update team.md (#167) 2021-11-13 16:47:12 -05:00
Meng Li dfa11acf87
Updated phone battery test data and results (#165)
Co-authored-by: Weiyu <weiyuhuang7@gmail.com>
2021-10-21 22:54:19 -04:00
Meng Li da633c5d08 Update change-log for v1.6.0 2021-10-18 17:03:21 -04:00
Meng Li 7f4683e0fe
Feature/screen default maxlength (#164)
Updated the default IGNORE_EPISODES_LONGER_THAN to be 6 hours for screen RAPIDS provider
2021-10-14 09:26:06 -04:00
Meng Li 3744367aa9 Updated docs and workflow example for location features with DORYAB provider 2021-10-13 17:06:53 -04:00
Meng Li c7e8777a6e Merge branch 'feature/phone_locations_refactor' into develop 2021-09-23 18:22:11 -04:00
Meng Li f340b89c58 Temporary revert PHONE_LOCATIONS BARNETT provider to use R script 2021-09-23 18:16:13 -04:00
Meng Li a3fb718aea Refactor PHONE_LOCATIONS DORYAB provider to compute features based on location episodes 2021-09-23 17:40:06 -04:00
Meng Li 80522e6b7f Merge branch 'feature/phone_calls_refactor' into develop 2021-09-15 11:47:41 -04:00
Weiyu ff38e36809 Tested phone call episodes 2021-09-15 10:29:56 -04:00
Meng Li a8a178486b Refactor PHONE_CALLS RAPIDS provider to compute features based on call episodes or events 2021-09-15 10:28:37 -04:00
JulioV 2e553dc9e7 Add tqdm package to environment.yaml 2021-08-16 11:04:03 -04:00
Meng Li 3ac12e7dad Fix the bug of step intraday features when INCLUDE_ZERO_STEP_ROWS is False 2021-08-11 12:40:40 -04:00
JulioV 1520a1e755 v1.5.0 Update changelog and team pages 2021-08-09 18:21:58 -04:00
Weiyu 46e4425323 Updated test data for data yield feature 2021-08-09 17:56:29 -04:00
Weiyu 35eebe8a51 Bug fixed: set ratiovalidyielded mins/hours value to the range 0 to 1 2021-08-09 17:56:29 -04:00
Shirley 3c46a1c878 Adding Ian and Shirley as community contributors 2021-08-05 18:25:37 -04:00
JulioV 3e69966c91 Update error message 2021-08-04 15:33:02 -04:00
Shirley 4ddb2845a6 Update initialize_params 2021-08-04 15:33:02 -04:00
JulioV 834bd3b93d Refactor in Python of Barnett provider
Co-authored-by: Shirley Hayati <sahayati@ucdavis.edu>
Co-authored-by: JulioV <JulioV@users.noreply.github.com>
2021-08-04 15:33:02 -04:00
Weiyu c41e1f08cc Tested data yield feature 2021-08-04 11:06:31 -04:00
Weiyu 64976919e4 Tested phone accelerometer feature 2021-08-04 11:06:31 -04:00
Weiyu 7f1c502ea0 Fixed bug: Added local_segment column if no data left after filtered 2021-08-04 11:06:31 -04:00
Weiyu 2e3e433d2b Tested fitbit step summary feature 2021-07-28 10:32:16 -04:00
Weiyu 6f5a143191 Tested fitbit steps intraday feature 2021-07-28 10:32:16 -04:00
JulioV 872125fbb2
Update link to paper that used RAPIDS 2021-07-23 11:57:04 -04:00
Hannah Roberts b52059b027 Ensure date/time format is maintained
Within the 'determine which is home' for loop, 'xx' is the midpoint of two datetime objects. When the midpoint is calculated to be midnight, only the date is returned. This can be replicated with:

mydates <- as.POSIXct("2018-01-01 00:00:00", tz = "UTC")
mydates
[1] "2018-01-01 UTC"

This results in 'hourofday' being NA as an hour cannot be found. By adding the suggested format wrapper, the time is maintained and 'hourofday' can be determined. It can then successfully be applied to the embedded if-statement within the loop.

mydates <- format(as.POSIXct("2018-01-01 00:00:00", tz = "UTC"), "%Y-%m-%d %H:%M:%S")
mydates
[1] "01-01-2018 00:00:00"
2021-07-23 10:12:11 -04:00
Weiyu 5a465873c4 Tested fitbit heartrate intraday feature 2021-07-21 10:24:02 -04:00
Weiyu e9c07924fd Tested fitbit heartrate summary feature 2021-07-20 16:54:08 -04:00
Kennedy Opoku Asare b1e3360e2b Update index.md 2021-07-19 09:54:57 -04:00
JulioV 18463f5e8e Fix bug with internal test script
Date times with a 00:00:00 would not be saved correctly for Fitbits
2021-07-16 16:56:02 -04:00
JulioV ad5796ed5e Update changelog for v1.4.1 2021-07-12 18:22:52 -04:00
JulioV 96d7b6e170 Fix links in home page 2021-07-12 18:15:27 -04:00
JulioV a323f6c390
Merge pull request #150 from carissalow/feature/phone_messages_test
Phone message tests
2021-07-07 10:51:42 -04:00
Weiyu 2e147fb89c Finished phone message test 2021-07-07 01:22:49 -04:00
JulioV 07eb2e7917 Update v1.4.0 changelog 2021-07-01 18:32:14 -04:00
JulioV c8dbd5c5ac Update app foreground docs 2021-07-01 18:27:46 -04:00
Meng Li e1cfcd46e4 Update example workflow for app episode features 2021-07-01 18:08:33 -04:00
JulioV d7fbee5914
Merge pull request #149 from carissalow/feature/own_categories
Feature/own categories
2021-07-01 17:02:28 -04:00
Weiyu 013425e36c Finished phone applications foreground feature test 2021-07-01 16:25:38 -04:00
JulioV 6fa1875bf3 Add app foreground episode count 2021-07-01 16:20:16 -04:00
JulioV bc5c0c9a4f Fix app episode length bug 2021-07-01 16:20:16 -04:00
JulioV 065a926a87 Change own to custom categories name 2021-07-01 16:20:16 -04:00
JulioV e74c745f86 Add own categories to app foreground features 2021-07-01 16:20:16 -04:00
JulioV 5892b6d838 Fix create_participants_files.R to handle numeric PIDs 2021-07-01 16:20:16 -04:00
Weiyu 9593667f38 Finished phone light test 2021-07-01 16:17:08 -04:00
Meng Li b7eafc8d5c Update changelog for v1.4.0 2021-06-29 11:54:42 -04:00
JulioV 0d7b7f3dad
Merge pull request #147 from carissalow/visualization_fix
Fix bugs of visualization module and analysis workflow example
2021-06-29 11:50:55 -04:00
Meng Li 2e12a061c7 Update docs of visualization module 2021-06-29 10:51:22 -04:00
Meng Li 97ef8a8368 Set color range and avoid SettingWithCopyWarning 2021-06-29 09:50:19 -04:00
Meng Li bb3c614135 Update analysis workflow example 2021-06-29 09:50:19 -04:00
Meng Li 1c57320ab3 Update segment labels and fix the bug when we do not have any labels for event segments 2021-06-29 09:49:24 -04:00
Meng Li cefcb0635b Update heatmap of recorded phone sensors 2021-06-29 09:49:24 -04:00
Meng Li bc06477d89 Update heatmap of sensor row count 2021-06-29 09:49:24 -04:00
Meng Li e98a8ff7ca Update histogram of phone data yield 2021-06-29 09:49:24 -04:00
Meng Li f436f1f530 Update heatmap of correlation matrix 2021-06-29 09:49:23 -04:00
Meng Li 4d37696158 Update heatmaps of overall data yield 2021-06-29 09:48:30 -04:00
JulioV 654f6f3c3d
Merge pull request #145 from carissalow/feature/phone_activity_recognition_test
Feature/phone activity recognition test
2021-06-23 19:11:26 -04:00
Weiyu 4efc247575 Tested phone activity recognition feature 2021-06-23 19:08:14 -04:00
Weiyu f374c67bd5 Bug fixed: Added unknown activity case 2021-06-23 19:04:55 -04:00
JulioV e4af893d25
Merge pull request #144 from carissalow/feature/phone_bluetooth_test
Feature/phone bluetooth test
2021-06-23 19:00:56 -04:00
Weiyu c07358df70 Tested phone bluetooth feature 2021-06-23 18:56:24 -04:00
Weiyu 3e4d167adc Bug fixed: sort bt_address alphabetically before picking the most frequent bt_address 2021-06-22 17:40:00 -04:00
Kirtiraj Khandekar 5924f251d9 Update phone-applications-foreground.md 2021-06-21 14:59:06 -04:00
Meng Li 339781252a Merge branch 'data_stream_fix' into develop 2021-06-11 18:37:30 -04:00
Meng Li f248b6c97d Fix bugs of Fitbit mutation scripts 2021-06-11 18:18:33 -04:00
kirtirajk 4b8698a4c6 adding app_episode with the changes as mentioned in the comments 2021-06-10 14:17:56 -04:00
Weiyu a7e720e1a8 Validated phone conversation feature results 2021-06-10 10:49:22 -04:00
Weiyu ae20d22d1e validated phone wifi feature results 2021-06-10 10:49:22 -04:00
Weiyu 65d5cb7bd4 Bug fixed: countscansmostuniquedevice stays the same for all time segments 2021-06-10 10:49:22 -04:00
Weiyu 56b344f9ce Updated phone call and screen test description
Updated phone screen description
2021-06-10 10:49:22 -04:00
Weiyu 93622b6781 Completed phone calls test 2021-06-10 10:49:22 -04:00
JulioV f1960f00b1 Update Docker Vs Code setup 2021-06-08 17:50:11 -04:00
JulioV daced51ca9 Update changelog for v1.3.0 2021-06-01 15:55:50 -04:00
JulioV 9764a0b378 Fix outdated warning link 2021-06-01 15:26:51 -04:00
JulioV e123a14082 Improve aware_csv msg when CSV files don't exist 2021-06-01 10:57:17 -04:00
Meng Li 9687081fbe Refactor the rule phone_locations_add_doryab_extra_columns 2021-05-28 09:48:36 -04:00
Meng Li 91a767699d Merge branch 'feature/phone_locations_doryab_refactor' into develop 2021-05-26 17:39:22 -04:00
Meng Li 0d6f51be8b Refactor location features from Doryab provider & add a new strategy to infer home location & fix bugs 2021-05-26 17:36:52 -04:00
JulioV 32472461ec - Fix bug when no phone data yield is needed to process location data
- Remove location rows with the same timestamp based on their accuracy
2021-05-26 14:04:29 -04:00
Nikunj Goel 9b21196f35
Fixed `expected_minutes` to account for different time segments. (#136) 2021-05-26 11:44:48 -04:00
Meng Li 772e114eb5 Merge branch 'feature/fitbit_steps_fix' into develop 2021-05-21 15:36:38 -04:00
Meng Li edf71e055d Add the EXCLUDE_SLEEP module for steps intraday features 2021-05-21 15:23:21 -04:00
Nikunj Goel 5e451f99b0
Added phone keyboard features including docs/tests (#134) 2021-05-21 11:45:27 -04:00
JulioV e9cd9c94d7 Fix PID matching when joining data from participants 2021-05-11 16:49:04 -04:00
JulioV 32818a4802 Fix parse of pids with more than 1 devices 2021-05-11 16:42:20 -04:00
JulioV b3c05128fa Update docs with contributing guide 2021-05-10 15:48:14 -04:00
Meng Li 73090e4bee Merge branch 'feature/fitbit_sleep' into develop 2021-04-27 15:23:56 -04:00
Meng Li c900712960 Update config & docs of sleep features for example workflow 2021-04-27 14:41:48 -04:00
Meng Li 809845143f Test & fix bugs of sleep intraday features 2021-04-27 14:40:14 -04:00
Meng Li 7c7f34ec45 Test & fix bugs of sleep summary features 2021-04-27 14:40:14 -04:00
Meng Li 50fe09cfac Update data streams mutation of fitbit data 2021-04-27 14:40:14 -04:00
Meng Li 66d9a9d640 Update params & docs of sleep features 2021-04-27 14:33:19 -04:00
JulioV 29cc3f00e9 Release v1.1.1 2021-04-22 14:30:08 -04:00
JulioV 4beafd233d Fix crash when scraping data for an app that does not exist 2021-04-22 14:28:52 -04:00
JulioV ea8094e028 Fix length of periodic segments on days with DLS 2021-04-22 11:32:10 -04:00
Weiyu 0b4704de29 Validated phone screen tests 2021-04-21 12:15:28 -04:00
JulioV 17f520d5e6 Delete android from fitbit test pid 2021-04-20 17:28:06 -04:00
JulioV 1fa9481933 Rlease v1.1.0 2021-04-20 15:14:29 -04:00
JulioV 04fe60eced Merge branch 'feature/calories' into develop 2021-04-20 12:06:37 -04:00
JulioV 0e127e4412 Add calories features tests 2021-04-20 12:02:21 -04:00
JulioV 9c56422529 Add calories intraday features 2021-04-20 12:00:38 -04:00
Meng Li 20910bf1dc Remove fitbit device rows from timesegments_event file of testing module 2021-04-19 14:21:00 -04:00
Meng Li 84cca10f04 Update event segments with multi timezones for testing module 2021-04-19 13:39:44 -04:00
Meng Li 305cc9d4ad Update testing module for Fitbit 2021-04-16 18:32:20 -04:00
Meng Li 5d732a45ec Revert tests module's sleep config 2021-04-16 15:25:33 -04:00
Meng Li 59c3e367ff Add empty folder for tests module 2021-04-14 21:02:17 -04:00
Meng Li 8385582e13 Refactor tests module 2021-04-14 20:52:45 -04:00
JulioV 3e22c3a197 Refactor testing guide 2021-04-14 12:57:20 -04:00
JulioV aafe35107d Fix missing pid files directory 2021-04-08 16:50:18 -04:00
Weiyu 415ec4a716 reformated test cases and updated the test doc 2021-04-08 16:50:18 -04:00
Weiyu 580ec6d350 Added iOS test case 5 and 6 2021-04-08 16:50:18 -04:00
Weiyu 502c1c36e2 documented phone_battery_test and added event tests 2021-04-08 16:50:18 -04:00
Weiyu 7c1a6bc91d Added event config file 2021-04-08 16:50:18 -04:00
Meng Li 00a3335623 Add device_id column for sleep intraday episodes 2021-04-08 11:21:28 -04:00
JulioV 286d317af4 Fix crash when there are no periodic segments to assign
This includes a simplification of how periodic segments are computed based on all local dates in the data independently of their time zones
2021-04-07 12:03:25 -04:00
JulioV 9551669d47 Fix periodic segments bug when there are no segments to assign 2021-04-06 20:29:30 -04:00
Meng Li e043dd8815 Update configuration.md: device_id will not be relabelled for both phone&fitbit 2021-04-06 23:42:42 +00:00
Meng Li 78173c54ab Convert date time object to string in assign_tz_code() function 2021-04-06 23:28:53 +00:00
JulioV 1025e6d9d8 Fix datetime labels of event segments across multiple tzs 2021-04-06 13:58:58 -04:00
Meng Li 8909876cff Add local_segment column for phone data yield features 2021-04-05 21:13:36 +00:00
Meng Li 68125dc1bf Fix the bug of phone data yield features when the input is empty 2021-04-05 20:57:05 +00:00
JulioV 46f5e24814 Fix Fitbit tz inference from phone data 2021-04-05 11:51:57 -04:00
JulioV 636b64c61a Revert "Added more keyboard features."
This reverts commit 94c72e3172.
2021-04-05 11:25:00 -04:00
Meng Li 68e12a2563 Fix bugs of bluetooth feature extraction when number of unique bt_address is 2 2021-04-05 14:09:50 +00:00
Meng Li 8414977331 Fix the bug of utils.py when one participant have multi timezones 2021-04-03 21:20:10 -04:00
nikunjgoel95 ce1f2e1c95 Merge branch 'feature/keyboard' into develop 2021-04-01 20:54:40 -04:00
nikunjgoel95 94c72e3172 Added more keyboard features. 2021-04-01 20:54:13 -04:00
Meng Li 1ea5b74eff Fix the bug of utils.py when one participant have multi timezones 2021-03-31 19:34:14 -04:00
Meng Li 136dfef56b Fix the bug of Analysis Workflows while parsing targets with updated segments 2021-03-30 16:41:50 -04:00
JulioV 99dae079d5 Add iOS BT and Wifi visible to formats for old devices 2021-03-30 15:32:50 -04:00
JulioV ca2dfacbdd Small doc updates 2021-03-29 16:09:46 -04:00
JulioV 81be67d74d Add outdated warning to develop 2021-03-29 10:46:35 -04:00
JulioV 61d0300adc Merge branch 'fix/overlapping_periodic_segments' into develop 2021-03-28 15:30:11 -04:00
JulioV 30ad3cd586 Validate participant files without device ids 2021-03-28 15:29:08 -04:00
JulioV 87fbbbe402 Refactor and simplify time segments 2021-03-28 15:29:07 -04:00
JulioV c48c1c8f24 Optimize Barnett's computation multi-day segments 2021-03-28 15:29:07 -04:00
JulioV d0858f8833 Fix overlapping periodic time segments 2021-03-28 15:29:07 -04:00
Meng Li 1314f1d1cf Fix typos of fitbit-sleep-intraday.md 2021-03-25 16:59:50 -04:00
Meng Li f8afef8767 Add script & docs to create multi timezones file 2021-03-25 16:22:25 -04:00
Meng Li 7d175030c6 Update docs of creating participant files section 2021-03-25 11:50:21 -04:00
Meng Li e177aa6386 Update create participant files section 2021-03-25 11:39:31 -04:00
Meng Li a5eb535126 Update visualizations docs & add time flag for heatmap of overall data yield 2021-03-23 21:40:55 -04:00
JulioV c7e834d654 Update default value for DBSCAN_EPS 2021-03-19 12:04:12 -04:00
nikunjgoel95 7815c380a2 Merge branch 'feature/doryab_location_empty_df_fix' into develop
Adding the branch to fix infer home locations and empty dataframe.
2021-03-19 11:17:43 -04:00
nikunjgoel95 cfc5039918 Fixed the empty dataframe case in infer_home_locations.py and added array condition in doryab location 2021-03-19 11:15:57 -04:00
Meng Li 294d84277d Fix bug of sleep intraday PRICE provider when the dataframe of a segment is empty 2021-03-17 15:21:31 -04:00
Weiyu 7e919eaaeb Merge branch 'develop' of https://github.com/carissalow/rapids into develop 2021-03-16 22:52:50 -04:00
Weiyu 73c980e846 Added test cases for test05 and test06, updated the results 2021-03-16 22:51:30 -04:00
JulioV 771c14a928 Improve mysql containers error messages 2021-03-16 20:02:44 -04:00
JulioV 6e234f7951 Fix warn instead of stop when there are not device ids 2021-03-16 20:02:16 -04:00
JulioV 4c2f60fffd Fix bugs in readable datetime and screen episodes 2021-03-16 20:01:43 -04:00
JulioV bb737237d0 Fixes for aware_influxdb 2021-03-16 11:26:46 -04:00
JulioV d6c22fdbc7 Fix an import bug and docs 2021-03-15 19:35:58 -04:00
JulioV d9f4ea6fad Fix links in home page 2021-03-15 12:12:41 -04:00
JulioV f6ccc3c08c
Merge pull request #128 from carissalow/feature/multi_smartphone_app
Feature/multi smartphone app
2021-03-15 12:01:19 -04:00
JulioV 6b7ba28fc2 Change utterances theme based on global theme 2021-03-15 11:56:40 -04:00
JulioV 4528ab3641 Replace SRC LANGUAGE and FOLDER with SCRIPT 2021-03-14 22:14:13 -04:00
JulioV 5f355560de Update CSV example links and feature introduction 2021-03-14 17:53:57 -04:00
JulioV 8583fa1db0 Add utterances comments 2021-03-14 16:13:18 -04:00
JulioV 93916e8242 Update migration guide 2021-03-14 13:41:27 -04:00
JulioV f4b2bd1fb2 Cleanup data/ 2021-03-14 13:36:22 -04:00
JulioV 42cee67664 Add aware_influxdb in beta 2021-03-14 13:33:43 -04:00
JulioV e38d1fa8ba Update RAPIDS diagrams 2021-03-14 13:17:31 -04:00
JulioV 3ee8199574 Update docs 2021-03-14 11:42:29 -04:00
JulioV 3d4a04effe Refactor testing 2021-03-14 00:09:08 -05:00
JulioV 61edbbfb00 Update migration guide 2021-03-13 12:15:38 -05:00
Meng Li 4cf19075b3 Update analysis.md 2021-03-12 20:51:56 -05:00
Meng Li ffaa8f8a1c Update create_example_participant_files rule 2021-03-12 20:04:43 -05:00
Meng Li 2b6447105a Migrate analysis example to new data stream 2021-03-12 19:52:34 -05:00
JulioV fae0c2ac05 Swap TABLE for CONTAINER 2021-03-12 18:14:49 -05:00
JulioV 8df629b403 Update FAQ 2021-03-12 17:26:47 -05:00
JulioV 8e9a3bf4c5 Update location accuracy default 2021-03-12 17:18:08 -05:00
JulioV 76460357bd Update change log and migration guide 2021-03-12 16:31:37 -05:00
JulioV f9ebcd35ff Create migration guide for 0.4.0 2021-03-12 16:31:37 -05:00
JulioV 57bd1a75dc Add where to start guide and update docs 2021-03-12 16:31:37 -05:00
Meng Li d529490999 Migrate fitbit features to new data stream 2021-03-12 12:38:36 -05:00
JulioV 6e898beca5 Add aware_csv 2021-03-11 19:32:11 -05:00
JulioV 2e030b377d Update minimal workflow 2021-03-11 15:22:23 -05:00
JulioV 1b8453bec4 Remove unused params from config.yaml 2021-03-11 14:57:34 -05:00
JulioV 13174b0c2a Fix a bug when fitbit data is empty 2021-03-11 14:51:16 -05:00
JulioV 2ee45995f2 Update config docs and create participant files script 2021-03-11 14:40:33 -05:00
JulioV 1e66dad838 Fix bug in empatica_zip container script 2021-03-11 14:39:26 -05:00
JulioV a79997e0ac Add empatica_zip docs 2021-03-11 14:39:26 -05:00
JulioV 8c4ac1fd43 Update renv 2021-03-11 14:39:26 -05:00
JulioV d48194fc07 Add fitbitparsed_csv 2021-03-11 14:39:26 -05:00
JulioV b97b70e3a1 Add fitbitjson_csv 2021-03-11 14:39:26 -05:00
JulioV 470f4276af Add fitbitparsed_mysql 2021-03-11 14:39:26 -05:00
JulioV 1b0ee4bbf0 Add sleep intraday to fitbitjson_mysql 2021-03-11 14:39:20 -05:00
JulioV a420f5ef92 Add sleep summary to fitbitjson_mysql 2021-03-11 14:37:22 -05:00
Meng Li cf0afeb08d Update docs & links in config.yaml 2021-03-11 14:37:22 -05:00
Meng Li 93baff9f83 Migrate phone keyboard sensor to new data stream 2021-03-11 14:37:22 -05:00
Meng Li 35968e2fd0 Migrate phone log sensor to new data stream 2021-03-11 14:37:22 -05:00
Meng Li 091f9c048a Migrate phone apps notifications sensor to new data stream 2021-03-11 14:37:22 -05:00
Meng Li b49dab0949 Migrate phone apps crashes sensor to new data stream 2021-03-11 14:37:22 -05:00
JulioV 47e1b33816 Add hr intraday to fitbitjson_mysql 2021-03-11 14:37:22 -05:00
JulioV 47f449555a Add hr summary to fitbitjson_mysql 2021-03-11 14:37:22 -05:00
JulioV 72f6b2d621 Add steps intraday to fitbitjson_mysql 2021-03-11 14:37:22 -05:00
JulioV 9a276c1c66 Add steps summary to jsonfitbit_mysql 2021-03-11 14:37:22 -05:00
JulioV 2ea9944059 Update docs for add new data stream 2021-03-11 14:37:22 -05:00
Meng Li 13290cd444 Migrate phone wifi visible sensor to new data stream 2021-03-11 14:36:52 -05:00
Meng Li d42c6e9c91 Migrate phone wifi connected sensor to new data stream 2021-03-11 14:35:34 -05:00
Meng Li 4825962361 Migrate phone screen sensor to new data stream 2021-03-11 14:35:34 -05:00
Meng Li bd4f647d37 Migrate phone messages sensor to new data stream 2021-03-11 14:35:34 -05:00
Meng Li 3a65b3864d Migrate phone locations sensor to new data stream 2021-03-11 14:35:34 -05:00
Meng Li 7a50a52a9d Migrate phone light sensor to new data stream 2021-03-11 14:35:34 -05:00
Meng Li c1682d8cd3 Migrate phone calls sensor to new data stream 2021-03-11 14:35:34 -05:00
Meng Li 0e96f39599 Migrate phone bluetooth sensor to new data stream 2021-03-11 14:35:34 -05:00
Meng Li 6d06d2b1eb Update MUTATION structure code & docs for AR, Apps foreground, and battery sensors 2021-03-11 14:35:34 -05:00
JulioV 6970954358 Change MUTATION structure 2021-03-11 14:35:34 -05:00
JulioV 58ef276179 Add stream_parameters arg to phone and empatica mutation scripts 2021-03-11 14:35:34 -05:00
JulioV 1063b4ca65 Add steps summary to fitbitjson_mysql 2021-03-11 14:35:34 -05:00
Meng Li f7cf316133 Migrate phone battery sensor to new data stream 2021-03-11 14:35:34 -05:00
Meng Li 824523e32c Migrate phone apps foreground sensor to new data stream 2021-03-11 14:35:34 -05:00
Meng Li 6b13c80e40 Fix bug while checking OS in stream_schema 2021-03-11 14:35:34 -05:00
JulioV 8c79cfc56f Fix OS specific dependcies 2021-03-11 14:35:34 -05:00
JulioV 7372fca0dd Fix bug when sensor is not available for an OS 2021-03-11 14:35:34 -05:00
JulioV 41711fcdb7 Rename download_data add support for py containers 2021-03-11 14:35:34 -05:00
JulioV 4b33ee43ba Replace .env with credentials.yaml 2021-03-11 14:35:34 -05:00
JulioV f65e3c8b1a Migrate empatica sensors to new data stream 2021-03-11 14:35:34 -05:00
Meng Li 2eae84ff05 Add docs of AR & conversation sensors 2021-03-11 14:35:34 -05:00
JulioV fb054b539f Add support for multiple time zones 2021-03-11 14:35:34 -05:00
JulioV f53b74e280 Update docs for multiple time zones 2021-03-11 14:35:34 -05:00
JulioV 6c51c6c239 Move phone_conversation to aware_mysql stream 2021-03-11 14:35:34 -05:00
JulioV ab1a3dbf79 Move phone_activity_recognition to aware_mysql stream 2021-03-11 14:35:34 -05:00
JulioV dc11cb593d Add support for smartphone sources and schemas.
Initial support for accelerometer
Update docs for automatically create participants
Update docs for initial multiple time zones
2021-03-11 14:35:33 -05:00
Weiyu e417aa3a6a added verified phone screen result files 2021-03-09 18:08:38 -05:00
Meng Li fc5b5eead8 Update fitbit-sleep-intraday.md 2021-03-04 15:55:49 -05:00
Meng Li aac87311e8 Update socialjetlag feature of sleep intraday: replace bedtime with midpoint sleep 2021-03-04 15:49:25 -05:00
Meng Li 8992a9c9e2 Merge branch 'feature/fitbit_sleep_intraday' into develop 2021-02-26 17:50:03 -05:00
Meng Li 7b4598357d Update the PRICE provider's example in sleep intraday docs 2021-02-26 17:47:01 -05:00
Meng Li 46d1575ce8 Add "device_id" col to PLAIN_TEXT example of sleep intraday docs 2021-02-26 17:47:01 -05:00
Meng Li 716ff3c592 Fix PRICE provider's bug when input is an empty dataframe 2021-02-26 17:47:01 -05:00
Meng Li 5c84c71f60 Add validation for FITBIT_SLEEP_INTRADAY config section 2021-02-26 17:47:01 -05:00
Meng Li d74196cab4 Add sleep intraday features with PRICE provider 2021-02-26 17:47:01 -05:00
Meng Li 2d5e966530 Update fitbit sleep intraday docs & config file 2021-02-26 17:47:01 -05:00
JulioV 9a49644fc6 Update docs 2021-02-26 17:47:01 -05:00
Meng Li c82f5952e6 Add Docs of fitbit sleep intraday features with PRICE provider 2021-02-26 17:47:01 -05:00
Meng Li bedf7106e5 Fix sleep episodes bug 2021-02-26 17:47:01 -05:00
Meng Li 8377c12efb Add sleep intraday features with RAPIDS provider 2021-02-26 17:47:01 -05:00
JulioV f565ac8a11 Review doc changes 2021-02-26 17:47:01 -05:00
Meng Li d0c0f876f4 Add docs of Fitbit sleep intraday features 2021-02-26 17:47:01 -05:00
JulioV 44c2a191f4 Merge branch 'feature/phone-battery-test' into develop 2021-02-25 13:46:20 -05:00
Weiyu edf741306a added raw data for battery testing to test05 and test06
Validation for phone battery feature (test 3,4,5,6)

Added test case for battery status 3 in iOS device

Set all battery level with battery status 4 to the same with the previous one, so they have no change on battery level

Marked phone battery frequency as availiable
2021-02-25 13:45:25 -05:00
JulioV aa429bef64 Update git flow to add new features 2021-02-25 13:41:55 -05:00
JulioV 0b57b80e54 Merge branch 'feature/location_doryab_home_location' into develop 2021-02-24 17:51:30 -05:00
JulioV 724027e383 Small fixes to timeathome docs, add config validation 2021-02-24 17:49:22 -05:00
nikunjgoel95 3d6caea6c4 Added the timeathome feature using infer_home_location.py as interim file. 2021-02-24 16:57:25 -05:00
JulioV a16ebca563 Merge branch 'feature/config_schema_documentary' into develop 2021-02-23 16:28:03 -05:00
JulioV 1870f513a6 Remove unsused pngs 2021-02-23 16:27:46 -05:00
JulioV d73bfdde1f Add `config.yaml` validation documentation. 2021-02-23 16:22:14 -05:00
Weiyu 1a9bcf8e37 Initial documentary for configuration schema 2021-02-23 11:07:51 -05:00
JulioV bef868671e Merge branch 'feature/config_validation' into develop 2021-02-21 20:00:41 -05:00
JulioV 5abca8bb0f Missing config validations 2021-02-21 19:57:21 -05:00
Weiyu 6bad9066f8 Added time_segment feature 2021-02-21 19:55:55 -05:00
JulioV e2d45460f7 Apply suggestions from code review 2021-02-21 19:55:55 -05:00
Weiyu 327b015206 Add validation for config keys 2021-02-21 19:55:55 -05:00
JulioV 84a8a93082 Initial support for a config schema 2021-02-21 19:55:27 -05:00
JulioV 12fffb9c63 Set TZDIR on M1 only, update installation for M1s 2021-02-21 18:23:12 -05:00
JulioV 135ebb2478 Disable line for M1 timezone fix 2021-02-21 17:44:39 -05:00
JulioV 4819e22fd5 Merge branch 'feature/dbdp-empatica' into develop 2021-02-21 17:33:33 -05:00
JulioV 9668dfac7a Update docs to support Empatica 2021-02-21 17:32:41 -05:00
JulioV faefca8b9a Fix extra index column when dataset is empty 2021-02-21 17:32:41 -05:00
JulioV 2e46f56111 Empatica zips must be placed in pid folder and small fixes 2021-02-21 17:32:41 -05:00
Joe Kim a26a44819a Add stats features for empatica bvp, eda, ibi, temp
Fix Snakefile file indentention
2021-02-21 17:32:41 -05:00
Joe Kim 4469cfd6bb add stats features for bvp, eda, ibi, temp 2021-02-21 17:32:41 -05:00
JulioV c6dc7e675a Add stats features for empatica heartrate
Turn off all empatica compute features
2021-02-21 17:32:41 -05:00
JulioV 3bb0230bac Add statistic features for empatica accelerometer 2021-02-21 17:32:41 -05:00
Juseong Kim 5f5f19866f implement extract_empatica_data script
add support for all data types

Fix name comparison of zipped files
2021-02-21 17:32:41 -05:00
JulioV 4b9857562b Add support for zip input files 2021-02-21 17:32:41 -05:00
JulioV 8c726f5d4f Start empatica support 2021-02-21 17:32:41 -05:00
abhineethreddyk 8b2f8c3ce1 Merge branch 'feature/update_testcase_docs' into develop 2021-02-16 19:09:53 -05:00
abhineethreddyk 407ef14925 Updated test cases docs with active tests table 2021-02-16 19:08:32 -05:00
JulioV d8813e2d04 Fix bug when any of the rows from any sensor do not belong a time segment 2021-02-09 14:51:54 -05:00
JulioV 47bd695249 Deprecate Doryab circadian movement feature until it is fixed 2021-02-03 09:46:29 -05:00
Meng Li 6e59452c34 Update mkdocs.yml and add-new-features.md 2021-02-02 18:50:20 -05:00
Meng Li b67f990816 Add new `FITBIT_DATA_YIELD` `RAPIDS` provider 2021-02-02 18:30:21 -05:00
JulioV ec1c211599 Update change log for v0.4.2 so far 2021-02-02 12:05:01 -05:00
JulioV 1685b5cba6 Change default value for CLUSTER_ON 2021-02-02 12:04:48 -05:00
JulioV 8ddb431e9f
Merge pull request #114 from carissalow/feature/doryab_timeDuration_fix
Feature/doryab time duration fix
2021-02-02 12:01:47 -05:00
nikunjgoel95 9b248c449d Fixing and adding MAXIMUM_ROW_DURATION. 2021-02-02 11:38:13 -05:00
nikunjgoel95 e7fc8f44f2 Removing Sampling Frequency and fixing ROG, location entropy and normalized location entropy. 2021-02-02 11:38:08 -05:00
nikunjgoel95 cc2127e72d Updated documentation for Duration. 2021-02-02 11:36:48 -05:00
nikunjgoel95 0bbf15f52e Fixed the features dependent on time duration. 2021-02-02 11:36:14 -05:00
nikunjgoel95 4746d7ab6c Added Observation in Doryab Feature docs. 2021-02-02 11:34:56 -05:00
Meng Li 3d0d062491 Fix HR summary bug: do not consider rows with restinghr=0 2021-02-01 17:29:30 -05:00
JulioV 1b664d9766 Update docs/faq.md 2021-02-01 16:12:57 -05:00
ThisWei777 77991d72f1 Added two FAQs
Added FAQs for Unimplemented MAX_NO_FIELD_TYPES and Running RAPIDS on Apple Silicon M1 Mac
2021-02-01 16:12:57 -05:00
Meng Li f83c5a585e Fix HR intraday bug: minutesonZONE features are 0 2021-02-01 13:57:12 -05:00
JulioV aefc794274 Fix location processing when certain columns don't exist 2021-02-01 11:49:22 -05:00
abhineethreddyk 5296f9130f Merge branch 'feature/update_testing_battery_periodic' into develop 2021-01-31 20:25:48 -05:00
abhineethreddyk dbf57f43f3 Updated battery feature and its testing for periodic 2021-01-31 20:24:43 -05:00
abhineethreddyk 91168dac0d Merge branch 'feature/testing_battery_periodic' into develop 2021-01-27 23:06:16 -05:00
abhineethreddyk 47ad6261cb Updated testing for battery (periodic) 2021-01-27 23:04:43 -05:00
JulioV 0b87aa3b36 Fix bug when no error message was displayed for an empty `[PHONE_DATA_YIELD][SENSORS]` when resampling location data 2021-01-27 17:12:53 -05:00
ThisWei777 f1118bbf63
R 4.0.3 timezone issue (#112)
* Update docs/faq.md
2021-01-27 16:35:21 -05:00
JulioV ea6894f2ff Update change log for v0.4.0 2021-01-26 13:40:45 -05:00
JulioV eb258d874a Fix minor issues in docs 2021-01-26 13:27:53 -05:00
Meng Li 25a3492eba Drop rows without "assigned_segments" column before feature extraction 2021-01-21 19:41:17 -05:00
Meng Li 797de54b34 Fix merge bug of fetch_provider_features() function 2021-01-21 14:58:31 -05:00
abhineethreddyk c33baa8fd9 Merge branch 'feature/update_testing_docs' into develop 2021-01-20 22:42:46 -05:00
abhineethreddyk 8ce9059e93 Updated testing docs 2021-01-20 22:40:35 -05:00
Meng Li 5f60aac5c8 Fix KeyError bug of parsing steps data 2021-01-20 11:26:28 -05:00
abhineethreddyk 3a62f09101 Merge branch 'feature/updating_testing' into develop 2021-01-14 20:46:09 -05:00
abhineethreddyk 7317821e02 Updated testing for light and conversation (periodic and frequency) 2021-01-14 20:44:18 -05:00
abhineethreddyk 8fb0fda861 Merge branch 'feature/testing_wifi' into develop 2021-01-14 17:44:56 -05:00
abhineethreddyk 377f1a6a5e Updated testing for wifi (periodic and frequency) 2021-01-14 17:41:57 -05:00
JulioV 17f3e5d598 Update nul FAQ 2021-01-14 15:25:08 -05:00
JulioV 8ef2e6191f Update snakefile and config for tests 2021-01-14 15:24:55 -05:00
JulioV fa10cbf86d Merge branch 'feature/fix-location-processing' into develop 2021-01-14 14:35:36 -05:00
JulioV d0fe4d4c28 Add ALL_RESAMPLED flag and accuracy limit 2021-01-14 14:34:25 -05:00
JulioV 4c1e311135 Add error msg for invalid phone data yield sensors 2021-01-14 14:31:40 -05:00
JulioV 585bf7bc5d Add code so new feature providers can be added for the new four sensors 2021-01-14 14:31:40 -05:00
JulioV 8fd1d9dc29 Add four new sensors without providers 2021-01-14 14:31:40 -05:00
JulioV 38fadbf202
Feature/doryab location clustering (#111)
* Added OPTICS -  lightweight clustering algorithm.

* Changed the error message for inconsistent parameters in CONFIG

* Removing hardcoded values and changing default EPS value in the clustering algorithm.

* Added Observation in Doryab Feature docs.

Co-authored-by: nikunjgoel95 <nikunjgoel2009@gmail.com>
2021-01-14 14:22:51 -05:00
JulioV 22f2bfd211 Docker repo only builds on release 2021-01-07 17:16:01 -05:00
JulioV ee2882fe2a Modified change log 2021-01-07 16:27:29 -05:00
JulioV b7ba3c6407
Feature/location doryab fix (#109)
* Fixing the doryab location features for context of clustering.

* Fixed the wrong shifting while calculating the distance.

* Refractoring the haversine function

* Removed comments.

* Cleaning parts of the code.

* Updated the documentation for CLUSTER_ON parameter.

Co-authored-by: nikunjgoel95 <nikunjgoel2009@gmail.com>
2021-01-07 16:20:46 -05:00
JulioV 9fc48ee0dc Merge branch 'feature/fitbit-fix' into develop 2021-01-06 12:10:38 -05:00
JulioV 3dd0e989a7 Update Doryab location docs 2021-01-06 12:09:06 -05:00
JulioV 4926497ae2 Fix bugs in Fitbit data parsing
- Fix the script that was breaking with an empty file
- Fix the script that was breaking when start/end dates were empty
- Ambigous and nonexistent DST times are handled now
- Remove unnecessary else clause
2021-01-06 11:43:01 -05:00
JulioV 5203aa60d1 Fix bugs in create participants files script
- The PHONE and FITBIT flags were mixed up
- The start/end dates from the CSV file weren't being parsed correctly
2021-01-06 11:14:15 -05:00
JulioV 3a80f93771 Fix segment error when device ids is empty 2021-01-06 11:12:10 -05:00
JulioV af048b213d Merge branch 'feature/rlib-fix' into develop 2021-01-05 19:26:53 -05:00
JulioV b343399ffa Add libglpk40 dependency
To CI tests, dockerfile and installation

Add libglpk40 to dockerfile and installation
2021-01-05 19:24:49 -05:00
JulioV 7921bc28a9 Update RSPM URL in tests CI 2021-01-05 18:43:19 -05:00
JulioV 9a55efaced Merge branch 'feature/docker-updates' into develop 2021-01-05 18:38:36 -05:00
JulioV aaa9ad22a2 Update changelog 2021-01-05 18:37:21 -05:00
JulioV f6e66a43f0 Clarify in DB credential configuration that we only support MySQL 2021-01-05 18:36:55 -05:00
JulioV f521d45b42 Update CI to create a release on a tagged push that passes the tests 2021-01-05 18:36:33 -05:00
JulioV 9418e6e936 Update docker and linux instructions to use RSPM binary repo for for faster installation 2021-01-05 18:36:19 -05:00
JulioV 2b1f3f230c v0.3.1 2020-12-21 16:30:46 -05:00
JulioV 29e3d9bf37 - Update R and Python virtual environments
- Add GH actions CI support for tests and docker
- Add release and test badges to README
2020-12-20 18:35:54 -05:00
JulioV 5551a1e6f3 Merge branch 'feature/docker-ci' into develop 2020-12-20 18:11:56 -05:00
JulioV 146db9db61 Add docker gha 2020-12-20 18:06:01 -05:00
JulioV 57d51378a3 Merge branch 'hotfix/v0.2.6' into develop 2020-12-20 17:04:08 -05:00
JulioV 67e0caa2fe Fix old versions banner on nested pages 2020-12-20 17:03:11 -05:00
JulioV 51c7739bfc Revert "Start support for phone_keyboard"
This reverts commit dd95b4f941.
2020-12-20 16:29:40 -05:00
JulioV 5e87db9952 Merge branch 'feature/tests_ci' into develop 2020-12-20 15:56:43 -05:00
JulioV 62067a865c Add curl dependcy for R 2020-12-20 14:26:18 -05:00
JulioV b2f903cb6d Remove unused files 2020-12-20 14:23:40 -05:00
JulioV 46b99a83c8 Merge branch 'feature/conda-cache' into develop 2020-12-20 13:57:03 -05:00
JulioV d39b700b5f Update git flow docs, set keyboard flag to false 2020-12-20 13:55:41 -05:00
JulioV ac4526df5d Update docs, add renv and conda cache,& release CI 2020-12-20 13:54:31 -05:00
JulioV 9bf6042ea9 Merge branch 'hotfix/v0.2.5' into develop 2020-12-18 22:47:12 -05:00
JulioV d352b2c607 Fix docs deploy typo 2020-12-18 22:46:30 -05:00
JulioV 9e8178edbd Merge branch 'hotfix/v0.2.4' into develop 2020-12-18 22:36:25 -05:00
JulioV 265163d228 Fix broken links in landing page and docs deploy 2020-12-18 22:35:10 -05:00
JulioV 079547e60d Update virtual envs 2020-12-18 14:06:28 -05:00
JulioV 41f19ed781 Merge branch 'hotfix/v0.2.3' into develop 2020-12-18 11:27:39 -05:00
JulioV 1b65f11b89 Fix participant IDS in the example analysis workflow 2020-12-18 11:27:04 -05:00
JulioV e263546bf8 Add release and test badges 2020-12-18 11:03:58 -05:00
abhineethreddyk 749ee55955 Migrated testing from Travis to Github Actions - part 4 2020-12-15 01:03:14 -05:00
abhineethreddyk 29c85ee46f Migrated testing from Travis to Github Actions - part 3 2020-12-14 22:33:50 -05:00
abhineethreddyk e41b14af6d Migrated testing from Travis to Github Actions - part 2 2020-12-14 22:26:07 -05:00
abhineethreddyk 6234675a36 Merge branch 'develop' of https://github.com/carissalow/rapids into develop 2020-12-14 22:16:12 -05:00
abhineethreddyk c59d4a66eb Migrated testing from Travis to Github Actions and updated config and Snakemake files in the testing directory - part 1 2020-12-14 22:15:39 -05:00
JulioV dd95b4f941 Start support for phone_keyboard 2020-12-14 13:42:22 -05:00
1253 changed files with 53243 additions and 41616 deletions

7
.gitattributes vendored 100644
View File

@ -0,0 +1,7 @@
# We'll let Git's auto-detection algorithm infer if a file is text. If it is,
# enforce LF line endings regardless of OS or git configurations.
* text=auto eol=lf
# Isolate binary files in case the auto-detection algorithm fails and
# marks them as text files (which could brick them).
*.{png,jpg,jpeg,gif,webp,woff,woff2} binary

View File

@ -7,28 +7,16 @@ assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
This form is only for bug reports. For questions, feature requests, or feedback use our [Github discussions](https://github.com/carissalow/rapids/discussions)
**To Reproduce**
Steps to reproduce the behavior:
1. Enable ... feature provider
2. Setup ... sensor parameters
3. Run RAPIDS
4. etc ...
Please make sure to:
**Expected behavior**
A clear and concise description of what you expected to happen.
* [ ] Debug and simplify the problem to create a minimal example. For example, reduce the problem to a single participant, sensor, and a few rows of data.
* [ ] Provide a clear and succinct description of the problem (expected behavior vs actual behavior).
* [ ] Attach your `config.yaml`, time segments file, and time zones file if appropriate.
* [ ] Attach test data if possible, and any screenshots or extra resources that will help us debug the problem.
* [ ] Share the commit you are running: `git rev-parse --short HEAD`
* [ ] Share your OS version (e.g. Windows 10)
* [ ] Share the device/sensor your are processing (e.g. phone accelerometer)
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Please complete the following information:**
- OS: [e.g. MacOS]
- RAPIDS current commit, paste the output of `git rev-parse --short HEAD`
- A link to your `config.yaml`
- Type of mobile data you are dealing with (Android/iOS)
**Additional context**
Add any other context about the problem here.
<!-- You can erase any parts of this template not applicable to your Issue. -->

View File

@ -1,20 +0,0 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.

30
.github/workflows/docker.yaml vendored 100644
View File

@ -0,0 +1,30 @@
name: docker
on:
release:
types: [edited, released]
jobs:
main:
runs-on: ubuntu-20.04
steps:
-
name: Set up QEMU
uses: docker/setup-qemu-action@v1
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
-
name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
-
name: Build and push
id: docker_build
uses: docker/build-push-action@v2
with:
push: true
tags: moshiresearch/rapids:latest
-
name: Image digest
run: echo ${{ steps.docker_build.outputs.digest }}

View File

@ -9,14 +9,18 @@ jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- if: ${{ github.ref == 'refs/heads/develop' }} #we delay develop because when we release a hotgix (tag + develop push), one of these push will be out of sync
uses: jakejarvis/wait-action@master
with:
fetch-depth: 0
time: '60s'
- uses: actions/setup-python@v2
with:
python-version: 3.x
- run: pip install git+https://${GH_TOKEN}@github.com/carissalow/mkdocs-material-insiders.git
- run: pip install mike
- uses: actions/checkout@v2
with:
fetch-depth: 0
- run: |
git config user.name github-actions
git config user.email github-actions@github.com

83
.github/workflows/tests.yaml vendored 100644
View File

@ -0,0 +1,83 @@
name: tests
on:
push:
branches-ignore:
- "master"
tags:
- "v[0-9]+.[0-9]+.[0-9]+"
pull_request:
branches:
- "develop"
env:
RENV_PATHS_ROOT: ~/.local/share/renv
jobs:
test-on-latest-ubuntu:
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- run: "sed -i 's/name:.*/name: rapidstests/g' environment.yml"
- run: echo "RELEASE_VERSION=${GITHUB_REF#refs/*/}" >> $GITHUB_ENV
- run: echo "RELEASE_VERSION_URL=$(echo $RELEASE_VERSION | sed -e 's/\.//g')" >> $GITHUB_ENV
- run : |
sudo apt update
sudo apt install libglpk40
# sudo apt install libcurl4-openssl-dev
# sudo apt install libssl-dev
# sudo apt install libxml2-dev
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/'
sudo apt install r-base
- name: Cache R packages
uses: actions/cache@v2
id: cacherenv
with:
path: ${{ env.RENV_PATHS_ROOT }}
key: ${{ runner.os }}-renv-${{ hashFiles('**/renv.lock') }}
restore-keys: |
${{ runner.os }}-renv-
- name: Install R dependencies
if: steps.cacherenv.outputs.cache-hit != 'true'
run: sudo apt install libcurl4-openssl-dev
- name: Restore R packages
shell: Rscript {0}
run: |
if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv")
renv::restore(repos = c(CRAN = "https://packagemanager.rstudio.com/all/__linux__/focal/latest"))
- name: Cache conda packages
uses: actions/cache@v1
env:
# Increase this value to reset cache if environment.yml has not changed
CACHE_NUMBER: 0
with:
path: ~/conda_pkgs_dir
key:
${{ runner.os }}-conda-${{ env.CACHE_NUMBER }}-${{
hashFiles('**/environment.yml') }}
- name: Restore conda packages
uses: conda-incubator/setup-miniconda@v2
with:
activate-environment: rapidstests
environment-file: environment.yml
use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!
- name: Run tests
shell: bash -l {0}
run : |
conda activate rapidstests
bash tests/scripts/run_tests.sh -t all
- name: Release tag
if: success() && startsWith(github.ref, 'refs/tags')
id: create_release
uses: actions/create-release@v1
env:
GITHUB_TOKEN: ${{ secrets.RAPIDS_RELEASES_TOKEN }} # This token is provided by Actions, you do not need to create your own token
with:
tag_name: ${{ github.ref }}
release_name: ${{ github.ref }}
body: |
See [change log](http://www.rapids.science/latest/change-log/#${{ env.RELEASE_VERSION_URL }})
draft: false
prerelease: false

20
.gitignore vendored
View File

@ -93,9 +93,17 @@ packrat/*
# exclude data from source control by default
data/external/*
!/data/external/empatica/empatica1/E4 Data.zip
!/data/external/.gitkeep
!/data/external/stachl_application_genre_catalogue.csv
!/data/external/timesegments*.csv
!/data/external/wiki_tz.csv
!/data/external/main_study_usernames.csv
!/data/external/timezone.csv
!/data/external/play_store_application_genre_catalogue.csv
!/data/external/play_store_categories_count.csv
data/raw/*
!/data/raw/.gitkeep
data/interim/*
@ -111,4 +119,14 @@ sn_profile_*/
!sn_profile_rapids
settings.dcf
tests/fakedata_generation/
site/
site/
credentials.yaml
# Docker container and other files
.devcontainer
# Calculating features module
calculatingfeatures/
# Temp folder for rapids data/external
rapids_temp_data/

View File

@ -1,74 +0,0 @@
services:
- mysql
- docker
sudo: required
language: python
jobs:
include:
- stage: Tests
name: Python 3.7 on Xenial Linux
os: linux
language: python
python: 3.7
before_install:
- /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
- export PATH=/home/linuxbrew/.linuxbrew/bin:$PATH
- source ~/.bashrc
- sudo apt-get install linuxbrew-wrapper
- brew tap --shallow linuxbrew/xorg
- brew install r
- R --version
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O
miniconda.sh;
- bash miniconda.sh -b -p $HOME/miniconda
- source "$HOME/miniconda/etc/profile.d/conda.sh"
- hash -r
- conda config --set always_yes yes --set changeps1 no
install:
- conda init bash
- conda update -q --all --yes conda
- conda env create -q -n test-environment python=$TRAVIS_PYTHON_VERSION --file
environment.yml
- conda activate test-environment
- snakemake -j1 renv_install
- R -e 'renv::settings$use.cache(FALSE)'
- snakemake -j1 renv_restore
cache:
directories:
- "/home/travis/.linuxbrew"
- "$HOME/.local/share/renv"
- "$TRAVIS_BUILD_DIR/renv/library"
script:
- bash tests/scripts/run_tests.sh all test
- stage: deploy
name: Python 3.7 on Xenial Linux Docker
os: linux
language: python
script:
- docker build -t rapids .
- docker login -u "agamk" -p $DOCKERPWD
- docker tag rapids agamk/rapids:travislatest
- docker push agamk/rapids:travislatest
branches:
only:
- master
- time_segment
stages:
- name: deploy
if: branch = master AND \
type = push
notifications:
email: false
slack:
secure: cJIpmIjb3zA5AMDBo9axF1v6fYNIgMm6s6UdMNOlHiT511xHGsaLUFej3lACwQLig4Gr94ySI61YdrP+RX1lFcYxusH+kUU/c8LX0PmSKNeKnycM3w/pCM+yTp/6oQG6ZrJD7pNm6zhB0xPL61uSmYhcr+JJ1sh4iLiON+J8/C+IfnAHm1ORkxJ0IxASkiP/LvaiAQDw8lNyYIZNWjSDNZbx68o1VNakyk6Vik3x8omiE3w33rzI2/JAx//QTxOq2J0dtV1AqYYSOWS4iXblV09NLBqgGrhAhrQ6+TbPHSPIyL/4EdhvS+YXO+SBWS7ODD7j/MuL6XiA4SujW72od2rgXNmOjFnlQvIrULO5bzv39BKKDkldvz9+XCyXLcjoLIwA/rmUnwMndNoC7NoD/CkQEevUxswXXB9811BmIFx/7GOHouVxwB2gaMAzkCroZJVwgbrc6ESSOVE5SMcb3wPMbpd8cXOgVZXJcmk5wK206zxXPigCvFfknqOnwDqRgyIWSFoTd/2wHppA7ND3R5U42nQTbEQ7MiONsOo61GlJTTxJELz32sLKl388AuAgOY7+0sqPibxMaHJkF1V4nYVTH0/H5bO/edK4VHMloJ6s0kuyko7LT5EMQf3pBJij5TnYmD2E60t+bSBAxHuH7WA5dvL+igjGEwROnxDc9pc=
on_success: always
template:
- Repo `%{repository_slug}` *%{result}* build (<%{build_url}|#%{build_number}>)
for commit (<%{compare_url}|%{commit}>) on branch `%{branch}`.
- 'Execution time: *%{duration}*'
- 'Message: %{message}'
env:
global:
secure: FD2aOa8L3lWf1xClZ24uS59SOBjMH16sdSLPGkb6bQLwrKAQw6BVna5wOw3iRscZtx2iqEQw3LdLmNb6ftI4fgHhf7qoAZlKVlc2Q8wU4L623Ad8S//2Ny1AxXRyzwmRw4emmIUqRXiGaeZYkzcptf38+2d9PjHazVsL3A6T2vFK+VAQmZBq3Iblx0i3g25qevQxFUACH1FIpZsmn08cesblZp0MiQ7GOq4YhBAqmbraT4/w7yFe1rwm/yPSWeBQKu8tZeZnEW6/FPYidxxuBgl/BxTdVuIKHcVzL95Mu4q6Y7uVaYeGYgyxai8eyntpY2dPu0wN1ng4JxulwqKBdxkWFPdbBJSGnYQq5EmrqULjro7wk9GVLSN9Lx0QjcmZRbNbDH0rpgxcXS9mtvzmgFbmatdsMa3VrObqKL2yYMsPZ6e5N4ve3gTU5+sm6oz/zYNWK2CDN2f08BJuaoKv9hETTfvWaZitKT7lFZ2LpsDdHSPUtRiAviDcLZcCZsTQjyCi6JeKSF2aMQ0+4rCsZgFkqpmjEVJB5N6DMkdZaUn+4HrbGsivAHWQsDcvPTD4n2CUcboV407NFsckr3PlDy0+fNNHr2h45VjO7DxAwDIJAdiwlhbj9l9gn8i3aZOtMCT6p4xIC2CgqOcY4yOTHmyOswJwnkz3uoSOq3eNLR4=

View File

@ -6,6 +6,7 @@ RUN apt update && apt install -y \
libssl-dev \
libxml2-dev \
libmysqlclient-dev \
libglpk40 \
mysql-server
RUN apt-get update && apt-get install -y gnupg
RUN apt-get update && apt-get install -y software-properties-common
@ -15,6 +16,7 @@ RUN apt update && apt install -y r-base
RUN apt install -y pandoc
RUN apt install -y git
RUN apt-get update && apt-get install -y vim
RUN apt-get update && apt-get install -y nano
RUN apt update && apt install -y unzip
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
ENV PATH /opt/conda/bin:$PATH
@ -42,7 +44,7 @@ RUN conda update -n base -c defaults conda
WORKDIR /rapids
RUN conda env create -f environment.yml -n rapids
RUN Rscript --vanilla -e 'install.packages("rmarkdown", repos="http://cran.us.r-project.org")'
RUN R -e 'renv::restore()'
RUN R -e 'renv::restore(repos = c(CRAN = "https://packagemanager.rstudio.com/all/__linux__/focal/latest"))'
ADD https://osf.io/587wc/download data/external
RUN mv data/external/download data/external/rapids_example.sql.zip
RUN unzip data/external/rapids_example.sql.zip

191
README.md
View File

@ -1,6 +1,7 @@
![GitHub release (latest SemVer)](https://img.shields.io/github/v/release/carissalow/rapids?style=plastic)
[![Snakemake](https://img.shields.io/badge/snakemake-≥5.7.1-brightgreen.svg?style=flat)](https://snakemake.readthedocs.io)
[![Documentation Status](https://github.com/carissalow/rapids/workflows/docs/badge.svg)](https://www.rapids.science/)
[![Build Status](https://travis-ci.com/carissalow/rapids.svg?branch=master)](https://travis-ci.com/carissalow/rapids)
![tests](https://github.com/carissalow/rapids/workflows/tests/badge.svg)
[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg)](code_of_conduct.md)
# RAPIDS
@ -10,3 +11,191 @@
For more information refer to our [documentation](http://www.rapids.science)
By [MoSHI](https://www.moshi.pitt.edu/), [University of Pittsburgh](https://www.pitt.edu/)
## Installation
For RAPIDS installation refer to to the [documentation](https://www.rapids.science/1.8/setup/installation/)
### For the installation of the Docker version
1. Follow the [instructions](https://www.rapids.science/1.8/setup/installation/) to setup RAPIDS via Docker (from scratch).
2. Delete current contents in /rapids/ folder when in a container session.
```
cd ..
rm -rf rapids/{*,.*}
cd rapids
```
3. Clone RAPIDS workspace from Git and checkout a specific branch.
```
git clone "https://repo.ijs.si/junoslukan/rapids.git" .
git checkout <branch_name>
```
4. Install missing “libpq-dev” dependency with bash.
```
apt-get update -y
apt-get install -y libpq-dev
```
5. Restore R venv.
Type R to go to the interactive R session and then:
```
renv::restore()
```
6. Install cr-features module
From: https://repo.ijs.si/matjazbostic/calculatingfeatures.git -> branch master.
Then follow the "cr-features module" section below.
7. Install all required packages from environment.yml, prune also deletes conda packages not present in environment file.
```
conda env update --file environment.yml prune
```
8. If you wish to update your R or Python venvs.
```
R in interactive session:
renv::snapshot()
Python:
conda env export --no-builds | sed 's/^.*libgfortran.*$/ - libgfortran/' | sed 's/^.*mkl=.*$/ - mkl/' > environment.yml
```
### cr-features module
This RAPIDS extension uses cr-features library accessible [here](https://repo.ijs.si/matjazbostic/calculatingfeatures).
To use cr-features library:
- Follow the installation instructions in the [README.md](https://repo.ijs.si/matjazbostic/calculatingfeatures/-/blob/master/README.md).
- Copy built calculatingfeatures folder into the RAPIDS workspace.
- Install the cr-features package by:
```
pip install path/to/the/calculatingfeatures/folder
e.g. pip install ./calculatingfeatures if the folder is copied to main parent directory
cr-features package has to be built and installed everytime to get the newest version.
Or an the newest version of the docker image must be used.
```
## Updating RAPIDS
To update RAPIDS, first pull and merge [origin]( https://github.com/carissalow/rapids), such as with:
```commandline
git fetch --progress "origin" refs/heads/master
git merge --no-ff origin/master
```
Next, update the conda and R virtual environment.
```bash
R -e 'renv::restore(repos = c(CRAN = "https://packagemanager.rstudio.com/all/__linux__/focal/latest"))'
```
## Custom configuration
### Credentials
As mentioned under [Database in RAPIDS documentation](https://www.rapids.science/1.6/snippets/database/), a `credentials.yaml` file is needed to connect to a database.
It should contain:
```yaml
PSQL_STRAW:
database: staw
host: 212.235.208.113
password: password
port: 5432
user: staw_db
```
where`password` needs to be specified as well.
## Possible installation issues
### Missing dependencies for RPostgres
To install `RPostgres` R package (used to connect to the PostgreSQL database), an error might occur:
```text
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because libpq was not found. Try installing:
* deb: libpq-dev (Debian, Ubuntu, etc)
* rpm: postgresql-devel (Fedora, EPEL)
* rpm: postgreql8-devel, psstgresql92-devel, postgresql93-devel, or postgresql94-devel (Amazon Linux)
* csw: postgresql_dev (Solaris)
* brew: libpq (OSX)
If libpq is already installed, check that either:
(i) 'pkg-config' is in your PATH AND PKG_CONFIG_PATH contains a libpq.pc file; or
(ii) 'pg_config' is in your PATH.
If neither can detect , you can set INCLUDE_DIR
and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------[ ERROR MESSAGE ]----------------------------
<stdin>:1:10: fatal error: libpq-fe.h: No such file or directory
compilation terminated.
```
The library requires `libpq` for compiling from source, so install accordingly.
### Timezone environment variable for tidyverse (relevant for WSL2)
One of the R packages, `tidyverse` might need access to the `TZ` environment variable during the installation.
On Ubuntu 20.04 on WSL2 this triggers the following error:
```text
> install.packages('tidyverse')
ERROR: configuration failed for package xml2
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
Warning in system("timedatectl", intern = TRUE) :
running command 'timedatectl' had status 1
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
namespace xml2 1.3.1 is already loaded, but >= 1.3.2 is required
Calls: <Anonymous> ... namespaceImportFrom -> asNamespace -> loadNamespace
Execution halted
ERROR: lazy loading failed for package tidyverse
```
This happens because WSL2 does not use the `timedatectl` service, which provides this variable.
```bash
~$ timedatectl
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
```
and later
```bash
Warning message:
In system("timedatectl", intern = TRUE) :
running command 'timedatectl' had status 1
Execution halted
```
This can be amended by setting the environment variable manually before attempting to install `tidyverse`:
```bash
export TZ='Europe/Ljubljana'
```
Note: if this is needed to avoid runtime issues, you need to either define this environment variable in each new terminal window or (better) define it in your `~/.bashrc` or `~/.bash_profile`.
## Possible runtime issues
### Unix end of line characters
Upon running rapids, an error might occur:
```bash
/usr/bin/env: python3\r: No such file or directory
```
This is due to Windows style end of line characters.
To amend this, I added a `.gitattributes` files to force `git` to checkout `rapids` using Unix EOL characters.
If this still fails, `dos2unix` can be used to change them.
### System has not been booted with systemd as init system (PID 1)
See [the installation issue above](#Timezone-environment-variable-for-tidyverse-(relevant-for-WSL2)).

293
Snakefile
View File

@ -1,8 +1,11 @@
from snakemake.utils import validate
configfile: "config.yaml"
validate(config, "tools/config.schema.yaml")
include: "rules/common.smk"
include: "rules/renv.smk"
include: "rules/preprocessing.smk"
include: "rules/features.smk"
include: "rules/models.smk"
include: "rules/reports.smk"
import itertools
@ -14,10 +17,19 @@ if len(config["PIDS"]) == 0:
for provider in config["PHONE_DATA_YIELD"]["PROVIDERS"].keys():
if config["PHONE_DATA_YIELD"]["PROVIDERS"][provider]["COMPUTE"]:
allowed_phone_sensors = get_phone_sensor_names()
if not (set(config["PHONE_DATA_YIELD"]["SENSORS"]) <= set(allowed_phone_sensors)):
raise ValueError('\nInvalid sensor(s) for PHONE_DATA_YIELD. config["PHONE_DATA_YIELD"]["SENSORS"] can have '
'one or more of the following phone sensors: {}.\nInstead you provided "{}".\n'
'Keep in mind that the sensors\' CONTAINER attribute must point to a valid database table or file'\
.format(', '.join(allowed_phone_sensors),
', '.join(set(config["PHONE_DATA_YIELD"]["SENSORS"]) - set(allowed_phone_sensors))))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=map(str.lower, config["PHONE_DATA_YIELD"]["SENSORS"])))
files_to_compute.extend(expand("data/interim/{pid}/phone_yielded_timestamps.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_yielded_timestamps_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_data_yield_features/phone_data_yield_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_DATA_YIELD"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/interim/{pid}/phone_data_yield_features/phone_data_yield_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_DATA_YIELD"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_data_yield.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
@ -26,7 +38,7 @@ for provider in config["PHONE_MESSAGES"]["PROVIDERS"].keys():
if config["PHONE_MESSAGES"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_messages_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_messages_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_messages_features/phone_messages_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_MESSAGES"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/interim/{pid}/phone_messages_features/phone_messages_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_MESSAGES"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_messages.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
@ -34,9 +46,13 @@ for provider in config["PHONE_MESSAGES"]["PROVIDERS"].keys():
for provider in config["PHONE_CALLS"]["PROVIDERS"].keys():
if config["PHONE_CALLS"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_calls_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_calls_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_calls_with_datetime_unified.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_calls_features/phone_calls_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_CALLS"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
if (provider == "RAPIDS") and (config["PHONE_CALLS"]["PROVIDERS"][provider]["FEATURES_TYPE"] == "EPISODES"):
files_to_compute.extend(expand("data/interim/{pid}/phone_calls_episodes.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_calls_episodes_resampled.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_calls_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
else:
files_to_compute.extend(expand("data/raw/{pid}/phone_calls_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_calls_features/phone_calls_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_CALLS"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_calls.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
@ -45,7 +61,7 @@ for provider in config["PHONE_BLUETOOTH"]["PROVIDERS"].keys():
if config["PHONE_BLUETOOTH"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_bluetooth_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_bluetooth_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_bluetooth_features/phone_bluetooth_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_BLUETOOTH"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/interim/{pid}/phone_bluetooth_features/phone_bluetooth_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_BLUETOOTH"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_bluetooth.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
@ -54,11 +70,10 @@ for provider in config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"].keys():
if config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_activity_recognition_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_activity_recognition_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_activity_recognition_with_datetime_unified.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_episodes.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_episodes_resampled.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_features/phone_activity_recognition_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_features/phone_activity_recognition_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_activity_recognition.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
@ -69,7 +84,7 @@ for provider in config["PHONE_BATTERY"]["PROVIDERS"].keys():
files_to_compute.extend(expand("data/interim/{pid}/phone_battery_episodes.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_battery_episodes_resampled.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_battery_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_battery_features/phone_battery_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_BATTERY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/interim/{pid}/phone_battery_features/phone_battery_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_BATTERY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_battery.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
@ -82,11 +97,10 @@ for provider in config["PHONE_SCREEN"]["PROVIDERS"].keys():
# raise ValueError("Error: Add PHONE_SCREEN (and as many PHONE_SENSORS as you have in your database) to [PHONE_DATA_YIELD][SENSORS] in config.yaml. This is necessary to compute phone_yielded_timestamps (time when the smartphone was sensing data)")
files_to_compute.extend(expand("data/raw/{pid}/phone_screen_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_screen_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_screen_with_datetime_unified.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_screen_episodes.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_screen_episodes_resampled.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_screen_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_screen_features/phone_screen_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_SCREEN"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/interim/{pid}/phone_screen_features/phone_screen_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_SCREEN"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_screen.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
@ -95,7 +109,7 @@ for provider in config["PHONE_LIGHT"]["PROVIDERS"].keys():
if config["PHONE_LIGHT"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_light_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_light_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_light_features/phone_light_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_LIGHT"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/interim/{pid}/phone_light_features/phone_light_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_LIGHT"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_light.csv", pid=config["PIDS"],))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
@ -104,7 +118,7 @@ for provider in config["PHONE_ACCELEROMETER"]["PROVIDERS"].keys():
if config["PHONE_ACCELEROMETER"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_accelerometer_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_accelerometer_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_accelerometer_features/phone_accelerometer_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_ACCELEROMETER"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/interim/{pid}/phone_accelerometer_features/phone_accelerometer_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_ACCELEROMETER"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_accelerometer.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
@ -114,7 +128,11 @@ for provider in config["PHONE_APPLICATIONS_FOREGROUND"]["PROVIDERS"].keys():
files_to_compute.extend(expand("data/raw/{pid}/phone_applications_foreground_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_applications_foreground_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_applications_foreground_with_datetime_with_categories.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_applications_foreground_features/phone_applications_foreground_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_APPLICATIONS_FOREGROUND"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
if config["PHONE_APPLICATIONS_FOREGROUND"]["PROVIDERS"][provider]["INCLUDE_EPISODE_FEATURES"]:
files_to_compute.extend(expand("data/interim/{pid}/phone_app_episodes.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_app_episodes_resampled.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_app_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_applications_foreground_features/phone_applications_foreground_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_APPLICATIONS_FOREGROUND"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_applications_foreground.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
@ -123,7 +141,7 @@ for provider in config["PHONE_WIFI_VISIBLE"]["PROVIDERS"].keys():
if config["PHONE_WIFI_VISIBLE"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_wifi_visible_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_wifi_visible_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_wifi_visible_features/phone_wifi_visible_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_WIFI_VISIBLE"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/interim/{pid}/phone_wifi_visible_features/phone_wifi_visible_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_WIFI_VISIBLE"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_wifi_visible.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
@ -132,7 +150,7 @@ for provider in config["PHONE_WIFI_CONNECTED"]["PROVIDERS"].keys():
if config["PHONE_WIFI_CONNECTED"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_wifi_connected_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_wifi_connected_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_wifi_connected_features/phone_wifi_connected_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_WIFI_CONNECTED"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/interim/{pid}/phone_wifi_connected_features/phone_wifi_connected_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_WIFI_CONNECTED"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_wifi_connected.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
@ -141,34 +159,117 @@ for provider in config["PHONE_CONVERSATION"]["PROVIDERS"].keys():
if config["PHONE_CONVERSATION"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_conversation_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_conversation_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_conversation_with_datetime_unified.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_conversation_features/phone_conversation_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_CONVERSATION"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/interim/{pid}/phone_conversation_features/phone_conversation_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_CONVERSATION"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_conversation.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["PHONE_ESM"]["PROVIDERS"].keys():
if config["PHONE_ESM"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_esm_raw.csv",pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_esm_with_datetime.csv",pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_esm_clean.csv",pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_esm_features/phone_esm_{language}_{provider_key}.csv",pid=config["PIDS"],language=get_script_language(config["PHONE_ESM"]["PROVIDERS"][provider]["SRC_SCRIPT"]),provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_esm.csv", pid=config["PIDS"]))
# files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv",pid=config["PIDS"]))
# files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["PHONE_SPEECH"]["PROVIDERS"].keys():
if config["PHONE_SPEECH"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_speech_raw.csv",pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_speech_with_datetime.csv",pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_speech_features/phone_speech_{language}_{provider_key}.csv",pid=config["PIDS"],language=get_script_language(config["PHONE_SPEECH"]["PROVIDERS"][provider]["SRC_SCRIPT"]),provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_speech.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
# We can delete these if's as soon as we add feature PROVIDERS to any of these sensors
if isinstance(config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"], dict):
for provider in config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"].keys():
if config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_applications_crashes_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_applications_crashes_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_applications_crashes_with_datetime_with_categories.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_applications_crashes_features/phone_applications_crashes_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_applications_crashes.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if isinstance(config["PHONE_APPLICATIONS_NOTIFICATIONS"]["PROVIDERS"], dict):
for provider in config["PHONE_APPLICATIONS_NOTIFICATIONS"]["PROVIDERS"].keys():
if config["PHONE_APPLICATIONS_NOTIFICATIONS"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_applications_notifications_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_applications_notifications_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_applications_notifications_with_datetime_with_categories.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_applications_notifications_features/phone_applications_notifications_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_APPLICATIONS_NOTIFICATIONS"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_applications_notifications.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if isinstance(config["PHONE_KEYBOARD"]["PROVIDERS"], dict):
for provider in config["PHONE_KEYBOARD"]["PROVIDERS"].keys():
if config["PHONE_KEYBOARD"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_keyboard_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_keyboard_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_keyboard_features/phone_keyboard_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_KEYBOARD"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_keyboard.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if isinstance(config["PHONE_LOG"]["PROVIDERS"], dict):
for provider in config["PHONE_LOG"]["PROVIDERS"].keys():
if config["PHONE_LOG"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_log_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_log_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_log_features/phone_log_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_LOG"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_log.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["PHONE_LOCATIONS"]["PROVIDERS"].keys():
if config["PHONE_LOCATIONS"]["PROVIDERS"][provider]["COMPUTE"]:
if config["PHONE_LOCATIONS"]["LOCATIONS_TO_USE"] == "FUSED_RESAMPLED":
if config["PHONE_LOCATIONS"]["LOCATIONS_TO_USE"] in ["FUSED_RESAMPLED","ALL_RESAMPLED"]:
if "PHONE_LOCATIONS" in config["PHONE_DATA_YIELD"]["SENSORS"]:
files_to_compute.extend(expand("data/interim/{pid}/phone_yielded_timestamps.csv", pid=config["PIDS"]))
else:
raise ValueError("Error: Add PHONE_LOCATIONS (and as many PHONE_SENSORS as you have) to [PHONE_DATA_YIELD][SENSORS] in config.yaml. This is necessary to compute phone_yielded_timestamps (time when the smartphone was sensing data) which is used to resample fused location data (RESAMPLED_FUSED)")
raise ValueError("Error: Add PHONE_LOCATIONS (and as many PHONE_SENSORS as you have) to [PHONE_DATA_YIELD][SENSORS] in config.yaml. This is necessary to compute phone_yielded_timestamps (time when the smartphone was sensing data) which is used to resample fused location data (ALL_RESAMPLED and RESAMPLED_FUSED)")
if provider == "BARNETT":
files_to_compute.extend(expand("data/interim/{pid}/phone_locations_barnett_daily.csv", pid=config["PIDS"]))
if provider == "DORYAB":
files_to_compute.extend(expand("data/interim/{pid}/phone_locations_processed_with_datetime_with_doryab_columns_episodes.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_locations_processed_with_datetime_with_doryab_columns_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_locations_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_locations_processed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_locations_processed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_locations_features/phone_locations_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["PHONE_LOCATIONS"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/interim/{pid}/phone_locations_features/phone_locations_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_LOCATIONS"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_locations.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["FITBIT_CALORIES_INTRADAY"]["PROVIDERS"].keys():
if config["FITBIT_CALORIES_INTRADAY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_calories_intraday_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_calories_intraday_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_calories_intraday_features/fitbit_calories_intraday_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["FITBIT_CALORIES_INTRADAY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_calories_intraday.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["FITBIT_DATA_YIELD"]["PROVIDERS"].keys():
if config["FITBIT_DATA_YIELD"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_intraday_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_intraday_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_data_yield.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["FITBIT_HEARTRATE_SUMMARY"]["PROVIDERS"].keys():
if config["FITBIT_HEARTRATE_SUMMARY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_summary_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_summary_parsed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_summary_parsed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_heartrate_summary_features/fitbit_heartrate_summary_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["FITBIT_HEARTRATE_SUMMARY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_summary_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_heartrate_summary_features/fitbit_heartrate_summary_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["FITBIT_HEARTRATE_SUMMARY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_heartrate_summary.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
@ -176,9 +277,8 @@ for provider in config["FITBIT_HEARTRATE_SUMMARY"]["PROVIDERS"].keys():
for provider in config["FITBIT_HEARTRATE_INTRADAY"]["PROVIDERS"].keys():
if config["FITBIT_HEARTRATE_INTRADAY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_intraday_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_intraday_parsed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_intraday_parsed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_heartrate_intraday_features/fitbit_heartrate_intraday_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["FITBIT_HEARTRATE_INTRADAY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_intraday_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_heartrate_intraday_features/fitbit_heartrate_intraday_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["FITBIT_HEARTRATE_INTRADAY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_heartrate_intraday.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
@ -186,46 +286,113 @@ for provider in config["FITBIT_HEARTRATE_INTRADAY"]["PROVIDERS"].keys():
for provider in config["FITBIT_SLEEP_SUMMARY"]["PROVIDERS"].keys():
if config["FITBIT_SLEEP_SUMMARY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_summary_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_summary_parsed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_summary_parsed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_sleep_summary_features/fitbit_sleep_summary_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["FITBIT_SLEEP_SUMMARY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_summary_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_sleep_summary_features/fitbit_sleep_summary_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["FITBIT_SLEEP_SUMMARY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_sleep_summary.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
# for provider in config["FITBIT_SLEEP_INTRADAY"]["PROVIDERS"].keys():
# if config["FITBIT_SLEEP_INTRADAY"]["PROVIDERS"][provider]["COMPUTE"]:
# files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_intraday_raw.csv", pid=config["PIDS"]))
# files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_intraday_parsed.csv", pid=config["PIDS"]))
# files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_intraday_parsed_with_datetime.csv", pid=config["PIDS"]))
for provider in config["FITBIT_SLEEP_INTRADAY"]["PROVIDERS"].keys():
if config["FITBIT_SLEEP_INTRADAY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_intraday_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_intraday_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_sleep_intraday_episodes.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_sleep_intraday_episodes_resampled.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_sleep_intraday_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_sleep_intraday_features/fitbit_sleep_intraday_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["FITBIT_SLEEP_INTRADAY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_sleep_intraday.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["FITBIT_STEPS_SUMMARY"]["PROVIDERS"].keys():
if config["FITBIT_STEPS_SUMMARY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_summary_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_summary_parsed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_summary_parsed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_steps_summary_features/fitbit_steps_summary_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["FITBIT_STEPS_SUMMARY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_summary_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_steps_summary_features/fitbit_steps_summary_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["FITBIT_STEPS_SUMMARY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_steps_summary.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["FITBIT_STEPS_INTRADAY"]["PROVIDERS"].keys():
if config["FITBIT_STEPS_INTRADAY"]["PROVIDERS"][provider]["COMPUTE"]:
if config["FITBIT_STEPS_INTRADAY"]["EXCLUDE_SLEEP"]["TIME_BASED"]["EXCLUDE"] or config["FITBIT_STEPS_INTRADAY"]["EXCLUDE_SLEEP"]["FITBIT_BASED"]["EXCLUDE"]:
if config["FITBIT_STEPS_INTRADAY"]["EXCLUDE_SLEEP"]["FITBIT_BASED"]["EXCLUDE"]:
files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_summary_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_steps_intraday_with_datetime_exclude_sleep.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_intraday_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_intraday_parsed.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_intraday_parsed_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_steps_intraday_features/fitbit_steps_intraday_{language}_{provider_key}.csv", pid=config["PIDS"], language=config["FITBIT_STEPS_INTRADAY"]["PROVIDERS"][provider]["SRC_LANGUAGE"].lower(), provider_key=provider.lower()))
files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_intraday_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/fitbit_steps_intraday_features/fitbit_steps_intraday_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["FITBIT_STEPS_INTRADAY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_steps_intraday.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
# for provider in config["FITBIT_CALORIES"]["PROVIDERS"].keys():
# if config["FITBIT_CALORIES"]["PROVIDERS"][provider]["COMPUTE"]:
# files_to_compute.extend(expand("data/raw/{pid}/fitbit_calories_{fitbit_data_type}_raw.csv", pid=config["PIDS"], fitbit_data_type=(["json"] if config["FITBIT_CALORIES"]["TABLE_FORMAT"] == "JSON" else ["summary", "intraday"])))
# files_to_compute.extend(expand("data/raw/{pid}/fitbit_calories_{fitbit_data_type}_parsed.csv", pid=config["PIDS"], fitbit_data_type=["summary", "intraday"]))
# files_to_compute.extend(expand("data/raw/{pid}/fitbit_calories_{fitbit_data_type}_parsed_with_datetime.csv", pid=config["PIDS"], fitbit_data_type=["summary", "intraday"]))
# files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
# files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["EMPATICA_ACCELEROMETER"]["PROVIDERS"].keys():
if config["EMPATICA_ACCELEROMETER"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/empatica_accelerometer_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/empatica_accelerometer_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/empatica_accelerometer_features/empatica_accelerometer_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["EMPATICA_ACCELEROMETER"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/empatica_accelerometer.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["EMPATICA_HEARTRATE"]["PROVIDERS"].keys():
if config["EMPATICA_HEARTRATE"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/empatica_heartrate_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/empatica_heartrate_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/empatica_heartrate_features/empatica_heartrate_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["EMPATICA_HEARTRATE"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/empatica_heartrate.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["EMPATICA_TEMPERATURE"]["PROVIDERS"].keys():
if config["EMPATICA_TEMPERATURE"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/empatica_temperature_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/empatica_temperature_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/empatica_temperature_features/empatica_temperature_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["EMPATICA_TEMPERATURE"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/empatica_temperature.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["EMPATICA_ELECTRODERMAL_ACTIVITY"]["PROVIDERS"].keys():
if config["EMPATICA_ELECTRODERMAL_ACTIVITY"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/empatica_electrodermal_activity_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/empatica_electrodermal_activity_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/empatica_electrodermal_activity_features/empatica_electrodermal_activity_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["EMPATICA_ELECTRODERMAL_ACTIVITY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/empatica_electrodermal_activity.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["EMPATICA_BLOOD_VOLUME_PULSE"]["PROVIDERS"].keys():
if config["EMPATICA_BLOOD_VOLUME_PULSE"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/empatica_blood_volume_pulse_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/empatica_blood_volume_pulse_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/empatica_blood_volume_pulse_features/empatica_blood_volume_pulse_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["EMPATICA_BLOOD_VOLUME_PULSE"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/empatica_blood_volume_pulse.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["EMPATICA_INTER_BEAT_INTERVAL"]["PROVIDERS"].keys():
if config["EMPATICA_INTER_BEAT_INTERVAL"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/empatica_inter_beat_interval_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/empatica_inter_beat_interval_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/empatica_inter_beat_interval_features/empatica_inter_beat_interval_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["EMPATICA_INTER_BEAT_INTERVAL"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/empatica_inter_beat_interval.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
if isinstance(config["EMPATICA_TAGS"]["PROVIDERS"], dict):
for provider in config["EMPATICA_TAGS"]["PROVIDERS"].keys():
if config["EMPATICA_TAGS"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/empatica_tags_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/empatica_tags_with_datetime.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/empatica_tags_features/empatica_tags_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["EMPATICA_TAGS"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/empatica_tags.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
# Visualization for Data Exploration
if config["HISTOGRAM_PHONE_DATA_YIELD"]["PLOT"]:
@ -240,11 +407,41 @@ if config["HEATMAP_SENSOR_ROW_COUNT_PER_TIME_SEGMENT"]["PLOT"]:
files_to_compute.append("reports/data_exploration/heatmap_sensor_row_count_per_time_segment.html")
if config["HEATMAP_PHONE_DATA_YIELD_PER_PARTICIPANT_PER_TIME_SEGMENT"]["PLOT"]:
if not config["PHONE_DATA_YIELD"]["PROVIDERS"]["RAPIDS"]["COMPUTE"]:
raise ValueError("Error: [PHONE_DATA_YIELD][PROVIDERS][RAPIDS][COMPUTE] must be True in config.yaml to get heatmaps of overall data yield.")
files_to_compute.append("reports/data_exploration/heatmap_phone_data_yield_per_participant_per_time_segment.html")
if config["HEATMAP_FEATURE_CORRELATION_MATRIX"]["PLOT"]:
files_to_compute.append("reports/data_exploration/heatmap_feature_correlation_matrix.html")
# Data Cleaning
for provider in config["ALL_CLEANING_INDIVIDUAL"]["PROVIDERS"].keys():
if config["ALL_CLEANING_INDIVIDUAL"]["PROVIDERS"][provider]["COMPUTE"]:
if provider == "STRAW":
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features_cleaned_" + provider.lower() + "_py.csv", pid=config["PIDS"]))
else:
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features_cleaned_" + provider.lower() + "_R.csv", pid=config["PIDS"]))
for provider in config["ALL_CLEANING_OVERALL"]["PROVIDERS"].keys():
if config["ALL_CLEANING_OVERALL"]["PROVIDERS"][provider]["COMPUTE"]:
if provider == "STRAW":
for target in config["PARAMS_FOR_ANALYSIS"]["TARGET"]["ALL_LABELS"]:
files_to_compute.extend(expand("data/processed/features/all_participants/all_sensor_features_cleaned_" + provider.lower() +"_py_(" + target + ").csv"))
else:
files_to_compute.extend(expand("data/processed/features/all_participants/all_sensor_features_cleaned_" + provider.lower() +"_R.csv"))
# Baseline features
if config["PARAMS_FOR_ANALYSIS"]["BASELINE"]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/baseline_merged.csv"))
files_to_compute.extend(expand("data/raw/{pid}/participant_baseline_raw.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/baseline_questionnaires.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/baseline_features.csv", pid=config["PIDS"]))
# Targets (labels)
if config["PARAMS_FOR_ANALYSIS"]["TARGET"]["COMPUTE"]:
files_to_compute.extend(expand("data/processed/models/individual_model/{pid}/input.csv", pid=config["PIDS"]))
for target in config["PARAMS_FOR_ANALYSIS"]["TARGET"]["ALL_LABELS"]:
files_to_compute.extend(expand("data/processed/models/population_model/input_" + target + ".csv"))
rule all:
input:

57
automl_test.py 100644
View File

@ -0,0 +1,57 @@
from pprint import pprint
import sklearn.metrics
import autosklearn.regression
import datetime
import importlib
import os
import sys
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import yaml
from sklearn import linear_model, svm, kernel_ridge, gaussian_process
from sklearn.model_selection import LeaveOneGroupOut, cross_val_score, train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.impute import SimpleImputer
model_input = pd.read_csv("data/processed/models/population_model/input_PANAS_negative_affect_mean.csv") # Standardizirani podatki
model_input.dropna(axis=1, how="all", inplace=True)
model_input.dropna(axis=0, how="any", subset=["target"], inplace=True)
categorical_feature_colnames = ["gender", "startlanguage"]
categorical_feature_colnames += [col for col in model_input.columns if "mostcommonactivity" in col or "homelabel" in col]
categorical_features = model_input[categorical_feature_colnames].copy()
mode_categorical_features = categorical_features.mode().iloc[0]
categorical_features = categorical_features.fillna(mode_categorical_features)
categorical_features = categorical_features.apply(lambda col: col.astype("category"))
if not categorical_features.empty:
categorical_features = pd.get_dummies(categorical_features)
numerical_features = model_input.drop(categorical_feature_colnames, axis=1)
model_in = pd.concat([numerical_features, categorical_features], axis=1)
index_columns = ["local_segment", "local_segment_label", "local_segment_start_datetime", "local_segment_end_datetime"]
model_in.set_index(index_columns, inplace=True)
X_train, X_test, y_train, y_test = train_test_split(model_in.drop(["target", "pid"], axis=1), model_in["target"], test_size=0.30)
automl = autosklearn.regression.AutoSklearnRegressor(
time_left_for_this_task=7200,
per_run_time_limit=120
)
automl.fit(X_train, y_train, dataset_name='straw')
print(automl.leaderboard())
pprint(automl.show_models(), indent=4)
train_predictions = automl.predict(X_train)
print("Train R2 score:", sklearn.metrics.r2_score(y_train, train_predictions))
test_predictions = automl.predict(X_test)
print("Test R2 score:", sklearn.metrics.r2_score(y_test, test_predictions))
import sys
sys.exit()

View File

@ -1,134 +1,183 @@
# See https://www.rapids.science/latest/setup/configuration/#database-credentials
DATABASE_GROUP: &database_group
MY_GROUP
# See https://www.rapids.science/latest/setup/configuration/#timezone-of-your-study
TIMEZONE: &timezone
America/New_York
########################################################################################################################
# GLOBAL CONFIGURATION #
########################################################################################################################
# See https://www.rapids.science/latest/setup/configuration/#participant-files
PIDS: [test01]
PIDS: ['p031', 'p032', 'p033', 'p034', 'p035', 'p036', 'p037', 'p038', 'p039', 'p040', 'p042', 'p043', 'p044', 'p045', 'p046', 'p049', 'p050', 'p052', 'p053', 'p054', 'p055', 'p057', 'p058', 'p059', 'p060', 'p061', 'p062', 'p064', 'p067', 'p068', 'p069', 'p070', 'p071', 'p072', 'p073', 'p074', 'p075', 'p076', 'p077', 'p078', 'p079', 'p080', 'p081', 'p082', 'p083', 'p084', 'p085', 'p086', 'p088', 'p089', 'p090', 'p091', 'p092', 'p093', 'p106', 'p107']
# See https://www.rapids.science/latest/setup/configuration/#automatic-creation-of-participant-files
CREATE_PARTICIPANT_FILES:
SOURCE:
TYPE: AWARE_DEVICE_TABLE #AWARE_DEVICE_TABLE or CSV_FILE
DATABASE_GROUP: *database_group
CSV_FILE_PATH: "data/external/example_participants.csv" # see docs for required format
TIMEZONE: *timezone
USERNAMES_CSV: "data/external/main_study_usernames.csv"
CSV_FILE_PATH: "data/external/main_study_participants.csv" # see docs for required format
PHONE_SECTION:
ADD: TRUE
DEVICE_ID_COLUMN: device_id # column name
ADD: True
IGNORED_DEVICE_IDS: []
FITBIT_SECTION:
ADD: TRUE
DEVICE_ID_COLUMN: device_id # column name
ADD: False
IGNORED_DEVICE_IDS: []
EMPATICA_SECTION:
ADD: True
IGNORED_DEVICE_IDS: []
# See https://www.rapids.science/latest/setup/configuration/#time-segments
TIME_SEGMENTS: &time_segments
TYPE: PERIODIC # FREQUENCY, PERIODIC, EVENT
FILE: "data/external/timesegments_periodic.csv"
INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE # Only relevant if TYPE=PERIODIC, see docs
TYPE: EVENT # FREQUENCY, PERIODIC, EVENT
FILE: "data/external/straw_events.csv"
INCLUDE_PAST_PERIODIC_SEGMENTS: TRUE # Only relevant if TYPE=PERIODIC, see docs
TAILORED_EVENTS: # Only relevant if TYPE=EVENT
COMPUTE: True
SEGMENTING_METHOD: "30_before" # 30_before, 90_before, stress_event
INTERVAL_OF_INTEREST: 10 # duration of event of interest [minutes]
IOI_ERROR_TOLERANCE: 5 # interval of interest erorr tolerance (before and after IOI) [minutes]
# See https://www.rapids.science/latest/setup/configuration/#timezone-of-your-study
TIMEZONE:
TYPE: MULTIPLE
SINGLE:
TZCODE: Europe/Ljubljana
MULTIPLE:
TZ_FILE: data/external/timezone.csv
TZCODES_FILE: data/external/multiple_timezones.csv
IF_MISSING_TZCODE: USE_DEFAULT
DEFAULT_TZCODE: Europe/Ljubljana
FITBIT:
ALLOW_MULTIPLE_TZ_PER_DEVICE: False
INFER_FROM_SMARTPHONE_TZ: False
########################################################################################################################
# PHONE #
########################################################################################################################
# See https://www.rapids.science/latest/setup/configuration/#device-data-source-configuration
PHONE_DATA_CONFIGURATION:
SOURCE:
TYPE: DATABASE
DATABASE_GROUP: *database_group
DEVICE_ID_COLUMN: device_id # column name
TIMEZONE:
TYPE: SINGLE
VALUE: *timezone
# See https://www.rapids.science/latest/setup/configuration/#data-stream-configuration
PHONE_DATA_STREAMS:
USE: aware_postgresql
# AVAILABLE:
aware_mysql:
DATABASE_GROUP: MY_GROUP
aware_postgresql:
DATABASE_GROUP: PSQL_STRAW
aware_csv:
FOLDER: data/external/aware_csv
aware_influxdb:
DATABASE_GROUP: MY_GROUP
# Sensors ------
# https://www.rapids.science/latest/features/phone-accelerometer/
PHONE_ACCELEROMETER:
TABLE: accelerometer
CONTAINER: accelerometer
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
SRC_FOLDER: "rapids" # inside src/features/phone_accelerometer
SRC_LANGUAGE: "python"
SRC_SCRIPT: src/features/phone_accelerometer/rapids/main.py
PANDA:
COMPUTE: False
VALID_SENSED_MINUTES: False
FEATURES:
exertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
nonexertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
SRC_FOLDER: "panda" # inside src/features/phone_accelerometer
SRC_LANGUAGE: "python"
SRC_SCRIPT: src/features/phone_accelerometer/panda/main.py
# See https://www.rapids.science/latest/features/phone-activity-recognition/
PHONE_ACTIVITY_RECOGNITION:
TABLE:
ANDROID: plugin_google_activity_recognition
CONTAINER:
ANDROID: google_ar
IOS: plugin_ios_activity_recognition
EPISODE_THRESHOLD_BETWEEN_ROWS: 5 # minutes. Max time difference for two consecutive rows to be considered within the same battery episode.
EPISODE_THRESHOLD_BETWEEN_ROWS: 5 # minutes. Max time difference for two consecutive rows to be considered within the same AR episode.
PROVIDERS:
RAPIDS:
COMPUTE: False
COMPUTE: True
FEATURES: ["count", "mostcommonactivity", "countuniqueactivities", "durationstationary", "durationmobile", "durationvehicle"]
ACTIVITY_CLASSES:
STATIONARY: ["still", "tilting"]
MOBILE: ["on_foot", "walking", "running", "on_bicycle"]
VEHICLE: ["in_vehicle"]
SRC_FOLDER: "rapids" # inside src/features/phone_activity_recognition
SRC_LANGUAGE: "python"
SRC_SCRIPT: src/features/phone_activity_recognition/rapids/main.py
# See https://www.rapids.science/latest/features/phone-applications-crashes/
PHONE_APPLICATIONS_CRASHES:
CONTAINER: applications_crashes
APPLICATION_CATEGORIES:
CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
CATALOGUE_FILE: "data/external/play_store_application_genre_catalogue.csv"
UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
SCRAPE_MISSING_CATEGORIES: False # whether to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
PROVIDERS: # None implemented yet but this sensor can be used in PHONE_DATA_YIELD
# See https://www.rapids.science/latest/features/phone-applications-foreground/
PHONE_APPLICATIONS_FOREGROUND:
TABLE: applications_foreground
CONTAINER: applications
APPLICATION_CATEGORIES:
CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
CATALOGUE_FILE: "data/external/play_store_application_genre_catalogue.csv"
# Refer to data/external/play_store_categories_count.csv for a list of categories (genres) and their frequency.
UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
SCRAPE_MISSING_CATEGORIES: False # whether to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
PROVIDERS:
RAPIDS:
COMPUTE: True
INCLUDE_EPISODE_FEATURES: True
SINGLE_CATEGORIES: ["Productivity", "Tools", "Communication", "Education", "Social"]
MULTIPLE_CATEGORIES:
games: ["Puzzle", "Card", "Casual", "Board", "Strategy", "Trivia", "Word", "Adventure", "Role Playing", "Simulation", "Board, Brain Games", "Racing"]
social: ["Communication", "Social", "Dating"]
productivity: ["Tools", "Productivity", "Finance", "Education", "News & Magazines", "Business", "Books & Reference"]
health: ["Health & Fitness", "Lifestyle", "Food & Drink", "Sports", "Medical", "Parenting"]
entertainment: ["Shopping", "Music & Audio", "Entertainment", "Travel & Local", "Photography", "Video Players & Editors", "Personalization", "House & Home", "Art & Design", "Auto & Vehicles", "Entertainment,Music & Video",
"Puzzle", "Card", "Casual", "Board", "Strategy", "Trivia", "Word", "Adventure", "Role Playing", "Simulation", "Board, Brain Games", "Racing" # Add all games.
]
maps_weather: ["Maps & Navigation", "Weather"]
CUSTOM_CATEGORIES:
SINGLE_APPS: []
EXCLUDED_CATEGORIES: ["System", "STRAW"]
# Note: A special option here is "is_system_app".
# This excludes applications that have is_system_app = TRUE, which is a separate column in the table.
# However, all of these applications have been assigned System category.
# I will therefore filter by that category, which is a superset and is more complete. JL
EXCLUDED_APPS: []
FEATURES:
APP_EVENTS: ["countevent", "timeoffirstuse", "timeoflastuse", "frequencyentropy"]
APP_EPISODES: ["countepisode", "minduration", "maxduration", "meanduration", "sumduration"]
IGNORE_EPISODES_SHORTER_THAN: 0 # in minutes, set to 0 to disable
IGNORE_EPISODES_LONGER_THAN: 300 # in minutes, set to 0 to disable
SRC_SCRIPT: src/features/phone_applications_foreground/rapids/main.py
# See https://www.rapids.science/latest/features/phone-applications-notifications/
PHONE_APPLICATIONS_NOTIFICATIONS:
CONTAINER: notifications
APPLICATION_CATEGORIES:
CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
CATALOGUE_FILE: "data/external/stachl_application_genre_catalogue.csv"
UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether or not to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
SCRAPE_MISSING_CATEGORIES: False # whether or not to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
PROVIDERS:
RAPIDS:
COMPUTE: False
SINGLE_CATEGORIES: ["all", "email"]
MULTIPLE_CATEGORIES:
social: ["socialnetworks", "socialmediatools"]
entertainment: ["entertainment", "gamingknowledge", "gamingcasual", "gamingadventure", "gamingstrategy", "gamingtoolscommunity", "gamingroleplaying", "gamingaction", "gaminglogic", "gamingsports", "gamingsimulation"]
SINGLE_APPS: ["top1global", "com.facebook.moments", "com.google.android.youtube", "com.twitter.android"] # There's no entropy for single apps
EXCLUDED_CATEGORIES: []
EXCLUDED_APPS: ["com.fitbit.FitbitMobile", "com.aware.plugin.upmc.cancer"]
FEATURES: ["count", "timeoffirstuse", "timeoflastuse", "frequencyentropy"]
SRC_FOLDER: "rapids" # inside src/features/phone_applications_foreground
SRC_LANGUAGE: "python"
PROVIDERS: # None implemented yet but this sensor can be used in PHONE_DATA_YIELD
# See https://www.rapids.science/latest/features/phone-battery/
PHONE_BATTERY:
TABLE: battery
CONTAINER: battery
EPISODE_THRESHOLD_BETWEEN_ROWS: 30 # minutes. Max time difference for two consecutive rows to be considered within the same battery episode.
PROVIDERS:
RAPIDS:
COMPUTE: False
COMPUTE: True
FEATURES: ["countdischarge", "sumdurationdischarge", "countcharge", "sumdurationcharge", "avgconsumptionrate", "maxconsumptionrate"]
SRC_FOLDER: "rapids" # inside src/features/phone_battery
SRC_LANGUAGE: "python"
SRC_SCRIPT: src/features/phone_battery/rapids/main.py
# See https://www.rapids.science/latest/features/phone-bluetooth/
PHONE_BLUETOOTH:
TABLE: bluetooth
CONTAINER: bluetooth
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
SRC_FOLDER: "rapids" # inside src/features/phone_bluetooth
SRC_LANGUAGE: "r"
SRC_SCRIPT: src/features/phone_bluetooth/rapids/main.R
DORYAB:
COMPUTE: FALSE
COMPUTE: True
FEATURES:
ALL:
DEVICES: ["countscans", "uniquedevices", "meanscans", "stdscans"]
@ -142,26 +191,25 @@ PHONE_BLUETOOTH:
DEVICES: ["countscans", "uniquedevices", "meanscans", "stdscans"]
SCANS_MOST_FREQUENT_DEVICE: ["withinsegments", "acrosssegments", "acrossdataset"]
SCANS_LEAST_FREQUENT_DEVICE: ["withinsegments", "acrosssegments", "acrossdataset"]
SRC_FOLDER: "doryab" # inside src/features/phone_bluetooth
SRC_LANGUAGE: "python"
SRC_SCRIPT: src/features/phone_bluetooth/doryab/main.py
# See https://www.rapids.science/latest/features/phone-calls/
PHONE_CALLS:
TABLE: calls
CONTAINER: call
PROVIDERS:
RAPIDS:
COMPUTE: False
COMPUTE: True
FEATURES_TYPE: EPISODES # EVENTS or EPISODES
CALL_TYPES: [missed, incoming, outgoing]
FEATURES:
missed: [count, distinctcontacts, timefirstcall, timelastcall, countmostfrequentcontact]
incoming: [count, distinctcontacts, meanduration, sumduration, minduration, maxduration, stdduration, modeduration, entropyduration, timefirstcall, timelastcall, countmostfrequentcontact]
outgoing: [count, distinctcontacts, meanduration, sumduration, minduration, maxduration, stdduration, modeduration, entropyduration, timefirstcall, timelastcall, countmostfrequentcontact]
SRC_LANGUAGE: "r"
SRC_FOLDER: "rapids" # inside src/features/phone_calls
SRC_SCRIPT: src/features/phone_calls/rapids/main.R
# See https://www.rapids.science/latest/features/phone-conversation/
PHONE_CONVERSATION:
TABLE:
PHONE_CONVERSATION: # TODO Adapt for speech
CONTAINER:
ANDROID: plugin_studentlife_audio_android
IOS: plugin_studentlife_audio
PROVIDERS:
@ -175,104 +223,146 @@ PHONE_CONVERSATION:
"unknownexpectedfraction","countconversation"]
RECORDING_MINUTES: 1
PAUSED_MINUTES : 3
SRC_FOLDER: "rapids" # inside src/features/phone_conversation
SRC_LANGUAGE: "python"
SRC_SCRIPT: src/features/phone_conversation/rapids/main.py
# See https://www.rapids.science/latest/features/phone-data-yield/
PHONE_DATA_YIELD:
SENSORS: []
SENSORS: [#PHONE_ACCELEROMETER,
PHONE_ACTIVITY_RECOGNITION,
PHONE_APPLICATIONS_FOREGROUND,
PHONE_APPLICATIONS_NOTIFICATIONS,
PHONE_BATTERY,
PHONE_BLUETOOTH,
PHONE_CALLS,
PHONE_LIGHT,
PHONE_LOCATIONS,
PHONE_MESSAGES,
PHONE_SCREEN,
PHONE_WIFI_VISIBLE]
PROVIDERS:
RAPIDS:
COMPUTE: True
FEATURES: [ratiovalidyieldedminutes, ratiovalidyieldedhours]
MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS: 0.5 # 0 to 1, minimum percentage of valid minutes in an hour to be considered valid.
SRC_SCRIPT: src/features/phone_data_yield/rapids/main.R
PHONE_ESM:
CONTAINER: esm
PROVIDERS:
STRAW:
COMPUTE: True
SCALES: ["PANAS_positive_affect", "PANAS_negative_affect", "JCQ_job_demand", "JCQ_job_control", "JCQ_supervisor_support", "JCQ_coworker_support",
"appraisal_stressfulness_period", "appraisal_stressfulness_event", "appraisal_threat", "appraisal_challenge"]
FEATURES: [mean]
SRC_SCRIPT: src/features/phone_esm/straw/main.py
# See https://www.rapids.science/latest/features/phone-keyboard/
PHONE_KEYBOARD:
CONTAINER: keyboard
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: [ratiovalidyieldedminutes, ratiovalidyieldedhours]
MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS: 0.5 # 0 to 1 representing the number of minutes with at least
SRC_LANGUAGE: "r"
SRC_FOLDER: "rapids" # inside src/features/phone_data_yield
FEATURES: ["sessioncount","averageinterkeydelay","averagesessionlength","changeintextlengthlessthanminusone","changeintextlengthequaltominusone","changeintextlengthequaltoone","changeintextlengthmorethanone","maxtextlength","lastmessagelength","totalkeyboardtouches"]
SRC_SCRIPT: src/features/phone_keyboard/rapids/main.py
# See https://www.rapids.science/latest/features/phone-light/
PHONE_LIGHT:
TABLE: light
CONTAINER: light_sensor
PROVIDERS:
RAPIDS:
COMPUTE: False
COMPUTE: True
FEATURES: ["count", "maxlux", "minlux", "avglux", "medianlux", "stdlux"]
SRC_FOLDER: "rapids" # inside src/features/phone_light
SRC_LANGUAGE: "python"
SRC_SCRIPT: src/features/phone_light/rapids/main.py
# See https://www.rapids.science/latest/features/phone-locations/
PHONE_LOCATIONS:
TABLE: locations
LOCATIONS_TO_USE: FUSED_RESAMPLED # ALL, GPS OR FUSED_RESAMPLED
CONTAINER: locations
LOCATIONS_TO_USE: ALL_RESAMPLED # ALL, GPS, ALL_RESAMPLED, OR FUSED_RESAMPLED
FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD: 30 # minutes, only replicate location samples to the next sensed bin if the phone did not stop collecting data for more than this threshold
FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION: 720 # minutes, only replicate location samples to consecutive sensed bins if they were logged within this threshold after a valid location row
ACCURACY_LIMIT: 100 # meters, drops location coordinates with an accuracy equal or higher than this. This number means there's a 68% probability the true location is within this radius
PROVIDERS:
DORYAB:
COMPUTE: False
FEATURES: ["locationvariance","loglocationvariance","totaldistance","averagespeed","varspeed","circadianmovement","numberofsignificantplaces","numberlocationtransitions","radiusgyration","timeattop1location","timeattop2location","timeattop3location","movingtostaticratio","outlierstimepercent","maxlengthstayatclusters","minlengthstayatclusters","meanlengthstayatclusters","stdlengthstayatclusters","locationentropy","normalizedlocationentropy"]
DBSCAN_EPS: 10 # meters
COMPUTE: True
FEATURES: ["locationvariance","loglocationvariance","totaldistance","avgspeed","varspeed", "numberofsignificantplaces","numberlocationtransitions","radiusgyration","timeattop1location","timeattop2location","timeattop3location","movingtostaticratio","outlierstimepercent","maxlengthstayatclusters","minlengthstayatclusters","avglengthstayatclusters","stdlengthstayatclusters","locationentropy","normalizedlocationentropy","timeathome", "homelabel"]
DBSCAN_EPS: 100 # meters
DBSCAN_MINSAMPLES: 5
THRESHOLD_STATIC : 1 # km/h
MAXIMUM_GAP_ALLOWED: 300
MAXIMUM_ROW_GAP: 300 # seconds
MINUTES_DATA_USED: False
SAMPLING_FREQUENCY: 0
SRC_FOLDER: "doryab" # inside src/features/phone_locations
SRC_LANGUAGE: "python"
CLUSTER_ON: PARTICIPANT_DATASET # PARTICIPANT_DATASET, TIME_SEGMENT, TIME_SEGMENT_INSTANCE
INFER_HOME_LOCATION_STRATEGY: DORYAB_STRATEGY # DORYAB_STRATEGY, SUN_LI_VEGA_STRATEGY
MINIMUM_DAYS_TO_DETECT_HOME_CHANGES: 3
CLUSTERING_ALGORITHM: DBSCAN # DBSCAN, OPTICS
RADIUS_FOR_HOME: 100
SRC_SCRIPT: src/features/phone_locations/doryab/main.py
BARNETT:
COMPUTE: False
COMPUTE: True
FEATURES: ["hometime","disttravelled","rog","maxdiam","maxhomedist","siglocsvisited","avgflightlen","stdflightlen","avgflightdur","stdflightdur","probpause","siglocentropy","circdnrtn","wkenddayrtn"]
ACCURACY_LIMIT: 51 # meters, drops location coordinates with an accuracy higher than this. This number means there's a 68% probability the true location is within this radius
TIMEZONE: *timezone
IF_MULTIPLE_TIMEZONES: USE_MOST_COMMON
MINUTES_DATA_USED: False # Use this for quality control purposes, how many minutes of data (location coordinates gruped by minute) were used to compute features
SRC_FOLDER: "barnett" # inside src/features/phone_locations
SRC_LANGUAGE: "r"
SRC_SCRIPT: src/features/phone_locations/barnett/main.R
# See https://www.rapids.science/latest/features/phone-log/
PHONE_LOG:
CONTAINER:
ANDROID: aware_log
IOS: ios_aware_log
PROVIDERS: # None implemented yet but this sensor can be used in PHONE_DATA_YIELD
# See https://www.rapids.science/latest/features/phone-messages/
PHONE_MESSAGES:
TABLE: messages
CONTAINER: sms
PROVIDERS:
RAPIDS:
COMPUTE: False
COMPUTE: True
MESSAGES_TYPES : [received, sent]
FEATURES:
received: [count, distinctcontacts, timefirstmessage, timelastmessage, countmostfrequentcontact]
sent: [count, distinctcontacts, timefirstmessage, timelastmessage, countmostfrequentcontact]
SRC_LANGUAGE: "r"
SRC_FOLDER: "rapids" # inside src/features/phone_messages
SRC_SCRIPT: src/features/phone_messages/rapids/main.R
# See https://www.rapids.science/latest/features/phone-screen/
PHONE_SCREEN:
TABLE: screen
CONTAINER: screen
PROVIDERS:
RAPIDS:
COMPUTE: False
COMPUTE: True
REFERENCE_HOUR_FIRST_USE: 0
IGNORE_EPISODES_SHORTER_THAN: 0 # in minutes, set to 0 to disable
IGNORE_EPISODES_LONGER_THAN: 0 # in minutes, set to 0 to disable
IGNORE_EPISODES_LONGER_THAN: 360 # in minutes, set to 0 to disable
FEATURES: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration", "firstuseafter"] # "episodepersensedminutes" needs to be added later
EPISODE_TYPES: ["unlock"]
SRC_FOLDER: "rapids" # inside src/features/phone_screen
SRC_LANGUAGE: "python"
SRC_SCRIPT: src/features/phone_screen/rapids/main.py
# Custom added sensor
PHONE_SPEECH:
CONTAINER: speech
PROVIDERS:
STRAW:
COMPUTE: True
FEATURES: ["meanspeech", "stdspeech", "nlargest", "nsmallest", "medianspeech"]
SRC_SCRIPT: src/features/phone_speech/straw/main.py
# See https://www.rapids.science/latest/features/phone-wifi-connected/
PHONE_WIFI_CONNECTED:
TABLE: "sensor_wifi"
CONTAINER: sensor_wifi
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
SRC_FOLDER: "rapids" # inside src/features/phone_wifi_connected
SRC_LANGUAGE: "r"
SRC_SCRIPT: src/features/phone_wifi_connected/rapids/main.R
# See https://www.rapids.science/latest/features/phone-wifi-visible/
PHONE_WIFI_VISIBLE:
TABLE: "wifi"
CONTAINER: wifi
PROVIDERS:
RAPIDS:
COMPUTE: False
COMPUTE: True
FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
SRC_FOLDER: "rapids" # inside src/features/phone_wifi_visible
SRC_LANGUAGE: "r"
SRC_SCRIPT: src/features/phone_wifi_visible/rapids/main.R
@ -280,112 +370,389 @@ PHONE_WIFI_VISIBLE:
# FITBIT #
########################################################################################################################
# See https://www.rapids.science/latest/setup/configuration/#device-data-source-configuration
FITBIT_DATA_CONFIGURATION:
SOURCE:
TYPE: DATABASE # DATABASE or FILES (set each [FITBIT_SENSOR][TABLE] attribute with a table name or a file path accordingly)
COLUMN_FORMAT: JSON # JSON or PLAIN_TEXT
DATABASE_GROUP: *database_group
DEVICE_ID_COLUMN: device_id # column name
TIMEZONE:
TYPE: SINGLE # Fitbit devices don't support time zones so we read this data in the timezone indicated by VALUE
VALUE: *timezone
# See https://www.rapids.science/latest/setup/configuration/#data-stream-configuration
FITBIT_DATA_STREAMS:
USE: fitbitjson_mysql
# AVAILABLE:
fitbitjson_mysql:
DATABASE_GROUP: MY_GROUP
SLEEP_SUMMARY_LAST_NIGHT_END: 660 # a number ranged from 0 (midnight) to 1439 (23:59) which denotes number of minutes after midnight. By default, 660 (11:00).
fitbitparsed_mysql:
DATABASE_GROUP: MY_GROUP
SLEEP_SUMMARY_LAST_NIGHT_END: 660 # a number ranged from 0 (midnight) to 1439 (23:59) which denotes number of minutes after midnight. By default, 660 (11:00).
fitbitjson_csv:
FOLDER: data/external/fitbit_csv
SLEEP_SUMMARY_LAST_NIGHT_END: 660 # a number ranged from 0 (midnight) to 1439 (23:59) which denotes number of minutes after midnight. By default, 660 (11:00).
fitbitparsed_csv:
FOLDER: data/external/fitbit_csv
SLEEP_SUMMARY_LAST_NIGHT_END: 660 # a number ranged from 0 (midnight) to 1439 (23:59) which denotes number of minutes after midnight. By default, 660 (11:00).
# Sensors ------
# See https://www.rapids.science/latest/features/fitbit-calories-intraday/
FITBIT_CALORIES_INTRADAY:
CONTAINER: fitbit_data
PROVIDERS:
RAPIDS:
COMPUTE: False
EPISODE_TYPE: [sedentary, lightlyactive, fairlyactive, veryactive, mvpa, lowmet, highmet]
EPISODE_TIME_THRESHOLD: 5 # minutes
EPISODE_MET_THRESHOLD: 3
EPISODE_MVPA_CATEGORIES: [fairlyactive, veryactive]
EPISODE_REFERENCE_TIME: MIDNIGHT # or START_OF_THE_SEGMENT
FEATURES: [count, sumduration, avgduration, minduration, maxduration, stdduration, starttimefirst, endtimefirst, starttimelast, endtimelast, starttimelongest, endtimelongest, summet, avgmet, maxmet, minmet, stdmet, sumcalories, avgcalories, maxcalories, mincalories, stdcalories]
SRC_SCRIPT: src/features/fitbit_calories_intraday/rapids/main.R
# See https://www.rapids.science/latest/features/fitbit-data-yield/
FITBIT_DATA_YIELD:
SENSOR: FITBIT_HEARTRATE_INTRADAY
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: [ratiovalidyieldedminutes, ratiovalidyieldedhours]
MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS: 0.5 # 0 to 1, minimum percentage of valid minutes in an hour to be considered valid.
SRC_SCRIPT: src/features/fitbit_data_yield/rapids/main.R
# See https://www.rapids.science/latest/features/fitbit-heartrate-summary/
FITBIT_HEARTRATE_SUMMARY:
TABLE: heartrate_summary
CONTAINER: heartrate_summary
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["maxrestinghr", "minrestinghr", "avgrestinghr", "medianrestinghr", "moderestinghr", "stdrestinghr", "diffmaxmoderestinghr", "diffminmoderestinghr", "entropyrestinghr"] # calories features' accuracy depend on the accuracy of the participants fitbit profile (e.g. height, weight) use these with care: ["sumcaloriesoutofrange", "maxcaloriesoutofrange", "mincaloriesoutofrange", "avgcaloriesoutofrange", "mediancaloriesoutofrange", "stdcaloriesoutofrange", "entropycaloriesoutofrange", "sumcaloriesfatburn", "maxcaloriesfatburn", "mincaloriesfatburn", "avgcaloriesfatburn", "mediancaloriesfatburn", "stdcaloriesfatburn", "entropycaloriesfatburn", "sumcaloriescardio", "maxcaloriescardio", "mincaloriescardio", "avgcaloriescardio", "mediancaloriescardio", "stdcaloriescardio", "entropycaloriescardio", "sumcaloriespeak", "maxcaloriespeak", "mincaloriespeak", "avgcaloriespeak", "mediancaloriespeak", "stdcaloriespeak", "entropycaloriespeak"]
SRC_FOLDER: "rapids" # inside src/features/fitbit_heartrate_summary
SRC_LANGUAGE: "python"
SRC_SCRIPT: src/features/fitbit_heartrate_summary/rapids/main.py
# See https://www.rapids.science/latest/features/fitbit-heartrate-intraday/
FITBIT_HEARTRATE_INTRADAY:
TABLE: heartrate_intraday
CONTAINER: heartrate_intraday
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["maxhr", "minhr", "avghr", "medianhr", "modehr", "stdhr", "diffmaxmodehr", "diffminmodehr", "entropyhr", "minutesonoutofrangezone", "minutesonfatburnzone", "minutesoncardiozone", "minutesonpeakzone"]
SRC_FOLDER: "rapids" # inside src/features/fitbit_heartrate_intraday
SRC_LANGUAGE: "python"
SRC_SCRIPT: src/features/fitbit_heartrate_intraday/rapids/main.py
# See https://www.rapids.science/latest/features/fitbit-sleep-summary/
FITBIT_SLEEP_SUMMARY:
TABLE: sleep_summary
SLEEP_EPISODE_TIMESTAMP: end # summary sleep episodes are considered as events based on either the start timestamp or end timestamp.
CONTAINER: sleep_summary
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["countepisode", "avgefficiency", "sumdurationafterwakeup", "sumdurationasleep", "sumdurationawake", "sumdurationtofallasleep", "sumdurationinbed", "avgdurationafterwakeup", "avgdurationasleep", "avgdurationawake", "avgdurationtofallasleep", "avgdurationinbed"]
FEATURES: ["firstwaketime", "lastwaketime", "firstbedtime", "lastbedtime", "countepisode", "avgefficiency", "sumdurationafterwakeup", "sumdurationasleep", "sumdurationawake", "sumdurationtofallasleep", "sumdurationinbed", "avgdurationafterwakeup", "avgdurationasleep", "avgdurationawake", "avgdurationtofallasleep", "avgdurationinbed"]
SLEEP_TYPES: ["main", "nap", "all"]
SRC_FOLDER: "rapids" # inside src/features/fitbit_sleep_summary
SRC_LANGUAGE: "python"
SRC_SCRIPT: src/features/fitbit_sleep_summary/rapids/main.py
# See https://www.rapids.science/latest/features/fitbit-steps-summary/
FITBIT_STEPS_SUMMARY:
TABLE: steps_summary
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["maxsumsteps", "minsumsteps", "avgsumsteps", "mediansumsteps", "stdsumsteps"]
SRC_FOLDER: "rapids" # inside src/features/fitbit_steps_summary
SRC_LANGUAGE: "python"
# See https://www.rapids.science/latest/features/fitbit-steps-intraday/
FITBIT_STEPS_INTRADAY:
TABLE: steps_intraday
# See https://www.rapids.science/latest/features/fitbit-sleep-intraday/
FITBIT_SLEEP_INTRADAY:
CONTAINER: sleep_intraday
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES:
STEPS: ["sum", "max", "min", "avg", "std"]
LEVELS_AND_TYPES: [countepisode, sumduration, maxduration, minduration, avgduration, medianduration, stdduration]
RATIOS_TYPE: [count, duration]
RATIOS_SCOPE: [ACROSS_LEVELS, ACROSS_TYPES, WITHIN_LEVELS, WITHIN_TYPES]
SLEEP_LEVELS:
INCLUDE_ALL_GROUPS: True
CLASSIC: [awake, restless, asleep]
STAGES: [wake, deep, light, rem]
UNIFIED: [awake, asleep]
SLEEP_TYPES: [main, nap, all]
SRC_SCRIPT: src/features/fitbit_sleep_intraday/rapids/main.py
PRICE:
COMPUTE: False
FEATURES: [avgduration, avgratioduration, avgstarttimeofepisodemain, avgendtimeofepisodemain, avgmidpointofepisodemain, stdstarttimeofepisodemain, stdendtimeofepisodemain, stdmidpointofepisodemain, socialjetlag, rmssdmeanstarttimeofepisodemain, rmssdmeanendtimeofepisodemain, rmssdmeanmidpointofepisodemain, rmssdmedianstarttimeofepisodemain, rmssdmedianendtimeofepisodemain, rmssdmedianmidpointofepisodemain]
SLEEP_LEVELS:
INCLUDE_ALL_GROUPS: True
CLASSIC: [awake, restless, asleep]
STAGES: [wake, deep, light, rem]
UNIFIED: [awake, asleep]
DAY_TYPES: [WEEKEND, WEEK, ALL]
LAST_NIGHT_END: 660 # number of minutes after midnight (11:00) 11*60
SRC_SCRIPT: src/features/fitbit_sleep_intraday/price/main.py
# See https://www.rapids.science/latest/features/fitbit-steps-summary/
FITBIT_STEPS_SUMMARY:
CONTAINER: steps_summary
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES: ["maxsumsteps", "minsumsteps", "avgsumsteps", "mediansumsteps", "stdsumsteps"]
SRC_SCRIPT: src/features/fitbit_steps_summary/rapids/main.py
# See https://www.rapids.science/latest/features/fitbit-steps-intraday/
FITBIT_STEPS_INTRADAY:
CONTAINER: steps_intraday
EXCLUDE_SLEEP: # you can exclude step data that was logged during sleep periods
TIME_BASED:
EXCLUDE: False
START_TIME: "23:00"
END_TIME: "07:00"
FITBIT_BASED:
EXCLUDE: False
PROVIDERS:
RAPIDS:
COMPUTE: False
FEATURES:
STEPS: ["sum", "max", "min", "avg", "std", "firststeptime", "laststeptime"]
SEDENTARY_BOUT: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration"]
ACTIVE_BOUT: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration"]
REFERENCE_HOUR: 0
THRESHOLD_ACTIVE_BOUT: 10 # steps
INCLUDE_ZERO_STEP_ROWS: False
SRC_FOLDER: "rapids" # inside src/features/fitbit_steps_intraday
SRC_LANGUAGE: "python"
SRC_SCRIPT: src/features/fitbit_steps_intraday/rapids/main.py
# FITBIT_CALORIES:
# TABLE_FORMAT: JSON # JSON or CSV. If your JSON or CSV data are files change [DEVICE_DATA][FITBIT][SOURCE][TYPE] to FILES
# TABLE:
# JSON: fitbit_calories
# CSV:
# SUMMARY: calories_summary
# INTRADAY: calories_intraday
# PROVIDERS:
# RAPIDS:
# COMPUTE: False
# FEATURES: []
########################################################################################################################
# EMPATICA #
########################################################################################################################
EMPATICA_DATA_STREAMS:
USE: empatica_zip
# AVAILABLE:
empatica_zip:
FOLDER: data/external/empatica
# Sensors ------
# See https://www.rapids.science/latest/features/empatica-accelerometer/
EMPATICA_ACCELEROMETER:
CONTAINER: ACC
PROVIDERS:
DBDP:
COMPUTE: False
FEATURES: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
SRC_SCRIPT: src/features/empatica_accelerometer/dbdp/main.py
CR:
COMPUTE: True
FEATURES: ["totalMagnitudeBand", "absoluteMeanBand", "varianceBand"] # Acc features
WINDOWS:
COMPUTE: True
WINDOW_LENGTH: 15 # specify window length in seconds
SECOND_ORDER_FEATURES: ['mean', 'median', 'sd', 'nlargest', 'nsmallest', 'count_windows']
SRC_SCRIPT: src/features/empatica_accelerometer/cr/main.py
# See https://www.rapids.science/latest/features/empatica-heartrate/
EMPATICA_HEARTRATE:
CONTAINER: HR
PROVIDERS:
DBDP:
COMPUTE: False
FEATURES: ["maxhr", "minhr", "avghr", "medianhr", "modehr", "stdhr", "diffmaxmodehr", "diffminmodehr", "entropyhr"]
SRC_SCRIPT: src/features/empatica_heartrate/dbdp/main.py
# See https://www.rapids.science/latest/features/empatica-temperature/
EMPATICA_TEMPERATURE:
CONTAINER: TEMP
PROVIDERS:
DBDP:
COMPUTE: False
FEATURES: ["maxtemp", "mintemp", "avgtemp", "mediantemp", "modetemp", "stdtemp", "diffmaxmodetemp", "diffminmodetemp", "entropytemp"]
SRC_SCRIPT: src/features/empatica_temperature/dbdp/main.py
CR:
COMPUTE: True
FEATURES: ["maximum", "minimum", "meanAbsChange", "longestStrikeAboveMean", "longestStrikeBelowMean",
"stdDev", "median", "meanChange", "sumSquared", "squareSumOfComponent", "sumOfSquareComponents"]
WINDOWS:
COMPUTE: True
WINDOW_LENGTH: 300 # specify window length in seconds
SECOND_ORDER_FEATURES: ['mean', 'median', 'sd', 'nlargest', 'nsmallest', 'count_windows']
SRC_SCRIPT: src/features/empatica_temperature/cr/main.py
# See https://www.rapids.science/latest/features/empatica-electrodermal-activity/
EMPATICA_ELECTRODERMAL_ACTIVITY:
CONTAINER: EDA
PROVIDERS:
DBDP:
COMPUTE: False
FEATURES: ["maxeda", "mineda", "avgeda", "medianeda", "modeeda", "stdeda", "diffmaxmodeeda", "diffminmodeeda", "entropyeda"]
SRC_SCRIPT: src/features/empatica_electrodermal_activity/dbdp/main.py
CR:
COMPUTE: True
FEATURES: ['mean', 'std', 'q25', 'q75', 'qd', 'deriv', 'power', 'numPeaks', 'ratePeaks', 'powerPeaks', 'sumPosDeriv', 'propPosDeriv', 'derivTonic',
'sigTonicDifference', 'freqFeats','maxPeakAmplitudeChangeBefore', 'maxPeakAmplitudeChangeAfter', 'avgPeakAmplitudeChangeBefore',
'avgPeakAmplitudeChangeAfter', 'avgPeakChangeRatio', 'maxPeakIncreaseTime', 'maxPeakDecreaseTime', 'maxPeakDuration', 'maxPeakChangeRatio',
'avgPeakIncreaseTime', 'avgPeakDecreaseTime', 'avgPeakDuration', 'signalOverallChange', 'changeDuration', 'changeRate', 'significantIncrease',
'significantDecrease']
WINDOWS:
COMPUTE: True
WINDOW_LENGTH: 60 # specify window length in seconds
SECOND_ORDER_FEATURES: ['mean', 'median', 'sd', 'nlargest', 'nsmallest', count_windows, eda_num_peaks_non_zero]
IMPUTE_NANS: True
SRC_SCRIPT: src/features/empatica_electrodermal_activity/cr/main.py
# See https://www.rapids.science/latest/features/empatica-blood-volume-pulse/
EMPATICA_BLOOD_VOLUME_PULSE:
CONTAINER: BVP
PROVIDERS:
DBDP:
COMPUTE: False
FEATURES: ["maxbvp", "minbvp", "avgbvp", "medianbvp", "modebvp", "stdbvp", "diffmaxmodebvp", "diffminmodebvp", "entropybvp"]
SRC_SCRIPT: src/features/empatica_blood_volume_pulse/dbdp/main.py
CR:
COMPUTE: False
FEATURES: ['meanHr', 'ibi', 'sdnn', 'sdsd', 'rmssd', 'pnn20', 'pnn50', 'sd', 'sd2', 'sd1/sd2', 'numRR', # Time features
'VLF', 'LF', 'LFnorm', 'HF', 'HFnorm', 'LF/HF', 'fullIntegral'] # Freq features
WINDOWS:
COMPUTE: True
WINDOW_LENGTH: 300 # specify window length in seconds
SECOND_ORDER_FEATURES: ['mean', 'median', 'sd', 'nlargest', 'nsmallest', 'count_windows', 'hrv_num_windows_non_nan']
SRC_SCRIPT: src/features/empatica_blood_volume_pulse/cr/main.py
# See https://www.rapids.science/latest/features/empatica-inter-beat-interval/
EMPATICA_INTER_BEAT_INTERVAL:
CONTAINER: IBI
PROVIDERS:
DBDP:
COMPUTE: False
FEATURES: ["maxibi", "minibi", "avgibi", "medianibi", "modeibi", "stdibi", "diffmaxmodeibi", "diffminmodeibi", "entropyibi"]
SRC_SCRIPT: src/features/empatica_inter_beat_interval/dbdp/main.py
CR:
COMPUTE: True
FEATURES: ['meanHr', 'ibi', 'sdnn', 'sdsd', 'rmssd', 'pnn20', 'pnn50', 'sd', 'sd2', 'sd1/sd2', 'numRR', # Time features
'VLF', 'LF', 'LFnorm', 'HF', 'HFnorm', 'LF/HF', 'fullIntegral'] # Freq features
PATCH_WITH_BVP: True
WINDOWS:
COMPUTE: True
WINDOW_LENGTH: 300 # specify window length in seconds
SECOND_ORDER_FEATURES: ['mean', 'median', 'sd', 'nlargest', 'nsmallest', 'count_windows', 'hrv_num_windows_non_nan']
SRC_SCRIPT: src/features/empatica_inter_beat_interval/cr/main.py
# See https://www.rapids.science/latest/features/empatica-tags/
EMPATICA_TAGS:
CONTAINER: TAGS
PROVIDERS: # None implemented yet
########################################################################################################################
# PLOTS #
########################################################################################################################
# Data quality
# Data quality ------
# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#1-histograms-of-phone-data-yield
HISTOGRAM_PHONE_DATA_YIELD:
PLOT: False
# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#2-heatmaps-of-overall-data-yield
HEATMAP_PHONE_DATA_YIELD_PER_PARTICIPANT_PER_TIME_SEGMENT:
PLOT: False
TIME: RELATIVE_TIME # ABSOLUTE_TIME or RELATIVE_TIME
# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#3-heatmap-of-recorded-phone-sensors
HEATMAP_SENSORS_PER_MINUTE_PER_TIME_SEGMENT:
PLOT: False
# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#4-heatmap-of-sensor-row-count
HEATMAP_SENSOR_ROW_COUNT_PER_TIME_SEGMENT:
PLOT: False
SENSORS: [PHONE_ACCELEROMETER, PHONE_ACTIVITY_RECOGNITION, PHONE_APPLICATIONS_FOREGROUND, PHONE_BATTERY, PHONE_BLUETOOTH, PHONE_CALLS, PHONE_CONVERSATION, PHONE_LIGHT, PHONE_LOCATIONS, PHONE_MESSAGES, PHONE_SCREEN, PHONE_WIFI_CONNECTED, PHONE_WIFI_VISIBLE]
SENSORS: []
# Features
# Features ------
# See https://www.rapids.science/latest/visualizations/feature-visualizations/#1-heatmap-correlation-matrix
HEATMAP_FEATURE_CORRELATION_MATRIX:
PLOT: False
MIN_ROWS_RATIO: 0.5
CORR_THRESHOLD: 0.1
CORR_METHOD: "pearson" # choose from {"pearson", "kendall", "spearman"}
########################################################################################################################
# Data Cleaning #
########################################################################################################################
ALL_CLEANING_INDIVIDUAL:
PROVIDERS:
RAPIDS:
COMPUTE: False
IMPUTE_SELECTED_EVENT_FEATURES:
COMPUTE: False
MIN_DATA_YIELDED_MINUTES_TO_IMPUTE: 0.33
COLS_NAN_THRESHOLD: 1 # set to 1 to disable
COLS_VAR_THRESHOLD: True
ROWS_NAN_THRESHOLD: 1 # set to 1 to disable
DATA_YIELD_FEATURE: RATIO_VALID_YIELDED_HOURS # RATIO_VALID_YIELDED_HOURS or RATIO_VALID_YIELDED_MINUTES
DATA_YIELD_RATIO_THRESHOLD: 0 # set to 0 to disable
DROP_HIGHLY_CORRELATED_FEATURES:
COMPUTE: True
MIN_OVERLAP_FOR_CORR_THRESHOLD: 0.5
CORR_THRESHOLD: 0.95
SRC_SCRIPT: src/features/all_cleaning_individual/rapids/main.R
STRAW:
COMPUTE: True
PHONE_DATA_YIELD_FEATURE: RATIO_VALID_YIELDED_MINUTES # RATIO_VALID_YIELDED_HOURS or RATIO_VALID_YIELDED_MINUTES
PHONE_DATA_YIELD_RATIO_THRESHOLD: 0.5 # set to 0 to disable
EMPATICA_DATA_YIELD_RATIO_THRESHOLD: 0.5 # set to 0 to disable
ROWS_NAN_THRESHOLD: 0.33 # set to 1 to disable
COLS_NAN_THRESHOLD: 0.9 # set to 1 to remove only columns that contains all (100% of) NaN
COLS_VAR_THRESHOLD: True
DROP_HIGHLY_CORRELATED_FEATURES:
COMPUTE: True
MIN_OVERLAP_FOR_CORR_THRESHOLD: 0.5
CORR_THRESHOLD: 0.95
STANDARDIZATION: True
SRC_SCRIPT: src/features/all_cleaning_individual/straw/main.py
ALL_CLEANING_OVERALL:
PROVIDERS:
RAPIDS:
COMPUTE: False
IMPUTE_SELECTED_EVENT_FEATURES:
COMPUTE: False
MIN_DATA_YIELDED_MINUTES_TO_IMPUTE: 0.33
COLS_NAN_THRESHOLD: 1 # set to 1 to disable
COLS_VAR_THRESHOLD: True
ROWS_NAN_THRESHOLD: 1 # set to 1 to disable
DATA_YIELD_FEATURE: RATIO_VALID_YIELDED_HOURS # RATIO_VALID_YIELDED_HOURS or RATIO_VALID_YIELDED_MINUTES
DATA_YIELD_RATIO_THRESHOLD: 0 # set to 0 to disable
DROP_HIGHLY_CORRELATED_FEATURES:
COMPUTE: True
MIN_OVERLAP_FOR_CORR_THRESHOLD: 0.5
CORR_THRESHOLD: 0.95
SRC_SCRIPT: src/features/all_cleaning_overall/rapids/main.R
STRAW:
COMPUTE: True
PHONE_DATA_YIELD_FEATURE: RATIO_VALID_YIELDED_MINUTES # RATIO_VALID_YIELDED_HOURS or RATIO_VALID_YIELDED_MINUTES
PHONE_DATA_YIELD_RATIO_THRESHOLD: 0.5 # set to 0 to disable
EMPATICA_DATA_YIELD_RATIO_THRESHOLD: 0.5 # set to 0 to disable
ROWS_NAN_THRESHOLD: 0.33 # set to 1 to disable
COLS_NAN_THRESHOLD: 0.8 # set to 1 to remove only columns that contains all (100% of) NaN
COLS_VAR_THRESHOLD: True
DROP_HIGHLY_CORRELATED_FEATURES:
COMPUTE: True
MIN_OVERLAP_FOR_CORR_THRESHOLD: 0.5
CORR_THRESHOLD: 0.95
STANDARDIZATION: True
TARGET_STANDARDIZATION: False
SRC_SCRIPT: src/features/all_cleaning_overall/straw/main.py
########################################################################################################################
# Baseline #
########################################################################################################################
PARAMS_FOR_ANALYSIS:
BASELINE:
COMPUTE: True
FOLDER: data/external/baseline
CONTAINER: [results-survey637813_final.csv, # Slovenia
results-survey358134_final.csv, # Belgium 1
results-survey413767_final.csv # Belgium 2
]
QUESTION_LIST: survey637813+question_text.csv
FEATURES: [age, gender, startlanguage, limesurvey_demand, limesurvey_control, limesurvey_demand_control_ratio, limesurvey_demand_control_ratio_quartile]
CATEGORICAL_FEATURES: [gender]
TARGET:
COMPUTE: True
LABEL: appraisal_stressfulness_event_mean
ALL_LABELS: [PANAS_positive_affect_mean, PANAS_negative_affect_mean, JCQ_job_demand_mean, JCQ_job_control_mean, JCQ_supervisor_support_mean, JCQ_coworker_support_mean, appraisal_stressfulness_period_mean]
# PANAS_positive_affect_mean, PANAS_negative_affect_mean, JCQ_job_demand_mean, JCQ_job_control_mean, JCQ_supervisor_support_mean,
# JCQ_coworker_support_mean, appraisal_stressfulness_period_mean, appraisal_stressfulness_event_mean, appraisal_threat_mean, appraisal_challenge_mean

View File

@ -0,0 +1,9 @@
"_id","timestamp","device_id","call_type","call_duration","trace"
1,1587663260695,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",2,14,"d5e84f8af01b2728021d4f43f53a163c0c90000c"
2,1587739118007,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",3,0,"47c125dc7bd163b8612cdea13724a814917b6e93"
5,1587746544891,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",2,95,"9cc793ffd6e88b1d850ce540b5d7e000ef5650d4"
6,1587911379859,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",2,63,"51fb9344e988049a3fec774c7ca622358bf80264"
7,1587992647361,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",3,0,"2a862a7730cfdfaf103a9487afe3e02935fd6e02"
8,1588020039448,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",1,11,"a2c53f6a086d98622c06107780980cf1bb4e37bd"
11,1588176189024,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",2,65,"56589df8c830c70e330b644921ed38e08d8fd1f3"
12,1588197745079,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",3,0,"cab458018a8ed3b626515e794c70b6f415318adc"
1 _id timestamp device_id call_type call_duration trace
2 1 1587663260695 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2 14 d5e84f8af01b2728021d4f43f53a163c0c90000c
3 2 1587739118007 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 3 0 47c125dc7bd163b8612cdea13724a814917b6e93
4 5 1587746544891 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2 95 9cc793ffd6e88b1d850ce540b5d7e000ef5650d4
5 6 1587911379859 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2 63 51fb9344e988049a3fec774c7ca622358bf80264
6 7 1587992647361 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 3 0 2a862a7730cfdfaf103a9487afe3e02935fd6e02
7 8 1588020039448 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 1 11 a2c53f6a086d98622c06107780980cf1bb4e37bd
8 11 1588176189024 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2 65 56589df8c830c70e330b644921ed38e08d8fd1f3
9 12 1588197745079 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 3 0 cab458018a8ed3b626515e794c70b6f415318adc

Binary file not shown.

View File

@ -0,0 +1,57 @@
label,empatica_id
uploader_79170,A0245B
uploader_89788,A02731
uploader_68294,A02705
uploader_92856,A024AF
uploader_23726,A0231C
uploader_66620,A02305
uploader_58435,A026B5
uploader_87801,A022A8
uploader_96055,A027BA
uploader_69549,A0226C
uploader_26363,A0263D
uploader_72010,A023FA
uploader_13997,A024AF
uploader_31156,A02305
uploader_63187,A027BA
uploader_94821,A022A8
uploader_65413,A023F1;A023FA
uploader_36488,A02713
uploader_91087,A0231C
uploader_35174,A025D1
uploader_73880,A02705
uploader_78650,A02731
uploader_70578,A0245B
uploader_88313,A02736
uploader_58482,A0261A
uploader_80601,A027BA
uploader_93729,A0226C
uploader_61663,A0245B
uploader_80848,A025D1
uploader_57312,A023F9;A02361;A027A0
uploader_52087,A02666
uploader_98770,A02953
uploader_51327,A0245F
uploader_11737,A02732
uploader_77440,A0264E
uploader_57277,A02422
uploader_13098,A026E5
uploader_80719,A023C8
uploader_54698,A02953
uploader_95571,A02853
uploader_21880,A024DC
uploader_92905,A02920
uploader_12108,A023F4
uploader_17436,A026E5
uploader_58440,A0273F
uploader_22172,A0245F
uploader_39250,A02422
uploader_15311,A023F9
uploader_45766,A02920
uploader_23096,A02361
uploader_78243,A02422
uploader_58777,A0245F
uploader_82941,A02666
uploader_89606,A023F4
uploader_82969,A023C8
uploader_53573,A024DC;A02361
1 label empatica_id
2 uploader_79170 A0245B
3 uploader_89788 A02731
4 uploader_68294 A02705
5 uploader_92856 A024AF
6 uploader_23726 A0231C
7 uploader_66620 A02305
8 uploader_58435 A026B5
9 uploader_87801 A022A8
10 uploader_96055 A027BA
11 uploader_69549 A0226C
12 uploader_26363 A0263D
13 uploader_72010 A023FA
14 uploader_13997 A024AF
15 uploader_31156 A02305
16 uploader_63187 A027BA
17 uploader_94821 A022A8
18 uploader_65413 A023F1;A023FA
19 uploader_36488 A02713
20 uploader_91087 A0231C
21 uploader_35174 A025D1
22 uploader_73880 A02705
23 uploader_78650 A02731
24 uploader_70578 A0245B
25 uploader_88313 A02736
26 uploader_58482 A0261A
27 uploader_80601 A027BA
28 uploader_93729 A0226C
29 uploader_61663 A0245B
30 uploader_80848 A025D1
31 uploader_57312 A023F9;A02361;A027A0
32 uploader_52087 A02666
33 uploader_98770 A02953
34 uploader_51327 A0245F
35 uploader_11737 A02732
36 uploader_77440 A0264E
37 uploader_57277 A02422
38 uploader_13098 A026E5
39 uploader_80719 A023C8
40 uploader_54698 A02953
41 uploader_95571 A02853
42 uploader_21880 A024DC
43 uploader_92905 A02920
44 uploader_12108 A023F4
45 uploader_17436 A026E5
46 uploader_58440 A0273F
47 uploader_22172 A0245F
48 uploader_39250 A02422
49 uploader_15311 A023F9
50 uploader_45766 A02920
51 uploader_23096 A02361
52 uploader_78243 A02422
53 uploader_58777 A0245F
54 uploader_82941 A02666
55 uploader_89606 A023F4
56 uploader_82969 A023C8
57 uploader_53573 A024DC;A02361

View File

@ -0,0 +1,11 @@
PHONE:
DEVICE_IDS: [4b62a655-cbf0-4ac0-a448-06726f45b56a]
PLATFORMS: [android]
LABEL: uploader_53573
START_DATE: 2021-05-21 09:21:24
END_DATE: 2021-07-12 17:32:07
EMPATICA:
DEVICE_IDS: [uploader_53573]
LABEL: uploader_53573
START_DATE: 2021-05-21 09:21:24
END_DATE: 2021-07-12 17:32:07

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,45 @@
genre,n
System,261
Tools,96
Productivity,71
Health & Fitness,60
Finance,54
Communication,39
Music & Audio,39
Shopping,38
Lifestyle,33
Education,28
News & Magazines,24
Maps & Navigation,23
Entertainment,21
Business,18
Travel & Local,18
Books & Reference,16
Social,16
Weather,16
Food & Drink,14
Sports,14
Other,13
Photography,13
Puzzle,13
Video Players & Editors,12
Card,9
Casual,9
Personalization,8
Medical,7
Board,5
Strategy,4
House & Home,3
Trivia,3
Word,3
Adventure,2
Art & Design,2
Auto & Vehicles,2
Dating,2
Role Playing,2
STRAW,2
Simulation,2
"Board,Brain Games",1
"Entertainment,Music & Video",1
Parenting,1
Racing,1
1 genre n
2 System 261
3 Tools 96
4 Productivity 71
5 Health & Fitness 60
6 Finance 54
7 Communication 39
8 Music & Audio 39
9 Shopping 38
10 Lifestyle 33
11 Education 28
12 News & Magazines 24
13 Maps & Navigation 23
14 Entertainment 21
15 Business 18
16 Travel & Local 18
17 Books & Reference 16
18 Social 16
19 Weather 16
20 Food & Drink 14
21 Sports 14
22 Other 13
23 Photography 13
24 Puzzle 13
25 Video Players & Editors 12
26 Card 9
27 Casual 9
28 Personalization 8
29 Medical 7
30 Board 5
31 Strategy 4
32 House & Home 3
33 Trivia 3
34 Word 3
35 Adventure 2
36 Art & Design 2
37 Auto & Vehicles 2
38 Dating 2
39 Role Playing 2
40 STRAW 2
41 Simulation 2
42 Board,Brain Games 1
43 Entertainment,Music & Video 1
44 Parenting 1
45 Racing 1

View File

@ -0,0 +1,3 @@
label,start_time,length,repeats_on,repeats_value
daily,04:00:00,23H 59M 59S,every_day,0
working_day,04:00:00,18H 00M 00S,every_day,0
1 label start_time length repeats_on repeats_value
2 daily 04:00:00 23H 59M 59S every_day 0
3 working_day 04:00:00 18H 00M 00S every_day 0

View File

@ -3,7 +3,12 @@ stress,1587661220000,1H,0M,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress,1587747620000,4H,4H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress,1587906020000,3H,0M,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress,1588003220000,7H,4H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress,1588172420000,9H,0,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
mood,1587661220000,1H,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
mood,1587747620000,1D,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
mood,1587906020000,7D,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress,1588172420000,9H,0H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
mood,1587661220000,1H,0H,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
mood,1587747620000,1D,0H,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
mood,1587906020000,7D,0H,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
survey1,1587661220000,10H,10H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
survey2,1587661220000,10H,5H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
survey3,1587661220000,10H,0H,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524

1 label event_timestamp length shift shift_direction device_id
3 stress 1587747620000 4H 4H -1 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
4 stress 1587906020000 3H 0M 1 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
5 stress 1588003220000 7H 4H -1 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
6 stress 1588172420000 9H 0 0H -1 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
7 mood 1587661220000 1H 0 0H 0 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
8 mood 1587747620000 1D 0 0H 0 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
9 mood 1587906020000 7D 0 0H 0 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
10 survey1 1587661220000 10H 10H -1 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
11 survey2 1587661220000 10H 5H -1 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
12 survey3 1587661220000 10H 0H 1 a748ee1a-1d0b-4ae9-9074-279a2b6ba524
13
14

View File

@ -1,2 +1,2 @@
label,length
thirtyminutes,30
fiveminutes,5
1 label length
2 thirtyminutes fiveminutes 30 5

View File

@ -1,6 +1,2 @@
label,start_time,length,repeats_on,repeats_value
daily,00:00:00,23H 59M 59S,every_day,0
morning,06:00:00,5H 59M 59S,every_day,0
afternoon,12:00:00,5H 59M 59S,every_day,0
evening,18:00:00,5H 59M 59S,every_day,0
night,00:00:00,5H 59M 59S,every_day,0
1 label start_time length repeats_on repeats_value
2 daily 00:00:00 23H 59M 59S every_day 0
morning 06:00:00 5H 59M 59S every_day 0
afternoon 12:00:00 5H 59M 59S every_day 0
evening 18:00:00 5H 59M 59S every_day 0
night 00:00:00 5H 59M 59S every_day 0

4109
data/external/timezone.csv vendored 100644

File diff suppressed because it is too large Load Diff

595
data/external/wiki_tz.csv vendored 100644
View File

@ -0,0 +1,595 @@
Country code,"Latitude, longitude ±DDMM(SS)±DDDMM(SS)",TZ database name,Portion of country covered,Status,UTC offset ±hh:mm,UTC DST offset ±hh:mm,Notes
CI,+051900402,Africa/Abidjan,,Canonical,+00:00,+00:00,
GH,+053300013,Africa/Accra,,Canonical,+00:00,+00:00,
ET,+0902+03842,Africa/Addis_Ababa,,Alias,+03:00,+03:00,Link to Africa/Nairobi
DZ,+3647+00303,Africa/Algiers,,Canonical,+01:00,+01:00,
ER,+1520+03853,Africa/Asmara,,Alias,+03:00,+03:00,Link to Africa/Nairobi
ER,+1520+03853,Africa/Asmera,,Deprecated,+03:00,+03:00,Link to Africa/Nairobi
ML,+123900800,Africa/Bamako,,Alias,+00:00,+00:00,Link to Africa/Abidjan
CF,+0422+01835,Africa/Bangui,,Alias,+01:00,+01:00,Link to Africa/Lagos
GM,+132801639,Africa/Banjul,,Alias,+00:00,+00:00,Link to Africa/Abidjan
GW,+115101535,Africa/Bissau,,Canonical,+00:00,+00:00,
MW,1547+03500,Africa/Blantyre,,Alias,+02:00,+02:00,Link to Africa/Maputo
CG,0416+01517,Africa/Brazzaville,,Alias,+01:00,+01:00,Link to Africa/Lagos
BI,0323+02922,Africa/Bujumbura,,Alias,+02:00,+02:00,Link to Africa/Maputo
EG,+3003+03115,Africa/Cairo,,Canonical,+02:00,+02:00,
MA,+333900735,Africa/Casablanca,,Canonical,+01:00,+00:00,
ES,+355300519,Africa/Ceuta,"Ceuta, Melilla",Canonical,+01:00,+02:00,
GN,+093101343,Africa/Conakry,,Alias,+00:00,+00:00,Link to Africa/Abidjan
SN,+144001726,Africa/Dakar,,Alias,+00:00,+00:00,Link to Africa/Abidjan
TZ,0648+03917,Africa/Dar_es_Salaam,,Alias,+03:00,+03:00,Link to Africa/Nairobi
DJ,+1136+04309,Africa/Djibouti,,Alias,+03:00,+03:00,Link to Africa/Nairobi
CM,+0403+00942,Africa/Douala,,Alias,+01:00,+01:00,Link to Africa/Lagos
EH,+270901312,Africa/El_Aaiun,,Canonical,+01:00,+00:00,
SL,+083001315,Africa/Freetown,,Alias,+00:00,+00:00,Link to Africa/Abidjan
BW,2439+02555,Africa/Gaborone,,Alias,+02:00,+02:00,Link to Africa/Maputo
ZW,1750+03103,Africa/Harare,,Alias,+02:00,+02:00,Link to Africa/Maputo
ZA,2615+02800,Africa/Johannesburg,,Canonical,+02:00,+02:00,
SS,+0451+03137,Africa/Juba,,Canonical,+02:00,+02:00,
UG,+0019+03225,Africa/Kampala,,Alias,+03:00,+03:00,Link to Africa/Nairobi
SD,+1536+03232,Africa/Khartoum,,Canonical,+02:00,+02:00,
RW,0157+03004,Africa/Kigali,,Alias,+02:00,+02:00,Link to Africa/Maputo
CD,0418+01518,Africa/Kinshasa,Dem. Rep. of Congo (west),Alias,+01:00,+01:00,Link to Africa/Lagos
NG,+0627+00324,Africa/Lagos,West Africa Time,Canonical,+01:00,+01:00,
GA,+0023+00927,Africa/Libreville,,Alias,+01:00,+01:00,Link to Africa/Lagos
TG,+0608+00113,Africa/Lome,,Alias,+00:00,+00:00,Link to Africa/Abidjan
AO,0848+01314,Africa/Luanda,,Alias,+01:00,+01:00,Link to Africa/Lagos
CD,1140+02728,Africa/Lubumbashi,Dem. Rep. of Congo (east),Alias,+02:00,+02:00,Link to Africa/Maputo
ZM,1525+02817,Africa/Lusaka,,Alias,+02:00,+02:00,Link to Africa/Maputo
GQ,+0345+00847,Africa/Malabo,,Alias,+01:00,+01:00,Link to Africa/Lagos
MZ,2558+03235,Africa/Maputo,Central Africa Time,Canonical,+02:00,+02:00,
LS,2928+02730,Africa/Maseru,,Alias,+02:00,+02:00,Link to Africa/Johannesburg
SZ,2618+03106,Africa/Mbabane,,Alias,+02:00,+02:00,Link to Africa/Johannesburg
SO,+0204+04522,Africa/Mogadishu,,Alias,+03:00,+03:00,Link to Africa/Nairobi
LR,+061801047,Africa/Monrovia,,Canonical,+00:00,+00:00,
KE,0117+03649,Africa/Nairobi,,Canonical,+03:00,+03:00,
TD,+1207+01503,Africa/Ndjamena,,Canonical,+01:00,+01:00,
NE,+1331+00207,Africa/Niamey,,Alias,+01:00,+01:00,Link to Africa/Lagos
MR,+180601557,Africa/Nouakchott,,Alias,+00:00,+00:00,Link to Africa/Abidjan
BF,+122200131,Africa/Ouagadougou,,Alias,+00:00,+00:00,Link to Africa/Abidjan
BJ,+0629+00237,Africa/Porto-Novo,,Alias,+01:00,+01:00,Link to Africa/Lagos
ST,+0020+00644,Africa/Sao_Tome,,Canonical,+00:00,+00:00,
ML,,Africa/Timbuktu,,Deprecated,+00:00,+00:00,Link to Africa/Abidjan
LY,+3254+01311,Africa/Tripoli,,Canonical,+02:00,+02:00,
TN,+3648+01011,Africa/Tunis,,Canonical,+01:00,+01:00,
NA,2234+01706,Africa/Windhoek,,Canonical,+02:00,+02:00,
US,+5152481763929,America/Adak,Aleutian Islands,Canonical,10:00,09:00,
US,+6113051495401,America/Anchorage,Alaska (most areas),Canonical,09:00,08:00,
AI,+181206304,America/Anguilla,,Alias,04:00,04:00,Link to America/Port_of_Spain
AG,+170306148,America/Antigua,,Alias,04:00,04:00,Link to America/Port_of_Spain
BR,071204812,America/Araguaina,Tocantins,Canonical,03:00,03:00,
AR,343605827,America/Argentina/Buenos_Aires,"Buenos Aires (BA, CF)",Canonical,03:00,03:00,
AR,282806547,America/Argentina/Catamarca,Catamarca (CT); Chubut (CH),Canonical,03:00,03:00,
AR,,America/Argentina/ComodRivadavia,,Deprecated,03:00,03:00,Link to America/Argentina/Catamarca
AR,312406411,America/Argentina/Cordoba,"Argentina (most areas: CB, CC, CN, ER, FM, MN, SE, SF)",Canonical,03:00,03:00,
AR,241106518,America/Argentina/Jujuy,Jujuy (JY),Canonical,03:00,03:00,
AR,292606651,America/Argentina/La_Rioja,La Rioja (LR),Canonical,03:00,03:00,
AR,325306849,America/Argentina/Mendoza,Mendoza (MZ),Canonical,03:00,03:00,
AR,513806913,America/Argentina/Rio_Gallegos,Santa Cruz (SC),Canonical,03:00,03:00,
AR,244706525,America/Argentina/Salta,"Salta (SA, LP, NQ, RN)",Canonical,03:00,03:00,
AR,313206831,America/Argentina/San_Juan,San Juan (SJ),Canonical,03:00,03:00,
AR,331906621,America/Argentina/San_Luis,San Luis (SL),Canonical,03:00,03:00,
AR,264906513,America/Argentina/Tucuman,Tucumán (TM),Canonical,03:00,03:00,
AR,544806818,America/Argentina/Ushuaia,Tierra del Fuego (TF),Canonical,03:00,03:00,
AW,+123006958,America/Aruba,,Alias,04:00,04:00,Link to America/Curacao
PY,251605740,America/Asuncion,,Canonical,04:00,03:00,
CA,+4845310913718,America/Atikokan,EST - ON (Atikokan); NU (Coral H),Canonical,05:00,05:00,
US,,America/Atka,,Deprecated,10:00,09:00,Link to America/Adak
BR,125903831,America/Bahia,Bahia,Canonical,03:00,03:00,
MX,+204810515,America/Bahia_Banderas,Central Time - Bahía de Banderas,Canonical,06:00,05:00,
BB,+130605937,America/Barbados,,Canonical,04:00,04:00,
BR,012704829,America/Belem,Pará (east); Amapá,Canonical,03:00,03:00,
BZ,+173008812,America/Belize,,Canonical,06:00,06:00,
CA,+512505707,America/Blanc-Sablon,AST - QC (Lower North Shore),Canonical,04:00,04:00,
BR,+024906040,America/Boa_Vista,Roraima,Canonical,04:00,04:00,
CO,+043607405,America/Bogota,,Canonical,05:00,05:00,
US,+4336491161209,America/Boise,Mountain - ID (south); OR (east),Canonical,07:00,06:00,
AR,343605827,America/Buenos_Aires,,Deprecated,03:00,03:00,Link to America/Argentina/Buenos_Aires
CA,+6906501050310,America/Cambridge_Bay,Mountain - NU (west),Canonical,07:00,06:00,
BR,202705437,America/Campo_Grande,Mato Grosso do Sul,Canonical,04:00,04:00,
MX,+210508646,America/Cancun,Eastern Standard Time - Quintana Roo,Canonical,05:00,05:00,
VE,+103006656,America/Caracas,,Canonical,04:00,04:00,
AR,282806547,America/Catamarca,,Deprecated,03:00,03:00,Link to America/Argentina/Catamarca
GF,+045605220,America/Cayenne,,Canonical,03:00,03:00,
KY,+191808123,America/Cayman,,Alias,05:00,05:00,Link to America/Panama
US,+4151000873900,America/Chicago,Central (most areas),Canonical,06:00,05:00,
MX,+283810605,America/Chihuahua,Mountain Time - Chihuahua (most areas),Canonical,07:00,06:00,
CA,,America/Coral_Harbour,,Deprecated,05:00,05:00,Link to America/Atikokan
AR,312406411,America/Cordoba,,Deprecated,03:00,03:00,Link to America/Argentina/Cordoba
CR,+095608405,America/Costa_Rica,,Canonical,06:00,06:00,
CA,+490611631,America/Creston,MST - BC (Creston),Canonical,07:00,07:00,
BR,153505605,America/Cuiaba,Mato Grosso,Canonical,04:00,04:00,
CW,+121106900,America/Curacao,,Canonical,04:00,04:00,
GL,+764601840,America/Danmarkshavn,National Park (east coast),Canonical,+00:00,+00:00,
CA,+640413925,America/Dawson,MST - Yukon (west),Canonical,07:00,07:00,
CA,+594612014,America/Dawson_Creek,"MST - BC (Dawson Cr, Ft St John)",Canonical,07:00,07:00,
US,+3944211045903,America/Denver,Mountain (most areas),Canonical,07:00,06:00,
US,+4219530830245,America/Detroit,Eastern - MI (most areas),Canonical,05:00,04:00,
DM,+151806124,America/Dominica,,Alias,04:00,04:00,Link to America/Port_of_Spain
CA,+533311328,America/Edmonton,Mountain - AB; BC (E); SK (W),Canonical,07:00,06:00,
BR,064006952,America/Eirunepe,Amazonas (west),Canonical,05:00,05:00,
SV,+134208912,America/El_Salvador,,Canonical,06:00,06:00,
MX,,America/Ensenada,,Deprecated,08:00,07:00,Link to America/Tijuana
CA,+584812242,America/Fort_Nelson,MST - BC (Ft Nelson),Canonical,07:00,07:00,
US,,America/Fort_Wayne,,Deprecated,05:00,04:00,Link to America/Indiana/Indianapolis
BR,034303830,America/Fortaleza,"Brazil (northeast: MA, PI, CE, RN, PB)",Canonical,03:00,03:00,
CA,+461205957,America/Glace_Bay,Atlantic - NS (Cape Breton),Canonical,04:00,03:00,
GL,+641105144,America/Godthab,,Deprecated,03:00,02:00,Link to America/Nuuk
CA,+532006025,America/Goose_Bay,Atlantic - Labrador (most areas),Canonical,04:00,03:00,
TC,+212807108,America/Grand_Turk,,Canonical,05:00,04:00,
GD,+120306145,America/Grenada,,Alias,04:00,04:00,Link to America/Port_of_Spain
GP,+161406132,America/Guadeloupe,,Alias,04:00,04:00,Link to America/Port_of_Spain
GT,+143809031,America/Guatemala,,Canonical,06:00,06:00,
EC,021007950,America/Guayaquil,Ecuador (mainland),Canonical,05:00,05:00,
GY,+064805810,America/Guyana,,Canonical,04:00,04:00,
CA,+443906336,America/Halifax,Atlantic - NS (most areas); PE,Canonical,04:00,03:00,
CU,+230808222,America/Havana,,Canonical,05:00,04:00,
MX,+290411058,America/Hermosillo,Mountain Standard Time - Sonora,Canonical,07:00,07:00,
US,+3946060860929,America/Indiana/Indianapolis,Eastern - IN (most areas),Canonical,05:00,04:00,
US,+4117450863730,America/Indiana/Knox,Central - IN (Starke),Canonical,06:00,05:00,
US,+3822320862041,America/Indiana/Marengo,Eastern - IN (Crawford),Canonical,05:00,04:00,
US,+3829310871643,America/Indiana/Petersburg,Eastern - IN (Pike),Canonical,05:00,04:00,
US,+3757110864541,America/Indiana/Tell_City,Central - IN (Perry),Canonical,06:00,05:00,
US,+3844520850402,America/Indiana/Vevay,Eastern - IN (Switzerland),Canonical,05:00,04:00,
US,+3840380873143,America/Indiana/Vincennes,"Eastern - IN (Da, Du, K, Mn)",Canonical,05:00,04:00,
US,+4103050863611,America/Indiana/Winamac,Eastern - IN (Pulaski),Canonical,05:00,04:00,
US,+3946060860929,America/Indianapolis,,Deprecated,05:00,04:00,Link to America/Indiana/Indianapolis
CA,+6820591334300,America/Inuvik,Mountain - NT (west),Canonical,07:00,06:00,
CA,+634406828,America/Iqaluit,Eastern - NU (most east areas),Canonical,05:00,04:00,
JM,+1758050764736,America/Jamaica,,Canonical,05:00,05:00,
AR,241106518,America/Jujuy,,Deprecated,03:00,03:00,Link to America/Argentina/Jujuy
US,+5818071342511,America/Juneau,Alaska - Juneau area,Canonical,09:00,08:00,
US,+3815150854534,America/Kentucky/Louisville,Eastern - KY (Louisville area),Canonical,05:00,04:00,
US,+3649470845057,America/Kentucky/Monticello,Eastern - KY (Wayne),Canonical,05:00,04:00,
US,+4117450863730,America/Knox_IN,,Deprecated,06:00,05:00,Link to America/Indiana/Knox
BQ,+1209030681636,America/Kralendijk,,Alias,04:00,04:00,Link to America/Curacao
BO,163006809,America/La_Paz,,Canonical,04:00,04:00,
PE,120307703,America/Lima,,Canonical,05:00,05:00,
US,+3403081181434,America/Los_Angeles,Pacific,Canonical,08:00,07:00,
US,+3815150854534,America/Louisville,,Deprecated,05:00,04:00,Link to America/Kentucky/Louisville
SX,+1803050630250,America/Lower_Princes,,Alias,04:00,04:00,Link to America/Curacao
BR,094003543,America/Maceio,"Alagoas, Sergipe",Canonical,03:00,03:00,
NI,+120908617,America/Managua,,Canonical,06:00,06:00,
BR,030806001,America/Manaus,Amazonas (east),Canonical,04:00,04:00,
MF,+180406305,America/Marigot,,Alias,04:00,04:00,Link to America/Port_of_Spain
MQ,+143606105,America/Martinique,,Canonical,04:00,04:00,
MX,+255009730,America/Matamoros,"Central Time US - Coahuila, Nuevo León, Tamaulipas (US border)",Canonical,06:00,05:00,
MX,+231310625,America/Mazatlan,"Mountain Time - Baja California Sur, Nayarit, Sinaloa",Canonical,07:00,06:00,
AR,325306849,America/Mendoza,,Deprecated,03:00,03:00,Link to America/Argentina/Mendoza
US,+4506280873651,America/Menominee,Central - MI (Wisconsin border),Canonical,06:00,05:00,
MX,+205808937,America/Merida,"Central Time - Campeche, Yucatán",Canonical,06:00,05:00,
US,+5507371313435,America/Metlakatla,Alaska - Annette Island,Canonical,09:00,08:00,
MX,+192409909,America/Mexico_City,Central Time,Canonical,06:00,05:00,
PM,+470305620,America/Miquelon,,Canonical,03:00,02:00,
CA,+460606447,America/Moncton,Atlantic - New Brunswick,Canonical,04:00,03:00,
MX,+254010019,America/Monterrey,"Central Time - Durango; Coahuila, Nuevo León, Tamaulipas (most areas)",Canonical,06:00,05:00,
UY,3454330561245,America/Montevideo,,Canonical,03:00,03:00,
CA,,America/Montreal,,Deprecated,05:00,04:00,Link to America/Toronto
MS,+164306213,America/Montserrat,,Alias,04:00,04:00,Link to America/Port_of_Spain
BS,+250507721,America/Nassau,,Canonical,05:00,04:00,
US,+4042510740023,America/New_York,Eastern (most areas),Canonical,05:00,04:00,
CA,+490108816,America/Nipigon,"Eastern - ON, QC (no DST 1967-73)",Canonical,05:00,04:00,
US,+6430041652423,America/Nome,Alaska (west),Canonical,09:00,08:00,
BR,035103225,America/Noronha,Atlantic islands,Canonical,02:00,02:00,
US,+4715511014640,America/North_Dakota/Beulah,Central - ND (Mercer),Canonical,06:00,05:00,
US,+4706591011757,America/North_Dakota/Center,Central - ND (Oliver),Canonical,06:00,05:00,
US,+4650421012439,America/North_Dakota/New_Salem,Central - ND (Morton rural),Canonical,06:00,05:00,
GL,+641105144,America/Nuuk,Greenland (most areas),Canonical,03:00,02:00,
MX,+293410425,America/Ojinaga,Mountain Time US - Chihuahua (US border),Canonical,07:00,06:00,
PA,+085807932,America/Panama,,Canonical,05:00,05:00,
CA,+660806544,America/Pangnirtung,Eastern - NU (Pangnirtung),Canonical,05:00,04:00,
SR,+055005510,America/Paramaribo,,Canonical,03:00,03:00,
US,+3326541120424,America/Phoenix,MST - Arizona (except Navajo),Canonical,07:00,07:00,
HT,+183207220,America/Port-au-Prince,,Canonical,05:00,04:00,
TT,+103906131,America/Port_of_Spain,,Canonical,04:00,04:00,
BR,,America/Porto_Acre,,Deprecated,05:00,05:00,Link to America/Rio_Branco
BR,084606354,America/Porto_Velho,Rondônia,Canonical,04:00,04:00,
PR,+1828060660622,America/Puerto_Rico,,Canonical,04:00,04:00,
CL,530907055,America/Punta_Arenas,Region of Magallanes,Canonical,03:00,03:00,Magallanes Region
CA,+484309434,America/Rainy_River,"Central - ON (Rainy R, Ft Frances)",Canonical,06:00,05:00,
CA,+6249000920459,America/Rankin_Inlet,Central - NU (central),Canonical,06:00,05:00,
BR,080303454,America/Recife,Pernambuco,Canonical,03:00,03:00,
CA,+502410439,America/Regina,CST - SK (most areas),Canonical,06:00,06:00,
CA,+7441440944945,America/Resolute,Central - NU (Resolute),Canonical,06:00,05:00,
BR,095806748,America/Rio_Branco,Acre,Canonical,05:00,05:00,
AR,,America/Rosario,,Deprecated,03:00,03:00,Link to America/Argentina/Cordoba
MX,,America/Santa_Isabel,,Deprecated,08:00,07:00,Link to America/Tijuana
BR,022605452,America/Santarem,Pará (west),Canonical,03:00,03:00,
CL,332707040,America/Santiago,Chile (most areas),Canonical,04:00,03:00,
DO,+182806954,America/Santo_Domingo,,Canonical,04:00,04:00,
BR,233204637,America/Sao_Paulo,"Brazil (southeast: GO, DF, MG, ES, RJ, SP, PR, SC, RS)",Canonical,03:00,03:00,
GL,+702902158,America/Scoresbysund,Scoresbysund/Ittoqqortoormiit,Canonical,01:00,+00:00,
US,,America/Shiprock,,Deprecated,07:00,06:00,Link to America/Denver
US,+5710351351807,America/Sitka,Alaska - Sitka area,Canonical,09:00,08:00,
BL,+175306251,America/St_Barthelemy,,Alias,04:00,04:00,Link to America/Port_of_Spain
CA,+473405243,America/St_Johns,Newfoundland; Labrador (southeast),Canonical,03:30,02:30,
KN,+171806243,America/St_Kitts,,Alias,04:00,04:00,Link to America/Port_of_Spain
LC,+140106100,America/St_Lucia,,Alias,04:00,04:00,Link to America/Port_of_Spain
VI,+182106456,America/St_Thomas,,Alias,04:00,04:00,Link to America/Port_of_Spain
VC,+130906114,America/St_Vincent,,Alias,04:00,04:00,Link to America/Port_of_Spain
CA,+501710750,America/Swift_Current,CST - SK (midwest),Canonical,06:00,06:00,
HN,+140608713,America/Tegucigalpa,,Canonical,06:00,06:00,
GL,+763406847,America/Thule,Thule/Pituffik,Canonical,04:00,03:00,
CA,+482308915,America/Thunder_Bay,Eastern - ON (Thunder Bay),Canonical,05:00,04:00,
MX,+323211701,America/Tijuana,Pacific Time US - Baja California,Canonical,08:00,07:00,
CA,+433907923,America/Toronto,"Eastern - ON, QC (most areas)",Canonical,05:00,04:00,
VG,+182706437,America/Tortola,,Alias,04:00,04:00,Link to America/Port_of_Spain
CA,+491612307,America/Vancouver,Pacific - BC (most areas),Canonical,08:00,07:00,
VI,,America/Virgin,,Deprecated,04:00,04:00,Link to America/Port_of_Spain
CA,+604313503,America/Whitehorse,MST - Yukon (east),Canonical,07:00,07:00,
CA,+495309709,America/Winnipeg,Central - ON (west); Manitoba,Canonical,06:00,05:00,
US,+5932491394338,America/Yakutat,Alaska - Yakutat,Canonical,09:00,08:00,
CA,+622711421,America/Yellowknife,Mountain - NT (central),Canonical,07:00,06:00,
AQ,6617+11031,Antarctica/Casey,Casey,Canonical,+11:00,+11:00,
AQ,6835+07758,Antarctica/Davis,Davis,Canonical,+07:00,+07:00,
AQ,6640+14001,Antarctica/DumontDUrville,Dumont-d'Urville,Canonical,+10:00,+10:00,
AU,5430+15857,Antarctica/Macquarie,Macquarie Island,Canonical,+10:00,+11:00,
AQ,6736+06253,Antarctica/Mawson,Mawson,Canonical,+05:00,+05:00,
AQ,7750+16636,Antarctica/McMurdo,"New Zealand time - McMurdo, South Pole",Alias,+12:00,+13:00,Link to Pacific/Auckland
AQ,644806406,Antarctica/Palmer,Palmer,Canonical,03:00,03:00,Chilean Antarctica Region
AQ,673406808,Antarctica/Rothera,Rothera,Canonical,03:00,03:00,
AQ,,Antarctica/South_Pole,,Deprecated,+12:00,+13:00,Link to Pacific/Auckland
AQ,690022+0393524,Antarctica/Syowa,Syowa,Canonical,+03:00,+03:00,
AQ,720041+0023206,Antarctica/Troll,Troll,Canonical,+00:00,+02:00,Previously used +01:00 for a brief period between standard and daylight time.[2]
AQ,7824+10654,Antarctica/Vostok,Vostok,Canonical,+06:00,+06:00,
SJ,+7800+01600,Arctic/Longyearbyen,,Alias,+01:00,+02:00,Link to Europe/Oslo
YE,+1245+04512,Asia/Aden,,Alias,+03:00,+03:00,Link to Asia/Riyadh
KZ,+4315+07657,Asia/Almaty,Kazakhstan (most areas),Canonical,+06:00,+06:00,
JO,+3157+03556,Asia/Amman,,Canonical,+02:00,+03:00,
RU,+6445+17729,Asia/Anadyr,MSK+09 - Bering Sea,Canonical,+12:00,+12:00,
KZ,+4431+05016,Asia/Aqtau,Mangghystaū/Mankistau,Canonical,+05:00,+05:00,
KZ,+5017+05710,Asia/Aqtobe,Aqtöbe/Aktobe,Canonical,+05:00,+05:00,
TM,+3757+05823,Asia/Ashgabat,,Canonical,+05:00,+05:00,
TM,+3757+05823,Asia/Ashkhabad,,Deprecated,+05:00,+05:00,Link to Asia/Ashgabat
KZ,+4707+05156,Asia/Atyrau,Atyraū/Atirau/Gur'yev,Canonical,+05:00,+05:00,
IQ,+3321+04425,Asia/Baghdad,,Canonical,+03:00,+03:00,
BH,+2623+05035,Asia/Bahrain,,Alias,+03:00,+03:00,Link to Asia/Qatar
AZ,+4023+04951,Asia/Baku,,Canonical,+04:00,+04:00,
TH,+1345+10031,Asia/Bangkok,Indochina (most areas),Canonical,+07:00,+07:00,
RU,+5322+08345,Asia/Barnaul,MSK+04 - Altai,Canonical,+07:00,+07:00,
LB,+3353+03530,Asia/Beirut,,Canonical,+02:00,+03:00,
KG,+4254+07436,Asia/Bishkek,,Canonical,+06:00,+06:00,
BN,+0456+11455,Asia/Brunei,,Canonical,+08:00,+08:00,
IN,+2232+08822,Asia/Calcutta,,Deprecated,+05:30,+05:30,Link to Asia/Kolkata
RU,+5203+11328,Asia/Chita,MSK+06 - Zabaykalsky,Canonical,+09:00,+09:00,
MN,+4804+11430,Asia/Choibalsan,"Dornod, Sükhbaatar",Canonical,+08:00,+08:00,
CN,,Asia/Chongqing,,Deprecated,+08:00,+08:00,Link to Asia/Shanghai
CN,,Asia/Chungking,,Deprecated,+08:00,+08:00,Link to Asia/Shanghai
LK,+0656+07951,Asia/Colombo,,Canonical,+05:30,+05:30,
BD,+2343+09025,Asia/Dacca,,Deprecated,+06:00,+06:00,Link to Asia/Dhaka
SY,+3330+03618,Asia/Damascus,,Canonical,+02:00,+03:00,
BD,+2343+09025,Asia/Dhaka,,Canonical,+06:00,+06:00,
TL,0833+12535,Asia/Dili,,Canonical,+09:00,+09:00,
AE,+2518+05518,Asia/Dubai,,Canonical,+04:00,+04:00,
TJ,+3835+06848,Asia/Dushanbe,,Canonical,+05:00,+05:00,
CY,+3507+03357,Asia/Famagusta,Northern Cyprus,Canonical,+02:00,+03:00,
PS,+3130+03428,Asia/Gaza,Gaza Strip,Canonical,+02:00,+03:00,
CN,,Asia/Harbin,,Deprecated,+08:00,+08:00,Link to Asia/Shanghai
PS,+313200+0350542,Asia/Hebron,West Bank,Canonical,+02:00,+03:00,
VN,+1045+10640,Asia/Ho_Chi_Minh,Vietnam (south),Canonical,+07:00,+07:00,
HK,+2217+11409,Asia/Hong_Kong,,Canonical,+08:00,+08:00,
MN,+4801+09139,Asia/Hovd,"Bayan-Ölgii, Govi-Altai, Hovd, Uvs, Zavkhan",Canonical,+07:00,+07:00,
RU,+5216+10420,Asia/Irkutsk,"MSK+05 - Irkutsk, Buryatia",Canonical,+08:00,+08:00,
TR,+4101+02858,Asia/Istanbul,,Alias,+03:00,+03:00,Link to Europe/Istanbul
ID,0610+10648,Asia/Jakarta,"Java, Sumatra",Canonical,+07:00,+07:00,
ID,0232+14042,Asia/Jayapura,New Guinea (West Papua / Irian Jaya); Malukus/Moluccas,Canonical,+09:00,+09:00,
IL,+314650+0351326,Asia/Jerusalem,,Canonical,+02:00,+03:00,
AF,+3431+06912,Asia/Kabul,,Canonical,+04:30,+04:30,
RU,+5301+15839,Asia/Kamchatka,MSK+09 - Kamchatka,Canonical,+12:00,+12:00,
PK,+2452+06703,Asia/Karachi,,Canonical,+05:00,+05:00,
CN,,Asia/Kashgar,,Deprecated,+06:00,+06:00,Link to Asia/Urumqi[note 1]
NP,+2743+08519,Asia/Kathmandu,,Canonical,+05:45,+05:45,
NP,+2743+08519,Asia/Katmandu,,Deprecated,+05:45,+05:45,Link to Asia/Kathmandu
RU,+623923+1353314,Asia/Khandyga,"MSK+06 - Tomponsky, Ust-Maysky",Canonical,+09:00,+09:00,
IN,+2232+08822,Asia/Kolkata,,Canonical,+05:30,+05:30,"Note: Different zones in history, see Time in India."
RU,+5601+09250,Asia/Krasnoyarsk,MSK+04 - Krasnoyarsk area,Canonical,+07:00,+07:00,
MY,+0310+10142,Asia/Kuala_Lumpur,Malaysia (peninsula),Canonical,+08:00,+08:00,
MY,+0133+11020,Asia/Kuching,"Sabah, Sarawak",Canonical,+08:00,+08:00,
KW,+2920+04759,Asia/Kuwait,,Alias,+03:00,+03:00,Link to Asia/Riyadh
MO,+221150+1133230,Asia/Macao,,Deprecated,+08:00,+08:00,Link to Asia/Macau
MO,+221150+1133230,Asia/Macau,,Canonical,+08:00,+08:00,
RU,+5934+15048,Asia/Magadan,MSK+08 - Magadan,Canonical,+11:00,+11:00,
ID,0507+11924,Asia/Makassar,"Borneo (east, south); Sulawesi/Celebes, Bali, Nusa Tengarra; Timor (west)",Canonical,+08:00,+08:00,
PH,+1435+12100,Asia/Manila,,Canonical,+08:00,+08:00,
OM,+2336+05835,Asia/Muscat,,Alias,+04:00,+04:00,Link to Asia/Dubai
CY,+3510+03322,Asia/Nicosia,Cyprus (most areas),Canonical,+02:00,+03:00,
RU,+5345+08707,Asia/Novokuznetsk,MSK+04 - Kemerovo,Canonical,+07:00,+07:00,
RU,+5502+08255,Asia/Novosibirsk,MSK+04 - Novosibirsk,Canonical,+07:00,+07:00,
RU,+5500+07324,Asia/Omsk,MSK+03 - Omsk,Canonical,+06:00,+06:00,
KZ,+5113+05121,Asia/Oral,West Kazakhstan,Canonical,+05:00,+05:00,
KH,+1133+10455,Asia/Phnom_Penh,,Alias,+07:00,+07:00,Link to Asia/Bangkok
ID,0002+10920,Asia/Pontianak,"Borneo (west, central)",Canonical,+07:00,+07:00,
KP,+3901+12545,Asia/Pyongyang,,Canonical,+09:00,+09:00,
QA,+2517+05132,Asia/Qatar,,Canonical,+03:00,+03:00,
KZ,+5312+06337,Asia/Qostanay,Qostanay/Kostanay/Kustanay,Canonical,+06:00,+06:00,
KZ,+4448+06528,Asia/Qyzylorda,Qyzylorda/Kyzylorda/Kzyl-Orda,Canonical,+05:00,+05:00,
MM,,Asia/Rangoon,,Deprecated,+06:30,+06:30,Link to Asia/Yangon
SA,+2438+04643,Asia/Riyadh,,Canonical,+03:00,+03:00,
VN,,Asia/Saigon,,Deprecated,+07:00,+07:00,Link to Asia/Ho_Chi_Minh
RU,+4658+14242,Asia/Sakhalin,MSK+08 - Sakhalin Island,Canonical,+11:00,+11:00,
UZ,+3940+06648,Asia/Samarkand,Uzbekistan (west),Canonical,+05:00,+05:00,
KR,+3733+12658,Asia/Seoul,,Canonical,+09:00,+09:00,
CN,+3114+12128,Asia/Shanghai,Beijing Time,Canonical,+08:00,+08:00,
SG,+0117+10351,Asia/Singapore,,Canonical,+08:00,+08:00,
RU,+6728+15343,Asia/Srednekolymsk,MSK+08 - Sakha (E); North Kuril Is,Canonical,+11:00,+11:00,
TW,+2503+12130,Asia/Taipei,,Canonical,+08:00,+08:00,
UZ,+4120+06918,Asia/Tashkent,Uzbekistan (east),Canonical,+05:00,+05:00,
GE,+4143+04449,Asia/Tbilisi,,Canonical,+04:00,+04:00,
IR,+3540+05126,Asia/Tehran,,Canonical,+03:30,+04:30,
IL,,Asia/Tel_Aviv,,Deprecated,+02:00,+03:00,Link to Asia/Jerusalem
BT,+2728+08939,Asia/Thimbu,,Deprecated,+06:00,+06:00,Link to Asia/Thimphu
BT,+2728+08939,Asia/Thimphu,,Canonical,+06:00,+06:00,
JP,+353916+1394441,Asia/Tokyo,,Canonical,+09:00,+09:00,
RU,+5630+08458,Asia/Tomsk,MSK+04 - Tomsk,Canonical,+07:00,+07:00,
ID,,Asia/Ujung_Pandang,,Deprecated,+08:00,+08:00,Link to Asia/Makassar
MN,+4755+10653,Asia/Ulaanbaatar,Mongolia (most areas),Canonical,+08:00,+08:00,
MN,,Asia/Ulan_Bator,,Deprecated,+08:00,+08:00,Link to Asia/Ulaanbaatar
CN,+4348+08735,Asia/Urumqi,Xinjiang Time,Canonical,+06:00,+06:00,The Asia/Urumqi entry in the tz database reflected the use of Xinjiang Time by part of the local population. Consider using Asia/Shanghai for Beijing Time if that is preferred.
RU,+643337+1431336,Asia/Ust-Nera,MSK+07 - Oymyakonsky,Canonical,+10:00,+10:00,
LA,+1758+10236,Asia/Vientiane,,Alias,+07:00,+07:00,Link to Asia/Bangkok
RU,+4310+13156,Asia/Vladivostok,MSK+07 - Amur River,Canonical,+10:00,+10:00,
RU,+6200+12940,Asia/Yakutsk,MSK+06 - Lena River,Canonical,+09:00,+09:00,
MM,+1647+09610,Asia/Yangon,,Canonical,+06:30,+06:30,
RU,+5651+06036,Asia/Yekaterinburg,MSK+02 - Urals,Canonical,+05:00,+05:00,
AM,+4011+04430,Asia/Yerevan,,Canonical,+04:00,+04:00,
PT,+374402540,Atlantic/Azores,Azores,Canonical,01:00,+00:00,
BM,+321706446,Atlantic/Bermuda,,Canonical,04:00,03:00,
ES,+280601524,Atlantic/Canary,Canary Islands,Canonical,+00:00,+01:00,
CV,+145502331,Atlantic/Cape_Verde,,Canonical,01:00,01:00,
FO,+620100646,Atlantic/Faeroe,,Deprecated,+00:00,+01:00,Link to Atlantic/Faroe
FO,+620100646,Atlantic/Faroe,,Canonical,+00:00,+01:00,
SJ,,Atlantic/Jan_Mayen,,Deprecated,+01:00,+02:00,Link to Europe/Oslo
PT,+323801654,Atlantic/Madeira,Madeira Islands,Canonical,+00:00,+01:00,
IS,+640902151,Atlantic/Reykjavik,,Canonical,+00:00,+00:00,
GS,541603632,Atlantic/South_Georgia,,Canonical,02:00,02:00,
SH,155500542,Atlantic/St_Helena,,Alias,+00:00,+00:00,Link to Africa/Abidjan
FK,514205751,Atlantic/Stanley,,Canonical,03:00,03:00,
AU,,Australia/ACT,,Deprecated,+10:00,+11:00,Link to Australia/Sydney
AU,3455+13835,Australia/Adelaide,South Australia,Canonical,+09:30,+10:30,
AU,2728+15302,Australia/Brisbane,Queensland (most areas),Canonical,+10:00,+10:00,
AU,3157+14127,Australia/Broken_Hill,New South Wales (Yancowinna),Canonical,+09:30,+10:30,
AU,,Australia/Canberra,,Deprecated,+10:00,+11:00,Link to Australia/Sydney
AU,,Australia/Currie,,Deprecated,+10:00,+11:00,Link to Australia/Hobart
AU,1228+13050,Australia/Darwin,Northern Territory,Canonical,+09:30,+09:30,
AU,3143+12852,Australia/Eucla,Western Australia (Eucla),Canonical,+08:45,+08:45,
AU,4253+14719,Australia/Hobart,Tasmania,Canonical,+10:00,+11:00,
AU,,Australia/LHI,,Deprecated,+10:30,+11:00,Link to Australia/Lord_Howe
AU,2016+14900,Australia/Lindeman,Queensland (Whitsunday Islands),Canonical,+10:00,+10:00,
AU,3133+15905,Australia/Lord_Howe,Lord Howe Island,Canonical,+10:30,+11:00,This is the only time zone in the world that uses 30-minute DST transitions.
AU,3749+14458,Australia/Melbourne,Victoria,Canonical,+10:00,+11:00,
AU,,Australia/North,,Deprecated,+09:30,+09:30,Link to Australia/Darwin
AU,,Australia/NSW,,Deprecated,+10:00,+11:00,Link to Australia/Sydney
AU,3157+11551,Australia/Perth,Western Australia (most areas),Canonical,+08:00,+08:00,
AU,,Australia/Queensland,,Deprecated,+10:00,+10:00,Link to Australia/Brisbane
AU,,Australia/South,,Deprecated,+09:30,+10:30,Link to Australia/Adelaide
AU,3352+15113,Australia/Sydney,New South Wales (most areas),Canonical,+10:00,+11:00,
AU,,Australia/Tasmania,,Deprecated,+10:00,+11:00,Link to Australia/Hobart
AU,,Australia/Victoria,,Deprecated,+10:00,+11:00,Link to Australia/Melbourne
AU,,Australia/West,,Deprecated,+08:00,+08:00,Link to Australia/Perth
AU,,Australia/Yancowinna,,Deprecated,+09:30,+10:30,Link to Australia/Broken_Hill
BR,,Brazil/Acre,,Deprecated,05:00,05:00,Link to America/Rio_Branco
BR,,Brazil/DeNoronha,,Deprecated,02:00,02:00,Link to America/Noronha
BR,,Brazil/East,,Deprecated,03:00,03:00,Link to America/Sao_Paulo
BR,,Brazil/West,,Deprecated,04:00,04:00,Link to America/Manaus
CA,,Canada/Atlantic,,Deprecated,04:00,03:00,Link to America/Halifax
CA,,Canada/Central,,Deprecated,06:00,05:00,Link to America/Winnipeg
CA,,Canada/Eastern,,Deprecated,05:00,04:00,Link to America/Toronto
CA,,Canada/Mountain,,Deprecated,07:00,06:00,Link to America/Edmonton
CA,,Canada/Newfoundland,,Deprecated,03:30,02:30,Link to America/St_Johns
CA,,Canada/Pacific,,Deprecated,08:00,07:00,Link to America/Vancouver
CA,,Canada/Saskatchewan,,Deprecated,06:00,06:00,Link to America/Regina
CA,,Canada/Yukon,,Deprecated,07:00,07:00,Link to America/Whitehorse
,,CET,,Deprecated,+01:00,+02:00,"Choose a zone that observes CET, such as Europe/Paris."
CL,,Chile/Continental,,Deprecated,04:00,03:00,Link to America/Santiago
CL,,Chile/EasterIsland,,Deprecated,06:00,05:00,Link to Pacific/Easter
,,CST6CDT,,Deprecated,06:00,05:00,"Choose a zone that observes CST with United States daylight saving time rules, such as America/Chicago."
CU,,Cuba,,Deprecated,05:00,04:00,Link to America/Havana
,,EET,,Deprecated,+02:00,+03:00,"Choose a zone that observes EET, such as Europe/Sofia."
EG,,Egypt,,Deprecated,+02:00,+02:00,Link to Africa/Cairo
IE,,Eire,,Deprecated,+01:00,+00:00,Link to Europe/Dublin
,,EST,,Deprecated,05:00,05:00,"Choose a zone that currently observes EST without daylight saving time, such as America/Cancun."
,,EST5EDT,,Deprecated,05:00,04:00,"Choose a zone that observes EST with United States daylight saving time rules, such as America/New_York."
,,Etc/GMT,,Canonical,+00:00,+00:00,
,,Etc/GMT+0,,Alias,+00:00,+00:00,Link to Etc/GMT
,,Etc/GMT+1,,Canonical,01:00,01:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT+10,,Canonical,10:00,10:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT+11,,Canonical,11:00,11:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT+12,,Canonical,12:00,12:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT+2,,Canonical,02:00,02:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT+3,,Canonical,03:00,03:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT+4,,Canonical,04:00,04:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT+5,,Canonical,05:00,05:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT+6,,Canonical,06:00,06:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT+7,,Canonical,07:00,07:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT+8,,Canonical,08:00,08:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT+9,,Canonical,09:00,09:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT-0,,Alias,+00:00,+00:00,Link to Etc/GMT
,,Etc/GMT-1,,Canonical,+01:00,+01:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT-10,,Canonical,+10:00,+10:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT-11,,Canonical,+11:00,+11:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT-12,,Canonical,+12:00,+12:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT-13,,Canonical,+13:00,+13:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT-14,,Canonical,+14:00,+14:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT-2,,Canonical,+02:00,+02:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT-3,,Canonical,+03:00,+03:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT-4,,Canonical,+04:00,+04:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT-5,,Canonical,+05:00,+05:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT-6,,Canonical,+06:00,+06:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT-7,,Canonical,+07:00,+07:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT-8,,Canonical,+08:00,+08:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT-9,,Canonical,+09:00,+09:00,Sign is intentionally inverted. See the Etc area description.
,,Etc/GMT0,,Alias,+00:00,+00:00,Link to Etc/GMT
,,Etc/Greenwich,,Deprecated,+00:00,+00:00,Link to Etc/GMT
,,Etc/UCT,,Deprecated,+00:00,+00:00,Link to Etc/UTC
,,Etc/Universal,,Deprecated,+00:00,+00:00,Link to Etc/UTC
,,Etc/UTC,,Canonical,+00:00,+00:00,
,,Etc/Zulu,,Deprecated,+00:00,+00:00,Link to Etc/UTC
NL,+5222+00454,Europe/Amsterdam,,Canonical,+01:00,+02:00,
AD,+4230+00131,Europe/Andorra,,Canonical,+01:00,+02:00,
RU,+4621+04803,Europe/Astrakhan,MSK+01 - Astrakhan,Canonical,+04:00,+04:00,
GR,+3758+02343,Europe/Athens,,Canonical,+02:00,+03:00,
GB,,Europe/Belfast,,Deprecated,+00:00,+01:00,Link to Europe/London
RS,+4450+02030,Europe/Belgrade,,Canonical,+01:00,+02:00,
DE,+5230+01322,Europe/Berlin,Germany (most areas),Canonical,+01:00,+02:00,"In 1945, the Trizone did not follow Berlin's switch to DST, see Time in Germany"
SK,+4809+01707,Europe/Bratislava,,Alias,+01:00,+02:00,Link to Europe/Prague
BE,+5050+00420,Europe/Brussels,,Canonical,+01:00,+02:00,
RO,+4426+02606,Europe/Bucharest,,Canonical,+02:00,+03:00,
HU,+4730+01905,Europe/Budapest,,Canonical,+01:00,+02:00,
DE,+4742+00841,Europe/Busingen,Busingen,Alias,+01:00,+02:00,Link to Europe/Zurich
MD,+4700+02850,Europe/Chisinau,,Canonical,+02:00,+03:00,
DK,+5540+01235,Europe/Copenhagen,,Canonical,+01:00,+02:00,
IE,+532000615,Europe/Dublin,,Canonical,+01:00,+00:00,
GI,+360800521,Europe/Gibraltar,,Canonical,+01:00,+02:00,
GG,+4927170023210,Europe/Guernsey,,Alias,+00:00,+01:00,Link to Europe/London
FI,+6010+02458,Europe/Helsinki,,Canonical,+02:00,+03:00,
IM,+540900428,Europe/Isle_of_Man,,Alias,+00:00,+01:00,Link to Europe/London
TR,+4101+02858,Europe/Istanbul,,Canonical,+03:00,+03:00,
JE,+4911010020624,Europe/Jersey,,Alias,+00:00,+01:00,Link to Europe/London
RU,+5443+02030,Europe/Kaliningrad,MSK-01 - Kaliningrad,Canonical,+02:00,+02:00,
UA,+5026+03031,Europe/Kiev,Ukraine (most areas),Canonical,+02:00,+03:00,
RU,+5836+04939,Europe/Kirov,MSK+00 - Kirov,Canonical,+03:00,+03:00,
PT,+384300908,Europe/Lisbon,Portugal (mainland),Canonical,+00:00,+01:00,
SI,+4603+01431,Europe/Ljubljana,,Alias,+01:00,+02:00,Link to Europe/Belgrade
GB,+5130300000731,Europe/London,,Canonical,+00:00,+01:00,
LU,+4936+00609,Europe/Luxembourg,,Canonical,+01:00,+02:00,
ES,+402400341,Europe/Madrid,Spain (mainland),Canonical,+01:00,+02:00,
MT,+3554+01431,Europe/Malta,,Canonical,+01:00,+02:00,
AX,+6006+01957,Europe/Mariehamn,,Alias,+02:00,+03:00,Link to Europe/Helsinki
BY,+5354+02734,Europe/Minsk,,Canonical,+03:00,+03:00,
MC,+4342+00723,Europe/Monaco,,Canonical,+01:00,+02:00,
RU,+554521+0373704,Europe/Moscow,MSK+00 - Moscow area,Canonical,+03:00,+03:00,
CY,+3510+03322,Europe/Nicosia,,Alias,+02:00,+03:00,Link to Asia/Nicosia
NO,+5955+01045,Europe/Oslo,,Canonical,+01:00,+02:00,
FR,+4852+00220,Europe/Paris,,Canonical,+01:00,+02:00,
ME,+4226+01916,Europe/Podgorica,,Alias,+01:00,+02:00,Link to Europe/Belgrade
CZ,+5005+01426,Europe/Prague,,Canonical,+01:00,+02:00,
LV,+5657+02406,Europe/Riga,,Canonical,+02:00,+03:00,
IT,+4154+01229,Europe/Rome,,Canonical,+01:00,+02:00,
RU,+5312+05009,Europe/Samara,"MSK+01 - Samara, Udmurtia",Canonical,+04:00,+04:00,
SM,+4355+01228,Europe/San_Marino,,Alias,+01:00,+02:00,Link to Europe/Rome
BA,+4352+01825,Europe/Sarajevo,,Alias,+01:00,+02:00,Link to Europe/Belgrade
RU,+5134+04602,Europe/Saratov,MSK+01 - Saratov,Canonical,+04:00,+04:00,
UA,+4457+03406,Europe/Simferopol,Crimea,Canonical,+03:00,+03:00,Disputed - Reflects data in the TZDB.[note 2]
MK,+4159+02126,Europe/Skopje,,Alias,+01:00,+02:00,Link to Europe/Belgrade
BG,+4241+02319,Europe/Sofia,,Canonical,+02:00,+03:00,
SE,+5920+01803,Europe/Stockholm,,Canonical,+01:00,+02:00,
EE,+5925+02445,Europe/Tallinn,,Canonical,+02:00,+03:00,
AL,+4120+01950,Europe/Tirane,,Canonical,+01:00,+02:00,
MD,,Europe/Tiraspol,,Deprecated,+02:00,+03:00,Link to Europe/Chisinau
RU,+5420+04824,Europe/Ulyanovsk,MSK+01 - Ulyanovsk,Canonical,+04:00,+04:00,
UA,+4837+02218,Europe/Uzhgorod,Transcarpathia,Canonical,+02:00,+03:00,
LI,+4709+00931,Europe/Vaduz,,Alias,+01:00,+02:00,Link to Europe/Zurich
VA,+415408+0122711,Europe/Vatican,,Alias,+01:00,+02:00,Link to Europe/Rome
AT,+4813+01620,Europe/Vienna,,Canonical,+01:00,+02:00,
LT,+5441+02519,Europe/Vilnius,,Canonical,+02:00,+03:00,
RU,+4844+04425,Europe/Volgograd,MSK+00 - Volgograd,Canonical,+03:00,+03:00,
PL,+5215+02100,Europe/Warsaw,,Canonical,+01:00,+02:00,
HR,+4548+01558,Europe/Zagreb,,Alias,+01:00,+02:00,Link to Europe/Belgrade
UA,+4750+03510,Europe/Zaporozhye,Zaporozhye and east Lugansk,Canonical,+02:00,+03:00,
CH,+4723+00832,Europe/Zurich,Swiss time,Canonical,+01:00,+02:00,
,,Factory,,Canonical,+00:00,+00:00,
GB,,GB,,Deprecated,+00:00,+01:00,Link to Europe/London
GB,,GB-Eire,,Deprecated,+00:00,+01:00,Link to Europe/London
,,GMT,,Alias,+00:00,+00:00,Link to Etc/GMT
,,GMT+0,,Deprecated,+00:00,+00:00,Link to Etc/GMT
,,GMT-0,,Deprecated,+00:00,+00:00,Link to Etc/GMT
,,GMT0,,Deprecated,+00:00,+00:00,Link to Etc/GMT
,,Greenwich,,Deprecated,+00:00,+00:00,Link to Etc/GMT
HK,+2217+11409,Hongkong,,Deprecated,+08:00,+08:00,Link to Asia/Hong_Kong
,,HST,,Deprecated,10:00,10:00,"Choose a zone that currently observes HST without daylight saving time, such as Pacific/Honolulu."
IS,,Iceland,,Deprecated,+00:00,+00:00,Link to Atlantic/Reykjavik
MG,1855+04731,Indian/Antananarivo,,Alias,+03:00,+03:00,Link to Africa/Nairobi
IO,0720+07225,Indian/Chagos,,Canonical,+06:00,+06:00,
CX,1025+10543,Indian/Christmas,,Canonical,+07:00,+07:00,
CC,1210+09655,Indian/Cocos,,Canonical,+06:30,+06:30,
KM,1141+04316,Indian/Comoro,,Alias,+03:00,+03:00,Link to Africa/Nairobi
TF,492110+0701303,Indian/Kerguelen,"Kerguelen, St Paul Island, Amsterdam Island",Canonical,+05:00,+05:00,
SC,0440+05528,Indian/Mahe,,Canonical,+04:00,+04:00,
MV,+0410+07330,Indian/Maldives,,Canonical,+05:00,+05:00,
MU,2010+05730,Indian/Mauritius,,Canonical,+04:00,+04:00,
YT,1247+04514,Indian/Mayotte,,Alias,+03:00,+03:00,Link to Africa/Nairobi
RE,2052+05528,Indian/Reunion,"Réunion, Crozet, Scattered Islands",Canonical,+04:00,+04:00,
IR,,Iran,,Deprecated,+03:30,+04:30,Link to Asia/Tehran
IL,,Israel,,Deprecated,+02:00,+03:00,Link to Asia/Jerusalem
JM,+1758050764736,Jamaica,,Deprecated,05:00,05:00,Link to America/Jamaica
JP,,Japan,,Deprecated,+09:00,+09:00,Link to Asia/Tokyo
MH,+0905+16720,Kwajalein,,Deprecated,+12:00,+12:00,Link to Pacific/Kwajalein
LY,,Libya,,Deprecated,+02:00,+02:00,Link to Africa/Tripoli
,,MET,,Deprecated,+01:00,+02:00,"Choose a zone that observes MET (sames as CET), such as Europe/Paris."
MX,,Mexico/BajaNorte,,Deprecated,08:00,07:00,Link to America/Tijuana
MX,,Mexico/BajaSur,,Deprecated,07:00,06:00,Link to America/Mazatlan
MX,,Mexico/General,,Deprecated,06:00,05:00,Link to America/Mexico_City
,,MST,,Deprecated,07:00,07:00,"Choose a zone that currently observes MST without daylight saving time, such as America/Phoenix."
,,MST7MDT,,Deprecated,07:00,06:00,"Choose a zone that observes MST with United States daylight saving time rules, such as America/Denver."
US,,Navajo,,Deprecated,07:00,06:00,Link to America/Denver
NZ,,NZ,,Deprecated,+12:00,+13:00,Link to Pacific/Auckland
NZ,,NZ-CHAT,,Deprecated,+12:45,+13:45,Link to Pacific/Chatham
WS,135017144,Pacific/Apia,,Canonical,+13:00,+14:00,
NZ,3652+17446,Pacific/Auckland,New Zealand time,Canonical,+12:00,+13:00,
PG,0613+15534,Pacific/Bougainville,Bougainville,Canonical,+11:00,+11:00,
NZ,435717633,Pacific/Chatham,Chatham Islands,Canonical,+12:45,+13:45,
FM,+0725+15147,Pacific/Chuuk,"Chuuk/Truk, Yap",Canonical,+10:00,+10:00,
CL,270910926,Pacific/Easter,Easter Island,Canonical,06:00,05:00,
VU,1740+16825,Pacific/Efate,,Canonical,+11:00,+11:00,
KI,030817105,Pacific/Enderbury,Phoenix Islands,Canonical,+13:00,+13:00,
TK,092217114,Pacific/Fakaofo,,Canonical,+13:00,+13:00,
FJ,1808+17825,Pacific/Fiji,,Canonical,+12:00,+13:00,
TV,0831+17913,Pacific/Funafuti,,Canonical,+12:00,+12:00,
EC,005408936,Pacific/Galapagos,Galápagos Islands,Canonical,06:00,06:00,
PF,230813457,Pacific/Gambier,Gambier Islands,Canonical,09:00,09:00,
SB,0932+16012,Pacific/Guadalcanal,,Canonical,+11:00,+11:00,
GU,+1328+14445,Pacific/Guam,,Canonical,+10:00,+10:00,
US,+2118251575130,Pacific/Honolulu,Hawaii,Canonical,10:00,10:00,
UM,,Pacific/Johnston,,Deprecated,10:00,10:00,Link to Pacific/Honolulu
KI,+015215720,Pacific/Kiritimati,Line Islands,Canonical,+14:00,+14:00,
FM,+0519+16259,Pacific/Kosrae,Kosrae,Canonical,+11:00,+11:00,
MH,+0905+16720,Pacific/Kwajalein,Kwajalein,Canonical,+12:00,+12:00,
MH,+0709+17112,Pacific/Majuro,Marshall Islands (most areas),Canonical,+12:00,+12:00,
PF,090013930,Pacific/Marquesas,Marquesas Islands,Canonical,09:30,09:30,
UM,+281317722,Pacific/Midway,Midway Islands,Alias,11:00,11:00,Link to Pacific/Pago_Pago
NR,0031+16655,Pacific/Nauru,,Canonical,+12:00,+12:00,
NU,190116955,Pacific/Niue,,Canonical,11:00,11:00,
NF,2903+16758,Pacific/Norfolk,,Canonical,+11:00,+12:00,
NC,2216+16627,Pacific/Noumea,,Canonical,+11:00,+11:00,
AS,141617042,Pacific/Pago_Pago,"Samoa, Midway",Canonical,11:00,11:00,
PW,+0720+13429,Pacific/Palau,,Canonical,+09:00,+09:00,
PN,250413005,Pacific/Pitcairn,,Canonical,08:00,08:00,
FM,+0658+15813,Pacific/Pohnpei,Pohnpei/Ponape,Canonical,+11:00,+11:00,
FM,,Pacific/Ponape,,Deprecated,+11:00,+11:00,Link to Pacific/Pohnpei
PG,0930+14710,Pacific/Port_Moresby,Papua New Guinea (most areas),Canonical,+10:00,+10:00,
CK,211415946,Pacific/Rarotonga,,Canonical,10:00,10:00,
MP,+1512+14545,Pacific/Saipan,,Alias,+10:00,+10:00,Link to Pacific/Guam
WS,,Pacific/Samoa,,Deprecated,11:00,11:00,Link to Pacific/Pago_Pago
PF,173214934,Pacific/Tahiti,Society Islands,Canonical,10:00,10:00,
KI,+0125+17300,Pacific/Tarawa,Gilbert Islands,Canonical,+12:00,+12:00,
TO,211017510,Pacific/Tongatapu,,Canonical,+13:00,+13:00,
FM,,Pacific/Truk,,Deprecated,+10:00,+10:00,Link to Pacific/Chuuk
UM,+1917+16637,Pacific/Wake,Wake Island,Canonical,+12:00,+12:00,
WF,131817610,Pacific/Wallis,,Canonical,+12:00,+12:00,
FM,,Pacific/Yap,,Deprecated,+10:00,+10:00,Link to Pacific/Chuuk
PL,,Poland,,Deprecated,+01:00,+02:00,Link to Europe/Warsaw
PT,,Portugal,,Deprecated,+00:00,+01:00,Link to Europe/Lisbon
CN,,PRC,,Deprecated,+08:00,+08:00,Link to Asia/Shanghai
,,PST8PDT,,Deprecated,08:00,07:00,"Choose a zone that observes PST with United States daylight saving time rules, such as America/Los_Angeles."
TW,,ROC,,Deprecated,+08:00,+08:00,Link to Asia/Taipei
KR,,ROK,,Deprecated,+09:00,+09:00,Link to Asia/Seoul
SG,+0117+10351,Singapore,,Deprecated,+08:00,+08:00,Link to Asia/Singapore
TR,,Turkey,,Deprecated,+03:00,+03:00,Link to Europe/Istanbul
,,UCT,,Deprecated,+00:00,+00:00,Link to Etc/UTC
,,Universal,,Deprecated,+00:00,+00:00,Link to Etc/UTC
US,,US/Alaska,,Deprecated,09:00,08:00,Link to America/Anchorage
US,,US/Aleutian,,Deprecated,10:00,09:00,Link to America/Adak
US,,US/Arizona,,Deprecated,07:00,07:00,Link to America/Phoenix
US,,US/Central,,Deprecated,06:00,05:00,Link to America/Chicago
US,,US/East-Indiana,,Deprecated,05:00,04:00,Link to America/Indiana/Indianapolis
US,,US/Eastern,,Deprecated,05:00,04:00,Link to America/New_York
US,,US/Hawaii,,Deprecated,10:00,10:00,Link to Pacific/Honolulu
US,,US/Indiana-Starke,,Deprecated,06:00,05:00,Link to America/Indiana/Knox
US,,US/Michigan,,Deprecated,05:00,04:00,Link to America/Detroit
US,,US/Mountain,,Deprecated,07:00,06:00,Link to America/Denver
US,,US/Pacific,,Deprecated,08:00,07:00,Link to America/Los_Angeles
WS,,US/Samoa,,Deprecated,11:00,11:00,Link to Pacific/Pago_Pago
,,UTC,,Alias,+00:00,+00:00,Link to Etc/UTC
RU,,W-SU,,Deprecated,+03:00,+03:00,Link to Europe/Moscow
,,WET,,Deprecated,+00:00,+01:00,"Choose a zone that observes WET, such as Europe/Lisbon."
,,Zulu,,Deprecated,+00:00,+00:00,Link to Etc/UTC
1 Country code Latitude, longitude ±DDMM(SS)±DDDMM(SS) TZ database name Portion of country covered Status UTC offset ±hh:mm UTC DST offset ±hh:mm Notes
2 CI +0519−00402 Africa/Abidjan Canonical +00:00 +00:00
3 GH +0533−00013 Africa/Accra Canonical +00:00 +00:00
4 ET +0902+03842 Africa/Addis_Ababa Alias +03:00 +03:00 Link to Africa/Nairobi
5 DZ +3647+00303 Africa/Algiers Canonical +01:00 +01:00
6 ER +1520+03853 Africa/Asmara Alias +03:00 +03:00 Link to Africa/Nairobi
7 ER +1520+03853 Africa/Asmera Deprecated +03:00 +03:00 Link to Africa/Nairobi
8 ML +1239−00800 Africa/Bamako Alias +00:00 +00:00 Link to Africa/Abidjan
9 CF +0422+01835 Africa/Bangui Alias +01:00 +01:00 Link to Africa/Lagos
10 GM +1328−01639 Africa/Banjul Alias +00:00 +00:00 Link to Africa/Abidjan
11 GW +1151−01535 Africa/Bissau Canonical +00:00 +00:00
12 MW −1547+03500 Africa/Blantyre Alias +02:00 +02:00 Link to Africa/Maputo
13 CG −0416+01517 Africa/Brazzaville Alias +01:00 +01:00 Link to Africa/Lagos
14 BI −0323+02922 Africa/Bujumbura Alias +02:00 +02:00 Link to Africa/Maputo
15 EG +3003+03115 Africa/Cairo Canonical +02:00 +02:00
16 MA +3339−00735 Africa/Casablanca Canonical +01:00 +00:00
17 ES +3553−00519 Africa/Ceuta Ceuta, Melilla Canonical +01:00 +02:00
18 GN +0931−01343 Africa/Conakry Alias +00:00 +00:00 Link to Africa/Abidjan
19 SN +1440−01726 Africa/Dakar Alias +00:00 +00:00 Link to Africa/Abidjan
20 TZ −0648+03917 Africa/Dar_es_Salaam Alias +03:00 +03:00 Link to Africa/Nairobi
21 DJ +1136+04309 Africa/Djibouti Alias +03:00 +03:00 Link to Africa/Nairobi
22 CM +0403+00942 Africa/Douala Alias +01:00 +01:00 Link to Africa/Lagos
23 EH +2709−01312 Africa/El_Aaiun Canonical +01:00 +00:00
24 SL +0830−01315 Africa/Freetown Alias +00:00 +00:00 Link to Africa/Abidjan
25 BW −2439+02555 Africa/Gaborone Alias +02:00 +02:00 Link to Africa/Maputo
26 ZW −1750+03103 Africa/Harare Alias +02:00 +02:00 Link to Africa/Maputo
27 ZA −2615+02800 Africa/Johannesburg Canonical +02:00 +02:00
28 SS +0451+03137 Africa/Juba Canonical +02:00 +02:00
29 UG +0019+03225 Africa/Kampala Alias +03:00 +03:00 Link to Africa/Nairobi
30 SD +1536+03232 Africa/Khartoum Canonical +02:00 +02:00
31 RW −0157+03004 Africa/Kigali Alias +02:00 +02:00 Link to Africa/Maputo
32 CD −0418+01518 Africa/Kinshasa Dem. Rep. of Congo (west) Alias +01:00 +01:00 Link to Africa/Lagos
33 NG +0627+00324 Africa/Lagos West Africa Time Canonical +01:00 +01:00
34 GA +0023+00927 Africa/Libreville Alias +01:00 +01:00 Link to Africa/Lagos
35 TG +0608+00113 Africa/Lome Alias +00:00 +00:00 Link to Africa/Abidjan
36 AO −0848+01314 Africa/Luanda Alias +01:00 +01:00 Link to Africa/Lagos
37 CD −1140+02728 Africa/Lubumbashi Dem. Rep. of Congo (east) Alias +02:00 +02:00 Link to Africa/Maputo
38 ZM −1525+02817 Africa/Lusaka Alias +02:00 +02:00 Link to Africa/Maputo
39 GQ +0345+00847 Africa/Malabo Alias +01:00 +01:00 Link to Africa/Lagos
40 MZ −2558+03235 Africa/Maputo Central Africa Time Canonical +02:00 +02:00
41 LS −2928+02730 Africa/Maseru Alias +02:00 +02:00 Link to Africa/Johannesburg
42 SZ −2618+03106 Africa/Mbabane Alias +02:00 +02:00 Link to Africa/Johannesburg
43 SO +0204+04522 Africa/Mogadishu Alias +03:00 +03:00 Link to Africa/Nairobi
44 LR +0618−01047 Africa/Monrovia Canonical +00:00 +00:00
45 KE −0117+03649 Africa/Nairobi Canonical +03:00 +03:00
46 TD +1207+01503 Africa/Ndjamena Canonical +01:00 +01:00
47 NE +1331+00207 Africa/Niamey Alias +01:00 +01:00 Link to Africa/Lagos
48 MR +1806−01557 Africa/Nouakchott Alias +00:00 +00:00 Link to Africa/Abidjan
49 BF +1222−00131 Africa/Ouagadougou Alias +00:00 +00:00 Link to Africa/Abidjan
50 BJ +0629+00237 Africa/Porto-Novo Alias +01:00 +01:00 Link to Africa/Lagos
51 ST +0020+00644 Africa/Sao_Tome Canonical +00:00 +00:00
52 ML Africa/Timbuktu Deprecated +00:00 +00:00 Link to Africa/Abidjan
53 LY +3254+01311 Africa/Tripoli Canonical +02:00 +02:00
54 TN +3648+01011 Africa/Tunis Canonical +01:00 +01:00
55 NA −2234+01706 Africa/Windhoek Canonical +02:00 +02:00
56 US +515248−1763929 America/Adak Aleutian Islands Canonical −10:00 −09:00
57 US +611305−1495401 America/Anchorage Alaska (most areas) Canonical −09:00 −08:00
58 AI +1812−06304 America/Anguilla Alias −04:00 −04:00 Link to America/Port_of_Spain
59 AG +1703−06148 America/Antigua Alias −04:00 −04:00 Link to America/Port_of_Spain
60 BR −0712−04812 America/Araguaina Tocantins Canonical −03:00 −03:00
61 AR −3436−05827 America/Argentina/Buenos_Aires Buenos Aires (BA, CF) Canonical −03:00 −03:00
62 AR −2828−06547 America/Argentina/Catamarca Catamarca (CT); Chubut (CH) Canonical −03:00 −03:00
63 AR America/Argentina/ComodRivadavia Deprecated −03:00 −03:00 Link to America/Argentina/Catamarca
64 AR −3124−06411 America/Argentina/Cordoba Argentina (most areas: CB, CC, CN, ER, FM, MN, SE, SF) Canonical −03:00 −03:00
65 AR −2411−06518 America/Argentina/Jujuy Jujuy (JY) Canonical −03:00 −03:00
66 AR −2926−06651 America/Argentina/La_Rioja La Rioja (LR) Canonical −03:00 −03:00
67 AR −3253−06849 America/Argentina/Mendoza Mendoza (MZ) Canonical −03:00 −03:00
68 AR −5138−06913 America/Argentina/Rio_Gallegos Santa Cruz (SC) Canonical −03:00 −03:00
69 AR −2447−06525 America/Argentina/Salta Salta (SA, LP, NQ, RN) Canonical −03:00 −03:00
70 AR −3132−06831 America/Argentina/San_Juan San Juan (SJ) Canonical −03:00 −03:00
71 AR −3319−06621 America/Argentina/San_Luis San Luis (SL) Canonical −03:00 −03:00
72 AR −2649−06513 America/Argentina/Tucuman Tucumán (TM) Canonical −03:00 −03:00
73 AR −5448−06818 America/Argentina/Ushuaia Tierra del Fuego (TF) Canonical −03:00 −03:00
74 AW +1230−06958 America/Aruba Alias −04:00 −04:00 Link to America/Curacao
75 PY −2516−05740 America/Asuncion Canonical −04:00 −03:00
76 CA +484531−0913718 America/Atikokan EST - ON (Atikokan); NU (Coral H) Canonical −05:00 −05:00
77 US America/Atka Deprecated −10:00 −09:00 Link to America/Adak
78 BR −1259−03831 America/Bahia Bahia Canonical −03:00 −03:00
79 MX +2048−10515 America/Bahia_Banderas Central Time - Bahía de Banderas Canonical −06:00 −05:00
80 BB +1306−05937 America/Barbados Canonical −04:00 −04:00
81 BR −0127−04829 America/Belem Pará (east); Amapá Canonical −03:00 −03:00
82 BZ +1730−08812 America/Belize Canonical −06:00 −06:00
83 CA +5125−05707 America/Blanc-Sablon AST - QC (Lower North Shore) Canonical −04:00 −04:00
84 BR +0249−06040 America/Boa_Vista Roraima Canonical −04:00 −04:00
85 CO +0436−07405 America/Bogota Canonical −05:00 −05:00
86 US +433649−1161209 America/Boise Mountain - ID (south); OR (east) Canonical −07:00 −06:00
87 AR −3436−05827 America/Buenos_Aires Deprecated −03:00 −03:00 Link to America/Argentina/Buenos_Aires
88 CA +690650−1050310 America/Cambridge_Bay Mountain - NU (west) Canonical −07:00 −06:00
89 BR −2027−05437 America/Campo_Grande Mato Grosso do Sul Canonical −04:00 −04:00
90 MX +2105−08646 America/Cancun Eastern Standard Time - Quintana Roo Canonical −05:00 −05:00
91 VE +1030−06656 America/Caracas Canonical −04:00 −04:00
92 AR −2828−06547 America/Catamarca Deprecated −03:00 −03:00 Link to America/Argentina/Catamarca
93 GF +0456−05220 America/Cayenne Canonical −03:00 −03:00
94 KY +1918−08123 America/Cayman Alias −05:00 −05:00 Link to America/Panama
95 US +415100−0873900 America/Chicago Central (most areas) Canonical −06:00 −05:00
96 MX +2838−10605 America/Chihuahua Mountain Time - Chihuahua (most areas) Canonical −07:00 −06:00
97 CA America/Coral_Harbour Deprecated −05:00 −05:00 Link to America/Atikokan
98 AR −3124−06411 America/Cordoba Deprecated −03:00 −03:00 Link to America/Argentina/Cordoba
99 CR +0956−08405 America/Costa_Rica Canonical −06:00 −06:00
100 CA +4906−11631 America/Creston MST - BC (Creston) Canonical −07:00 −07:00
101 BR −1535−05605 America/Cuiaba Mato Grosso Canonical −04:00 −04:00
102 CW +1211−06900 America/Curacao Canonical −04:00 −04:00
103 GL +7646−01840 America/Danmarkshavn National Park (east coast) Canonical +00:00 +00:00
104 CA +6404−13925 America/Dawson MST - Yukon (west) Canonical −07:00 −07:00
105 CA +5946−12014 America/Dawson_Creek MST - BC (Dawson Cr, Ft St John) Canonical −07:00 −07:00
106 US +394421−1045903 America/Denver Mountain (most areas) Canonical −07:00 −06:00
107 US +421953−0830245 America/Detroit Eastern - MI (most areas) Canonical −05:00 −04:00
108 DM +1518−06124 America/Dominica Alias −04:00 −04:00 Link to America/Port_of_Spain
109 CA +5333−11328 America/Edmonton Mountain - AB; BC (E); SK (W) Canonical −07:00 −06:00
110 BR −0640−06952 America/Eirunepe Amazonas (west) Canonical −05:00 −05:00
111 SV +1342−08912 America/El_Salvador Canonical −06:00 −06:00
112 MX America/Ensenada Deprecated −08:00 −07:00 Link to America/Tijuana
113 CA +5848−12242 America/Fort_Nelson MST - BC (Ft Nelson) Canonical −07:00 −07:00
114 US America/Fort_Wayne Deprecated −05:00 −04:00 Link to America/Indiana/Indianapolis
115 BR −0343−03830 America/Fortaleza Brazil (northeast: MA, PI, CE, RN, PB) Canonical −03:00 −03:00
116 CA +4612−05957 America/Glace_Bay Atlantic - NS (Cape Breton) Canonical −04:00 −03:00
117 GL +6411−05144 America/Godthab Deprecated −03:00 −02:00 Link to America/Nuuk
118 CA +5320−06025 America/Goose_Bay Atlantic - Labrador (most areas) Canonical −04:00 −03:00
119 TC +2128−07108 America/Grand_Turk Canonical −05:00 −04:00
120 GD +1203−06145 America/Grenada Alias −04:00 −04:00 Link to America/Port_of_Spain
121 GP +1614−06132 America/Guadeloupe Alias −04:00 −04:00 Link to America/Port_of_Spain
122 GT +1438−09031 America/Guatemala Canonical −06:00 −06:00
123 EC −0210−07950 America/Guayaquil Ecuador (mainland) Canonical −05:00 −05:00
124 GY +0648−05810 America/Guyana Canonical −04:00 −04:00
125 CA +4439−06336 America/Halifax Atlantic - NS (most areas); PE Canonical −04:00 −03:00
126 CU +2308−08222 America/Havana Canonical −05:00 −04:00
127 MX +2904−11058 America/Hermosillo Mountain Standard Time - Sonora Canonical −07:00 −07:00
128 US +394606−0860929 America/Indiana/Indianapolis Eastern - IN (most areas) Canonical −05:00 −04:00
129 US +411745−0863730 America/Indiana/Knox Central - IN (Starke) Canonical −06:00 −05:00
130 US +382232−0862041 America/Indiana/Marengo Eastern - IN (Crawford) Canonical −05:00 −04:00
131 US +382931−0871643 America/Indiana/Petersburg Eastern - IN (Pike) Canonical −05:00 −04:00
132 US +375711−0864541 America/Indiana/Tell_City Central - IN (Perry) Canonical −06:00 −05:00
133 US +384452−0850402 America/Indiana/Vevay Eastern - IN (Switzerland) Canonical −05:00 −04:00
134 US +384038−0873143 America/Indiana/Vincennes Eastern - IN (Da, Du, K, Mn) Canonical −05:00 −04:00
135 US +410305−0863611 America/Indiana/Winamac Eastern - IN (Pulaski) Canonical −05:00 −04:00
136 US +394606−0860929 America/Indianapolis Deprecated −05:00 −04:00 Link to America/Indiana/Indianapolis
137 CA +682059−1334300 America/Inuvik Mountain - NT (west) Canonical −07:00 −06:00
138 CA +6344−06828 America/Iqaluit Eastern - NU (most east areas) Canonical −05:00 −04:00
139 JM +175805−0764736 America/Jamaica Canonical −05:00 −05:00
140 AR −2411−06518 America/Jujuy Deprecated −03:00 −03:00 Link to America/Argentina/Jujuy
141 US +581807−1342511 America/Juneau Alaska - Juneau area Canonical −09:00 −08:00
142 US +381515−0854534 America/Kentucky/Louisville Eastern - KY (Louisville area) Canonical −05:00 −04:00
143 US +364947−0845057 America/Kentucky/Monticello Eastern - KY (Wayne) Canonical −05:00 −04:00
144 US +411745−0863730 America/Knox_IN Deprecated −06:00 −05:00 Link to America/Indiana/Knox
145 BQ +120903−0681636 America/Kralendijk Alias −04:00 −04:00 Link to America/Curacao
146 BO −1630−06809 America/La_Paz Canonical −04:00 −04:00
147 PE −1203−07703 America/Lima Canonical −05:00 −05:00
148 US +340308−1181434 America/Los_Angeles Pacific Canonical −08:00 −07:00
149 US +381515−0854534 America/Louisville Deprecated −05:00 −04:00 Link to America/Kentucky/Louisville
150 SX +180305−0630250 America/Lower_Princes Alias −04:00 −04:00 Link to America/Curacao
151 BR −0940−03543 America/Maceio Alagoas, Sergipe Canonical −03:00 −03:00
152 NI +1209−08617 America/Managua Canonical −06:00 −06:00
153 BR −0308−06001 America/Manaus Amazonas (east) Canonical −04:00 −04:00
154 MF +1804−06305 America/Marigot Alias −04:00 −04:00 Link to America/Port_of_Spain
155 MQ +1436−06105 America/Martinique Canonical −04:00 −04:00
156 MX +2550−09730 America/Matamoros Central Time US - Coahuila, Nuevo León, Tamaulipas (US border) Canonical −06:00 −05:00
157 MX +2313−10625 America/Mazatlan Mountain Time - Baja California Sur, Nayarit, Sinaloa Canonical −07:00 −06:00
158 AR −3253−06849 America/Mendoza Deprecated −03:00 −03:00 Link to America/Argentina/Mendoza
159 US +450628−0873651 America/Menominee Central - MI (Wisconsin border) Canonical −06:00 −05:00
160 MX +2058−08937 America/Merida Central Time - Campeche, Yucatán Canonical −06:00 −05:00
161 US +550737−1313435 America/Metlakatla Alaska - Annette Island Canonical −09:00 −08:00
162 MX +1924−09909 America/Mexico_City Central Time Canonical −06:00 −05:00
163 PM +4703−05620 America/Miquelon Canonical −03:00 −02:00
164 CA +4606−06447 America/Moncton Atlantic - New Brunswick Canonical −04:00 −03:00
165 MX +2540−10019 America/Monterrey Central Time - Durango; Coahuila, Nuevo León, Tamaulipas (most areas) Canonical −06:00 −05:00
166 UY −345433−0561245 America/Montevideo Canonical −03:00 −03:00
167 CA America/Montreal Deprecated −05:00 −04:00 Link to America/Toronto
168 MS +1643−06213 America/Montserrat Alias −04:00 −04:00 Link to America/Port_of_Spain
169 BS +2505−07721 America/Nassau Canonical −05:00 −04:00
170 US +404251−0740023 America/New_York Eastern (most areas) Canonical −05:00 −04:00
171 CA +4901−08816 America/Nipigon Eastern - ON, QC (no DST 1967-73) Canonical −05:00 −04:00
172 US +643004−1652423 America/Nome Alaska (west) Canonical −09:00 −08:00
173 BR −0351−03225 America/Noronha Atlantic islands Canonical −02:00 −02:00
174 US +471551−1014640 America/North_Dakota/Beulah Central - ND (Mercer) Canonical −06:00 −05:00
175 US +470659−1011757 America/North_Dakota/Center Central - ND (Oliver) Canonical −06:00 −05:00
176 US +465042−1012439 America/North_Dakota/New_Salem Central - ND (Morton rural) Canonical −06:00 −05:00
177 GL +6411−05144 America/Nuuk Greenland (most areas) Canonical −03:00 −02:00
178 MX +2934−10425 America/Ojinaga Mountain Time US - Chihuahua (US border) Canonical −07:00 −06:00
179 PA +0858−07932 America/Panama Canonical −05:00 −05:00
180 CA +6608−06544 America/Pangnirtung Eastern - NU (Pangnirtung) Canonical −05:00 −04:00
181 SR +0550−05510 America/Paramaribo Canonical −03:00 −03:00
182 US +332654−1120424 America/Phoenix MST - Arizona (except Navajo) Canonical −07:00 −07:00
183 HT +1832−07220 America/Port-au-Prince Canonical −05:00 −04:00
184 TT +1039−06131 America/Port_of_Spain Canonical −04:00 −04:00
185 BR America/Porto_Acre Deprecated −05:00 −05:00 Link to America/Rio_Branco
186 BR −0846−06354 America/Porto_Velho Rondônia Canonical −04:00 −04:00
187 PR +182806−0660622 America/Puerto_Rico Canonical −04:00 −04:00
188 CL −5309−07055 America/Punta_Arenas Region of Magallanes Canonical −03:00 −03:00 Magallanes Region
189 CA +4843−09434 America/Rainy_River Central - ON (Rainy R, Ft Frances) Canonical −06:00 −05:00
190 CA +624900−0920459 America/Rankin_Inlet Central - NU (central) Canonical −06:00 −05:00
191 BR −0803−03454 America/Recife Pernambuco Canonical −03:00 −03:00
192 CA +5024−10439 America/Regina CST - SK (most areas) Canonical −06:00 −06:00
193 CA +744144−0944945 America/Resolute Central - NU (Resolute) Canonical −06:00 −05:00
194 BR −0958−06748 America/Rio_Branco Acre Canonical −05:00 −05:00
195 AR America/Rosario Deprecated −03:00 −03:00 Link to America/Argentina/Cordoba
196 MX America/Santa_Isabel Deprecated −08:00 −07:00 Link to America/Tijuana
197 BR −0226−05452 America/Santarem Pará (west) Canonical −03:00 −03:00
198 CL −3327−07040 America/Santiago Chile (most areas) Canonical −04:00 −03:00
199 DO +1828−06954 America/Santo_Domingo Canonical −04:00 −04:00
200 BR −2332−04637 America/Sao_Paulo Brazil (southeast: GO, DF, MG, ES, RJ, SP, PR, SC, RS) Canonical −03:00 −03:00
201 GL +7029−02158 America/Scoresbysund Scoresbysund/Ittoqqortoormiit Canonical −01:00 +00:00
202 US America/Shiprock Deprecated −07:00 −06:00 Link to America/Denver
203 US +571035−1351807 America/Sitka Alaska - Sitka area Canonical −09:00 −08:00
204 BL +1753−06251 America/St_Barthelemy Alias −04:00 −04:00 Link to America/Port_of_Spain
205 CA +4734−05243 America/St_Johns Newfoundland; Labrador (southeast) Canonical −03:30 −02:30
206 KN +1718−06243 America/St_Kitts Alias −04:00 −04:00 Link to America/Port_of_Spain
207 LC +1401−06100 America/St_Lucia Alias −04:00 −04:00 Link to America/Port_of_Spain
208 VI +1821−06456 America/St_Thomas Alias −04:00 −04:00 Link to America/Port_of_Spain
209 VC +1309−06114 America/St_Vincent Alias −04:00 −04:00 Link to America/Port_of_Spain
210 CA +5017−10750 America/Swift_Current CST - SK (midwest) Canonical −06:00 −06:00
211 HN +1406−08713 America/Tegucigalpa Canonical −06:00 −06:00
212 GL +7634−06847 America/Thule Thule/Pituffik Canonical −04:00 −03:00
213 CA +4823−08915 America/Thunder_Bay Eastern - ON (Thunder Bay) Canonical −05:00 −04:00
214 MX +3232−11701 America/Tijuana Pacific Time US - Baja California Canonical −08:00 −07:00
215 CA +4339−07923 America/Toronto Eastern - ON, QC (most areas) Canonical −05:00 −04:00
216 VG +1827−06437 America/Tortola Alias −04:00 −04:00 Link to America/Port_of_Spain
217 CA +4916−12307 America/Vancouver Pacific - BC (most areas) Canonical −08:00 −07:00
218 VI America/Virgin Deprecated −04:00 −04:00 Link to America/Port_of_Spain
219 CA +6043−13503 America/Whitehorse MST - Yukon (east) Canonical −07:00 −07:00
220 CA +4953−09709 America/Winnipeg Central - ON (west); Manitoba Canonical −06:00 −05:00
221 US +593249−1394338 America/Yakutat Alaska - Yakutat Canonical −09:00 −08:00
222 CA +6227−11421 America/Yellowknife Mountain - NT (central) Canonical −07:00 −06:00
223 AQ −6617+11031 Antarctica/Casey Casey Canonical +11:00 +11:00
224 AQ −6835+07758 Antarctica/Davis Davis Canonical +07:00 +07:00
225 AQ −6640+14001 Antarctica/DumontDUrville Dumont-d'Urville Canonical +10:00 +10:00
226 AU −5430+15857 Antarctica/Macquarie Macquarie Island Canonical +10:00 +11:00
227 AQ −6736+06253 Antarctica/Mawson Mawson Canonical +05:00 +05:00
228 AQ −7750+16636 Antarctica/McMurdo New Zealand time - McMurdo, South Pole Alias +12:00 +13:00 Link to Pacific/Auckland
229 AQ −6448−06406 Antarctica/Palmer Palmer Canonical −03:00 −03:00 Chilean Antarctica Region
230 AQ −6734−06808 Antarctica/Rothera Rothera Canonical −03:00 −03:00
231 AQ Antarctica/South_Pole Deprecated +12:00 +13:00 Link to Pacific/Auckland
232 AQ −690022+0393524 Antarctica/Syowa Syowa Canonical +03:00 +03:00
233 AQ −720041+0023206 Antarctica/Troll Troll Canonical +00:00 +02:00 Previously used +01:00 for a brief period between standard and daylight time.[2]
234 AQ −7824+10654 Antarctica/Vostok Vostok Canonical +06:00 +06:00
235 SJ +7800+01600 Arctic/Longyearbyen Alias +01:00 +02:00 Link to Europe/Oslo
236 YE +1245+04512 Asia/Aden Alias +03:00 +03:00 Link to Asia/Riyadh
237 KZ +4315+07657 Asia/Almaty Kazakhstan (most areas) Canonical +06:00 +06:00
238 JO +3157+03556 Asia/Amman Canonical +02:00 +03:00
239 RU +6445+17729 Asia/Anadyr MSK+09 - Bering Sea Canonical +12:00 +12:00
240 KZ +4431+05016 Asia/Aqtau Mangghystaū/Mankistau Canonical +05:00 +05:00
241 KZ +5017+05710 Asia/Aqtobe Aqtöbe/Aktobe Canonical +05:00 +05:00
242 TM +3757+05823 Asia/Ashgabat Canonical +05:00 +05:00
243 TM +3757+05823 Asia/Ashkhabad Deprecated +05:00 +05:00 Link to Asia/Ashgabat
244 KZ +4707+05156 Asia/Atyrau Atyraū/Atirau/Gur'yev Canonical +05:00 +05:00
245 IQ +3321+04425 Asia/Baghdad Canonical +03:00 +03:00
246 BH +2623+05035 Asia/Bahrain Alias +03:00 +03:00 Link to Asia/Qatar
247 AZ +4023+04951 Asia/Baku Canonical +04:00 +04:00
248 TH +1345+10031 Asia/Bangkok Indochina (most areas) Canonical +07:00 +07:00
249 RU +5322+08345 Asia/Barnaul MSK+04 - Altai Canonical +07:00 +07:00
250 LB +3353+03530 Asia/Beirut Canonical +02:00 +03:00
251 KG +4254+07436 Asia/Bishkek Canonical +06:00 +06:00
252 BN +0456+11455 Asia/Brunei Canonical +08:00 +08:00
253 IN +2232+08822 Asia/Calcutta Deprecated +05:30 +05:30 Link to Asia/Kolkata
254 RU +5203+11328 Asia/Chita MSK+06 - Zabaykalsky Canonical +09:00 +09:00
255 MN +4804+11430 Asia/Choibalsan Dornod, Sükhbaatar Canonical +08:00 +08:00
256 CN Asia/Chongqing Deprecated +08:00 +08:00 Link to Asia/Shanghai
257 CN Asia/Chungking Deprecated +08:00 +08:00 Link to Asia/Shanghai
258 LK +0656+07951 Asia/Colombo Canonical +05:30 +05:30
259 BD +2343+09025 Asia/Dacca Deprecated +06:00 +06:00 Link to Asia/Dhaka
260 SY +3330+03618 Asia/Damascus Canonical +02:00 +03:00
261 BD +2343+09025 Asia/Dhaka Canonical +06:00 +06:00
262 TL −0833+12535 Asia/Dili Canonical +09:00 +09:00
263 AE +2518+05518 Asia/Dubai Canonical +04:00 +04:00
264 TJ +3835+06848 Asia/Dushanbe Canonical +05:00 +05:00
265 CY +3507+03357 Asia/Famagusta Northern Cyprus Canonical +02:00 +03:00
266 PS +3130+03428 Asia/Gaza Gaza Strip Canonical +02:00 +03:00
267 CN Asia/Harbin Deprecated +08:00 +08:00 Link to Asia/Shanghai
268 PS +313200+0350542 Asia/Hebron West Bank Canonical +02:00 +03:00
269 VN +1045+10640 Asia/Ho_Chi_Minh Vietnam (south) Canonical +07:00 +07:00
270 HK +2217+11409 Asia/Hong_Kong Canonical +08:00 +08:00
271 MN +4801+09139 Asia/Hovd Bayan-Ölgii, Govi-Altai, Hovd, Uvs, Zavkhan Canonical +07:00 +07:00
272 RU +5216+10420 Asia/Irkutsk MSK+05 - Irkutsk, Buryatia Canonical +08:00 +08:00
273 TR +4101+02858 Asia/Istanbul Alias +03:00 +03:00 Link to Europe/Istanbul
274 ID −0610+10648 Asia/Jakarta Java, Sumatra Canonical +07:00 +07:00
275 ID −0232+14042 Asia/Jayapura New Guinea (West Papua / Irian Jaya); Malukus/Moluccas Canonical +09:00 +09:00
276 IL +314650+0351326 Asia/Jerusalem Canonical +02:00 +03:00
277 AF +3431+06912 Asia/Kabul Canonical +04:30 +04:30
278 RU +5301+15839 Asia/Kamchatka MSK+09 - Kamchatka Canonical +12:00 +12:00
279 PK +2452+06703 Asia/Karachi Canonical +05:00 +05:00
280 CN Asia/Kashgar Deprecated +06:00 +06:00 Link to Asia/Urumqi[note 1]
281 NP +2743+08519 Asia/Kathmandu Canonical +05:45 +05:45
282 NP +2743+08519 Asia/Katmandu Deprecated +05:45 +05:45 Link to Asia/Kathmandu
283 RU +623923+1353314 Asia/Khandyga MSK+06 - Tomponsky, Ust-Maysky Canonical +09:00 +09:00
284 IN +2232+08822 Asia/Kolkata Canonical +05:30 +05:30 Note: Different zones in history, see Time in India.
285 RU +5601+09250 Asia/Krasnoyarsk MSK+04 - Krasnoyarsk area Canonical +07:00 +07:00
286 MY +0310+10142 Asia/Kuala_Lumpur Malaysia (peninsula) Canonical +08:00 +08:00
287 MY +0133+11020 Asia/Kuching Sabah, Sarawak Canonical +08:00 +08:00
288 KW +2920+04759 Asia/Kuwait Alias +03:00 +03:00 Link to Asia/Riyadh
289 MO +221150+1133230 Asia/Macao Deprecated +08:00 +08:00 Link to Asia/Macau
290 MO +221150+1133230 Asia/Macau Canonical +08:00 +08:00
291 RU +5934+15048 Asia/Magadan MSK+08 - Magadan Canonical +11:00 +11:00
292 ID −0507+11924 Asia/Makassar Borneo (east, south); Sulawesi/Celebes, Bali, Nusa Tengarra; Timor (west) Canonical +08:00 +08:00
293 PH +1435+12100 Asia/Manila Canonical +08:00 +08:00
294 OM +2336+05835 Asia/Muscat Alias +04:00 +04:00 Link to Asia/Dubai
295 CY +3510+03322 Asia/Nicosia Cyprus (most areas) Canonical +02:00 +03:00
296 RU +5345+08707 Asia/Novokuznetsk MSK+04 - Kemerovo Canonical +07:00 +07:00
297 RU +5502+08255 Asia/Novosibirsk MSK+04 - Novosibirsk Canonical +07:00 +07:00
298 RU +5500+07324 Asia/Omsk MSK+03 - Omsk Canonical +06:00 +06:00
299 KZ +5113+05121 Asia/Oral West Kazakhstan Canonical +05:00 +05:00
300 KH +1133+10455 Asia/Phnom_Penh Alias +07:00 +07:00 Link to Asia/Bangkok
301 ID −0002+10920 Asia/Pontianak Borneo (west, central) Canonical +07:00 +07:00
302 KP +3901+12545 Asia/Pyongyang Canonical +09:00 +09:00
303 QA +2517+05132 Asia/Qatar Canonical +03:00 +03:00
304 KZ +5312+06337 Asia/Qostanay Qostanay/Kostanay/Kustanay Canonical +06:00 +06:00
305 KZ +4448+06528 Asia/Qyzylorda Qyzylorda/Kyzylorda/Kzyl-Orda Canonical +05:00 +05:00
306 MM Asia/Rangoon Deprecated +06:30 +06:30 Link to Asia/Yangon
307 SA +2438+04643 Asia/Riyadh Canonical +03:00 +03:00
308 VN Asia/Saigon Deprecated +07:00 +07:00 Link to Asia/Ho_Chi_Minh
309 RU +4658+14242 Asia/Sakhalin MSK+08 - Sakhalin Island Canonical +11:00 +11:00
310 UZ +3940+06648 Asia/Samarkand Uzbekistan (west) Canonical +05:00 +05:00
311 KR +3733+12658 Asia/Seoul Canonical +09:00 +09:00
312 CN +3114+12128 Asia/Shanghai Beijing Time Canonical +08:00 +08:00
313 SG +0117+10351 Asia/Singapore Canonical +08:00 +08:00
314 RU +6728+15343 Asia/Srednekolymsk MSK+08 - Sakha (E); North Kuril Is Canonical +11:00 +11:00
315 TW +2503+12130 Asia/Taipei Canonical +08:00 +08:00
316 UZ +4120+06918 Asia/Tashkent Uzbekistan (east) Canonical +05:00 +05:00
317 GE +4143+04449 Asia/Tbilisi Canonical +04:00 +04:00
318 IR +3540+05126 Asia/Tehran Canonical +03:30 +04:30
319 IL Asia/Tel_Aviv Deprecated +02:00 +03:00 Link to Asia/Jerusalem
320 BT +2728+08939 Asia/Thimbu Deprecated +06:00 +06:00 Link to Asia/Thimphu
321 BT +2728+08939 Asia/Thimphu Canonical +06:00 +06:00
322 JP +353916+1394441 Asia/Tokyo Canonical +09:00 +09:00
323 RU +5630+08458 Asia/Tomsk MSK+04 - Tomsk Canonical +07:00 +07:00
324 ID Asia/Ujung_Pandang Deprecated +08:00 +08:00 Link to Asia/Makassar
325 MN +4755+10653 Asia/Ulaanbaatar Mongolia (most areas) Canonical +08:00 +08:00
326 MN Asia/Ulan_Bator Deprecated +08:00 +08:00 Link to Asia/Ulaanbaatar
327 CN +4348+08735 Asia/Urumqi Xinjiang Time Canonical +06:00 +06:00 The Asia/Urumqi entry in the tz database reflected the use of Xinjiang Time by part of the local population. Consider using Asia/Shanghai for Beijing Time if that is preferred.
328 RU +643337+1431336 Asia/Ust-Nera MSK+07 - Oymyakonsky Canonical +10:00 +10:00
329 LA +1758+10236 Asia/Vientiane Alias +07:00 +07:00 Link to Asia/Bangkok
330 RU +4310+13156 Asia/Vladivostok MSK+07 - Amur River Canonical +10:00 +10:00
331 RU +6200+12940 Asia/Yakutsk MSK+06 - Lena River Canonical +09:00 +09:00
332 MM +1647+09610 Asia/Yangon Canonical +06:30 +06:30
333 RU +5651+06036 Asia/Yekaterinburg MSK+02 - Urals Canonical +05:00 +05:00
334 AM +4011+04430 Asia/Yerevan Canonical +04:00 +04:00
335 PT +3744−02540 Atlantic/Azores Azores Canonical −01:00 +00:00
336 BM +3217−06446 Atlantic/Bermuda Canonical −04:00 −03:00
337 ES +2806−01524 Atlantic/Canary Canary Islands Canonical +00:00 +01:00
338 CV +1455−02331 Atlantic/Cape_Verde Canonical −01:00 −01:00
339 FO +6201−00646 Atlantic/Faeroe Deprecated +00:00 +01:00 Link to Atlantic/Faroe
340 FO +6201−00646 Atlantic/Faroe Canonical +00:00 +01:00
341 SJ Atlantic/Jan_Mayen Deprecated +01:00 +02:00 Link to Europe/Oslo
342 PT +3238−01654 Atlantic/Madeira Madeira Islands Canonical +00:00 +01:00
343 IS +6409−02151 Atlantic/Reykjavik Canonical +00:00 +00:00
344 GS −5416−03632 Atlantic/South_Georgia Canonical −02:00 −02:00
345 SH −1555−00542 Atlantic/St_Helena Alias +00:00 +00:00 Link to Africa/Abidjan
346 FK −5142−05751 Atlantic/Stanley Canonical −03:00 −03:00
347 AU Australia/ACT Deprecated +10:00 +11:00 Link to Australia/Sydney
348 AU −3455+13835 Australia/Adelaide South Australia Canonical +09:30 +10:30
349 AU −2728+15302 Australia/Brisbane Queensland (most areas) Canonical +10:00 +10:00
350 AU −3157+14127 Australia/Broken_Hill New South Wales (Yancowinna) Canonical +09:30 +10:30
351 AU Australia/Canberra Deprecated +10:00 +11:00 Link to Australia/Sydney
352 AU Australia/Currie Deprecated +10:00 +11:00 Link to Australia/Hobart
353 AU −1228+13050 Australia/Darwin Northern Territory Canonical +09:30 +09:30
354 AU −3143+12852 Australia/Eucla Western Australia (Eucla) Canonical +08:45 +08:45
355 AU −4253+14719 Australia/Hobart Tasmania Canonical +10:00 +11:00
356 AU Australia/LHI Deprecated +10:30 +11:00 Link to Australia/Lord_Howe
357 AU −2016+14900 Australia/Lindeman Queensland (Whitsunday Islands) Canonical +10:00 +10:00
358 AU −3133+15905 Australia/Lord_Howe Lord Howe Island Canonical +10:30 +11:00 This is the only time zone in the world that uses 30-minute DST transitions.
359 AU −3749+14458 Australia/Melbourne Victoria Canonical +10:00 +11:00
360 AU Australia/North Deprecated +09:30 +09:30 Link to Australia/Darwin
361 AU Australia/NSW Deprecated +10:00 +11:00 Link to Australia/Sydney
362 AU −3157+11551 Australia/Perth Western Australia (most areas) Canonical +08:00 +08:00
363 AU Australia/Queensland Deprecated +10:00 +10:00 Link to Australia/Brisbane
364 AU Australia/South Deprecated +09:30 +10:30 Link to Australia/Adelaide
365 AU −3352+15113 Australia/Sydney New South Wales (most areas) Canonical +10:00 +11:00
366 AU Australia/Tasmania Deprecated +10:00 +11:00 Link to Australia/Hobart
367 AU Australia/Victoria Deprecated +10:00 +11:00 Link to Australia/Melbourne
368 AU Australia/West Deprecated +08:00 +08:00 Link to Australia/Perth
369 AU Australia/Yancowinna Deprecated +09:30 +10:30 Link to Australia/Broken_Hill
370 BR Brazil/Acre Deprecated −05:00 −05:00 Link to America/Rio_Branco
371 BR Brazil/DeNoronha Deprecated −02:00 −02:00 Link to America/Noronha
372 BR Brazil/East Deprecated −03:00 −03:00 Link to America/Sao_Paulo
373 BR Brazil/West Deprecated −04:00 −04:00 Link to America/Manaus
374 CA Canada/Atlantic Deprecated −04:00 −03:00 Link to America/Halifax
375 CA Canada/Central Deprecated −06:00 −05:00 Link to America/Winnipeg
376 CA Canada/Eastern Deprecated −05:00 −04:00 Link to America/Toronto
377 CA Canada/Mountain Deprecated −07:00 −06:00 Link to America/Edmonton
378 CA Canada/Newfoundland Deprecated −03:30 −02:30 Link to America/St_Johns
379 CA Canada/Pacific Deprecated −08:00 −07:00 Link to America/Vancouver
380 CA Canada/Saskatchewan Deprecated −06:00 −06:00 Link to America/Regina
381 CA Canada/Yukon Deprecated −07:00 −07:00 Link to America/Whitehorse
382 CET Deprecated +01:00 +02:00 Choose a zone that observes CET, such as Europe/Paris.
383 CL Chile/Continental Deprecated −04:00 −03:00 Link to America/Santiago
384 CL Chile/EasterIsland Deprecated −06:00 −05:00 Link to Pacific/Easter
385 CST6CDT Deprecated −06:00 −05:00 Choose a zone that observes CST with United States daylight saving time rules, such as America/Chicago.
386 CU Cuba Deprecated −05:00 −04:00 Link to America/Havana
387 EET Deprecated +02:00 +03:00 Choose a zone that observes EET, such as Europe/Sofia.
388 EG Egypt Deprecated +02:00 +02:00 Link to Africa/Cairo
389 IE Eire Deprecated +01:00 +00:00 Link to Europe/Dublin
390 EST Deprecated −05:00 −05:00 Choose a zone that currently observes EST without daylight saving time, such as America/Cancun.
391 EST5EDT Deprecated −05:00 −04:00 Choose a zone that observes EST with United States daylight saving time rules, such as America/New_York.
392 Etc/GMT Canonical +00:00 +00:00
393 Etc/GMT+0 Alias +00:00 +00:00 Link to Etc/GMT
394 Etc/GMT+1 Canonical −01:00 −01:00 Sign is intentionally inverted. See the Etc area description.
395 Etc/GMT+10 Canonical −10:00 −10:00 Sign is intentionally inverted. See the Etc area description.
396 Etc/GMT+11 Canonical −11:00 −11:00 Sign is intentionally inverted. See the Etc area description.
397 Etc/GMT+12 Canonical −12:00 −12:00 Sign is intentionally inverted. See the Etc area description.
398 Etc/GMT+2 Canonical −02:00 −02:00 Sign is intentionally inverted. See the Etc area description.
399 Etc/GMT+3 Canonical −03:00 −03:00 Sign is intentionally inverted. See the Etc area description.
400 Etc/GMT+4 Canonical −04:00 −04:00 Sign is intentionally inverted. See the Etc area description.
401 Etc/GMT+5 Canonical −05:00 −05:00 Sign is intentionally inverted. See the Etc area description.
402 Etc/GMT+6 Canonical −06:00 −06:00 Sign is intentionally inverted. See the Etc area description.
403 Etc/GMT+7 Canonical −07:00 −07:00 Sign is intentionally inverted. See the Etc area description.
404 Etc/GMT+8 Canonical −08:00 −08:00 Sign is intentionally inverted. See the Etc area description.
405 Etc/GMT+9 Canonical −09:00 −09:00 Sign is intentionally inverted. See the Etc area description.
406 Etc/GMT-0 Alias +00:00 +00:00 Link to Etc/GMT
407 Etc/GMT-1 Canonical +01:00 +01:00 Sign is intentionally inverted. See the Etc area description.
408 Etc/GMT-10 Canonical +10:00 +10:00 Sign is intentionally inverted. See the Etc area description.
409 Etc/GMT-11 Canonical +11:00 +11:00 Sign is intentionally inverted. See the Etc area description.
410 Etc/GMT-12 Canonical +12:00 +12:00 Sign is intentionally inverted. See the Etc area description.
411 Etc/GMT-13 Canonical +13:00 +13:00 Sign is intentionally inverted. See the Etc area description.
412 Etc/GMT-14 Canonical +14:00 +14:00 Sign is intentionally inverted. See the Etc area description.
413 Etc/GMT-2 Canonical +02:00 +02:00 Sign is intentionally inverted. See the Etc area description.
414 Etc/GMT-3 Canonical +03:00 +03:00 Sign is intentionally inverted. See the Etc area description.
415 Etc/GMT-4 Canonical +04:00 +04:00 Sign is intentionally inverted. See the Etc area description.
416 Etc/GMT-5 Canonical +05:00 +05:00 Sign is intentionally inverted. See the Etc area description.
417 Etc/GMT-6 Canonical +06:00 +06:00 Sign is intentionally inverted. See the Etc area description.
418 Etc/GMT-7 Canonical +07:00 +07:00 Sign is intentionally inverted. See the Etc area description.
419 Etc/GMT-8 Canonical +08:00 +08:00 Sign is intentionally inverted. See the Etc area description.
420 Etc/GMT-9 Canonical +09:00 +09:00 Sign is intentionally inverted. See the Etc area description.
421 Etc/GMT0 Alias +00:00 +00:00 Link to Etc/GMT
422 Etc/Greenwich Deprecated +00:00 +00:00 Link to Etc/GMT
423 Etc/UCT Deprecated +00:00 +00:00 Link to Etc/UTC
424 Etc/Universal Deprecated +00:00 +00:00 Link to Etc/UTC
425 Etc/UTC Canonical +00:00 +00:00
426 Etc/Zulu Deprecated +00:00 +00:00 Link to Etc/UTC
427 NL +5222+00454 Europe/Amsterdam Canonical +01:00 +02:00
428 AD +4230+00131 Europe/Andorra Canonical +01:00 +02:00
429 RU +4621+04803 Europe/Astrakhan MSK+01 - Astrakhan Canonical +04:00 +04:00
430 GR +3758+02343 Europe/Athens Canonical +02:00 +03:00
431 GB Europe/Belfast Deprecated +00:00 +01:00 Link to Europe/London
432 RS +4450+02030 Europe/Belgrade Canonical +01:00 +02:00
433 DE +5230+01322 Europe/Berlin Germany (most areas) Canonical +01:00 +02:00 In 1945, the Trizone did not follow Berlin's switch to DST, see Time in Germany
434 SK +4809+01707 Europe/Bratislava Alias +01:00 +02:00 Link to Europe/Prague
435 BE +5050+00420 Europe/Brussels Canonical +01:00 +02:00
436 RO +4426+02606 Europe/Bucharest Canonical +02:00 +03:00
437 HU +4730+01905 Europe/Budapest Canonical +01:00 +02:00
438 DE +4742+00841 Europe/Busingen Busingen Alias +01:00 +02:00 Link to Europe/Zurich
439 MD +4700+02850 Europe/Chisinau Canonical +02:00 +03:00
440 DK +5540+01235 Europe/Copenhagen Canonical +01:00 +02:00
441 IE +5320−00615 Europe/Dublin Canonical +01:00 +00:00
442 GI +3608−00521 Europe/Gibraltar Canonical +01:00 +02:00
443 GG +492717−0023210 Europe/Guernsey Alias +00:00 +01:00 Link to Europe/London
444 FI +6010+02458 Europe/Helsinki Canonical +02:00 +03:00
445 IM +5409−00428 Europe/Isle_of_Man Alias +00:00 +01:00 Link to Europe/London
446 TR +4101+02858 Europe/Istanbul Canonical +03:00 +03:00
447 JE +491101−0020624 Europe/Jersey Alias +00:00 +01:00 Link to Europe/London
448 RU +5443+02030 Europe/Kaliningrad MSK-01 - Kaliningrad Canonical +02:00 +02:00
449 UA +5026+03031 Europe/Kiev Ukraine (most areas) Canonical +02:00 +03:00
450 RU +5836+04939 Europe/Kirov MSK+00 - Kirov Canonical +03:00 +03:00
451 PT +3843−00908 Europe/Lisbon Portugal (mainland) Canonical +00:00 +01:00
452 SI +4603+01431 Europe/Ljubljana Alias +01:00 +02:00 Link to Europe/Belgrade
453 GB +513030−0000731 Europe/London Canonical +00:00 +01:00
454 LU +4936+00609 Europe/Luxembourg Canonical +01:00 +02:00
455 ES +4024−00341 Europe/Madrid Spain (mainland) Canonical +01:00 +02:00
456 MT +3554+01431 Europe/Malta Canonical +01:00 +02:00
457 AX +6006+01957 Europe/Mariehamn Alias +02:00 +03:00 Link to Europe/Helsinki
458 BY +5354+02734 Europe/Minsk Canonical +03:00 +03:00
459 MC +4342+00723 Europe/Monaco Canonical +01:00 +02:00
460 RU +554521+0373704 Europe/Moscow MSK+00 - Moscow area Canonical +03:00 +03:00
461 CY +3510+03322 Europe/Nicosia Alias +02:00 +03:00 Link to Asia/Nicosia
462 NO +5955+01045 Europe/Oslo Canonical +01:00 +02:00
463 FR +4852+00220 Europe/Paris Canonical +01:00 +02:00
464 ME +4226+01916 Europe/Podgorica Alias +01:00 +02:00 Link to Europe/Belgrade
465 CZ +5005+01426 Europe/Prague Canonical +01:00 +02:00
466 LV +5657+02406 Europe/Riga Canonical +02:00 +03:00
467 IT +4154+01229 Europe/Rome Canonical +01:00 +02:00
468 RU +5312+05009 Europe/Samara MSK+01 - Samara, Udmurtia Canonical +04:00 +04:00
469 SM +4355+01228 Europe/San_Marino Alias +01:00 +02:00 Link to Europe/Rome
470 BA +4352+01825 Europe/Sarajevo Alias +01:00 +02:00 Link to Europe/Belgrade
471 RU +5134+04602 Europe/Saratov MSK+01 - Saratov Canonical +04:00 +04:00
472 UA +4457+03406 Europe/Simferopol Crimea Canonical +03:00 +03:00 Disputed - Reflects data in the TZDB.[note 2]
473 MK +4159+02126 Europe/Skopje Alias +01:00 +02:00 Link to Europe/Belgrade
474 BG +4241+02319 Europe/Sofia Canonical +02:00 +03:00
475 SE +5920+01803 Europe/Stockholm Canonical +01:00 +02:00
476 EE +5925+02445 Europe/Tallinn Canonical +02:00 +03:00
477 AL +4120+01950 Europe/Tirane Canonical +01:00 +02:00
478 MD Europe/Tiraspol Deprecated +02:00 +03:00 Link to Europe/Chisinau
479 RU +5420+04824 Europe/Ulyanovsk MSK+01 - Ulyanovsk Canonical +04:00 +04:00
480 UA +4837+02218 Europe/Uzhgorod Transcarpathia Canonical +02:00 +03:00
481 LI +4709+00931 Europe/Vaduz Alias +01:00 +02:00 Link to Europe/Zurich
482 VA +415408+0122711 Europe/Vatican Alias +01:00 +02:00 Link to Europe/Rome
483 AT +4813+01620 Europe/Vienna Canonical +01:00 +02:00
484 LT +5441+02519 Europe/Vilnius Canonical +02:00 +03:00
485 RU +4844+04425 Europe/Volgograd MSK+00 - Volgograd Canonical +03:00 +03:00
486 PL +5215+02100 Europe/Warsaw Canonical +01:00 +02:00
487 HR +4548+01558 Europe/Zagreb Alias +01:00 +02:00 Link to Europe/Belgrade
488 UA +4750+03510 Europe/Zaporozhye Zaporozhye and east Lugansk Canonical +02:00 +03:00
489 CH +4723+00832 Europe/Zurich Swiss time Canonical +01:00 +02:00
490 Factory Canonical +00:00 +00:00
491 GB GB Deprecated +00:00 +01:00 Link to Europe/London
492 GB GB-Eire Deprecated +00:00 +01:00 Link to Europe/London
493 GMT Alias +00:00 +00:00 Link to Etc/GMT
494 GMT+0 Deprecated +00:00 +00:00 Link to Etc/GMT
495 GMT-0 Deprecated +00:00 +00:00 Link to Etc/GMT
496 GMT0 Deprecated +00:00 +00:00 Link to Etc/GMT
497 Greenwich Deprecated +00:00 +00:00 Link to Etc/GMT
498 HK +2217+11409 Hongkong Deprecated +08:00 +08:00 Link to Asia/Hong_Kong
499 HST Deprecated −10:00 −10:00 Choose a zone that currently observes HST without daylight saving time, such as Pacific/Honolulu.
500 IS Iceland Deprecated +00:00 +00:00 Link to Atlantic/Reykjavik
501 MG −1855+04731 Indian/Antananarivo Alias +03:00 +03:00 Link to Africa/Nairobi
502 IO −0720+07225 Indian/Chagos Canonical +06:00 +06:00
503 CX −1025+10543 Indian/Christmas Canonical +07:00 +07:00
504 CC −1210+09655 Indian/Cocos Canonical +06:30 +06:30
505 KM −1141+04316 Indian/Comoro Alias +03:00 +03:00 Link to Africa/Nairobi
506 TF −492110+0701303 Indian/Kerguelen Kerguelen, St Paul Island, Amsterdam Island Canonical +05:00 +05:00
507 SC −0440+05528 Indian/Mahe Canonical +04:00 +04:00
508 MV +0410+07330 Indian/Maldives Canonical +05:00 +05:00
509 MU −2010+05730 Indian/Mauritius Canonical +04:00 +04:00
510 YT −1247+04514 Indian/Mayotte Alias +03:00 +03:00 Link to Africa/Nairobi
511 RE −2052+05528 Indian/Reunion Réunion, Crozet, Scattered Islands Canonical +04:00 +04:00
512 IR Iran Deprecated +03:30 +04:30 Link to Asia/Tehran
513 IL Israel Deprecated +02:00 +03:00 Link to Asia/Jerusalem
514 JM +175805−0764736 Jamaica Deprecated −05:00 −05:00 Link to America/Jamaica
515 JP Japan Deprecated +09:00 +09:00 Link to Asia/Tokyo
516 MH +0905+16720 Kwajalein Deprecated +12:00 +12:00 Link to Pacific/Kwajalein
517 LY Libya Deprecated +02:00 +02:00 Link to Africa/Tripoli
518 MET Deprecated +01:00 +02:00 Choose a zone that observes MET (sames as CET), such as Europe/Paris.
519 MX Mexico/BajaNorte Deprecated −08:00 −07:00 Link to America/Tijuana
520 MX Mexico/BajaSur Deprecated −07:00 −06:00 Link to America/Mazatlan
521 MX Mexico/General Deprecated −06:00 −05:00 Link to America/Mexico_City
522 MST Deprecated −07:00 −07:00 Choose a zone that currently observes MST without daylight saving time, such as America/Phoenix.
523 MST7MDT Deprecated −07:00 −06:00 Choose a zone that observes MST with United States daylight saving time rules, such as America/Denver.
524 US Navajo Deprecated −07:00 −06:00 Link to America/Denver
525 NZ NZ Deprecated +12:00 +13:00 Link to Pacific/Auckland
526 NZ NZ-CHAT Deprecated +12:45 +13:45 Link to Pacific/Chatham
527 WS −1350−17144 Pacific/Apia Canonical +13:00 +14:00
528 NZ −3652+17446 Pacific/Auckland New Zealand time Canonical +12:00 +13:00
529 PG −0613+15534 Pacific/Bougainville Bougainville Canonical +11:00 +11:00
530 NZ −4357−17633 Pacific/Chatham Chatham Islands Canonical +12:45 +13:45
531 FM +0725+15147 Pacific/Chuuk Chuuk/Truk, Yap Canonical +10:00 +10:00
532 CL −2709−10926 Pacific/Easter Easter Island Canonical −06:00 −05:00
533 VU −1740+16825 Pacific/Efate Canonical +11:00 +11:00
534 KI −0308−17105 Pacific/Enderbury Phoenix Islands Canonical +13:00 +13:00
535 TK −0922−17114 Pacific/Fakaofo Canonical +13:00 +13:00
536 FJ −1808+17825 Pacific/Fiji Canonical +12:00 +13:00
537 TV −0831+17913 Pacific/Funafuti Canonical +12:00 +12:00
538 EC −0054−08936 Pacific/Galapagos Galápagos Islands Canonical −06:00 −06:00
539 PF −2308−13457 Pacific/Gambier Gambier Islands Canonical −09:00 −09:00
540 SB −0932+16012 Pacific/Guadalcanal Canonical +11:00 +11:00
541 GU +1328+14445 Pacific/Guam Canonical +10:00 +10:00
542 US +211825−1575130 Pacific/Honolulu Hawaii Canonical −10:00 −10:00
543 UM Pacific/Johnston Deprecated −10:00 −10:00 Link to Pacific/Honolulu
544 KI +0152−15720 Pacific/Kiritimati Line Islands Canonical +14:00 +14:00
545 FM +0519+16259 Pacific/Kosrae Kosrae Canonical +11:00 +11:00
546 MH +0905+16720 Pacific/Kwajalein Kwajalein Canonical +12:00 +12:00
547 MH +0709+17112 Pacific/Majuro Marshall Islands (most areas) Canonical +12:00 +12:00
548 PF −0900−13930 Pacific/Marquesas Marquesas Islands Canonical −09:30 −09:30
549 UM +2813−17722 Pacific/Midway Midway Islands Alias −11:00 −11:00 Link to Pacific/Pago_Pago
550 NR −0031+16655 Pacific/Nauru Canonical +12:00 +12:00
551 NU −1901−16955 Pacific/Niue Canonical −11:00 −11:00
552 NF −2903+16758 Pacific/Norfolk Canonical +11:00 +12:00
553 NC −2216+16627 Pacific/Noumea Canonical +11:00 +11:00
554 AS −1416−17042 Pacific/Pago_Pago Samoa, Midway Canonical −11:00 −11:00
555 PW +0720+13429 Pacific/Palau Canonical +09:00 +09:00
556 PN −2504−13005 Pacific/Pitcairn Canonical −08:00 −08:00
557 FM +0658+15813 Pacific/Pohnpei Pohnpei/Ponape Canonical +11:00 +11:00
558 FM Pacific/Ponape Deprecated +11:00 +11:00 Link to Pacific/Pohnpei
559 PG −0930+14710 Pacific/Port_Moresby Papua New Guinea (most areas) Canonical +10:00 +10:00
560 CK −2114−15946 Pacific/Rarotonga Canonical −10:00 −10:00
561 MP +1512+14545 Pacific/Saipan Alias +10:00 +10:00 Link to Pacific/Guam
562 WS Pacific/Samoa Deprecated −11:00 −11:00 Link to Pacific/Pago_Pago
563 PF −1732−14934 Pacific/Tahiti Society Islands Canonical −10:00 −10:00
564 KI +0125+17300 Pacific/Tarawa Gilbert Islands Canonical +12:00 +12:00
565 TO −2110−17510 Pacific/Tongatapu Canonical +13:00 +13:00
566 FM Pacific/Truk Deprecated +10:00 +10:00 Link to Pacific/Chuuk
567 UM +1917+16637 Pacific/Wake Wake Island Canonical +12:00 +12:00
568 WF −1318−17610 Pacific/Wallis Canonical +12:00 +12:00
569 FM Pacific/Yap Deprecated +10:00 +10:00 Link to Pacific/Chuuk
570 PL Poland Deprecated +01:00 +02:00 Link to Europe/Warsaw
571 PT Portugal Deprecated +00:00 +01:00 Link to Europe/Lisbon
572 CN PRC Deprecated +08:00 +08:00 Link to Asia/Shanghai
573 PST8PDT Deprecated −08:00 −07:00 Choose a zone that observes PST with United States daylight saving time rules, such as America/Los_Angeles.
574 TW ROC Deprecated +08:00 +08:00 Link to Asia/Taipei
575 KR ROK Deprecated +09:00 +09:00 Link to Asia/Seoul
576 SG +0117+10351 Singapore Deprecated +08:00 +08:00 Link to Asia/Singapore
577 TR Turkey Deprecated +03:00 +03:00 Link to Europe/Istanbul
578 UCT Deprecated +00:00 +00:00 Link to Etc/UTC
579 Universal Deprecated +00:00 +00:00 Link to Etc/UTC
580 US US/Alaska Deprecated −09:00 −08:00 Link to America/Anchorage
581 US US/Aleutian Deprecated −10:00 −09:00 Link to America/Adak
582 US US/Arizona Deprecated −07:00 −07:00 Link to America/Phoenix
583 US US/Central Deprecated −06:00 −05:00 Link to America/Chicago
584 US US/East-Indiana Deprecated −05:00 −04:00 Link to America/Indiana/Indianapolis
585 US US/Eastern Deprecated −05:00 −04:00 Link to America/New_York
586 US US/Hawaii Deprecated −10:00 −10:00 Link to Pacific/Honolulu
587 US US/Indiana-Starke Deprecated −06:00 −05:00 Link to America/Indiana/Knox
588 US US/Michigan Deprecated −05:00 −04:00 Link to America/Detroit
589 US US/Mountain Deprecated −07:00 −06:00 Link to America/Denver
590 US US/Pacific Deprecated −08:00 −07:00 Link to America/Los_Angeles
591 WS US/Samoa Deprecated −11:00 −11:00 Link to Pacific/Pago_Pago
592 UTC Alias +00:00 +00:00 Link to Etc/UTC
593 RU W-SU Deprecated +03:00 +03:00 Link to Europe/Moscow
594 WET Deprecated +00:00 +01:00 Choose a zone that observes WET, such as Europe/Lisbon.
595 Zulu Deprecated +00:00 +00:00 Link to Etc/UTC

View File

@ -0,0 +1,95 @@
# Analysis Workflow Example
!!! info "TL;DR"
- In addition to using RAPIDS to extract behavioral features, create plots, and clean sensor features, you can structure your data analysis within RAPIDS (i.e. creating ML/statistical models and evaluating your models)
- We include an analysis example in RAPIDS that covers raw data processing, feature extraction, cleaning, machine learning modeling, and evaluation
- Use this example as a guide to structure your own analysis within RAPIDS
- RAPIDS analysis workflows are compatible with your favorite data science tools and libraries
- RAPIDS analysis workflows are reproducible and we encourage you to publish them along with your research papers
## Why should I integrate my analysis in RAPIDS?
Even though the bulk of RAPIDS current functionality is related to the computation of behavioral features, we recommend RAPIDS as a complementary tool to create a mobile data analysis workflow. This is because the cookiecutter data science file organization guidelines, the use of Snakemake, the provided behavioral features, and the reproducible R and Python development environments allow researchers to divide an analysis workflow into small parts that can be audited, shared in an online repository, reproduced in other computers, and understood by other people as they follow a familiar and consistent structure. We believe these advantages outweigh the time needed to learn how to create these workflows in RAPIDS.
We clarify that to create analysis workflows in RAPIDS, researchers can still use any data manipulation tools, editors, libraries or languages they are already familiar with. RAPIDS is meant to be the final destination of analysis code that was developed in interactive notebooks or stand-alone scripts. For example, a user can compute call and location features using RAPIDS, then, they can use Jupyter notebooks to explore feature cleaning approaches and once the cleaning code is final, it can be moved to RAPIDS as a new step in the pipeline. In turn, the output of this cleaning step can be used to explore machine learning models and once a model is finished, it can also be transferred to RAPIDS as a step of its own. The idea is that when it is time to publish a piece of research, a RAPIDS workflow can be shared in a public repository as is.
In the following sections we share an example of how we structured an analysis workflow in RAPIDS.
## Analysis workflow structure
To accurately reflect the complexity of a real-world modeling scenario, we decided not to oversimplify this example. Importantly, every step in this example follows a basic structure: an input file and parameters are manipulated by an R or Python script that saves the results to an output file. Input files, parameters, output files and scripts are grouped into Snakemake rules that are described on `smk` files in the rules folder (we point the reader to the relevant rule(s) of each step).
Researchers can use these rules and scripts as a guide to create their own as it is expected every modeling project will have different requirements, data and goals but ultimately most follow a similar chainned pattern.
!!! hint
The example's config file is `example_profile/example_config.yaml` and its Snakefile is in `example_profile/Snakefile`. The config file is already configured to process the sensor data as explained in [Analysis workflow modules](#analysis-workflow-modules).
## Description of the study modeled in our analysis workflow example
Our example is based on a hypothetical study that recruited 2 participants that underwent surgery and collected mobile data for at least one week before and one week after the procedure. Participants wore a Fitbit device and installed the AWARE client in their personal Android and iOS smartphones to collect mobile data 24/7. In addition, participants completed daily severity ratings of 12 common symptoms on a scale from 0 to 10 that we summed up into a daily symptom burden score.
The goal of this workflow is to find out if we can predict the daily symptom burden score of a participant. Thus, we framed this question as a binary classification problem with two classes, high and low symptom burden based on the scores above and below average of each participant. We also want to compare the performance of individual (personalized) models vs a population model.
In total, our example workflow has nine steps that are in charge of sensor data preprocessing, feature extraction, feature cleaning, machine learning model training and model evaluation (see figure below). We ship this workflow with RAPIDS and share files with [test data](https://osf.io/wbg23/) in an Open Science Framework repository.
<figure>
<img src="../../img/analysis_workflow.png" max-width="100%" />
<figcaption>Modules of RAPIDS example workflow, from raw data to model evaluation</figcaption>
</figure>
## Configure and run the analysis workflow example
1. [Install](../../setup/installation) RAPIDS
2. Unzip the CSV files inside [rapids_example_csv.zip](https://osf.io/wbg23/) in `data/external/example_workflow/*.csv`.
3. Create the participant files for this example by running:
```bash
./rapids -j1 create_example_participant_files
```
4. Run the example pipeline with:
```bash
./rapids -j1 --profile example_profile
```
Note you will see a lot of warning messages, you can ignore them since they happen because we ran ML algorithms with a small fake dataset.
## Modules of our analysis workflow example
??? info "1. Feature extraction"
We extract daily behavioral features for data yield, received and sent messages, missed, incoming and outgoing calls, resample fused location data using Doryab provider, activity recognition, battery, Bluetooth, screen, light, applications foreground, conversations, Wi-Fi connected, Wi-Fi visible, Fitbit heart rate summary and intraday data, Fitbit sleep summary data, and Fitbit step summary and intraday data without excluding sleep periods with an active bout threshold of 10 steps. In total, we obtained 245 daily sensor features over 12 days per participant.
??? info "2. Extract demographic data."
It is common to have demographic data in addition to mobile and target (ground truth) data. In this example we include participants age, gender and the number of days they spent in hospital after their surgery as features in our model. We extract these three columns from the `data/external/example_workflow/participant_info.csv` file. As these three features remain the same within participants, they are used only on the population model. Refer to the `demographic_features` rule in `rules/models.smk`.
??? info "3. Create target labels."
The two classes for our machine learning binary classification problem are high and low symptom burden. Target values are already stored in the `data/external/example_workflow/participant_target.csv` file. A new rule/script can be created if further manipulation is necessary. Refer to the `parse_targets` rule in `rules/models.smk`.
??? info "4. Feature merging."
These daily features are stored on a CSV file per sensor, a CSV file per participant, and a CSV file including all features from all participants (in every case each column represents a feature and each row represents a day). Refer to the `merge_sensor_features_for_individual_participants` and `merge_sensor_features_for_all_participants` rules in `rules/features.smk`.
??? info "5. Data visualization."
At this point the user can use the five plots RAPIDS provides (or implement new ones) to explore and understand the quality of the raw data and extracted features and decide what sensors, days, or participants to include and exclude. Refer to `rules/reports.smk` to find the rules that generate these plots.
??? info "6. Feature cleaning."
In this stage we perform four steps to clean our sensor feature file. First, we discard days with a data yield hour ratio less than or equal to 0.75, i.e. we include days with at least 18 hours of data. Second, we drop columns (features) with more than 30% of missing rows. Third, we drop columns with zero variance. Fourth, we drop rows (days) with more than 30% of missing columns (features). In this cleaning stage several parameters are created and exposed in `example_profile/example_config.yaml`.
After this step, we kept 173 features over 11 days for the individual model of p01, 101 features over 12 days for the individual model of p02 and 117 features over 22 days for the population model. Note that the difference in the number of features between p01 and p02 is mostly due to iOS restrictions that stops researchers from collecting the same number of sensors than in Android phones.
Feature cleaning for the individual models is done in the `clean_sensor_features_for_individual_participants` rule and for the population model in the `clean_sensor_features_for_all_participants` rule in `rules/models.smk`.
??? info "7. Merge features and targets."
In this step we merge the cleaned features and target labels for our individual models in the `merge_features_and_targets_for_individual_model` rule in `rules/features.smk`. Additionally, we merge the cleaned features, target labels, and demographic features of our two participants for the population model in the `merge_features_and_targets_for_population_model` rule in `rules/features.smk`. These two merged files are the input for our individual and population models.
??? info "8. Modelling."
This stage has three phases: model building, training and evaluation.
In the building phase we impute, normalize and oversample our dataset. Missing numeric values in each column are imputed with their mean and we impute missing categorical values with their mode. We normalize each numeric column with one of three strategies (min-max, z-score, and scikit-learn packages robust scaler) and we one-hot encode each categorial feature as a numerical array. We oversample our imbalanced dataset using SMOTE (Synthetic Minority Over-sampling Technique) or a Random Over sampler from scikit-learn. All these parameters are exposed in `example_profile/example_config.yaml`.
In the training phase, we create eight models: logistic regression, k-nearest neighbors, support vector machine, decision tree, random forest, gradient boosting classifier, extreme gradient boosting classifier and a light gradient boosting machine. We cross-validate each model with an inner cycle to tune hyper-parameters based on the Macro F1 score and an outer cycle to predict the test set on a model with the best hyper-parameters. Both cross-validation cycles use a leave-one-out strategy. Parameters for each model like weights and learning rates are exposed in `example_profile/example_config.yaml`.
Finally, in the evaluation phase we compute the accuracy, Macro F1, kappa, area under the curve and per class precision, recall and F1 score of all folds of the outer cross-validation cycle.
Refer to the `modelling_for_individual_participants` rule for the individual modeling and to the `modelling_for_all_participants` rule for the population modeling, both in `rules/models.smk`.
??? info "9. Compute model baselines."
We create three baselines to evaluate our classification models.
First, a majority classifier that labels each test sample with the majority class of our training data. Second, a random weighted classifier that predicts each test observation sampling at random from a binomial distribution based on the ratio of our target labels. Third, a decision tree classifier based solely on the demographic features of each participant. As we do not have demographic features for individual model, this baseline is only available for population model.
Our baseline metrics (e.g. accuracy, precision, etc.) are saved into a CSV file, ready to be compared to our modeling results. Refer to the `baselines_for_individual_model` rule for the individual model baselines and to the `baselines_for_population_model` rule for population model baselines, both in `rules/models.smk`.

View File

@ -0,0 +1,92 @@
Data Cleaning
=============
The goal of this module is to perform basic clean tasks on the behavioral features that RAPIDS computes. You might need to do further processing depending on your analysis objectives. This module can clean features at the individual level and at the study level. If you are interested in creating individual models (using each participant's features independently of the others) use [`ALL_CLEANING_INDIVIDUAL`]. If you are interested in creating population models (using everyone's data in the same model) use [`ALL_CLEANING_OVERALL`]
## Clean sensor features for individual participants
!!! info "File Sequence"
```bash
- data/processed/features/{pid}/all_sensor_features.csv
- data/processed/features/{pid}/all_sensor_features_cleaned_{provider_key}.csv
```
### RAPIDS provider
Parameters description for `[ALL_CLEANING_INDIVIDUAL][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]` | Set to `True` to execute the cleaning tasks described below. You can use the parameters of each task to tweak them or deactivate them|
|`[IMPUTE_SELECTED_EVENT_FEATURES]` | Fill NAs with 0 only for event-based features, see table below
|`[COLS_NAN_THRESHOLD]` | Discard columns with missing value ratios higher than `[COLS_NAN_THRESHOLD]`. Set to 1 to disable
|`[COLS_VAR_THRESHOLD]` | Set to `True` to discard columns with zero variance
|`[ROWS_NAN_THRESHOLD]` | Discard rows with missing value ratios higher than `[ROWS_NAN_THRESHOLD]`. Set to 1 to disable
|`[DATA_YIELD_FEATURE]` | `RATIO_VALID_YIELDED_HOURS` or `RATIO_VALID_YIELDED_MINUTES`
|`[DATA_YIELD_RATIO_THRESHOLD]` | Discard rows with `ratiovalidyieldedhours` or `ratiovalidyieldedminutes` feature less than `[DATA_YIELD_RATIO_THRESHOLD]`. The feature name is determined by `[DATA_YIELD_FEATURE]` parameter. Set to 0 to disable
|`DROP_HIGHLY_CORRELATED_FEATURES` | Discard highly correlated features, see table below
Parameters description for `[ALL_CLEANING_INDIVIDUAL][PROVIDERS][RAPIDS][IMPUTE_SELECTED_EVENT_FEATURES]`:
|Parameters | Description |
|-------------------------------------- |----------------------------------------------------------------|
|`[COMPUTE]` | Set to `True` to fill NAs with 0 for phone event-based features
|`[MIN_DATA_YIELDED_MINUTES_TO_IMPUTE]` | Any feature value in a time segment instance with phone data yield > `[MIN_DATA_YIELDED_MINUTES_TO_IMPUTE]` will be replaced with a zero. See below for an explanation. |
Parameters description for `[ALL_CLEANING_INDIVIDUAL][PROVIDERS][RAPIDS][DROP_HIGHLY_CORRELATED_FEATURES]`:
|Parameters | Description |
|-------------------------------------- |----------------------------------------------------------------|
|`[COMPUTE]` | Set to `True` to drop highly correlated features
|`[MIN_OVERLAP_FOR_CORR_THRESHOLD]` | Minimum ratio of observations required per pair of columns (features) to be considered as a valid correlation.
|`[CORR_THRESHOLD]` | The absolute values of pair-wise correlations are calculated. If two variables have a valid correlation higher than `[CORR_THRESHOLD]`, we looks at the mean absolute correlation of each variable and removes the variable with the largest mean absolute correlation.
Steps to clean sensor features for individual participants. It only considers the **phone sensors** currently.
??? info "1. Fill NA with 0 for the selected event features."
Some event features should be zero instead of NA. In this step, we fill those missing features with 0 when the `phone_data_yield_rapids_ratiovalidyieldedminutes` column is higher than the `[IMPUTE_SELECTED_EVENT_FEATURES][MIN_DATA_YIELDED_MINUTES_TO_IMPUTE]` parameter. Plugins such as Activity Recognition sensor are not considered. You can skip this step by setting `[IMPUTE_SELECTED_EVENT_FEATURES][COMPUTE]` to `False`.
Take phone calls sensor as an example. If there are no calls records during a time segment for a participant, then (1) the calls sensor was not working during that time segment; or (2) the calls sensor was working and the participant did not have any calls during that time segment. To differentiate these two situations, we assume the selected sensors are working when `phone_data_yield_rapids_ratiovalidyieldedminutes > [MIN_DATA_YIELDED_MINUTES_TO_IMPUTE]`.
The following phone event-based features are considered currently:
- Application foreground: countevent, countepisode, minduration, maxduration, meanduration, sumduration.
- Battery: all features.
- Calls: count, distinctcontacts, sumduration, minduration, maxduration, meanduration, modeduration.
- Keyboard: sessioncount, averagesessionlength, changeintextlengthlessthanminusone, changeintextlengthequaltominusone, changeintextlengthequaltoone, changeintextlengthmorethanone, maxtextlength, totalkeyboardtouches.
- Messages: count, distinctcontacts.
- Screen: sumduration, maxduration, minduration, avgduration, countepisode.
- WiFi: all connected and visible features.
??? info "2. Discard unreliable rows."
Extracted features might be not reliable if the sensor only works for a short period during a time segment. In this step, we discard rows when the `phone_data_yield_rapids_ratiovalidyieldedminutes` column or the `phone_data_yield_rapids_ratiovalidyieldedhours` column is less than the `[DATA_YIELD_RATIO_THRESHOLD]` parameter. We recommend using `phone_data_yield_rapids_ratiovalidyieldedminutes` column (set `[DATA_YIELD_FEATURE]` to `RATIO_VALID_YIELDED_MINUTES`) on time segments that are shorter than two or three hours and `phone_data_yield_rapids_ratiovalidyieldedhours` (set `[DATA_YIELD_FEATURE]` to `RATIO_VALID_YIELDED_HOURS`) for longer segments. We do not recommend you to skip this step, but you can do it by setting `[DATA_YIELD_RATIO_THRESHOLD]` to 0.
??? info "3. Discard columns (features) with too many missing values."
In this step, we discard columns with missing value ratios higher than `[COLS_NAN_THRESHOLD]`. We do not recommend you to skip this step, but you can do it by setting `[COLS_NAN_THRESHOLD]` to 1.
??? info "4. Discard columns (features) with zero variance."
In this step, we discard columns with zero variance. We do not recommend you to skip this step, but you can do it by setting `[COLS_VAR_THRESHOLD]` to `False`.
??? info "5. Drop highly correlated features."
As highly correlated features might not bring additional information and will increase the complexity of a model, we drop them in this step. The absolute values of pair-wise correlations are calculated. Each correlation vector between two variables is regarded as valid only if the ratio of valid value pairs (i.e. non NA pairs) is greater than or equal to `[DROP_HIGHLY_CORRELATED_FEATURES][MIN_OVERLAP_FOR_CORR_THRESHOLD]`. If two variables have a correlation coefficient higher than `[DROP_HIGHLY_CORRELATED_FEATURES][CORR_THRESHOLD]`, we look at the mean absolute correlation of each variable and remove the variable with the largest mean absolute correlation. This step can be skipped by setting `[DROP_HIGHLY_CORRELATED_FEATURES][COMPUTE]` to False.
??? info "6. Discard rows with too many missing values."
In this step, we discard rows with missing value ratios higher than `[ROWS_NAN_THRESHOLD]`. We do not recommend you to skip this step, but you can do it by setting `[ROWS_NAN_THRESHOLD]` to 1. In other words, we are discarding time segments (e.g. days) that did not have enough data to be considered reliable. This step is similar to step 2 except the ratio is computed based on NA values instead of a phone data yield threshold.
## Clean sensor features for all participants
!!! info "File Sequence"
```bash
- data/processed/features/all_participants/all_sensor_features.csv
- data/processed/features/all_participants/all_sensor_features_cleaned_{provider_key}.csv
```
### RAPIDS provider
Parameters description and the steps are the same as the above [RAPIDS provider](#rapids-provider) section for individual participants.

View File

@ -0,0 +1,153 @@
Minimal Working Example
=======================
This is a quick guide for creating and running a simple pipeline to extract missing, outgoing, and incoming `call` features for `24 hr` (`00:00:00` to `23:59:59`) and `night` (`00:00:00` to `05:59:59`) time segments of every day of data of one participant that was monitored on the US East coast with an Android smartphone.
1. Install RAPIDS and make sure your `conda` environment is active (see [Installation](../../setup/installation))
3. Download this [CSV file](../img/calls.csv) and save it as `data/external/aware_csv/calls.csv`
2. Make the changes listed below for the corresponding [Configuration](../../setup/configuration) step (we provide an example of what the relevant sections in your `config.yml` will look like after you are done)
??? info "Required configuration changes (*click to expand*)"
1. **Supported [data streams](../../setup/configuration#supported-data-streams).**
Based on the docs, we decided to use the `aware_csv` data stream because we are processing aware data saved in a CSV file. We will use this label in a later step; there's no need to type it or save it anywhere yet.
3. **Create your [participants file](../../setup/configuration#participant-files).**
Since we are processing data from a single participant, you only need to create a single participant file called `p01.yaml` in `data/external/participant_files`. This participant file only has a `PHONE` section because this hypothetical participant was only monitored with a smartphone. Note that for a real analysis, you can do this [automatically with a CSV file](../../setup/configuration##automatic-creation-of-participant-files)
1. Add `p01` to `[PIDS]` in `config.yaml`
1. Create a file in `data/external/participant_files/p01.yaml` with the following content:
```yaml
PHONE:
DEVICE_IDS: [a748ee1a-1d0b-4ae9-9074-279a2b6ba524] # the participant's AWARE device id
PLATFORMS: [android] # or ios
LABEL: MyTestP01 # any string
START_DATE: 2020-01-01 # this can also be empty
END_DATE: 2021-01-01 # this can also be empty
```
4. **Select what [time segments](../../setup/configuration#time-segments) you want to extract features on.**
1. Set `[TIME_SEGMENTS][FILE]` to `data/external/timesegments_periodic.csv`
1. Create a file in `data/external/timesegments_periodic.csv` with the following content
```csv
label,start_time,length,repeats_on,repeats_value
daily,00:00:00,23H 59M 59S,every_day,0
night,00:00:00,5H 59M 59S,every_day,0
```
2. **Choose the [timezone of your study](../../setup/configuration#timezone-of-your-study).**
We will use the default time zone settings since this example is processing data collected on the US East Coast (`America/New_York`)
```yaml
TIMEZONE:
TYPE: SINGLE
SINGLE:
TZCODE: America/New_York
```
5. **Modify your [device data stream configuration](../../setup/configuration#data-stream-configuration)**
1. Set `[PHONE_DATA_STREAMS][USE]` to `aware_csv`.
2. We will use the default value for `[PHONE_DATA_STREAMS][aware_csv][FOLDER]` since we already stored the test calls CSV file there.
6. **Select what [sensors and features](../../setup/configuration#sensor-and-features-to-process) you want to process.**
1. Set `[PHONE_CALLS][CONTAINER]` to `calls.csv` in the `config.yaml` file.
1. Set `[PHONE_CALLS][PROVIDERS][RAPIDS][COMPUTE]` to `True` in the `config.yaml` file.
!!! example "Example of the `config.yaml` sections after the changes outlined above"
This will be your `config.yaml` after following the instructions above. Click on the numbered markers to know more.
``` { .yaml .annotate }
PIDS: [p01] # (1)
TIMEZONE:
TYPE: SINGLE # (2)
SINGLE:
TZCODE: America/New_York
# ... other irrelevant sections
TIME_SEGMENTS: &time_segments
TYPE: PERIODIC # (3)
FILE: "data/external/timesegments_periodic.csv" # (4)
INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE
PHONE_DATA_STREAMS:
USE: aware_csv # (5)
aware_csv:
FOLDER: data/external/aware_csv # (6)
# ... other irrelevant sections
############## PHONE ###########################################################
################################################################################
# ... other irrelevant sections
# Communication call features config, TYPES and FEATURES keys need to match
PHONE_CALLS:
CONTAINER: calls.csv # (7)
PROVIDERS:
RAPIDS:
COMPUTE: True # (8)
CALL_TYPES: ...
```
1. We added `p01` to PIDS after creating the participant file:
```bash
data/external/participant_files/p01.yaml
```
With the following content:
```yaml
PHONE:
DEVICE_IDS: [a748ee1a-1d0b-4ae9-9074-279a2b6ba524] # the participant's AWARE device id
PLATFORMS: [android] # or ios
LABEL: MyTestP01 # any string
START_DATE: 2020-01-01 # this can also be empty
END_DATE: 2021-01-01 # this can also be empty
```
2. We use the default `SINGLE` time zone.
3. We use the default `PERIODIC` time segment `[TYPE]`
4. We created this time segments file with these lines:
```csv
label,start_time,length,repeats_on,repeats_value
daily,00:00:00,23H 59M 59S,every_day,0
night,001:00:00,5H 59M 59S,every_day,0
```
5. We set `[USE]` to `aware_device` to tell RAPIDS to process sensor data collected with the AWARE Framework stored in CSV files.
6. We used the default `[FOLDER]` for `awre_csv` since we already stored our test `calls.csv` file there
7. We changed `[CONTAINER]` to `calls.csv` to process our test call data.
8. We flipped `[COMPUTE]` to `True` to extract call behavioral features using the `RAPIDS` feature provider.
3. Run RAPIDS
```bash
./rapids -j1
```
4. The call features for daily and morning time segments will be in
```
data/processed/features/all_participants/all_sensor_features.csv
```

View File

@ -1,5 +1,143 @@
# Change Log
## v1.8.0
- Add data stream for AWARE Micro server
- Fix the NA bug in PHONE_LOCATIONS BARNETT provider
- Fix the bug of data type for call_duration field
- Fix the index bug of heatmap_sensors_per_minute_per_time_segment
## v1.7.1
- Update docs for Git Flow section
- Update RAPIDS paper information
## v1.7.0
- Add firststeptime and laststeptime features to FITBIT_STEPS_INTRADAY RAPIDS provider
- Update tests for Fitbit steps intraday features
- Add tests for phone battery features
- Add a data cleaning module to replace NAs with 0 in selected event-based features, discard unreliable rows and columns, discard columns with zero variance, and discard highly correlated columns
## v1.6.0
- Refactor PHONE_CALLS RAPIDS provider to compute features based on call episodes or events
- Refactor PHONE_LOCATIONS DORYAB provider to compute features based on location episodes
- Temporary revert PHONE_LOCATIONS BARNETT provider to use R script
- Update the default IGNORE_EPISODES_LONGER_THAN to be 6 hours for screen RAPIDS provider
- Fix the bug of step intraday features when INCLUDE_ZERO_STEP_ROWS is False
## v1.5.0
- Update Barnett location features with faster Python implementation
- Fix rounding bug in data yield features
- Add tests for data yield, Fitbit and accelerometer features
- Small fixes of documentation
## v1.4.1
- Update home page
- Add PHONE_MESSAGES tests
## v1.4.0
- Add new Application Foreground episode features and tests
- Update VSCode setup instructions for our Docker container
- Add tests for phone calls features
- Add tests for WiFI features and fix a bug that incorrectly counted the most scanned device within the current time segment instances instead of globally
- Add tests for phone conversation features
- Add tests for Bluetooth features and choose the most scanned device alphabetically when ties exist
- Add tests for Activity Recognition features and fix iOS unknown activity parsing
- Fix Fitbit bug that parsed date-times with the current time zone in rare cases
- Update the visualizations to be more precise and robust with different time segments.
- Fix regression crash of the example analysis workflow
## v1.3.0
- Refactor PHONE_LOCATIONS DORYAB provider. Fix bugs and faster execution up to 30x
- New PHONE_KEYBOARD features
- Add a new strategy to infer home location that can handle multiple homes for the same participant
- Add module to exclude sleep episodes from steps intraday features
- Fix PID matching when joining data from multiple participants. Now, we can handle PIDS with an arbitrary format.
- Fix bug that did not correctly parse participants with more than 2 phones or more than 1 wearable
- Fix crash when no phone data yield is needed to process location data (ALL & GPS location providers)
- Remove location rows with the same timestamp based on their accuracy
- Fix PHONE_CONVERSATION bug that produced inaccurate ratio features when time segments were not daily.
- Other minor bug fixes
## v1.2.0
- Sleep summary and intraday features are more consistent.
- Add wake and bedtime features for sleep summary data.
- Fix bugs with sleep PRICE features.
- Update home page
- Add contributing guide
## v1.1.1
- Fix length of periodic segments on days with DLS
- Fix crash when scraping data for an app that does not exist
- Add tests for phone screen data
## v1.1.0
- Add Fitbit calories intraday features
## v1.0.1
- Fix crash in `chunk_episodes` of `utils.py` for multi time zone data
- Fix crash in BT Doryab provider when the number of clusters is 2
- Fix Fitbit multi time zone inference from phone data (simplify)
- Fix missing columns when the input for phone data yield is empty
- Fix wrong date time labels for event segments for multi time zone data (all labels are computed based on a single tz)
- Fix periodic segment crash when there are no segments to assign (only affects wday, mday, qday, or yday)
- Fix crash in Analysis Workflow with new suffix in segments' labels
## v1.0.0
- Add a new [Overview](../setup/overview/) page.
- You can [extend](../datastreams/add-new-data-streams/) RAPIDS with your own [data streams](../datastreams/data-streams-introduction/). Data streams are data collected with other sensing apps besides AWARE (like Beiwe, mindLAMP), and stored in other data containers (databases, files) besides MySQL.
- Support to analyze Empatica wearable data (thanks to Joe Kim and Brinnae Bent from the [DBDP](https://dbdp.org/))
- Support to analyze AWARE data stored in [CSV files](../datastreams/aware-csv/) and [InfluxDB](../datastreams/aware-influxdb/) databases
- Support to analyze data collected over [multiple time zones](../setup/configuration/#multiple-timezones)
- Support for [sleep intraday features](../features/fitbit-sleep-intraday/) from the core team and also from the community (thanks to Stephen Price)
- Users can comment on the documentation (powered by utterances).
- `SCR_SCRIPT` and `SRC_LANGUAGE` are replaced by `SRC_SCRIPT`.
- Add RAPIDS new logo
- Move Citation and Minimal Example page to the Setup section
- Add `config.yaml` validation schema and documentation. Now it's more difficult to modify the `config.yaml` file with invalid values.
- Add new `time at home` Doryab location feature
- Add and home coordinates to the location data file so location providers can build features based on it.
- If you are migrating from RAPIDS 0.4.3 or older, check this [guide](../migrating-from-old-versions/#migrating-from-rapids-04x-or-older)
## v0.4.3
- Fix bug when any of the rows from any sensor do not belong a time segment
## v0.4.2
- Update battery testing
- Fix location processing bug when certain columns don't exist
- Fix HR intraday bug when minutesonZONE features were 0
- Update FAQs
- Fix HR summary bug when restinghr=0 (ignore those rows)
- Fix ROG, location entropy and normalized entropy in Doryab location provider
- Remove sampling frequency dependance in Doryab location provider
- Update documentation of Doryab location provider
- Add new `FITBIT_DATA_YIELD` `RAPIDS` provider
- Deprecate Doryab circadian movement feature until it is fixed
## v0.4.1
- Fix bug when no error message was displayed for an empty `[PHONE_DATA_YIELD][SENSORS]` when resampling location data
## v0.4.0
- Add four new phone sensors that can be used for PHONE_DATA_YIELD
- Add code so new feature providers can be added for the new four sensors
- Add new clustering algorithm (OPTICS) for Doryab features
- Update default EPS parameter for Doryab location clustering
- Add clearer error message for invalid phone data yield sensors
- Add ALL_RESAMPLED flag and accuracy limit for location features
- Add FAQ about null characters in phone tables
- Reactivate light and wifi tests and update testing docs
- Fix bug when parsing Fitbit steps data
- Fix bugs when merging features from empty time segments
- Fix minor issues in the documentation
## v0.3.2
- Update docker and linux instructions to use RSPM binary repo for for faster installation
- Update CI to create a release on a tagged push that passes the tests
- Clarify in DB credential configuration that we only support MySQL
- Add Windows installation instructions
- Fix bugs in the create_participants_file script
- Fix bugs in Fitbit data parsing.
- Fixed Doryab location features context of clustering.
- Fixed the wrong shifting while calculating distance in Doryab location features.
- Refactored the haversine function
## v0.3.1
- Update installation docs for RAPIDS' docker container
- Fix example analysis use of accelerometer data in a plot
- Update FAQ
- Update minimal example documentation
- Minor doc updates
## v0.3.0
- Update R and Python virtual environments
- Add GH actions CI support for tests and docker
- Add release and test badges to README
## v0.2.6
- Fix old versions banner on nested pages
## v0.2.5
- Fix docs deploy typo
## v0.2.4
- Fix broken links in landing page and docs deploy
## v0.2.3
- Fix participant IDS in the example analysis workflow
## v0.2.2
- Fix readme link to docs
## v0.2.1
@ -24,4 +162,4 @@
- Update [virtual environment](../developers/virtual-environments) guide
- Update analysis workflow [example](../workflow-examples/analysis)
- Add a [Code of Conduct](../code_of_conduct)
- Update [Team](../team) page
- Update [Team](../team) page

View File

@ -5,14 +5,18 @@
## RAPIDS
If you used RAPIDS, please cite [this paper](https://preprints.jmir.org/preprint/23246).
If you used RAPIDS, please cite [this paper](https://www.frontiersin.org/article/10.3389/fdgth.2021.769823).
!!! cite "RAPIDS et al. citation"
Vega J, Li M, Aguillera K, Goel N, Joshi E, Durica KC, Kunta AR, Low CA
RAPIDS: Reproducible Analysis Pipeline for Data Streams Collected with Mobile Devices
JMIR Preprints. 18/08/2020:23246
DOI: 10.2196/preprints.23246
URL: https://preprints.jmir.org/preprint/23246
Vega, J., Li, M., Aguillera, K., Goel, N., Joshi, E., Khandekar, K., ... & Low, C. A. (2021). Reproducible Analysis Pipeline for Data Streams (RAPIDS): Open-Source Software to Process Data Collected with Mobile Devices. Frontiers in Digital Health, 168.
## DBDP (all Empatica sensors)
If you computed features using the provider `[DBDP]` of any of the Empatica sensors (accelerometer, heart rate, temperature, EDA, BVP, IBI, tags) cite [this paper](https://www.cambridge.org/core/journals/journal-of-clinical-and-translational-science/article/digital-biomarker-discovery-pipeline-an-open-source-software-platform-for-the-development-of-digital-biomarkers-using-mhealth-and-wearables-data/A6696CEF138247077B470F4800090E63) in addition to RAPIDS.
!!! cite "Bent et al. citation"
Bent, B., Wang, K., Grzesiak, E., Jiang, C., Qi, Y., Jiang, Y., Cho, P., Zingler, K., Ogbeide, F.I., Zhao, A., Runge, R., Sim, I., Dunn, J. (2020). The Digital Biomarker Discovery Pipeline: An open source software platform for the development of digital biomarkers using mHealth and wearables data. Journal of Clinical and Translational Science, 1-28. doi:10.1017/cts.2020.511
## Panda (accelerometer)
@ -47,10 +51,13 @@ If you computed locations features using the provider `[PHONE_LOCATIONS][BARNETT
## Doryab (locations)
If you computed locations features using the provider `[PHONE_LOCATIONS][DORYAB]` cite [this paper](https://arxiv.org/abs/1812.10394) and [this paper](https://doi.org/10.1145/2750858.2805845) in addition to RAPIDS.
If you computed locations features using the provider `[PHONE_LOCATIONS][DORYAB]` cite [this paper](https://arxiv.org/abs/1812.10394) and [this paper](https://doi.org/10.1145/2750858.2805845) in addition to RAPIDS. In addition, if you used the `SUN_LI_VEGA_STRATEGY` strategy, cite [this paper](https://www.jmir.org/2020/9/e19992/) as well.
!!! cite "Doryab et al. citation"
Doryab, A., Chikarsel, P., Liu, X., & Dey, A. K. (2019). Extraction of Behavioral Features from Smartphone and Wearable Data. ArXiv:1812.10394 [Cs, Stat]. http://arxiv.org/abs/1812.10394
!!! cite "Canzian et al. citation"
Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp '15). Association for Computing Machinery, New York, NY, USA, 12931304. DOI:https://doi.org/10.1145/2750858.2805845
!!! cite "Sun et al. citation"
Sun S, Folarin AA, Ranjan Y, Rashid Z, Conde P, Stewart C, Cummins N, Matcham F, Dalla Costa G, Simblett S, Leocani L, Lamers F, Sørensen PS, Buron M, Zabalza A, Guerrero Pérez AI, Penninx BW, Siddi S, Haro JM, Myin-Germeys I, Rintala A, Wykes T, Narayan VA, Comi G, Hotopf M, Dobson RJ, RADAR-CNS Consortium. Using Smartphones and Wearable Devices to Monitor Behavioral Changes During COVID-19. J Med Internet Res 2020;22(9):e19992

View File

@ -0,0 +1,263 @@
# Common Errors
## Cannot connect to your MySQL server
???+ failure "Problem"
```bash
**Error in .local(drv, \...) :** **Failed to connect to database: Error:
Can\'t initialize character set unknown (path: compiled\_in)** :
Calls: dbConnect -> dbConnect -> .local -> .Call
Execution halted
[Tue Mar 10 19:40:15 2020]
Error in rule download_dataset:
jobid: 531
output: data/raw/p60/locations_raw.csv
RuleException:
CalledProcessError in line 20 of /home/ubuntu/rapids/rules/preprocessing.snakefile:
Command 'set -euo pipefail; Rscript --vanilla /home/ubuntu/rapids/.snakemake/scripts/tmp_2jnvqs7.download_dataset.R' returned non-zero exit status 1.
File "/home/ubuntu/rapids/rules/preprocessing.snakefile", line 20, in __rule_download_dataset
File "/home/ubuntu/anaconda3/envs/moshi-env/lib/python3.7/concurrent/futures/thread.py", line 57, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
```
???+ done "Solution"
Please make sure the `DATABASE_GROUP` in `config.yaml` matches your DB credentials group in `.env`.
---
## Cannot start mysql in linux via `brew services start mysql`
???+ failure "Problem"
Cannot start mysql in linux via `brew services start mysql`
???+ done "Solution"
Use `mysql.server start`
---
## Every time I run force the download_dataset rule all rules are executed
???+ failure "Problem"
When running `snakemake -j1 -R pull_phone_data` or `./rapids -j1 -R pull_phone_data` all the rules and files are re-computed
???+ done "Solution"
This is expected behavior. The advantage of using `snakemake` under the hood is that every time a file containing data is modified every rule that depends on that file will be re-executed to update their results. In this case, since `download_dataset` updates all the raw data, and you are forcing the rule with the flag `-R` every single rule that depends on those raw files will be executed.
---
## Error `Table XXX doesn't exist` while running the `download_phone_data` or `download_fitbit_data` rule.
???+ failure "Problem"
```bash
Error in .local(conn, statement, ...) :
could not run statement: Table 'db_name.table_name' doesn't exist
Calls: colnames ... .local -> dbSendQuery -> dbSendQuery -> .local -> .Call
Execution halted
```
???+ done "Solution"
Please make sure the sensors listed in `[PHONE_VALID_SENSED_BINS][PHONE_SENSORS]` and the `[CONTAINER]` of each sensor you activated in `config.yaml` match your database tables or files.
---
## How do I install RAPIDS on Ubuntu 16.04
???+ done "Solution"
1. Install dependencies (Homebrew - if not installed):
- `sudo apt-get install libmariadb-client-lgpl-dev libxml2-dev libssl-dev`
- Install [brew](https://docs.brew.sh/Homebrew-on-Linux) for linux and add the following line to `~/.bashrc`: `export PATH=$HOME/.linuxbrew/bin:$PATH`
- `source ~/.bashrc`
1. Install MySQL
- `brew install mysql`
- `brew services start mysql`
2. Install R, pandoc and rmarkdown:
- `brew install r`
- `brew install gcc@6` (needed due to this [bug](https://github.com/Homebrew/linuxbrew-core/issues/17812))
- `HOMEBREW_CC=gcc-6 brew install pandoc`
3. Install miniconda using these [instructions](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html)
4. Clone our repo:
- `git clone https://github.com/carissalow/rapids`
5. Create a python virtual environment:
- `cd rapids`
- `conda env create -f environment.yml -n MY_ENV_NAME`
- `conda activate MY_ENV_NAME`
6. Install R packages and virtual environment:
- `snakemake renv_install`
- `snakemake renv_init`
- `snakemake renv_restore`
This step could take several minutes to complete. Please be patient and let it run until completion.
---
## `mysql.h` cannot be found
???+ failure "Problem"
```bash
--------------------------[ ERROR MESSAGE ]----------------------------
<stdin>:1:10: fatal error: mysql.h: No such file or directory
compilation terminated.
-----------------------------------------------------------------------
ERROR: configuration failed for package 'RMySQL'
```
???+ done "Solution"
```bash
sudo apt install libmariadbclient-dev
```
---
## No package `libcurl` found
???+ failure "Problem"
`libcurl` cannot be found
???+ done "Solution"
Install `libcurl`
```bash
sudo apt install libcurl4-openssl-dev
```
---
## Configuration failed because `openssl` was not found.
???+ failure "Problem"
`openssl` cannot be found
???+ done "Solution"
Install `openssl`
```bash
sudo apt install libssl-dev
```
---
## Configuration failed because `libxml-2.0` was not found
???+ failure "Problem"
`libxml-2.0` cannot be found
???+ done "Solution"
Install `libxml-2.0`
```bash
sudo apt install libxml2-dev
```
---
## SSL connection error when running RAPIDS
???+ failure "Problem"
You are getting the following error message when running RAPIDS:
```bash
Error: Failed to connect: SSL connection error: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol.
```
???+ done "Solution"
This is a bug in Ubuntu 20.04 when trying to connect to an old MySQL server with MySQL client 8.0. You should get the same error message if you try to connect from the command line. There you can add the option `--ssl-mode=DISABLED` but we can\'t do this from the R connector.
If you can\'t update your server, the quickest solution would be to import your database to another server or to a local environment. Alternatively, you could replace `mysql-client` and `libmysqlclient-dev` with `mariadb-client` and `libmariadbclient-dev` and reinstall renv. More info about this issue [here](https://bugs.launchpad.net/ubuntu/+source/mysql-8.0/+bug/1872541)
---
## `DB_TABLES` key not found
???+ failure "Problem"
If you get the following error `KeyError in line 43 of preprocessing.smk: 'PHONE_SENSORS'`, it means that the indentation of the key `[PHONE_SENSORS]` is not matching the other child elements of `PHONE_VALID_SENSED_BINS`
???+ done "Solution"
You need to add or remove any leading whitespaces as needed on that line.
```yaml
PHONE_VALID_SENSED_BINS:
COMPUTE: False # This flag is automatically ignored (set to True) if you are extracting PHONE_VALID_SENSED_DAYS or screen or Barnett's location features
BIN_SIZE: &bin_size 5 # (in minutes)
PHONE_SENSORS: []
```
---
## Error while updating your conda environment in Ubuntu
???+ failure "Problem"
You get the following error:
```bash
CondaMultiError: CondaVerificationError: The package for tk located at /home/ubuntu/miniconda2/pkgs/tk-8.6.9-hed695b0_1003
appears to be corrupted. The path 'include/mysqlStubs.h'
specified in the package manifest cannot be found.
ClobberError: This transaction has incompatible packages due to a shared path.
packages: conda-forge/linux-64::llvm-openmp-10.0.0-hc9558a2_0, anaconda/linux-64::intel-openmp-2019.4-243
path: 'lib/libiomp5.so'
```
???+ done "Solution"
Reinstall conda
## Embedded nul in string
???+ failure "Problem"
You get the following error when downloading sensor data:
```bash
Error in result_fetch(res@ptr, n = n) :
embedded nul in string:
```
???+ done "Solution"
This problem is due to the way `RMariaDB` handles a mismatch between data types in R and MySQL (see [this issue](https://github.com/r-dbi/RMariaDB/issues/121)). Since it seems this problem won't be handled by `RMariaDB`, you have two options:
1. Remove the the null character from the conflictive table cell(s). You can adapt the following query on a MySQL server 8.0 or older
```sql
update YOUR_TABLE set YOUR_COLUMN = regexp_replace(YOUR_COLUMN, '\0', '');
```
2. If it's not feasible to modify your data you can try swapping `RMariaDB` with `RMySQL`. Just have in mind you might have problems connecting to modern MySQL servers running in Linux:
- Add `RMySQL` to the renv environment by running the following command in a terminal open on RAPIDS root folder
```bash
R -e 'renv::install("RMySQL")'
```
- Go to `src/data/streams/pull_phone_data.R` or `src/data/streams/pull_fitbit_data.R` and replace `library(RMariaDB)` with `library(RMySQL)`
- In the same file(s) replace `dbEngine <- dbConnect(MariaDB(), default.file = "./.env", group = group)` with `dbEngine <- dbConnect(MySQL(), default.file = "./.env", group = group)`
## There is no package called `RMariaDB`
???+ failure "Problem"
You get the following error when executing RAPIDS:
```bash
Error in library(RMariaDB) : there is no package called 'RMariaDB'
Execution halted
```
???+ done "Solution"
In RAPIDS v0.1.0 we replaced `RMySQL` R package with `RMariaDB`, this error means your R virtual environment is out of date, to update it run `snakemake -j1 renv_restore`
## Unrecognized output timezone "America/New_York"
???+ failure "Problem"
When running RAPIDS with R 4.0.3 on MacOS on M1, lubridate may throw an error associated with the timezone.
```bash
Error in C_force_tz(time, tz = tzone, roll):
CCTZ: Unrecognized output timezone: "America/New_York"
Calls: get_timestamp_filter ... .parse_date_time -> .strptime -> force_tz -> C_force_tz
```
???+ done "Solution"
This is because R timezone library is not set. Please add `Sys.setenv(“TZDIR” = file.path(R.home(), “share”, “zoneinfo”))` to the file active.R in renv folder to set the timezone library. For further details on how to test if `TZDIR` is properly set, please refer to `https://github.com/tidyverse/lubridate/issues/928#issuecomment-720059233`.
## Unimplemented MAX_NO_FIELD_TYPES
???+ failure "Problem"
You get the following error when downloading Fitbit data:
```bash
Error: Unimplemented MAX_NO_FIELD_TYPES
Execution halted
```
???+ done "Solution"
At the moment RMariaDB [cannot handle](https://github.com/r-dbi/RMariaDB/issues/127) MySQL columns of JSON type. Change the type of your Fitbit data column to `longtext` (note that the content will not change and will still be a JSON object just interpreted as a string).
## Running RAPIDS on Apple Silicon M1 Mac
???+ failure "Problem"
You get the following error when installing pandoc or running rapids:
```bash
MoSHI/rapids/renv/staging/1/00LOCK-KernSmooth/00new/KernSmooth/libs/KernSmooth.so: mach-0, but wrong architecture
```
???+ done "Solution"
As of Feb 2020 in M1 macs, R needs to be installed via brew under Rosetta (x86 arch) due to some incompatibility with selected R libraries. To do this, run your terminal [via Rosetta](https://www.youtube.com/watch?v=nv2ylxro7rM&t=138s), then proceed with the usual brew installation command. x86 homebrew should be installed in `/usr/local/bin/brew `, you can check which brew you are using by typing `which brew`. Then use x86 homebrew to install R and restore RAPIDS packages (`renv_restore`).

View File

@ -0,0 +1,56 @@
# Contributing
Thank you for taking the time to contribute!
All changes, small or big, are welcome, and regardless of who you are, we are always happy to work together to make your contribution as strong as possible. We follow the [Covenant Code of Conduct](../code_of_conduct), so we ask you to uphold it. Be kind to everyone in the community, and please report unacceptable behavior to moshiresearch@gmail.com.
## Questions, Feature Requests, and Discussions
Post any questions, feature requests, or discussions in our [GitHub Discussions tab](https://github.com/carissalow/rapids/discussions).
## Bug Reports
Report any bugs in our [GithHub issue tracker](https://github.com/carissalow/rapids/issues) keeping in mind to:
- Debug and simplify the problem to create a minimal example. For example, reduce the problem to a single participant, sensor, and a few rows of data.
- Provide a clear and succinct description of the problem (expected behavior vs. actual behavior).
- Attach your `config.yaml`, time segments file, and time zones file if appropriate.
- Attach test data if possible and any screenshots or extra resources that will help us debug the problem.
- Share the commit you are running: `git rev-parse --short HEAD`
- Share your OS version (e.g., Windows 10)
- Share the device/sensor you are processing (e.g., phone accelerometer)
## Documentation Contributions
If you want to fix a typo or any other minor changes, you can edit the file online by clicking on the pencil icon at the top right of any page and opening a pull request using [Github's website](https://docs.github.com/en/github/managing-files-in-a-repository/editing-files-in-your-repository)
If your changes are more complex, clone RAPIDS' repository, setup the dev environment for our documentation with this [tutorial](../developers/documentation), and submit any changes on a new *feature branch* following our [git flow](../developers/git-flow).
## Code Contributions
!!! hint "Hints for any code changes"
- To submit any new code, use a new *feature branch* following our [git flow](../developers/git-flow).
- If you neeed a new Python or R package in RAPIDS' virtual environments, follow this [tutorial](../developers/virtual-environments/)
- If you need to change the `config.yaml` you will need to update its validation schema with this [tutorial](../developers/validation-schema-config/)
### New Data Streams
*New data containers.* If you want to process data from a device RAPIDS supports ([see this table](../datastreams/data-streams-introduction/)) but it's stored in a database engine or file type we don't support yet, [implement a new data stream container and format](../datastreams/add-new-data-streams/). You can copy and paste the `format.yaml` of one of the other streams of the device you are targeting.
*New sensing apps.* If you want to add support for new smartphone sensing apps like Beiwe, [implement a new data stream container and format](../datastreams/add-new-data-streams/).
*New wearable devices.* If you want to add support for a new wearable, open a [Github discussion](https://github.com/carissalow/rapids/discussions), so we can add the necessary initial configuration files and code.
### New Behavioral Features
If you want to add new [behavioral features](../features/feature-introduction/) for mobile sensors RAPIDS already supports, follow this [tutorial](../features/add-new-features/). A sensor is supported if it has a configuration section in `config.yaml`.
If you want to add new [behavioral features](../features/feature-introduction/) for mobile sensors RAPIDS does not support yet, open a [Github discussion](https://github.com/carissalow/rapids/discussions), so we can add the necessary initial configuration files and code.
### New Tests
If you want to add new tests for existent behavioral features, follow this [tutorial](../developers/testing).
### New Visualizations
Open a [Github discussion](https://github.com/carissalow/rapids/discussions), so we can add the necessary initial configuration files and code.

View File

@ -0,0 +1,350 @@
# Add New Data Streams
A data stream is a set of sensor data collected using a specific type of **device** with a specific **format** and stored in a specific **container**. RAPIDS is agnostic to data streams' formats and container; see the [Data Streams Introduction](../data-streams-introduction) for a list of supported streams.
**A container** is queried with an R or Python script that connects to the database, API or file where your stream's raw data is stored.
**A format** is described using a `format.yaml` file that specifies how to map and mutate your stream's raw data to match the data and format RAPIDS needs.
The most common cases when you would want to implement a new data stream are:
- You collected data with a mobile sensing app RAPIDS does not support yet. For example, [Beiwe](https://www.beiwe.org/) data stored in MySQL. You will need to define a new format file and a new container script.
- You collected data with a mobile sensing app RAPIDS supports, but this data is stored in a container that RAPIDS can't connect to yet. For example, AWARE data stored in PostgreSQL. In this case, you can reuse the format file of the `aware_mysql` stream, but you will need to implement a new container script.
!!! hint
Both the `container.[R|py]` and the `format.yaml` are stored in `./src/data/streams/[stream_name]` where `[stream_name]` can be `aware_mysql` for example.
## Implement a Container
The `container` script of a data stream can be implemented in R (strongly recommended) or python. This script must have two functions if you are implementing a stream for phone data or one function otherwise. The script can contain other auxiliary functions.
First of all, add any parameters your script might need in `config.yaml` under `(device)_DATA_STREAMS`. These parameters will be available in the `stream_parameters` argument of the one or two functions you implement. For example, if you are adding support for `Beiwe` data stored in `PostgreSQL` and your container needs a set of credentials to connect to a database, your new data stream configuration would be:
```yaml hl_lines="7 8"
PHONE_DATA_STREAMS:
USE: aware_python
# AVAILABLE:
aware_mysql:
DATABASE_GROUP: MY_GROUP
beiwe_postgresql:
DATABASE_GROUP: MY_GROUP # users define this group (user, password, host, etc.) in credentials.yaml
```
Then implement one or both of the following functions:
=== "pull_data"
This function returns the data columns for a specific sensor and participant. It has the following parameters:
| Param | Description |
|--------------------|-------------------------------------------------------------------------------------------------------|
| stream_parameters | Any parameters (keys/values) set by the user in any `[DEVICE_DATA_STREAMS][stream_name]` key of `config.yaml`. For example, `[DATABASE_GROUP]` inside `[FITBIT_DATA_STREAMS][fitbitjson_mysql]` |
| sensor_container | The value set by the user in any `[DEVICE_SENSOR][CONTAINER]` key of `config.yaml`. It can be a table, file path, or whatever data source you want to support that contains the **data from a single sensor for all participants**. For example, `[PHONE_ACCELEROMETER][CONTAINER]`|
| device | The device id that you need to get the data for (this is set by the user in the [participant files](../../setup/configuration/#participant-files)). For example, in AWARE this device id is a uuid|
| columns | A list of the columns that you need to get from `sensor_container`. You specify these columns in your stream's `format.yaml`|
!!! example
This is the `pull_data` function we implemented for `aware_mysql`. Note that we can `message`, `warn` or `stop` the user during execution.
```r
pull_data <- function(stream_parameters, device, sensor_container, columns){
# get_db_engine is an auxiliary function not shown here for brevity bu can be found in src/data/streams/aware_mysql/container.R
dbEngine <- get_db_engine(stream_parameters$DATABASE_GROUP)
query <- paste0("SELECT ", paste(columns, collapse = ",")," FROM ", sensor_container, " WHERE device_id = '", device,"'")
# Letting the user know what we are doing
message(paste0("Executing the following query to download data: ", query))
sensor_data <- dbGetQuery(dbEngine, query)
dbDisconnect(dbEngine)
if(nrow(sensor_data) == 0)
warning(paste("The device '", device,"' did not have data in ", sensor_container))
return(sensor_data)
}
```
=== "infer_device_os"
!!! warning
This function is only necessary for phone data streams.
RAPIDS allows users to use the keyword `infer` (previously `multiple`) to [automatically infer](../../setup/configuration/#structure-of-participants-files) the mobile Operative System a phone was running.
If you have a way to infer the OS of a device id, implement this function. For example, for AWARE data we use the `aware_device` table.
If you don't have a way to infer the OS, call `stop("Error Message")` so other users know they can't use `infer` or the inference failed, and they have to assign the OS manually in the participant file.
This function returns the operative system (`android` or `ios`) for a specific phone device id. It has the following parameters:
| Param | Description |
|--------------------|-------------------------------------------------------------------------------------------------------|
| stream_parameters | Any parameters (keys/values) set by the user in any `[DEVICE_DATA_STREAMS][stream_name]` key of `config.yaml`. For example, `[DATABASE_GROUP]` inside `[FITBIT_DATA_STREAMS][fitbitjson_mysql]` |
| device | The device id that you need to infer the OS for (this is set by the user in the [participant files](../../setup/configuration/#participant-files)). For example, in AWARE this device id is a uuid|
!!! example
This is the `infer_device_os` function we implemented for `aware_mysql`. Note that we can `message`, `warn` or `stop` the user during execution.
```r
infer_device_os <- function(stream_parameters, device){
# get_db_engine is an auxiliary function not shown here for brevity bu can be found in src/data/streams/aware_mysql/container.R
group <- stream_parameters$DATABASE_GROUP
dbEngine <- dbConnect(MariaDB(), default.file = "./.env", group = group)
query <- paste0("SELECT device_id,brand FROM aware_device WHERE device_id = '", device, "'")
message(paste0("Executing the following query to infer phone OS: ", query))
os <- dbGetQuery(dbEngine, query)
dbDisconnect(dbEngine)
if(nrow(os) > 0)
return(os %>% mutate(os = ifelse(brand == "iPhone", "ios", "android")) %>% pull(os))
else
stop(paste("We cannot infer the OS of the following device id because it does not exist in the aware_device table:", device))
return(os)
}
```
## Implement a Format
A format file `format.yaml` describes the mapping between your stream's raw data and the data that RAPIDS needs. This file has a section per sensor (e.g. `PHONE_ACCELEROMETER`), and each section has two attributes (keys):
1. `RAPIDS_COLUMN_MAPPINGS` are mappings between the columns RAPIDS needs and the columns your raw data already has.
1. The reserved keyword `FLAG_TO_MUTATE` flags columns that RAPIDS requires but that are not initially present in your container (database, CSV file). These columns have to be created by your mutation scripts.
2. `MUTATION`. Sometimes your raw data needs to be transformed to match the format RAPIDS can handle (including creating columns marked as `FLAG_TO_MUTATE`)
2. `COLUMN_MAPPINGS` are mappings between the columns a mutation `SCRIPT` needs and the columns your raw data has.
2. `SCRIPTS` are a collection of R or Python scripts that transform one or more raw data columns into the format RAPIDS needs.
!!! hint
`[RAPIDS_COLUMN_MAPPINGS]` and `[MUTATE][COLUMN_MAPPINGS]` have a `key` (left-hand side string) and a `value` (right-hand side string). The `values` are the names used to pulled columns from a container (e.g., columns in a database table). All `values` are renamed to their `keys` in lower case. The renamed columns are sent to every mutation script within the `data` argument, and the final output is the input RAPIDS process further.
For example, let's assume we are implementing `beiwe_mysql` and defining the following format for `PHONE_FAKESENSOR`:
```yaml
PHONE_FAKESENSOR:
ANDROID:
RAPIDS_COLUMN_MAPPINGS:
TIMESTAMP: beiwe_timestamp
DEVICE_ID: beiwe_deviceID
MAGNITUDE_SQUARED: FLAG_TO_MUTATE
MUTATE:
COLUMN_MAPPINGS:
MAGNITUDE: beiwe_value
SCRIPTS:
- src/data/streams/mutations/phone/square_magnitude.py
```
RAPIDS will:
1. Download `beiwe_timestamp`, `beiwe_deviceID`, and `beiwe_value` from the container of `beiwe_mysql` (MySQL DB)
2. Rename these columns to `timestamp`, `device_id`, and `magnitude`, respectively.
3. Execute `square_magnitude.py` with a data frame as an argument containing the renamed columns. This script will square `magnitude` and rename it to `magnitude_squared`
4. Verify the data frame returned by `square_magnitude.py` has the columns RAPIDS needs `timestamp`, `device_id`, and `magnitude_squared`.
5. Use this data frame as the input to be processed in the pipeline.
Note that although `RAPIDS_COLUMN_MAPPINGS` and `[MUTATE][COLUMN_MAPPINGS]` keys are in capital letters for readability (e.g. `MAGNITUDE_SQUARED`), the names of the final columns you mutate in your scripts should be lower case.
Let's explain in more depth this column mapping with examples.
### Name mapping
The mapping for some sensors is straightforward. For example, accelerometer data most of the time has a timestamp, three axes (x,y,z), and a device id that produced it. AWARE and a different sensing app like Beiwe likely logged accelerometer data in the same way but with different column names. In this case, we only need to match Beiwe data columns to RAPIDS columns one-to-one:
```yaml hl_lines="4 5 6 7 8"
PHONE_ACCELEROMETER:
ANDROID:
RAPIDS_COLUMN_MAPPINGS:
TIMESTAMP: beiwe_timestamp
DEVICE_ID: beiwe_deviceID
DOUBLE_VALUES_0: beiwe_x
DOUBLE_VALUES_1: beiwe_y
DOUBLE_VALUES_2: beiwe_z
MUTATE:
COLUMN_MAPPINGS:
SCRIPTS: # it's ok if this is empty
```
### Value mapping
For some sensors, we need to map column names and values. For example, screen data has ON and OFF events; let's suppose Beiwe represents an ON event with the number `1,` but RAPIDS identifies ON events with the number `2`. In this case, we need to mutate the raw data coming from Beiwe and replace all `1`s with `2`s.
We do this by listing one or more R or Python scripts in `MUTATION_SCRIPTS` that will be executed in order. We usually store all mutation scripts under `src/data/streams/mutations/[device]/[platform]/` and they can be reused across data streams.
```yaml hl_lines="10"
PHONE_SCREEN:
ANDROID:
RAPIDS_COLUMN_MAPPINGS:
TIMESTAMP: beiwe_timestamp
DEVICE_ID: beiwe_deviceID
EVENT: beiwe_event
MUTATE:
COLUMN_MAPPINGS:
SCRIPTS:
- src/data/streams/mutations/phone/beiwe/beiwe_screen_map.py
```
!!! hint
- A `MUTATION_SCRIPT` can also be used to clean/preprocess your data before extracting behavioral features.
- A mutation script has to have a `main` function that receives two arguments, `data` and `stream_parameters`.
- The `stream_parameters` argument contains the `config.yaml` key/values of your data stream (this is the same argument that your `container.[py|R]` script receives, see [Implement a Container](#implement-a-container)).
=== "python"
Example of a python mutation script
```python
import pandas as pd
def main(data, stream_parameters):
# mutate data
return(data)
```
=== "R"
Example of a R mutation script
```r
source("renv/activate.R") # needed to use RAPIDS renv environment
library(dplyr)
main <- function(data, stream_parameters){
# mutate data
return(data)
}
```
### Complex mapping
Sometimes, your raw data doesn't even have the same columns RAPIDS expects for a sensor. For example, let's pretend Beiwe stores `PHONE_ACCELEROMETER` axis data in a single column called `acc_col` instead of three. You have to create a `MUTATION_SCRIPT` to split `acc_col` into three columns `x`, `y`, and `z`.
For this, you mark the three axes columns RAPIDS needs in `[RAPIDS_COLUMN_MAPPINGS]` with the word `FLAG_TO_MUTATE`, map `acc_col` in `[MUTATION][COLUMN_MAPPINGS]`, and list a Python script under `[MUTATION][SCRIPTS]` with the code to split `acc_col`. See an example below.
RAPIDS expects that every column mapped as `FLAG_TO_MUTATE` will be generated by your mutation script, so it won't try to retrieve them from your container (database, CSV file, etc.).
In our example, `acc_col` will be fetched from the stream's container and renamed to `JOINED_AXES` because `beiwe_split_acc.py` will split it into `double_values_0`, `double_values_1`, and `double_values_2`.
```yaml hl_lines="6 7 8 11 13"
PHONE_ACCELEROMETER:
ANDROID:
RAPIDS_COLUMN_MAPPINGS:
TIMESTAMP: beiwe_timestamp
DEVICE_ID: beiwe_deviceID
DOUBLE_VALUES_0: FLAG_TO_MUTATE
DOUBLE_VALUES_1: FLAG_TO_MUTATE
DOUBLE_VALUES_2: FLAG_TO_MUTATE
MUTATE:
COLUMN_MAPPINGS:
JOINED_AXES: acc_col
SCRIPTS:
- src/data/streams/mutations/phone/beiwe/beiwe_split_acc.py
```
This is a draft of `beiwe_split_acc.py` `MUTATION_SCRIPT`:
```python
import pandas as pd
def main(data, stream_parameters):
# data has the acc_col
# split acc_col into three columns: double_values_0, double_values_1, double_values_2 to match RAPIDS format
# remove acc_col since we don't need it anymore
return(data)
```
### OS complex mapping
There is a special case for a complex mapping scenario for smartphone data streams. The Android and iOS sensor APIs return data in different formats for certain sensors (like screen, activity recognition, battery, among others).
In case you didn't notice, the examples we have used so far are grouped under an `ANDROID` key, which means they will be applied to data collected by Android phones. Additionally, each sensor has an `IOS` key for a similar purpose. We use the complex mapping described above to transform iOS data into an Android format (it's always iOS to Android and any new phone data stream must do the same).
For example, this is the `format.yaml` key for `PHONE_ACTVITY_RECOGNITION`. Note that the `ANDROID` mapping is simple (one-to-one) but the `IOS` mapping is complex with three `FLAG_TO_MUTATE` columns, two `[MUTATE][COLUMN_MAPPINGS]` mappings, and one `[MUTATION][SCRIPT]`.
```yaml hl_lines="16 17 18 21 22 24"
PHONE_ACTIVITY_RECOGNITION:
ANDROID:
RAPIDS_COLUMN_MAPPINGS:
TIMESTAMP: timestamp
DEVICE_ID: device_id
ACTIVITY_TYPE: activity_type
ACTIVITY_NAME: activity_name
CONFIDENCE: confidence
MUTATION:
COLUMN_MAPPINGS:
SCRIPTS:
IOS:
RAPIDS_COLUMN_MAPPINGS:
TIMESTAMP: timestamp
DEVICE_ID: device_id
ACTIVITY_TYPE: FLAG_TO_MUTATE
ACTIVITY_NAME: FLAG_TO_MUTATE
CONFIDENCE: FLAG_TO_MUTATE
MUTATION:
COLUMN_MAPPINGS:
ACTIVITIES: activities
CONFIDENCE: confidence
SCRIPTS:
- "src/data/streams/mutations/phone/aware/activity_recogniton_ios_unification.R"
```
??? "Example activity_recogniton_ios_unification.R"
In this `MUTATION_SCRIPT` we create `ACTIVITY_NAME` and `ACTIVITY_TYPE` based on `activities`, and map `confidence` iOS values to Android values.
```R
source("renv/activate.R")
library("dplyr", warn.conflicts = F)
library(stringr)
clean_ios_activity_column <- function(ios_gar){
ios_gar <- ios_gar %>%
mutate(activities = str_replace_all(activities, pattern = '("|\\[|\\])', replacement = ""))
existent_multiple_activities <- ios_gar %>%
filter(str_detect(activities, ",")) %>%
group_by(activities) %>%
summarise(mutiple_activities = unique(activities), .groups = "drop_last") %>%
pull(mutiple_activities)
known_multiple_activities <- c("stationary,automotive")
unkown_multiple_actvities <- setdiff(existent_multiple_activities, known_multiple_activities)
if(length(unkown_multiple_actvities) > 0){
stop(paste0("There are unkwown combinations of ios activities, you need to implement the decision of the ones to keep: ", unkown_multiple_actvities))
}
ios_gar <- ios_gar %>%
mutate(activities = str_replace_all(activities, pattern = "stationary,automotive", replacement = "automotive"))
return(ios_gar)
}
unify_ios_activity_recognition <- function(ios_gar){
# We only need to unify Google Activity Recognition data for iOS
# discard rows where activities column is blank
ios_gar <- ios_gar[-which(ios_gar$activities == ""), ]
# clean "activities" column of ios_gar
ios_gar <- clean_ios_activity_column(ios_gar)
# make it compatible with android version: generate "activity_name" and "activity_type" columns
ios_gar <- ios_gar %>%
mutate(activity_name = case_when(activities == "automotive" ~ "in_vehicle",
activities == "cycling" ~ "on_bicycle",
activities == "walking" ~ "walking",
activities == "running" ~ "running",
activities == "stationary" ~ "still"),
activity_type = case_when(activities == "automotive" ~ 0,
activities == "cycling" ~ 1,
activities == "walking" ~ 7,
activities == "running" ~ 8,
activities == "stationary" ~ 3,
activities == "unknown" ~ 4),
confidence = case_when(confidence == 0 ~ 0,
confidence == 1 ~ 50,
confidence == 2 ~ 100)
) %>%
select(-activities)
return(ios_gar)
}
main <- function(data, stream_parameters){
return(unify_ios_activity_recognition(data, stream_parameters))
}
```

View File

@ -0,0 +1,32 @@
# `aware_csv`
This [data stream](../../datastreams/data-streams-introduction) handles iOS and Android sensor data collected with the [AWARE Framework](https://awareframework.com/) and stored in CSV files.
!!! warning
The CSV files have to use `,` as separator, `\` as escape character (do not escape `"` with `""`), and wrap any string columns with `"`.
See examples in the CSV files inside [rapids_example_csv.zip](https://osf.io/wbg23/)
??? example "Example of a valid CSV file"
```csv
"_id","timestamp","device_id","activities","confidence","stationary","walking","running","automotive","cycling","unknown","label"
1,1587528000000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,""
2,1587528060000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
3,1587528120000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
4,1587528180000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
5,1587528240000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
6,1587528300000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
7,1587528360000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
```
## Container
A CSV file per sensor, each containing the data for all participants.
The script to connect and download data from this container is at:
```bash
src/data/streams/aware_csv/container.R
```
## Format
--8<---- "docs/snippets/aware_format.md"

View File

@ -0,0 +1,18 @@
# `aware_influxdb (beta)`
!!! warning
This data stream is being released in beta while we test it thoroughly.
This [data stream](../../datastreams/data-streams-introduction) handles iOS and Android sensor data collected with the [AWARE Framework](https://awareframework.com/) and stored in an InfluxDB database.
## Container
An InfluxDB database with a table per sensor, each containing the data for all participants.
The script to connect and download data from this container is at:
```bash
src/data/streams/aware_influxdb/container.R
```
## Format
--8<---- "docs/snippets/aware_format.md"

View File

@ -0,0 +1,15 @@
# `aware_micro_mysql`
This [data stream](../../datastreams/data-streams-introduction) handles iOS and Android sensor data collected with the [AWARE Framework's](https://awareframework.com/) [AWARE Micro](https://github.com/denzilferreira/aware-micro) server and stored in a MySQL database.
## Container
A MySQL database with a table per sensor, each containing the data for all participants. Sensor data is stored in a JSON field within each table called `data`
The script to connect and download data from this container is at:
```bash
src/data/streams/aware_micro_mysql/container.R
```
## Format
--8<---- "docs/snippets/aware_format.md"

View File

@ -0,0 +1,15 @@
# `aware_mysql`
This [data stream](../../datastreams/data-streams-introduction) handles iOS and Android sensor data collected with the [AWARE Framework](https://awareframework.com/) and stored in a MySQL database.
## Container
A MySQL database with a table per sensor, each containing the data for all participants. This is the default database created by the old PHP AWARE server (as opposed to the new JavaScript Micro server).
The script to connect and download data from this container is at:
```bash
src/data/streams/aware_mysql/container.R
```
## Format
--8<---- "docs/snippets/aware_format.md"

View File

@ -0,0 +1,26 @@
# Data Streams Introduction
A data stream is a set of sensor data collected using a specific type of **device** with a specific **format** and stored in a specific **container**.
For example, the `aware_mysql` data stream handles smartphone data (**device**) collected with the [AWARE Framework](https://awareframework.com/) (**format**) stored in a MySQL database (**container**). Similarly, smartphone data collected with [Beiwe](https://www.beiwe.org/) will have a different format and could be stored in a container like a PostgreSQL database or a CSV file.
If you want to process a data stream using RAPIDS, make sure that your data is stored in a supported **format** and **container** (see table below).
If RAPIDS doesn't support your data stream yet (e.g. Beiwe data stored in PostgreSQL, or AWARE data stored in SQLite), you can always [implement a new data stream](../add-new-data-streams). If it's something you think other people might be interested on, we will be happy to include your new data stream in RAPIDS, so get in touch!.
!!! hint
Currently, you can add new data streams for smartphones, Fitbit, and Empatica devices. If you need RAPIDS to process data from **other devices**, like Oura Rings or Actigraph wearables, get in touch. It is a more complicated process that could take a couple of days to implement for someone familiar with R or Python, but we would be happy to work on it together.
For reference, these are the data streams we currently support:
| Data Stream | Device | Format | Container | Docs
|--|--|--|--|--|
| `aware_mysql`| Phone | AWARE app | MySQL | [link](../aware-mysql)
| `aware_micro_mysql`| Phone | AWARE Micro server | MySQL | [link](../aware-micro-mysql)
| `aware_csv`| Phone | AWARE app | CSV files | [link](../aware-csv)
| `aware_influxdb` (beta)| Phone | AWARE app | InfluxDB | [link](../aware-influxdb)
| `fitbitjson_mysql`| Fitbit | JSON (per [Fitbit's API](https://dev.fitbit.com/build/reference/web-api/)) | MySQL | [link](../fitbitjson-mysql)
| `fitbitjson_csv`| Fitbit | JSON (per [Fitbit's API](https://dev.fitbit.com/build/reference/web-api/)) | CSV files | [link](../fitbitjson-csv)
| `fitbitparsed_mysql`| Fitbit | Parsed (parsed API data) | MySQL | [link](../fitbitparsed-mysql)
| `fitbitparsed_csv`| Fitbit | Parsed (parsed API data) | CSV files | [link](../fitbitparsed-csv)
| `empatica_zip`| Empatica | [E4 Connect](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) | ZIP files | [link](../empatica-zip)

View File

@ -0,0 +1,136 @@
# `empatica_zip`
This [data stream](../../datastreams/data-streams-introduction) handles Empatica sensor data downloaded as zip files using the [E4 Connect](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-).
## Container
You need to create a subfolder for every participant named after their `device id` inside the folder specified by `[EMPATICA_DATA_STREAMS][empatica_zipfiles][FOLDER]`. You can add one or more Empatica zip files to any subfolder.
The script to connect and download data from this container is at:
```bash
src/data/streams/empatica_zip/container.R
```
## Format
The `format.yaml` maps and transforms columns in your raw data stream to the [mandatory columns RAPIDS needs for Empatica sensors](../mandatory-empatica-format). This file is at:
```bash
src/data/streams/empatica_zip/format.yaml
```
All columns are mutated from the raw data in the zip files so you don't need to modify any column mappings.
??? info "EMPATICA_ACCELEROMETER"
**RAPIDS_COLUMN_MAPPINGS**
| RAPIDS column | Stream column |
|-----------------|-----------------|
| TIMESTAMP | timestamp|
| DEVICE_ID | device_id|
| DOUBLE_VALUES_0 | double_values_0|
| DOUBLE_VALUES_1 | double_values_1|
| DOUBLE_VALUES_2 | double_values_2|
**MUTATION**
- **COLUMN_MAPPINGS** (None)
- **SCRIPTS** (None)
??? info "EMPATICA_HEARTRATE"
**RAPIDS_COLUMN_MAPPINGS**
| RAPIDS column | Stream column |
|-----------------|-----------------|
|TIMESTAMP | timestamp|
|DEVICE_ID | device_id|
|HEARTRATE | heartrate|
**MUTATION**
- **COLUMN_MAPPINGS** (None)
- **SCRIPTS** (None)
??? info "EMPATICA_TEMPERATURE"
**RAPIDS_COLUMN_MAPPINGS**
| RAPIDS column | Stream column |
|-----------------|-----------------|
|TIMESTAMP | timestamp|
|DEVICE_ID | device_id|
|TEMPERATURE | temperature|
**MUTATION**
- **COLUMN_MAPPINGS** (None)
- **SCRIPTS** (None)
??? info "EMPATICA_ELECTRODERMAL_ACTIVITY"
**RAPIDS_COLUMN_MAPPINGS**
| RAPIDS column | Stream column |
|-----------------|-----------------|
|TIMESTAMP | timestamp|
|DEVICE_ID | device_id|
|ELECTRODERMAL_ACTIVITY | electrodermal_activity|
**MUTATION**
- **COLUMN_MAPPINGS** (None)
- **SCRIPTS** (None)
??? info "EMPATICA_BLOOD_VOLUME_PULSE"
**RAPIDS_COLUMN_MAPPINGS**
| RAPIDS column | Stream column |
|-----------------|-----------------|
|TIMESTAMP | timestamp|
|DEVICE_ID | device_id|
|BLOOD_VOLUME_PULSE | blood_volume_pulse|
**MUTATION**
- **COLUMN_MAPPINGS** (None)
- **SCRIPTS** (None)
??? info "EMPATICA_INTER_BEAT_INTERVAL"
**RAPIDS_COLUMN_MAPPINGS**
| RAPIDS column | Stream column |
|-----------------|-----------------|
|TIMESTAMP | timestamp|
|DEVICE_ID | device_id|
|INTER_BEAT_INTERVAL | inter_beat_interval|
**MUTATION**
- **COLUMN_MAPPINGS** (None)
- **SCRIPTS** (None)
??? info "EMPATICA_EMPATICA_TAGS"
**RAPIDS_COLUMN_MAPPINGS**
| RAPIDS column | Stream column |
|-----------------|-----------------|
|TIMESTAMP | timestamp|
|DEVICE_ID | device_id|
|TAGS | tags|
**MUTATION**
- **COLUMN_MAPPINGS** (None)
- **SCRIPTS** (None)

View File

@ -0,0 +1,23 @@
# `fitbitjson_csv`
This [data stream](../../datastreams/data-streams-introduction) handles Fitbit sensor data downloaded using the [Fitbit Web API](https://dev.fitbit.com/build/reference/web-api/) and stored in a CSV file. Please note that RAPIDS cannot query the API directly; you need to use other available tools or implement your own. Once you have your sensor data in a CSV file, RAPIDS can process it.
!!! warning
The CSV files have to use `,` as separator, `\` as escape character (do not escape `"` with `""`), and wrap any string columns with `"`.
??? example "Example of a valid CSV file"
```csv
"timestamp","device_id","label","fitbit_id","fitbit_data_type","fitbit_data"
1587614400000,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524","5S","5ZKN9B","steps","{\"activities-steps\":[{\"dateTime\":\"2020-04-23\",\"value\":\"7881\"}]"
```
## Container
The container should be a CSV file per Fitbit sensor, each containing all participants' data.
The script to connect and download data from this container is at:
```bash
src/data/streams/fitbitjson_csv/container.R
```
## Format
--8<---- "docs/snippets/jsonfitbit_format.md"

View File

@ -0,0 +1,14 @@
# `fitbitjson_mysql`
This [data stream](../../datastreams/data-streams-introduction) handles Fitbit sensor data downloaded using the [Fitbit Web API](https://dev.fitbit.com/build/reference/web-api/) and stored in a MySQL database. Please note that RAPIDS cannot query the API directly; you need to use other available tools or implement your own. Once you have your sensor data in a MySQL database, RAPIDS can process it.
## Container
The container should be a MySQL database with a table per sensor, each containing all participants' data.
The script to connect and download data from this container is at:
```bash
src/data/streams/fitbitjson_mysql/container.R
```
## Format
--8<---- "docs/snippets/jsonfitbit_format.md"

View File

@ -0,0 +1,29 @@
# `fitbitparsed_csv`
This [data stream](../../datastreams/data-streams-introduction) handles Fitbit sensor data downloaded using the [Fitbit Web API](https://dev.fitbit.com/build/reference/web-api/), **parsed**, and stored in a CSV file. Please note that RAPIDS cannot query the API directly; you need to use other available tools or implement your own. Once you have your parsed sensor data in a CSV file, RAPIDS can process it.
!!! info "What is the difference between JSON and plain data streams"
Most people will only need `fitbitjson_*` because they downloaded and stored their data directly from Fitbit's API. However, if, for some reason, you don't have access to that JSON data and instead only have the parsed data (columns and rows), you can use this data stream.
!!! warning
The CSV files have to use `,` as separator, `\` as escape character (do not escape `"` with `""`), and wrap any string columns with `"`.
??? example "Example of a valid CSV file"
```csv
"device_id","heartrate","heartrate_zone","local_date_time","timestamp"
"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",69,"outofrange","2020-04-23 00:00:00",0
"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",69,"outofrange","2020-04-23 00:01:00",0
"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",67,"outofrange","2020-04-23 00:02:00",0
"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",69,"outofrange","2020-04-23 00:03:00",0
```
## Container
The container should be a CSV file per sensor, each containing all participants' data.
The script to connect and download data from this container is at:
```bash
src/data/streams/fitbitparsed_csv/container.R
```
## Format
--8<---- "docs/snippets/parsedfitbit_format.md"

View File

@ -0,0 +1,17 @@
# `fitbitparsed_mysql`
This [data stream](../../datastreams/data-streams-introduction) handles Fitbit sensor data downloaded using the [Fitbit Web API](https://dev.fitbit.com/build/reference/web-api/), **parsed**, and stored in a MySQL database. Please note that RAPIDS cannot query the API directly; you need to use other available tools or implement your own. Once you have your parsed sensor data in a MySQL database, RAPIDS can process it.
!!! info "What is the difference between JSON and plain data streams"
Most people will only need `fitbitjson_*` because they downloaded and stored their data directly from Fitbit's API. However, if, for some reason, you don't have access to that JSON data and instead only have the parsed data (columns and rows), you can use this data stream.
## Container
The container should be a MySQL database with a table per sensor, each containing all participants' data.
The script to connect and download data from this container is at:
```bash
src/data/streams/fitbitparsed_mysql/container.R
```
## Format
--8<---- "docs/snippets/parsedfitbit_format.md"

View File

@ -0,0 +1,61 @@
# Mandatory Empatica Format
This is a description of the format RAPIDS needs to process data for the following Empatica sensors.
??? info "EMPATICA_ACCELEROMETER"
| RAPIDS column | Description |
|-----------------|--------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| DOUBLE_VALUES_0 | x axis of acceleration |
| DOUBLE_VALUES_1 | y axis of acceleration |
| DOUBLE_VALUES_2 | z axis of acceleration |
??? info "EMPATICA_HEARTRATE"
| RAPIDS column | Description |
|-----------------|-----------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
| DEVICE_ID | A string that uniquely identifies a device |
| HEARTRATE | Intraday heartrate |
??? info "EMPATICA_TEMPERATURE"
| RAPIDS column | Description |
|-----------------|-----------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
| DEVICE_ID | A string that uniquely identifies a device |
| TEMPERATURE | temperature |
??? info "EMPATICA_ELECTRODERMAL_ACTIVITY"
| RAPIDS column | Description |
|-----------------|-----------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
| DEVICE_ID | A string that uniquely identifies a device |
| ELECTRODERMAL_ACTIVITY | electrical conductance |
??? info "EMPATICA_BLOOD_VOLUME_PULSE"
| RAPIDS column | Description |
|-----------------|-----------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
| DEVICE_ID | A string that uniquely identifies a device |
| BLOOD_VOLUME_PULSE | blood volume pulse |
??? info "EMPATICA_INTER_BEAT_INTERVAL"
| RAPIDS column | Description |
|-----------------|-----------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
| DEVICE_ID | A string that uniquely identifies a device |
| INTER_BEAT_INTERVAL | inter beat interval |
??? info "EMPATICA_TAGS"
| RAPIDS column | Description |
|-----------------|-----------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
| DEVICE_ID | A string that uniquely identifies a device |
| TAGS | tags |

View File

@ -0,0 +1,75 @@
# Mandatory Fitbit Format
This is a description of the format RAPIDS needs to process data for the following Fitbit\ sensors.
??? info "FITBIT_HEARTRATE_SUMMARY"
| RAPIDS column | Description |
|-----------------|-----------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
| LOCAL_DATE_TIME | Date time string with format `yyyy-mm-dd hh:mm:ss` |
| DEVICE_ID | A string that uniquely identifies a device |
| HEARTRATE_DAILY_RESTINGHR | Daily resting heartrate |
| HEARTRATE_DAILY_CALORIESOUTOFRANGE | Calories spent while heartrate was oustide a heartrate [zone](https://help.fitbit.com/articles/en_US/Help_article/1565.htm#) |
| HEARTRATE_DAILY_CALORIESFATBURN | Calories spent while heartrate was inside the fat burn [zone](https://help.fitbit.com/articles/en_US/Help_article/1565.htm#) |
| HEARTRATE_DAILY_CALORIESCARDIO | Calories spent while heartrate was inside the cardio [zone](https://help.fitbit.com/articles/en_US/Help_article/1565.htm#) |
| HEARTRATE_DAILY_CALORIESPEAK | Calories spent while heartrate was inside the peak [zone](https://help.fitbit.com/articles/en_US/Help_article/1565.htm#) |
??? info "FITBIT_HEARTRATE_INTRADAY"
| RAPIDS column | Description |
|-----------------|-----------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
| LOCAL_DATE_TIME | Date time string with format `yyyy-mm-dd hh:mm:ss` |
| DEVICE_ID | A string that uniquely identifies a device |
| HEARTRATE | Intraday heartrate |
| HEARTRATE_ZONE | Heartrate [zone](https://help.fitbit.com/articles/en_US/Help_article/1565.htm#) that HEARTRATE belongs to. It is based on the heartrate zone ranges of each device |
??? info "FITBIT_SLEEP_SUMMARY"
| RAPIDS column | Description |
|-----------------|-----------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
| LOCAL_DATE_TIME | Date time string with format `yyyy-mm-dd 00:00:00`, the date is the same as the start date of a daily sleep episode if its time is after SLEEP_SUMMARY_LAST_NIGHT_END, otherwise it is the day before the start date of that sleep episode |
| LOCAL_START_DATE_TIME | Date time string with format `yyyy-mm-dd hh:mm:ss` representing the start of a daily sleep episode |
| LOCAL_END_DATE_TIME | Date time string with format `yyyy-mm-dd hh:mm:ss` representing the end of a daily sleep episode|
| DEVICE_ID | A string that uniquely identifies a device |
| EFFICIENCY | Sleep efficiency computed by fitbit as time asleep / (total time in bed - time to fall asleep)|
| MINUTES_AFTER_WAKEUP | Minutes the participant spent in bed after waking up|
| MINUTES_ASLEEP | Minutes the participant was asleep |
| MINUTES_AWAKE | Minutes the participant was awake |
| MINUTES_TO_FALL_ASLEEP | Minutes the participant spent in bed before falling asleep|
| MINUTES_IN_BED | Minutes the participant spent in bed across the sleep episode|
| IS_MAIN_SLEEP | 0 if this episode is a nap, or 1 if it is a main sleep episode|
| TYPE | stages or classic [sleep data](https://dev.fitbit.com/build/reference/web-api/sleep/)|
??? info "FITBIT_SLEEP_INTRADAY"
| RAPIDS column | Description |
|-----------------|-----------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS)|
| LOCAL_DATE_TIME | Date time string with format `yyyy-mm-dd hh:mm:ss`, this either is a copy of LOCAL_START_DATE_TIME or LOCAL_END_DATE_TIME depending on which column is used to assign an episode to a specific day|
| DEVICE_ID | A string that uniquely identifies a device |
| TYPE_EPISODE_ID | An id for each unique main or nap episode. Main and nap episodes have different levels, each row in this table is one of such levels, so multiple rows can have the same TYPE_EPISODE_ID|
| DURATION | Duration of the episode level in minutes|
| IS_MAIN_SLEEP | 0 if this episode level belongs to a nap, or 1 if it belongs to a main sleep episode|
| TYPE | type of level: stages or classic [sleep data](https://dev.fitbit.com/build/reference/web-api/sleep/)|
| LEVEL | For stages levels one of `wake`, `deep`, `light`, or `rem`. For classic levels one of `awake`, `restless`, and `asleep`|
??? info "FITBIT_STEPS_SUMMARY"
| RAPIDS column | Description |
|-----------------|-----------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
| LOCAL_DATE_TIME | Date time string with format `yyyy-mm-dd hh:mm:ss` |
| DEVICE_ID | A string that uniquely identifies a device |
| STEPS | Daily step count |
??? info "FITBIT_STEPS_INTRADAY"
| RAPIDS column | Description |
|-----------------|-----------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
| LOCAL_DATE_TIME | Date time string with format `yyyy-mm-dd hh:mm:ss` |
| DEVICE_ID | A string that uniquely identifies a device |
| STEPS | Intraday step count (usually every minute)|

View File

@ -0,0 +1,202 @@
# Mandatory Phone Format
This is a description of the format RAPIDS needs to process data for the following PHONE sensors.
See examples in the CSV files inside [rapids_example_csv.zip](https://osf.io/wbg23/)
??? info "PHONE_ACCELEROMETER"
| RAPIDS column | Description |
|-----------------|--------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| DOUBLE_VALUES_0 | x axis of acceleration |
| DOUBLE_VALUES_1 | y axis of acceleration |
| DOUBLE_VALUES_2 | z axis of acceleration |
??? info "PHONE_ACTIVITY_RECOGNITION"
| RAPIDS column | Description |
|-----------------|---------------------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| ACTIVITY_NAME | An string that denotes current activity name: `in_vehicle`, `on_bicycle`, `on_foot`, `still`, `unknown`, `tilting`, `walking` or `running` |
| ACTIVITY_TYPE | An integer (ranged from 0 to 8) that denotes current activity type |
| CONFIDENCE | An integer (ranged from 0 to 100) that denotes the prediction accuracy |
??? info "PHONE_APPLICATIONS_CRASHES"
| RAPIDS column | Description |
|--------------------|---------------------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| PACKAGE_NAME | Applications package name |
| APPLICATION_NAME | Applications localized name |
| APPLICATION_VERSION| Applications version code |
| ERROR_SHORT | Short description of the error |
| ERROR_LONG | More verbose version of the error description |
| ERROR_CONDITION | 1 = code error; 2 = non-responsive (ANR error) |
| IS_SYSTEM_APP | Devices pre-installed application |
??? info "PHONE_APPLICATIONS_FOREGROUND"
| RAPIDS column | Description |
|--------------------|---------------------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| PACKAGE_NAME | Applications package name |
| APPLICATION_NAME | Applications localized name |
| IS_SYSTEM_APP | Devices pre-installed application |
??? info "PHONE_APPLICATIONS_NOTIFICATIONS"
| RAPIDS column | Description |
|--------------------|---------------------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| PACKAGE_NAME | Applications package name |
| APPLICATION_NAME | Applications localized name |
| TEXT | Notifications header text, not the content |
| SOUND | Notifications sound source (if applicable) |
| VIBRATE | Notifications vibration pattern (if applicable) |
| DEFAULTS | If notification was delivered according to devices default settings |
| FLAGS | An integer that denotes [Android notification flag](https://developer.android.com/reference/android/app/Notification.html) |
??? info "PHONE_BATTERY"
| RAPIDS column | Description |
|----------------------|------------------------------------------------------------------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| BATTERY_STATUS | An integer that denotes battery status: 0 or 1 = unknown, 2 = charging, 3 = discharging, 4 = not charging, 5 = full |
| BATTERY_LEVEL | An integer that denotes battery level, between 0 and `BATTERY_SCALE` |
| BATTERY_SCALE | An integer that denotes the maximum battery level |
??? info "PHONE_BLUETOOTH"
| RAPIDS column | Description |
|--------------------|---------------------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| BT_ADDRESS | MAC address of the devices Bluetooth sensor |
| BT_NAME | User assigned name of the devices Bluetooth sensor |
| BT_RSSI | The RSSI dB to the scanned device |
??? info "PHONE_CALLS"
| RAPIDS column | Description |
|--------------------|---------------------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| CALL_TYPE | An integer that denotes call type: 1 = incoming, 2 = outgoing, 3 = missed |
| CALL_DURATION | Length of the call session |
| TRACE | SHA-1 one-way source/target of the call |
??? info "PHONE_CONVERSATION"
| RAPIDS column | Description |
|----------------------|--------------------------------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| DOUBLE_ENERGY | A number that denotes the amplitude of an audio sample (L2-norm of the audio frame) |
| INFERENCE | An integer (ranged from 0 to 3) that denotes the type of an audio sample: 0 = silence, 1 = noise, 2 = voice, 3 = unknown |
| DOUBLE_CONVO_START | UNIX timestamp (13 digits) of the beginning of a conversation |
| DOUBLE_CONVO_END | UNIX timestamp (13 digits) of the end of a conversation |
??? info "PHONE_KEYBOARD"
| RAPIDS column | Description |
|--------------------|---------------------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| PACKAGE_NAME | The applications package name of keyboard interaction |
| BEFORE_TEXT | The previous keyboard input (empty if password) |
| CURRENT_TEXT | The current keyboard input (empty if password) |
| IS_PASSWORD | An integer: 0 = not password; 1 = password |
??? info "PHONE_LIGHT"
| RAPIDS column | Description |
|--------------------|----------------------------------------------------------------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| DOUBLE_LIGHT_LUX | The ambient luminance in lux units |
| ACCURACY | An integer that denotes the sensor's accuracy level: 3 = maximum accuracy, 2 = medium accuracy, 1 = low accuracy |
??? info "PHONE_LOCATIONS"
| RAPIDS column | Description |
|--------------------|---------------------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| DOUBLE_LATITUDE | The locations latitude, in degrees |
| DOUBLE_LONGITUDE | The locations longitude, in degrees |
| DOUBLE_BEARING | The locations bearing, in degrees |
| DOUBLE_SPEED | The speed if available, in meters/second over ground |
| DOUBLE_ALTITUDE | The altitude if available, in meters above sea level |
| PROVIDER | A string that denotes the provider: `gps`, `fused` or `network` |
| ACCURACY | The estimated location accuracy |
??? info "PHONE_LOG"
| RAPIDS column | Description |
|--------------------|---------------------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| LOG_MESSAGE | A string that denotes log message |
??? info "PHONE_MESSAGES"
| RAPIDS column | Description |
|--------------------|---------------------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| MESSAGE_TYPE | An integer that denotes message type: 1 = received, 2 = sent |
| TRACE | SHA-1 one-way source/target of the message |
??? info "PHONE_SCREEN"
| RAPIDS column | Description |
|--------------------|-----------------------------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| SCREEN_STATUS | An integer that denotes screen status: 0 = off, 1 = on, 2 = locked, 3 = unlocked |
??? info "PHONE_WIFI_CONNECTED"
| RAPIDS column | Description |
|--------------------|-----------------------------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| MAC_ADDRESS | Devices MAC address |
| SSID | Currently connected access point network name |
| BSSID | Currently connected access point MAC address |
??? info "PHONE_WIFI_VISIBLE"
| RAPIDS column | Description |
|--------------------|-----------------------------------------------------------------------------------|
| TIMESTAMP | An UNIX timestamp (13 digits) when a row of data was logged |
| DEVICE_ID | A string that uniquely identifies a device |
| SSID | Detected access point network name |
| BSSID | Detected access point MAC address |
| SECURITY | Active security protocols |
| FREQUENCY | Wi-Fi band frequency (e.g., 2427, 5180), in Hz |
| RSSI | RSSI dB to the scanned device |

View File

@ -5,33 +5,47 @@ We use the `develop/master` variation of the [OneFlow](https://www.endoflineblog
## Add New Features
We use feature (topic) branches to implement new features
1. Pull the latest develop
```bash
git checkout develop
git pull
```
1. Create your feature branch
```bash
git checkout -b feature/feature1
```
1. Add, modify or delete the necessary files to add your new feature
1. Update the [change log](../../change-log) (`docs/change-log.md`)
2. Stage and commit your changes using VS Code git GUI or the following commands
```bash
git add modified-file1 modified-file2
git commit -m "Add my new feature" # use a concise description
```
1. Integrate your new feature to `develop`
=== "Internal Developer"
You are an internal developer if you have writing permissions to the repository.
Most feature branches are never pushed to the repo, only do so if you expect that its development will take days (to avoid losing your work if you computer is damaged). Otherwise follow the following instructions to locally rebase your feature branch into `develop` and push those rebased changes online.
**Starting your feature branch**
1. Pull the latest develop
```bash
git checkout develop
git pull
```
1. Create your feature branch
```bash
git checkout -b feature/feature1
```
1. Add, modify or delete the necessary files to add your new feature
1. Update the [change log](../../change-log) (`docs/change-log.md`)
2. Stage and commit your changes using VS Code git GUI or the following commands
```bash
git add modified-file1 modified-file2
git commit -m "Add my new feature" # use a concise description
```
**Merging back your feature branch**
If your changes took time to be implemented it is possible that there are new commits in our `develop` branch, so we need to rebase your feature branch.
1. Fetch the latest changes to develop
```bash
git fetch origin develop
```
1. Rebase your feature branch
```bash
git checkout feature/feature1
git pull origin develop
git rebase -i develop
```
1. Integrate your new feature to `develop`
```bash
git checkout develop
git merge --no-ff feature/feature1 # (use the default merge message)
git push origin develop
@ -41,11 +55,50 @@ git commit -m "Add my new feature" # use a concise description
=== "External Developer"
You are an external developer if you do NOT have writing permissions to the repository.
Push your feature branch online
**Starting your feature branch**
1. Fork and clone our repository on Github
1. Switch to the latest develop
```bash
git checkout develop
```
1. Create your feature branch
```bash
git checkout -b feature/external-test
```
1. Add, modify or delete the necessary files to add your new feature
2. Stage and commit your changes using VS Code git GUI or the following commands
```bash
git add modified-file1 modified-file2
git commit -m "Add my new feature" # use a concise description
```
**Merging back your feature branch**
If your changes took time to be implemented, it is possible that there are new commits in our `develop` branch, so we need to rebase your feature branch.
1. Add our repo as another `remote`
```bash
git remote add upstream https://github.com/carissalow/rapids/
```
1. Fetch the latest changes to develop
```bash
git fetch upstream develop
```
1. Rebase your feature branch
```bash
git checkout feature/external-test
git rebase -i develop
```
1. Push your feature branch online
```bash
git push --set-upstream origin feature/external-test
```
Then open a pull request to the `develop` branch using Github's GUI
1. Open a pull request to the `develop` branch using Github's GUI
## Release a New Version
@ -74,9 +127,9 @@ git branch -d release/v[NEW_RELEASE]
```
git checkout master
git merge --ff-only develop
git push
git push # Unlock the master branch before merging
```
1. Go to [GitHub](https://github.com/carissalow/rapids/tags) and create a new release based on the newest tag `v[NEW_RELEASE]` (remember to add the change log)
1. Release happens automatically after passing the tests
## Release a Hotfix
1. Pull the latest master
@ -103,6 +156,6 @@ git branch -d hotfix/v[NEW_HOTFIX]
```
git checkout master
git merge --ff-only v[NEW_HOTFIX]
git push
git push # Unlock the master branch before merging
```
1. Go to [GitHub](https://github.com/carissalow/rapids/tags) and create a new release based on the newest tag `v[NEW_HOTFIX]` (remember to add the change log)
1. Release happens automatically after passing the tests

View File

@ -3,12 +3,13 @@
We use the Live Share extension of Visual Studio Code to debug bugs when sharing data or database credentials is not possible.
1. Install [Visual Studio Code](https://code.visualstudio.com/)
2. Open you RAPIDS root folder in a new VSCode window
3. Open a new Terminal `Terminal > New terminal`
2. Open your RAPIDS root folder in a new VSCode window
3. Open a new terminal in Visual Studio Code `Terminal > New terminal`
4. Install the [Live Share extension pack](https://marketplace.visualstudio.com/items?itemName=MS-vsliveshare.vsliveshare-pack)
5. Press ++ctrl+p++ or ++cmd+p++ and run this command:
```bash
>live share: start collaboration session
```
6. Follow the instructions and share the session link you receive

View File

@ -4,182 +4,584 @@ Along with the continued development and the addition of new sensors and feature
The following is a list of the sensors that testing is currently available.
| Sensor | Provider | Periodic | Frequency | Event |
|-------------------------------|----------|----------|-----------|-------|
| Phone Accelerometer | Panda | Y | Y | Y |
| Phone Accelerometer | RAPIDS | Y | Y | Y |
| Phone Activity Recognition | RAPIDS | Y | Y | Y |
| Phone Applications Foreground | RAPIDS | Y | Y | Y |
| Phone Battery | RAPIDS | Y | Y | Y |
| Phone Bluetooth | Doryab | Y | Y | Y |
| Phone Bluetooth | RAPIDS | Y | Y | Y |
| Phone Calls | RAPIDS | Y | Y | Y |
| Phone Conversation | RAPIDS | Y | Y | Y |
| Phone Data Yield | RAPIDS | Y | Y | Y |
| Phone Light | RAPIDS | Y | Y | Y |
| Phone Locations | Doryab | Y | Y | Y |
| Phone Locations | Barnett | N | N | N |
| Phone Messages | RAPIDS | Y | Y | Y |
| Phone Screen | RAPIDS | Y | Y | Y |
| Phone WiFi Connected | RAPIDS | Y | Y | Y |
| Phone WiFi Visible | RAPIDS | Y | Y | Y |
| Fitbit Calories Intraday | RAPIDS | Y | Y | Y |
| Fitbit Data Yield | RAPIDS | Y | Y | Y |
| Fitbit Heart Rate Summary | RAPIDS | Y | Y | Y |
| Fitbit Heart Rate Intraday | RAPIDS | Y | Y | Y |
| Fitbit Sleep Summary | RAPIDS | Y | Y | Y |
| Fitbit Sleep Intraday | RAPIDS | Y | Y | Y |
| Fitbit Sleep Intraday | PRICE | Y | Y | Y |
| Fitbit Steps Summary | RAPIDS | Y | Y | Y |
| Fitbit Steps Intraday | RAPIDS | Y | Y | Y |
## Accelerometer
Description
- The raw accelerometer data file, `phone_accelerometer_raw.csv`, contains data for 4 separate days
- One episode for each daily segment (night, morning, afternoon and evening)
- Two episodes locate in the same 30-min segment (`Fri 00:15:00` and `Fri 00:21:21`)
- Two episodes locate in the same daily segment (`Fri 00:15:00` and `Fri 18:12:00`)
- One episode before the time switch (`Sun 00:02:00`) and one episode after the time switch (`Sun 04:18:00`)
- Multiple episodes within one min which cause variance in magnitude (`Fri 00:10:25`, `Fri 00:10:27` and `Fri 00:10:46`)
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|android, ios|
|morning|OK|OK|android, ios|
|daily|OK|OK|android, ios|
|threeday|OK|OK|android, ios|
|weekend|OK|OK|android, ios|
|beforeMarchEvent|OK|OK|android, ios|
|beforeNovemberEvent|OK|OK|android, ios|
## Messages (SMS)
- The raw message data file contains data for 2 separate days.
- The data for the first day contains records 5 records for every
`epoch`.
- The second day\'s data contains 6 records for each of only 2
`epoch` (currently `morning` and `evening`)
- The raw message data contains records for both `message_types`
(i.e. `recieved` and `sent`) in both days in all epochs. The
number records with each `message_types` per epoch is randomly
distributed There is at least one records with each
`message_types` per epoch.
- There is one raw message data file each, as described above, for
testing both iOS and Android data.
- There is also an additional empty data file for both android and
iOS for testing empty data files
Description
- The raw message data file, `phone_messages_raw.csv`, contains data for 4 separate days
- One episode for each daily segment (night, morning, afternoon and evening)
- Two `sent` episodes locate in the same 30-min segment (`Fri 16:08:03.000` and `Fri 16:19:35.000`)
- Two `received` episodes locate in the same 30-min segment (`Sat 06:45:05.000` and `Fri 06:45:05.000`)
- Two episodes locate in the same daily segment (`Fri 11:57:56.385` and `Sat 10:54:10.000`)
- One episode before the time switch (`Sun 00:48:01.000`) and one episode after the time switch (`Sun 06:21:01.000`)
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|android|
|morning|OK|OK|android|
|daily|OK|OK|android|
|threeday|OK|OK|android|
|weekend|OK|OK|android|
|beforeMarchEvent|OK|OK|android|
|beforeNovemberEvent|OK|OK|android|
## Calls
Due to the difference in the format of the raw call data for iOS and Android the following is the expected results the `calls_with_datetime_unified.csv`. This would give a better idea of the use cases being tested since the `calls_with_datetime_unified.csv` would make both the iOS and Android data comparable.
Due to the difference in the format of the raw data for iOS and Android the following is the expected results
the `phone_calls.csv`.
- The call data would contain data for 2 days.
- The data for the first day contains 6 records for every `epoch`.
- The second day\'s data contains 6 records for each of only 2
`epoch` (currently `morning` and `evening`)
- The call data contains records for all `call_types` (i.e.
`incoming`, `outgoing` and `missed`) in both days in all epochs.
The number records with each of the `call_types` per epoch is
randomly distributed. There is at least one records with each
`call_types` per epoch.
- There is one call data file each, as described above, for testing
both iOS and Android data.
- There is also an additional empty data file for both android and
iOS for testing empty data files
Description
- One missed episode, one outgoing episode and one incoming episode on Friday night, morning, afternoon and evening
- There is at least one episode of each type of phone calls on each day
- One incoming episode crossing two 30-mins segments
- One outgoing episode crossing two 30-mins segments
- One missed episode before, during and after the `event`
- There is one incoming episode before, during or after the `event`
- There is one outcoming episode before, during or after the `event`
- There is one missed episode before, during or after the `event`
Data format
| Device | Missed | Outgoing | Incoming |
|-|-|-|-|
|android| 3 | 2 | 1 |
|ios| 1,4 or 3,4 | 3,2,4 | 1,2,4 |
Note
When generating test data, all traces for iOS device need to be unique otherwise the episode with duplicate trace will be dropped
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|android, iOS|
|morning|OK|OK|android, iOS|
|daily|OK|OK|android, iOS|
|threeday|OK|OK|android, iOS|
|weekend|OK|OK|android, iOS|
|beforeMarchEvent|OK|OK|android, iOS|
|beforeNovemberEvent|OK|OK|android, iOS|
## Screen
Due to the difference in the format of the raw screen data for iOS and Android the following is the expected results the `screen_deltas.csv`. This would give a better idea of the use cases being tested since the `screen_eltas.csv` would make both the iOS and Android data comparable These files are used to calculate the features for the screen sensor
Due to the difference in the format of the raw screen data for iOS and Android the following is the expected results the `phone_screen.csv`.
- The screen delta data file contains data for 1 day.
- The screen delta data contains 1 record to represent an `unlock`
Description
- The screen data file contains data for 4 days.
- The screen data contains 1 record to represent an `unlock`
episode that falls within an `epoch` for every `epoch`.
- The screen delta data contains 1 record to represent an `unlock`
- The screen data contains 1 record to represent an `unlock`
episode that falls across the boundary of 2 epochs. Namely the
`unlock` episode starts in one epoch and ends in the next, thus
there is a record for `unlock` episodes that fall across `night`
to `morning`, `morning` to `afternoon` and finally `afternoon` to
`night`
- The testing is done for `unlock` episode\_type.
- There is one screen data file each for testing both iOS and
Android data formats.
- There is also an additional empty data file for both android and
iOS for testing empty data files
- One episode that crossing two `30-min` segments
Data format
| Device | unlock |
|-|-|
| Android | 3, 0|
| iOS | 3, 2|
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|android, iOS|
|morning|OK|OK|android, iOS|
|daily|OK|OK|android, iOS|
|threeday|OK|OK|android, iOS|
|weekend|OK|OK|android, iOS|
|beforeMarchEvent|OK|OK|android, iOS|
|beforeNovemberEvent|OK|OK|android, iOS|
## Battery
Due to the difference in the format of the raw battery data for iOS and Android as well as versions of iOS the following is the expected results the `battery_deltas.csv`. This would give a better idea of the use cases being tested since the `battery_deltas.csv` would make both the iOS and Android data comparable. These files are used to calculate the features for the battery sensor.
Description
- The battery delta data file contains data for 1 day.
- The battery delta data contains 1 record each for a `charging` and
`discharging` episode that falls within an `epoch` for every
`epoch`. Thus, for the `daily` epoch there would be multiple
`charging` and `discharging` episodes
- Since either a `charging` episode or a `discharging` episode and
not both can occur across epochs, in order to test episodes that
occur across epochs alternating episodes of `charging` and
`discharging` episodes that fall across `night` to `morning`,
`morning` to `afternoon` and finally `afternoon` to `night` are
present in the battery delta data. This starts with a
`discharging` episode that begins in `night` and end in `morning`.
- There is one battery data file each, for testing both iOS and
Android data formats.
- There is also an additional empty data file for both android and
iOS for testing empty data files
- The 4-day raw data is contained in `phone_battery_raw.csv`
- One discharge episode acrossing two 30-min time segements (`Fri 05:57:30.123` to `Fri 06:04:32.456`)
- One charging episode acrossing two 30-min time segments (`Fri 11:55:58.416` to `Fri 12:08:07.876`)
- One discharge episode and one charging episode locate within the same 30-min time segement (`Fri 21:30:00` to `Fri 22:00:00`)
- One episode before the time switch (`Sun 00:24:00.000`) and one episode after the time switch (`Sun 21:58:00`)
- Two episodes locate in the same daily segment
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|android|
|morning|OK|OK|android|
|daily|OK|OK|android|
|threeday|OK|OK|android|
|weekend|OK|OK|android|
|beforeMarchEvent|OK|OK|android|
|beforeNovemberEvent|OK|OK|android|
## Bluetooth
- The raw Bluetooth data file contains data for 1 day.
- The raw Bluetooth data contains at least 2 records for each
`epoch`. Each `epoch` has a record with a `timestamp` for the
beginning boundary for that `epoch` and a record with a
`timestamp` for the ending boundary for that `epoch`. (e.g. For
the `morning` epoch there is a record with a `timestamp` for
`6:00AM` and another record with a `timestamp` for `11:59:59AM`.
These are to test edge cases)
- An option of 5 Bluetooth devices are randomly distributed
throughout the data records.
- There is one raw Bluetooth data file each, for testing both iOS
and Android data formats.
- There is also an additional empty data file for both android and
iOS for testing empty data files.
Description
- The 4-day raw data is contained in `phone_bluetooth_raw.csv`
- One episode for each daily segment (`night`, `morning`, `afternoon` and `evening`)
- Two episodes locate in the same 30-min segment (`Fri 23:38:45.789` and `Fri 23:59:59.465`)
- Two episodes locate in the same daily segment (`Fri 00:00:00.798` and `Fri 00:49:04.132`)
- One episode before the time switch (`Sun 00:24:00.000`) and one episode after the time switch (`Sun 17:32:00.000`)
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|android|
|morning|OK|OK|android|
|daily|OK|OK|android|
|threeday|OK|OK|android|
|weekend|OK|OK|android|
|beforeMarchEvent|OK|OK|android|
|beforeNovemberEvent|OK|OK|android|
## WIFI
- There are 2 data files (`wifi_raw.csv` and `sensor_wifi_raw.csv`)
for each fake participant for each phone platform.
- The raw WIFI data files contain data for 1 day.
- The `sensor_wifi_raw.csv` data contains at least 2 records for
each `epoch`. Each `epoch` has a record with a `timestamp` for the
beginning boundary for that `epoch` and a record with a
`timestamp` for the ending boundary for that `epoch`. (e.g. For
the `morning` epoch there is a record with a `timestamp` for
`6:00AM` and another record with a `timestamp` for `11:59:59AM`.
These are to test edge cases)
- The `wifi_raw.csv` data contains 3 records with random timestamps
for each `epoch` to represent visible broadcasting WIFI network.
This file is empty for the iOS phone testing data.
- An option of 10 access point devices is randomly distributed
throughout the data records. 5 each for `sensor_wifi_raw.csv` and
`wifi_raw.csv`.
- There data files for testing both iOS and Android data formats.
- There are also additional empty data files for both android and
iOS for testing empty data files.
There are two wifi features (`phone wifi connected` and `phone wifi visible`). The raw test data are seperatly stored in the `phone_wifi_connected_raw.csv` and `phone_wifi_visible_raw.csv`.
Description
- One episode for each `epoch` (`night`, `morining`, `afternoon` and `evening`)
- Two two episodes in the same time segment (`daily` and `30-min`)
- Two episodes around the transition of `epochs` (e.g. one at the end of `night` and one at the beginning of `morning`)
- One episode before and after the time switch on Sunday
phone wifi connected
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|android, iOS|
|morning|OK|OK|android, iOS|
|daily|OK|OK|android, iOS|
|threeday|OK|OK|android, iOS|
|weekend|OK|OK|android, iOS|
|beforeMarchEvent|OK|OK|android, iOS|
|beforeNovemberEvent|OK|OK|android, iOS|
phone wifi visible
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|android|
|morning|OK|OK|android|
|daily|OK|OK|android|
|threeday|OK|OK|android|
|weekend|OK|OK|android|
|beforeMarchEvent|OK|OK|android|
|beforeNovemberEvent|OK|OK|android|
## Light
- The raw light data file contains data for 1 day.
- The raw light data contains 3 or 4 rows of data for each `epoch`
except `night`. The single row of data for `night` is for testing
features for single values inputs. (Example testing the standard
deviation of one input value)
- Since light is only available for Android there is only one file
that contains data for Android. All other files (i.e. for iPhone)
are empty data files.
Description
- The 4-day raw light data is contained in `phone_light_raw.csv`
- One episode for each daily segment (`night`, `morning`, `afternoon` and `evening`)
- Two episodes locate in the same 30-min segment (`Fri 00:07:27.000` and `Fri 00:12:00.000`)
- Two episodes locate in the same daily segment (`Fri 01:00:00` and `Fri 03:59:59.654`)
- One episode before the time switch (`Sun 00:08:00.000`) and one episode after the time switch (`Sun 05:36:00.000`)
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|android|
|morning|OK|OK|android|
|daily|OK|OK|android|
|threeday|OK|OK|android|
|weekend|OK|OK|android|
|beforeMarchEvent|OK|OK|android|
|beforeNovemberEvent|OK|OK|android|
## Locations
Description
- The participant's home location is (latitude=1, longitude=1).
- From Sat 10:56:00 to Sat 11:04:00, the center of the cluster is (latitude=-100, longitude=-100).
- From Sun 03:30:00 to Sun 03:47:00, the center of the cluster is (latitude=1, longitude=1). Home location is extracted from this period.
- From Sun 11:30:00 to Sun 11:38:00, the center of the cluster is (latitude=100, longitude=100).
## Application Foreground
- The raw application foreground data file contains data for 1 day.
- The raw application foreground data contains 7 - 9 rows of data
for each `epoch`. The records for each `epoch` contains apps that
are randomly selected from a list of apps that are from the
`MULTIPLE_CATEGORIES` and `SINGLE_CATEGORIES` (See
[testing\_config.yaml]()). There are also records in each epoch
that have apps randomly selected from a list of apps that are from
the `EXCLUDED_CATEGORIES` and `EXCLUDED_APPS`. This is to test
that these apps are actually being excluded from the calculations
of features. There are also records to test `SINGLE_APPS`
calculations.
- Since application foreground is only available for Android there
is only one file that contains data for Android. All other files
(i.e. for iPhone) are empty data files.
- The 4-day raw application data is contained in `phone_applications_foreground_raw.csv`
- One episode for each daily segment (night, morning, afternoon and evening)
- Two episodes locate in the same 30-min segment (`Fri 10:12:56.385` and `Fri 10:18:48.895`)
- Two episodes locate in the same daily segment (`Fri 11:57:56.385` and `Fri 12:02:56.385`)
- One episode before the time switch (`Sun 00:07:48.001`) and one episode after the time switch (`Sun 05:10:30.001`)
- Two custom category (`Dating`) episode, one at `Fri 06:05:10.385`, another one at ` Fri 11:53:00.385`
Checklist:
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|android|
|morning|OK|OK|android|
|daily|OK|OK|android|
|threeday|OK|OK|android|
|weekend|OK|OK|android|
|beforeMarchEvent|OK|OK|android|
|beforeNovemberEvent|OK|OK|android|
## Activity Recognition
- The raw Activity Recognition data file contains data for 1 day.
- The raw Activity Recognition data each `epoch` period contains
rows that records 2 - 5 different `activity_types`. The is such
that durations of activities can be tested. Additionally, there
are records that mimic the duration of an activity over the time
boundary of neighboring epochs. (For example, there a set of
records that mimic the participant `in_vehicle` from `afternoon`
into `evening`)
- There is one file each with raw Activity Recognition data for
testing both iOS and Android data formats.
(plugin\_google\_activity\_recognition\_raw.csv for android and
plugin\_ios\_activity\_recognition\_raw.csv for iOS)
- There is also an additional empty data file for both android and
iOS for testing empty data files.
Description
- The 4-day raw activity data is contained in `plugin_google_activity_recognition_raw.csv` and `plugin_ios_activity_recognition_raw.csv`.
- Two episodes locate in the same 30-min segment (`Fri 04:01:54` and `Fri 04:13:52`)
- One episode for each daily segment (`night`, `morning`, `afternoon` and `evening`)
- Two episodes locate in the same daily segment (`Fri 05:03:09` and `Fri 05:50:36`)
- Two episodes with the time difference less than `5 mins` threshold (`Fri 07:14:21` and `Fri 07:18:50`)
- One episode before the time switch (`Sun 00:46:00`) and one episode after the time switch (`Sun 03:42:00`)
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|android, iOS|
|morning|OK|OK|android, iOS|
|daily|OK|OK|android, iOS|
|threeday|OK|OK|android, iOS|
|weekend|OK|OK|android, iOS|
|beforeMarchEvent|OK|OK|android, iOS|
|beforeNovemberEvent|OK|OK|android, iOS|
## Conversation
- The raw conversation data file contains data for 2 day.
- The raw conversation data contains records with a sample of both
`datatypes` (i.e. `voice/noise` = `0`, and `conversation` = `2` )
as well as rows with for samples of each of the `inference` values
(i.e. `silence` = `0`, `noise` = `1`, `voice` = `2`, and `unknown`
= `3`) for each `epoch`. The different `datatype` and `inference`
records are randomly distributed throughout the `epoch`.
- Additionally there are 2 - 5 records for conversations (`datatype`
= 2, and `inference` = -1) in each `epoch` and for each `epoch`
except night, there is a conversation record that has a
`double_convo_start` `timestamp` that is from the previous
`epoch`. This is to test the calculations of features across
`epochs`.
- There is a raw conversation data file for both android and iOS
platforms (`plugin_studentlife_audio_android_raw.csv` and
`plugin_studentlife_audio_raw.csv` respectively).
- Finally, there are also additional empty data files for both
android and iOS for testing empty data files
The 4-day raw conversation data is contained in `phone_conversation_raw.csv`. The different `inference` records are
randomly distributed throughout the `epoch`.
Description
- One episode for each daily segment (`night`, `morning`, `afternoon` and `evening`) on each day
- Two episodes near the transition of the daily segment, one starts at the end of the afternoon, `Fri 17:10:00` and another one starts at the beginning of the evening, `Fri 18:01:00`
- One episode across two segments, `daily` and `30-mins`, (from `Fri 05:55:00` to `Fri 06:00:41`)
- Two episodes locate in the same daily segment (`Sat 12:45:36` and `Sat 16:48:22`)
- One episode before the time switch, `Sun 00:15:06`, and one episode after the time switch, `Sun 06:01:00`
Data format
| inference | type |
| - | - |
| 0 | silence |
| 1 | noise |
| 2 | voice |
| 3 | unknown |
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|android|
|morning|OK|OK|android|
|daily|OK|OK|android|
|threeday|OK|OK|android|
|weekend|OK|OK|android|
|beforeMarchEvent|OK|OK|android|
|beforeNovemberEvent|OK|OK|android|
## Keyboard
- The raw keyboard data file contains data for 4 days.
- The raw keyboard data contains records with difference in `timestamp` ranging from
milliseconds to seconds.
- With difference in timestamps between consecutive records more than 5 seconds helps us to create separate
sessions within the usage of the same app. This helps to verify the case where sessions have to be different.
- The raw keyboard data contains records where the difference in text is less
than 5 seconds which makes it into 1 session but because of difference of app
new session starts. This edge case determines the behaviour within particular app
and also within 5 seconds.
- The raw keyboard data also contains the records where length of `current_text` varies between consecutive rows. This helps us to tests on the cases where input text is entered by auto-suggested
or auto-correct operations.
- One three-minute episode with a 1-minute row on Sun 08:59:54.65 and 09:00:00,another on Sun 12:01:02 that are considering a single episode in multi-timezone event segments to showcase how
inferring time zone data for Keyboard from phone data can produce inaccurate results around the tz change. This happens because the device was on LA time until 11:59 and switched to NY time at 12pm, in terms of actual time 09 am LA and 12 pm NY represent the same moment in time so 09:00 LA and 12:01 NY are consecutive minutes.
## Application Episodes
- The feature requires raw application foreground data file and raw phone screen data file
- The raw data files contains data for 4 day.
- The raw conversation data contains records with difference in `timestamp` ranging from milliseconds to minutes.
- An app episode starts when an app is launched and ends when another app is launched, marking the episode end of the first one,
or when the screen locks. Thus, we are taking into account the screen unlock episodes.
- There are multiple apps usage within each screen unlock episode to verify creation of different app episodes in each
screen unlock session. In the screen unlock episode starting from Fri 05:56:51, Fri 10:00:24, Sat 17:48:01, Sun 22:02:00, and Mon 21:05:00 we have multiple apps, both system and non-system apps, to check this.
- The 22 minute chunk starting from Fri 10:03:56 checks app episodes for system apps only.
- The screen unlock episode starting from Mon 21:05:00 and Sat 17:48:01 checks if the screen lock marks the end of episode for that particular app which was launched a few milliseconds to 8 mins before the screen lock.
- Finally, since application foreground is only for Android devices, this feature is also for Android devices only. All other files are empty data files
## Data Yield
Description
- Two sensors were picked for testing, `phone_screen` and `phone_light`. `phone_screen` is event based and `phone_light` is sampling at regular frequency
- A 31-min episode (from `Fri 01:00:00` to `Fri 01:30:00`) in phone_light data, which is considered as a `validyieldedhours`
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|android, ios|
|morning|OK|OK|android, ios|
|daily|OK|OK|android, ios|
|threeday|OK|OK|android, ios|
|weekend|OK|OK|android, ios|
|beforeMarchEvent|OK|OK|android, ios|
|beforeNovemberEvent|OK|OK|android, ios|
## Fitbit Calories Intraday
Description
- A five-minute sedentary episode on Fri 11:00:00
- A one-minute sedentary episode on Sun 02:00:00. It exists in November but not in February in STZ
- A five-minute sedentary episode on Fri 11:58:00. It is split within two 30-min segments and the morning
- A three-minute lightly active episode on Fri 11:10:00, a one-minute at 11:18:00 and a one-minute 11:24:00. These check for start and end times of first/last/longest episode
- A three-minute fairly active episode on Fri 11:40:00, a one-minute at 11:48:00 and a one-minute 11:54:00. These check for start and end times of first/last/longest episode
- A three-minute very active episode on Fri 12:10:00, a one-minute at 12:18:00 and a one-minute 12:24:00. These check for start and end times of first/last/longest episode
- A eight-minute MVPA episode with intertwined fairly and very active rows on Fri 12:30:00
- The above episodes contain six higmet (>= 3 MET) episodes and nine lowmet episodes.
- One two-minute sedentary episode with a 1-minute row on Sun 09:00:00 and another on Sun 12:01:01 that are considering a single episode in multi-timezone event segments to showcase how inferring time zone data for Fitbit from phone data can produce inaccurate results around the tz change. This happens because the device was on LA time until 11:59 and switched to NY time at 12pm, in terms of actual time 09 am LA and 12 pm NY represent the same moment in time so 09:00 LA and 12:01 NY are consecutive minutes.
- A three-minute sedentary episode on Sat 08:59 that will be ignored for multi-timezone event segments.
- A three-minute sedentary episode on Sat 12:59 of which the first minute will be ignored for multi-timezone event segments since the test segment starts at 13:00
- A three-minute sedentary episode on Sat 16:00
- A four-minute sedentary episode on Sun 10:01 that will be ignored for Novembers's multi-timezone event segments since the test segment ends at 10am on that weekend.
- A three-minute very active episode on Sat 16:03. This episode and the one at 16:00 are counted as one for lowmet episodes
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|fitbit|
|morning|OK|OK|fitbit|
|daily|OK|OK|fitbit|
|threeday|OK|OK|fitbit|
|weekend|OK|OK|fitbit|
|beforeMarchEvent|OK|OK|fitbit|
|beforeNovemberEvent|OK|OK|fitbit|
## Fitbit Heartrate intraday
Description:
- The 4-day raw heartrate data is contained in `fitbit_heartrate_intraday_raw.csv`
- One episode for each daily segment (`night`, `morning`, `afternoon` and `evening`)
- Two episodes locate in the same 30-min segment (`Fri 00:49:00` and `Fri 00:52:00`)
- Two different types of heartrate zone episodes locate in the same 30-min segment (`Fri 05:49:00 outofrange` and `Fri 05:57:00 fatburn`)
- Two episodes locate in the same daily segment (`Fri 12:02:00` and `Fri 19:38:00`)
- One episode before the time switch, `Sun 00:08:00`, and one episode after the time switch, `Sun 07:28:00`
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|fitbit|
|morning|OK|OK|fitbit|
|daily|OK|OK|fitbit|
|threeday|OK|OK|fitbit|
|weekend|OK|OK|fitbit|
|beforeMarchEvent|OK|OK|fitbit|
|beforeNovemberEvent|OK|OK|fitbit|
## Fitbit Sleep Summary
Description
- A main sleep episode that starts on Fri 20:00:00 and ends on Sat 02:00:00. This episode starts after 11am (Last Night End) which will be considered as today's (Fri) data.
- A nap that starts on Sat 04:00:00 and ends on Sat 06:00:00. This episode starts before 11am (Last Night End) which will be considered as yesterday's (Fri) data.
- A nap that starts on Sat 13:00:00 and ends on Sat 15:00:00. This episode starts after 11am (Last Night End) which will be considered as today's (Sat) data.
- A main sleep that starts on Sun 01:00:00 and ends on Sun 12:00:00. This episode starts before 11am (Last Night End) which will be considered as yesterday's (Sat) data.
- A main sleep that starts on Sun 23:00:00 and ends on Mon 07:00:00. This episode starts after 11am (Last Night End) which will be considered as today's (Sun) data.
- Any segment shorter than one day will be ignored for sleep RAPIDS features.
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|fitbit|
|morning|OK|OK|fitbit|
|daily|OK|OK|fitbit|
|threeday|OK|OK|fitbit|
|weekend|OK|OK|fitbit|
|beforeMarchEvent|OK|OK|fitbit|
|beforeNovemberEvent|OK|OK|fitbit|
## Fitbit Sleep Intraday
Description
- A five-minute main sleep episode with asleep-classic level on Fri 11:00:00.
- An eight-hour main sleep episode on Fri 17:00:00. It is split into 2 parts for daily segment: a seven-hour sleep episode on Fri 17:00:00 and an one-hour sleep episode on Sat 00:00:00.
- A two-hour nap on Sat 01:00:00 that will be ignored for main sleep features.
- An one-hour nap on Sat 13:00:00 that will be ignored for main sleep features.
- An eight-hour main sleep episode on Sat 22:00:00. This episode ends on Sun 08:00:00 (NY) for March and Sun 06:00:00 (NY) for Novembers due to daylight savings. It will be considered for `beforeMarchEvent` segment and ignored for `beforeNovemberEvent` segment.
- A nine-hour main sleep episode on Sun 11:00:00. Start time will be assigned as NY time zone and converted to 14:00:00.
- A seven-hour main sleep episode on Mon 06:00:00. This episode will be split into two parts: a five-hour sleep episode on Mon 06:00:00 and a two-hour sleep episode on Mon 11:00:00. The first part will be discarded as it is before 11am (Last Night End)
- Any segment shorter than one day will be ignored for sleep PRICE features.
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|fitbit|
|morning|OK|OK|fitbit|
|daily|OK|OK|fitbit|
|threeday|OK|OK|fitbit|
|weekend|OK|OK|fitbit|
|beforeMarchEvent|OK|OK|fitbit|
|beforeNovemberEvent|OK|OK|fitbit|
## Fitbit Heartrate Summary
Description
- The 4-day raw heartrate summary data is contained in `fitbit_heartrate_summary_raw.csv`.
- As heartrate summary is periodic, it only generates results in periodic feature, there will be no result in frequency and event.
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|fitbit|
|morning|OK|OK|fitbit|
|daily|OK|OK|fitbit|
|threeday|OK|OK|fitbit|
|weekend|OK|OK|fitbit|
|beforeMarchEvent|OK|OK|fitbit|
|beforeNovemberEvent|OK|OK|fitbit|
## Fitbit Step Intraday
Description
- The 4-day raw heartrate summary data is contained in `fitbit_steps_intraday_raw.csv`
- One episode for each daily segment (`night`, `morning`, `afternoon` and `evening`) on each day
- Two episodes within the same 30-min segment (`Fri 05:58:00` and `Fri 05:59:00`)
- A one-min episode at `2020-03-07 09:00:00` that will be converted to New York time `2020-03-07 12:00:00`
- One episode before the time switch, `Sun 00:19:00`, and one episode after the time switch, `Sun 09:01:00`
- Episodes cross two 30-min segments (`Fri 11:59:00` and `Fri 12:00:00`)
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|fitbit|
|morning|OK|OK|fitbit|
|daily|OK|OK|fitbit|
|threeday|OK|OK|fitbit|
|weekend|OK|OK|fitbit|
|beforeMarchEvent|OK|OK|fitbit|
|beforeNovemberEvent|OK|OK|fitbit|
## Fitbit Step Summary
Description
- The 4-day raw heartrate summary data is contained in `fitbit_steps_summary_raw.csv`.
- As heartrate summary is periodic, it only generates results in periodic feature, there will be no result in frequency and event.
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|fitbit|
|morning|OK|OK|fitbit|
|daily|OK|OK|fitbit|
|threeday|OK|OK|fitbit|
|weekend|OK|OK|fitbit|
|beforeMarchEvent|OK|OK|fitbit|
|beforeNovemberEvent|OK|OK|fitbit|
## Fitbit Data Yield
Checklist
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|fitbit|
|morning|OK|OK|fitbit|
|daily|OK|OK|fitbit|
|threeday|OK|OK|fitbit|
|weekend|OK|OK|fitbit|
|beforeMarchEvent|OK|OK|fitbit|
|beforeNovemberEvent|OK|OK|fitbit|

View File

@ -1,45 +1,177 @@
# Testing
The following is a simple guide to testing RAPIDS. All files necessary for testing are stored in the `/tests` directory
The following is a simple guide to run RAPIDS' tests. All files necessary for testing are stored in the `./tests/` directory
## Steps for Testing
1. To begin testing RAPIDS place the fake raw input data `csv` files in
`tests/data/raw/`. The fake participant files should be placed in
`tests/data/external/`. The expected output files of RAPIDS after
processing the input data should be placed in
`tests/data/processesd/`.
2. The Snakemake rule(s) that are to be tested must be placed in the
`tests/Snakemake` file. The current `tests/Snakemake` is a good
example of how to define them. (At the time of writing this
documentation the snakefile contains rules messages (SMS), calls and
screen)
3. Edit the `tests/settings/config.yaml`. Add and/or remove the rules
to be run for testing from the `forcerun` list.
4. Edit the `tests/settings/testing_config.yaml` with the necessary
configuration settings for running the rules to be tested.
5. Add any additional testscripts in `tests/scripts`.
6. Uncomment or comment off lines in the testing shell script
`tests/scripts/run_tests.sh`.
7. Run the testing shell script.
??? check "**Testing Overview**"
1. You have to create a single four day test dataset for the sensor you are working on.
2. You will adjust your dataset with `tests/script/assign_test_timestamps.py` to fit `Fri March 6th 2020 - Mon March 9th 2020` and `Fri Oct 30th 2020 - Mon Nov 2nd 2020`. We test daylight saving times with these dates.
2. We have one test participant per platform (`pids`: `android`, `ios`, `fitbit`, `empatica`, `empty`). The data `device_id` should be equal to the `pid`.
2. We will run this test dataset against six test pipelines, three for `frequency`, `periodic`, and `event` time segments in a `single` time zone, and the same three in `multiple` time zones.
3. You will have to create your test data to cover as many corner cases as possible. These cases depend on the sensor you are working on.
4. The time segments and time zones to be tested are:
??? example "Frequency"
- 30 minutes (`30min,30`)
??? example "Periodic"
- morning (`morning,06:00:00,5H 59M 59S,every_day,0`)
- daily (`daily,00:00:00,23H 59M 59S,every_day,0`)
- three-day segments that repeat every day (`threeday,00:00:00,71H 59M 59S,every_day,0`)
- three-day segments that repeat every Friday (`weekend,00:00:00,71H 59M 59S,wday,5`)
??? example "Event"
- A segment that starts 3 hour before an event (Sat Mar 07 2020 19:00:00 EST) and lasts for 22 hours. Note that the last part of this segment will happen during a daylight saving change on Sunday at 2am when the clock moves forward and the period 2am-3am does not exist. In this case, the segment would start on Sat Mar 07 2020 16:00:00 EST (timestamp: 1583614800000) and end on Sun Mar 08 2020 15:00:00 EST (timestamp: 1583694000000). (`beforeMarchEvent,1583625600000,22H,3H,-1,android`)
- A segment that starts 3 hour before an event (Sat Oct 31 2020 19:00:00 EST) and lasts for 22 hours. Note that the last part of this segment will happen during a daylight saving change on Sunday at 2am when the clock moves back and the period 1am-2am exists twice. In this case, the segment would start on Sat Oct 31 2020 16:00:00 EST (timestamp: 1604174400000) and end on Sun Nov 01 2020 13:00:00 EST (timestamp: 1604253600000). (`beforeNovemberEvent,1604185200000,22H,3H,-1,android`)
??? example "Single time zone to test"
America/New_York
??? example "Multi time zones to test"
- America/New_York starting at `0`
- America/Los_Angeles starting at `1583600400000` (Sat Mar 07 2020 12:00:00 EST)
- America/New_York starting at `1583683200000` (Sun Mar 08 2020 12:00:00 EST)
- America/Los_Angeles starting at `1604160000000` (Sat Oct 31 2020 12:00:00 EST)
- America/New_York starting at `1604250000000` (Sun Nov 01 2020 12:00:00 EST)
??? hint "Understanding event segments with multi timezones"
<figure>
<img src="../../img/testing_eventsegments_mtz.png" max-width="100%" />
</figure>
??? check "**Document your tests**"
- Before you start implementing any test data you need to document your tests.
- The documentation of your tests should be added to `docs/developers/test-cases.md` under the corresponding sensor.
- You will need to add two subsections `Description` and the `Checklist`
- The amount of data you need depends on each sensor but you can be efficient by creating data that covers corner cases in more than one time segment. For example, a battery episode from 11am to 1pm, covers the case when an episode has to be split for 30min frequency segments and for morning segments.
- As a rule of thumb think about corner cases for 30min segments as they will give you the most flexibility.
- Only add tests for iOS if the raw data format is different than Android's (for example for screen)
- Create specific tests for Sunday before and after 02:00. These will test daylight saving switches, in March 02:00 to 02:59 do not exist, and in November 01:00 to 01:59 exist twice (read below how `tests/script/assign_test_timestamps.py` handles this)
??? example "Example of Description"
`Description` is a list and every item describes the different scenarios your test data is covering. For example, if we are testing PHONE_BATTERY:
```
- We test 24 discharge episodes, 24 charge episodes and 2 episodes with a 0 discharge rate
- One episode is shorter than 30 minutes (`start timestamp` to `end timestamp`)
- One episode is 120 minutes long from 11:00 to 13:00 (`start timestamp` to `end timestamp`). This one covers the case when an episode has to be chunked for 30min frequency segments and for morning segments
- One episode is 60 minutes long from 23:30 to 00:30 (`start timestamp` to `end timestamp`). This one covers the case when an episode has to be chunked for 30min frequency segments and for daly segments (overnight)
- One 0 discharge rate episode 10 minutes long that happens within a 30-minute segment (10:00 to 10:29) (`start timestamp` to `end timestamp`)
- Three discharge episodes that happen between during beforeMarchEvent (start/end timestamps of those discharge episodes)
- Three charge episodes that happen between during beforeMarchEvent (start/end timestamps of those charge episodes)
- One discharge episode that happen between 00:30 and 04:00 to test for daylight saving times in March and Novemeber 2020.
- ... any other test corner cases you can think of
```
Describe your test cases in as much detail as possible so in the future if we find a bug in RAPIDS, we know what test case we did not include and should add.
??? example "Example of Checklist"
`Checklist` is a table where you confirm you have verified the output of your dataset for the different time segments and time zones
|time segment| single tz | multi tz|platform|
|-|-|-|-|
|30min|OK|OK|android and iOS|
|morning|OK|OK|android and iOS|
|daily|OK|OK|android and iOS|
|threeday|OK|OK|android and iOS|
|weekend|OK|OK|android and iOS|
|beforeMarchEvent|OK|OK|android and iOS|
|beforeNovemberEvent|OK|OK|android and iOS|
??? check "**Add raw input data.**"
1. Add the raw test data to the corresponding sensor CSV file in `tests/data/manual/aware_csv/SENSOR_raw.csv`. Create the CSV if it does not exist.
2. The test data you create will have the same columns as normal raw data except `test_time` replaces `timestamp`. To make your life easier, you can place a test data row in time using the `test_time` column with the following format: `Day HH:MM:SS.XXX`, for example `Fri 22:54:30.597`.
2. You can convert your manual test data to actual raw test data with the following commands:
- For the selected files: (It could be a single file name or multiple file names separated by whitespace(s))
```
python tests/scripts/assign_test_timestamps.py -f file_name_1 file_name_2
```
- For all files under the `tests/data/manual/aware_csv` folder:
```
python tests/scripts/assign_test_timestamps.py -a
```
2. The script `assign_test_timestamps.py` converts you `test_time` column into a `timestamp`. For example, `Fri 22:54:30.597` is converted to `1583553270597` (`Fri Mar 06 2020 22:54:30 GMT-0500`) and to `1604112870597` (`Fri Oct 30 2020 22:54:30 GMT-0400`). Note you can include milliseconds.
2. The `device_id` should be the same as `pid`.
??? example "Example of test data you need to create"
The `test_time` column will be automatically converted to a timestamp that fits our testing periods in March and November by `tests/script/assign_test_timestamps.py`
```
test_time,device_id,battery_level,battery_scale,battery_status
Fri 01:00:00.000,ios,90,100,4
Fri 01:00:30.500,ios,89,100,4
Fri 01:01:00.000,ios,80,100,4
Fri 01:01:45.500,ios,79,100,4
...
Sat 08:00:00.000,ios,78,100,4
Sat 08:01:00.000,ios,50,100,4
Sat 08:02:00.000,ios,49,100,4
```
??? check "**Add expected output data.**"
1. Add or update the expected output feature file of the participant and sensor you are testing:
```bash
tests/scripts/run_tests.sh
tests/data/processed/features/{type_of_time_segment}/{pid}/device_sensor.csv
# this example is expected output data for battery tests for periodic segments in a single timezone
tests/data/processed/features/stz_periodic/android/phone_sensor.csv
# this example is expected output data for battery tests for periodic segments in multi timezones
tests/data/processed/features/mtz_periodic/android/phone_sensor.csv
```
The following is a snippet of the output you should see after running your test.
??? check "**Edit the config file(s).**"
1. Activate the sensor provider you are testing if it isn't already. Set `[SENSOR][PROVIDER][COMPUTE]` to `TRUE` in the `config.yaml` of the time segments and time zones you are testing:
```yaml
- tests/settings/stz_frequency_config.yaml # For single-timezone frequency time segments
- tests/settings/stz_periodic_config.yaml # For single-timezone periodic time segments
- tests/settings/stz_event_config.yaml # For single-timezone event time segments
```bash
test_sensors_files_exist (test_sensor_features.TestSensorFeatures) ... ok
test_sensors_features_calculations (test_sensor_features.TestSensorFeatures) ... FAIL
- tests/settings/mtz_frequency_config.yaml # For multi-timezone frequency time segments
- tests/settings/mtz_periodic_config.yaml # For multi-timezone periodic time segments
- tests/settings/mtz_event_config.yaml # For multi-timezone event time segments
```
??? check "**Run the pipeline and tests.**"
1. You can run all six segment pipelines and their tests
```bash
bash tests/scripts/run_tests.sh -t all
```
2. You can run only the pipeline of a specific time segment and its tests
```bash
bash tests/scripts/run_tests.sh -t stz_frequency -a both # swap stz_frequency for mtz_frequency, stz_event, mtz_event, etc
```
2. Or, if you are working on your tests and you want to run a pipeline and its tests independently
```bash
bash tests/scripts/run_tests.sh -t stz_frequency -a run
bash tests/scripts/run_tests.sh -t stz_frequency -a test
```
======================================================================
FAIL: test_sensors_features_calculations (test_sensor_features.TestSensorFeatures)
----------------------------------------------------------------------
```
??? hint "How does the test execution work?"
This bash script `tests/scripts/run_tests.sh` executes one or all test pipelines for different time segment types (`frequency`, `periodic`, and `events`) and single or multiple timezones.
The results above show that the first test `test_sensors_files_exist` passed while `test_sensors_features_calculations` failed. In addition you should get the traceback of the failure (not shown here). For more information on how to implement test scripts and use unittest please see [Unittest Documentation](https://docs.python.org/3.7/library/unittest.html#command-line-interface)
The python script `tests/scripts/run_tests.py` runs the tests. It parses the involved participants and active sensor providers in the `config.yaml` file of the time segment type and time zone being tested. We test that the output file we expect exists and that its content matches the expected values.
Testing of the RAPIDS sensors and features is a work-in-progress. Please see `test-cases`{.interpreted-text role="ref"} for a list of sensors and features that have testing currently available.
??? example "Output Example"
The following is a snippet of the output you should see after running your test.
Currently the repository is set up to test a number of sensors out of the box by simply running the `tests/scripts/run_tests.sh` command once the RAPIDS python environment is active.
```bash
test_sensors_files_exist (test_sensor_features.TestSensorFeatures) ... stz_periodic
ok
test_sensors_features_calculations (test_sensor_features.TestSensorFeatures) ... stz_periodic
ok
test_sensors_files_exist (test_sensor_features.TestSensorFeatures) ... stz_frequency
ok
test_sensors_features_calculations (test_sensor_features.TestSensorFeatures) ... stz_frequency
FAIL
```
The results above show that the for stz_periodic, both `test_sensors_files_exist` and `test_sensors_features_calculations` passed. While for stz_frequency, the first test `test_sensors_files_exist` passed while `test_sensors_features_calculations` failed. Additionally, you should get the traceback of the failure (not shown here).

View File

@ -0,0 +1,175 @@
# Validation schema of `config.yaml`
!!! hint "Why do we need to validate the `config.yaml`?"
Most of the key/values in the `config.yaml` are constrained to a set of possible values or types. For example `[TIME_SEGMENTS][TYPE]` can only be one of `["FREQUENCY", "PERIODIC", "EVENT"]`, and `[TIMEZONE]` has to be a string.
We should show the user an error if that's not the case. We could validate this in Python or R but since we reuse scripts and keys in multiple places, tracking these validations can be time consuming and get out of control. Thus, we do these validations through a schema and check that schema before RAPIDS starts processing any data so the user can see the error right away.
Keep in mind these validations can only cover certain base cases. Some validations that require more complex logic should still be done in the respective script. For example, we can check that a CSV file path actually ends in `.csv` but we can only check that the file actually exists in a Python script.
The structure and values of the `config.yaml` file are validated using a YAML schema stored in `tools/config.schema.yaml`. Each key in `config.yaml`, for example `PIDS`, has a corresponding entry in the schema where we can validate its type, possible values, required properties, min and max values, among other things.
The `config.yaml` is validated against the schema every time RAPIDS runs (see the top of the `Snakefile`):
```python
validate(config, "tools/config.schema.yaml")
```
## Structure of the schema
The schema has three main sections `required`, `definitions`, and `properties`. All of them are just nested key/value YAML pairs, where the value can be a primitive type (`integer`, `string`, `boolean`, `number`) or can be another key/value pair (`object`).
### required
`required` lists `properties` that should be present in the `config.yaml`. We will almost always add every `config.yaml` key to this list (meaning that the user cannot delete any of those keys like `TIMEZONE` or `PIDS`).
### definitions
`definitions` lists key/values that are common to different `properties` so we can reuse them. You can define a key/value under `definitions` and use `$ref` to refer to it in any `property`.
For example, every sensor like `[PHONE_ACCELEROMETER]` has one or more providers like `RAPIDS` and `PANDA`, these providers have some common properties like the `COMPUTE` flag or the `SRC_SCRIPT` string. Therefore we define a shared provider "template" that is used by every provider and extended with properties exclusive to each one of them. For example:
=== "provider definition (template)"
The `PROVIDER` definition will be used later on different `properties`.
```yaml
PROVIDER:
type: object
required: [COMPUTE, SRC_SCRIPT, FEATURES]
properties:
COMPUTE:
type: boolean
FEATURES:
type: [array, object]
SRC_SCRIPT:
type: string
pattern: "^.*\\.(py|R)$"
```
=== "provider reusing and extending the template"
Notice that `RAPIDS` (a provider) uses and extends the `PROVIDER` template in this example. The `FEATURES` key is overriding the `FEATURES` key from the `#/definitions/PROVIDER` template but is keeping the validation for `COMPUTE`, and `SRC_SCRIPT`. For more details about reusing properties, go to this [link](http://json-schema.org/understanding-json-schema/structuring.html#reuse)
```yaml hl_lines="9 10"
PHONE_ACCELEROMETER:
type: object
# .. other properties
PROVIDERS:
type: ["null", object]
properties:
RAPIDS:
allOf:
- $ref: "#/definitions/PROVIDER"
- properties:
FEATURES:
type: array
uniqueItems: True
items:
type: string
enum: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
```
### properties
`properties` are nested key/values that describe the different components of our `config.yaml` file. Values can be of one or more primitive types like `string`, `number`, `array`, `boolean` and `null`. Values can also be another key/value pair (of type `object`) that are similar to a dictionary in Python.
For example, the following property validates the `PIDS` of our `config.yaml`. It checks that `PIDS` is an `array` with unique items of type `string`.
```yaml
PIDS:
type: array
uniqueItems: True
items:
type: string
```
## Modifying the schema
!!! hint "Validating the `config.yaml` during development"
If you updated the schema and want to check the `config.yaml` is compliant, you can run the command `snakemake --list-params-changes`. You will see `Building DAG of jobs...` if there are no problems or an error message otherwise (try setting any `COMPUTE` flag to a string like `test` instead of `False/True`).
You can use this command without having to configure RAPIDS to process any participants or sensors.
You can validate different aspects of each key/value in our `config.yaml` file:
=== "number/integer"
Including min and max values
```yaml
MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS:
type: number
minimum: 0
maximum: 1
FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD:
type: integer
exclusiveMinimum: 0
```
=== "string"
Including valid values (`enum`)
```yaml
items:
type: string
enum: ["count", "maxlux", "minlux", "avglux", "medianlux", "stdlux"]
```
=== "boolean"
```yaml
MINUTES_DATA_USED:
type: boolean
```
=== "array"
Including whether or not it should have unique values, the type of the array's elements (`strings`, `numbers`) and valid values (`enum`).
```yaml
MESSAGES_TYPES:
type: array
uniqueItems: True
items:
type: string
enum: ["received", "sent"]
```
=== "object"
`PARENT` is an object that has two properties. `KID1` is one of those properties that are, in turn, another object that will reuse the `"#/definitions/PROVIDER"` `definition` **AND** also include (extend) two extra properties `GRAND_KID1` of type `array` and `GRAND_KID2` of type `number`. `KID2` is another property of `PARENT` of type `boolean`.
The schema validation looks like this
```yaml
PARENT:
type: object
properties:
KID1:
allOf:
- $ref: "#/definitions/PROVIDER"
- properties:
GRAND_KID1:
type: array
uniqueItems: True
GRAND_KID2:
type: number
KID2:
type: boolean
```
The `config.yaml` key that the previous schema validates looks like this:
```yaml
PARENT:
KID1:
# These four come from the `PROVIDER` definition (template)
COMPUTE: False
FEATURES: [x, y] # an array
SRC_SCRIPT: "a path to a py or R script"
# This two come from the extension
GRAND_KID1: [a, b] # an array
GRAND_KID2: 5.1 # an number
KID2: True # a boolean
```
## Verifying the schema is correct
We recommend that before you start modifying the schema you modify the `config.yaml` key that you want to validate with an invalid value. For example, if you want to validate that `COMPUTE` is boolean, you set `COMPUTE: 123`. Then create your validation, run `snakemake --list-params-changes` and make sure your validation fails (123 is not `boolean`), and then set the key to the correct value. In other words, make sure it's broken first so that you know that your validation works.
!!! warning
**Be careful**. You can check that the schema `config.schema.yaml` has a valid format by running `python tools/check_schema.py`. You will see this message if its structure is correct: `Schema is OK`. However, we don't have a way to detect typos, for example `allOf` will work but `allOF` won't (capital `F`) and it won't show any error. That's why we recommend to start with an invalid key/value in your `config.yaml` so that you can be sure the schema validation finds the problem.
## Useful resources
Read the following links to learn more about what we can validate with schemas. They are based on `JSON` instead of `YAML` schemas but the same concepts apply.
- [Understanding JSON Schemas](http://json-schema.org/understanding-json-schema/index.html)
- [Specification of the JSON schema we use](https://tools.ietf.org/html/draft-handrews-json-schema-01)

View File

@ -1,17 +1,21 @@
## Python Virtual Environment
### Add new packages
Try to install any new package using `conda install -c CHANNEL PACKAGE_NAME` (you can use `pip` if the package is only available there). Make sure your Python virtual environment is active (`conda activate YOUR_ENV`).
### Remove packages
Uninstall packages using the same manager you used to install them `conda remove PACKAGE_NAME` or `pip uninstall PACKAGE_NAME`
### Update your conda `environment.yaml`
After installing or removing a package you can use the following command in your terminal to update your `environment.yaml` before publishing your pipeline. Note that we ignore the package version for `libfortran` to keep compatibility with Linux:
### Updating all packages
Make sure your Python virtual environment is active (`conda activate YOUR_ENV`), then run
```bash
conda env export --no-builds | sed 's/^.*libgfortran.*$/ - libgfortran/' > environment.yml
conda update --all
```
### Update your conda `environment.yaml`
After installing or removing a package you can use the following command in your terminal to update your `environment.yaml` before publishing your pipeline. Note that we ignore the package version for `libfortran` and `mkl` to keep compatibility with Linux:
```bash
conda env export --no-builds | sed 's/^.*libgfortran.*$/ - libgfortran/' | sed 's/^.*mkl=.*$/ - mkl/' > environment.yml
```
## R Virtual Environment
@ -26,6 +30,10 @@ conda env export --no-builds | sed 's/^.*libgfortran.*$/ - libgfortran/' > env
2. Run `R` to open an R interactive session
3. Run `renv::remove("PACKAGE_NAME")`
### Updating all packages
1. Open your terminal and navigate to RAPIDS' root folder
2. Run `R` to open an R interactive session
3. Run `renv::update()`
### Update your R `renv.lock`
After installing or removing a package you can use the following command in your terminal to update your `renv.lock` before publishing your pipeline.

View File

@ -1,216 +0,0 @@
# Frequently Asked Questions
## Cannot connect to your MySQL server
???+ failure "Problem"
```bash
**Error in .local(drv, \...) :** **Failed to connect to database: Error:
Can\'t initialize character set unknown (path: compiled\_in)** :
Calls: dbConnect -> dbConnect -> .local -> .Call
Execution halted
[Tue Mar 10 19:40:15 2020]
Error in rule download_dataset:
jobid: 531
output: data/raw/p60/locations_raw.csv
RuleException:
CalledProcessError in line 20 of /home/ubuntu/rapids/rules/preprocessing.snakefile:
Command 'set -euo pipefail; Rscript --vanilla /home/ubuntu/rapids/.snakemake/scripts/tmp_2jnvqs7.download_dataset.R' returned non-zero exit status 1.
File "/home/ubuntu/rapids/rules/preprocessing.snakefile", line 20, in __rule_download_dataset
File "/home/ubuntu/anaconda3/envs/moshi-env/lib/python3.7/concurrent/futures/thread.py", line 57, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
```
???+ done "Solution"
Please make sure the `DATABASE_GROUP` in `config.yaml` matches your DB credentials group in `.env`.
---
## Cannot start mysql in linux via `brew services start mysql`
???+ failure "Problem"
Cannot start mysql in linux via `brew services start mysql`
???+ done "Solution"
Use `mysql.server start`
---
## Every time I run force the download_dataset rule all rules are executed
???+ failure "Problem"
When running `snakemake -j1 -R download_phone_data` or `./rapids -j1 -R download_phone_data` all the rules and files are re-computed
???+ done "Solution"
This is expected behavior. The advantage of using `snakemake` under the hood is that every time a file containing data is modified every rule that depends on that file will be re-executed to update their results. In this case, since `download_dataset` updates all the raw data, and you are forcing the rule with the flag `-R` every single rule that depends on those raw files will be executed.
---
## Error `Table XXX doesn't exist` while running the `download_phone_data` or `download_fitbit_data` rule.
???+ failure "Problem"
```bash
Error in .local(conn, statement, ...) :
could not run statement: Table 'db_name.table_name' doesn't exist
Calls: colnames ... .local -> dbSendQuery -> dbSendQuery -> .local -> .Call
Execution halted
```
???+ done "Solution"
Please make sure the sensors listed in `[PHONE_VALID_SENSED_BINS][PHONE_SENSORS]` and the `[TABLE]` of each sensor you activated in `config.yaml` match your database tables.
---
## How do I install RAPIDS on Ubuntu 16.04
???+ done "Solution"
1. Install dependencies (Homebrew - if not installed):
- `sudo apt-get install libmariadb-client-lgpl-dev libxml2-dev libssl-dev`
- Install [brew](https://docs.brew.sh/Homebrew-on-Linux) for linux and add the following line to `~/.bashrc`: `export PATH=$HOME/.linuxbrew/bin:$PATH`
- `source ~/.bashrc`
1. Install MySQL
- `brew install mysql`
- `brew services start mysql`
2. Install R, pandoc and rmarkdown:
- `brew install r`
- `brew install gcc@6` (needed due to this [bug](https://github.com/Homebrew/linuxbrew-core/issues/17812))
- `HOMEBREW_CC=gcc-6 brew install pandoc`
3. Install miniconda using these [instructions](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html)
4. Clone our repo:
- `git clone https://github.com/carissalow/rapids`
5. Create a python virtual environment:
- `cd rapids`
- `conda env create -f environment.yml -n MY_ENV_NAME`
- `conda activate MY_ENV_NAME`
6. Install R packages and virtual environment:
- `snakemake renv_install`
- `snakemake renv_init`
- `snakemake renv_restore`
This step could take several minutes to complete. Please be patient and let it run until completion.
---
## `mysql.h` cannot be found
???+ failure "Problem"
```bash
--------------------------[ ERROR MESSAGE ]----------------------------
<stdin>:1:10: fatal error: mysql.h: No such file or directory
compilation terminated.
-----------------------------------------------------------------------
ERROR: configuration failed for package 'RMySQL'
```
???+ done "Solution"
```bash
sudo apt install libmariadbclient-dev
```
---
## No package `libcurl` found
???+ failure "Problem"
`libcurl` cannot be found
???+ done "Solution"
Install `libcurl`
```bash
sudo apt install libcurl4-openssl-dev
```
---
## Configuration failed because `openssl` was not found.
???+ failure "Problem"
`openssl` cannot be found
???+ done "Solution"
Install `openssl`
```bash
sudo apt install libssl-dev
```
---
## Configuration failed because `libxml-2.0` was not found
???+ failure "Problem"
`libxml-2.0` cannot be found
???+ done "Solution"
Install `libxml-2.0`
```bash
sudo apt install libxml2-dev
```
---
## SSL connection error when running RAPIDS
???+ failure "Problem"
You are getting the following error message when running RAPIDS:
```bash
Error: Failed to connect: SSL connection error: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol.
```
???+ done "Solution"
This is a bug in Ubuntu 20.04 when trying to connect to an old MySQL server with MySQL client 8.0. You should get the same error message if you try to connect from the command line. There you can add the option `--ssl-mode=DISABLED` but we can\'t do this from the R connector.
If you can\'t update your server, the quickest solution would be to import your database to another server or to a local environment. Alternatively, you could replace `mysql-client` and `libmysqlclient-dev` with `mariadb-client` and `libmariadbclient-dev` and reinstall renv. More info about this issue [here](https://bugs.launchpad.net/ubuntu/+source/mysql-8.0/+bug/1872541)
---
## `DB_TABLES` key not found
???+ failure "Problem"
If you get the following error `KeyError in line 43 of preprocessing.smk: 'PHONE_SENSORS'`, it means that the indentation of the key `[PHONE_SENSORS]` is not matching the other child elements of `PHONE_VALID_SENSED_BINS`
???+ done "Solution"
You need to add or remove any leading whitespaces as needed on that line.
```yaml
PHONE_VALID_SENSED_BINS:
COMPUTE: False # This flag is automatically ignored (set to True) if you are extracting PHONE_VALID_SENSED_DAYS or screen or Barnett's location features
BIN_SIZE: &bin_size 5 # (in minutes)
PHONE_SENSORS: []
```
---
## Error while updating your conda environment in Ubuntu
???+ failure "Problem"
You get the following error:
```bash
CondaMultiError: CondaVerificationError: The package for tk located at /home/ubuntu/miniconda2/pkgs/tk-8.6.9-hed695b0_1003
appears to be corrupted. The path 'include/mysqlStubs.h'
specified in the package manifest cannot be found.
ClobberError: This transaction has incompatible packages due to a shared path.
packages: conda-forge/linux-64::llvm-openmp-10.0.0-hc9558a2_0, anaconda/linux-64::intel-openmp-2019.4-243
path: 'lib/libiomp5.so'
```
???+ done "Solution"
Reinstall conda
## Embedded nul in string
???+ failure "Problem"
You get the following error when downloading sensor data:
```bash
Error in result_fetch(res@ptr, n = n) :
embedded nul in string:
```
???+ done "Solution"
This problem is due to the way `RMariaDB` handles a mismatch between data types in R and MySQL (see [this issue](https://github.com/r-dbi/RMariaDB/issues/121)). Since it seems this problem won't be handled by `RMariaDB`, you have two options:
1. If it's only a few rows that are causing this problem, remove the the null character from the conflictive table cell.
2. If it's not feasible to modify your data you can try swapping `RMariaDB` with `RMySQL`. Just have in mind you might have problems connecting to modern MySQL servers running in Liunx:
- Add `RMySQL` to the renv environment by running the following command in a terminal open on RAPIDS root folder
```bash
R -e 'renv::install("RMySQL")'
```
- Go to `src/data/download_phone_data.R` and replace `library(RMariaDB)` with `library(RMySQL)`
- In the same file replace `dbEngine <- dbConnect(MariaDB(), default.file = "./.env", group = group)` with `dbEngine <- dbConnect(MySQL(), default.file = "./.env", group = group)`

View File

@ -1,67 +1,87 @@
# Add New Features
!!! hint
We recommend reading the [Behavioral Features Introduction](../feature-introduction/) before reading this page
!!! hint
You won't have to deal with time zones, dates, times, data cleaning or preprocessing. The data that RAPIDS pipes to your feature extraction code is ready to process.
- We recommend reading the [Behavioral Features Introduction](../feature-introduction/) before reading this page.
- You can implement new features in Python or R scripts.
- You won't have to deal with time zones, dates, times, data cleaning, or preprocessing. The data that RAPIDS pipes to your feature extraction code are ready to process.
## New Features for Existing Sensors
You can add new features to any existing sensors (see list below) by adding a new provider in three steps:
1. [Modify](#modify-the-configyaml-file) the `config.yaml` file
2. [Create](#create-a-provider-folder-script-and-function) a provider folder, script and function
2. [Create](#create-a-feature-provider-script) your feature provider script
3. [Implement](#implement-your-feature-extraction-code) your features extraction code
As a tutorial, we will add a new provider for `PHONE_ACCELEROMETER` called `VEGA` that extracts `feature1`, `feature2`, `feature3` in Python and that it requires a parameter from the user called `MY_PARAMETER`.
As a tutorial, we will add a new provider for `PHONE_ACCELEROMETER` called `VEGA` that extracts `feature1`, `feature2`, `feature3` with a Python script that requires a parameter from the user called `MY_PARAMETER`.
??? info "Existing Sensors"
An existing sensor is any of the phone or Fitbit sensors with a configuration entry in `config.yaml`:
An existing sensor of any device with a configuration entry in `config.yaml`:
Smartphone (AWARE)
- Phone Accelerometer
- Phone Activity Recognition
- Phone Applications Crashes
- Phone Applications Foreground
- Phone Applications Notifications
- Phone Battery
- Phone Bluetooth
- Phone Calls
- Phone Conversation
- Phone Data Yield
- Phone Keyboard
- Phone Light
- Phone Locations
- Phone Log
- Phone Messages
- Phone Screen
- Phone WiFI Connected
- Phone WiFI Visible
Fitbit
- Fitbit Data Yield
- Fitbit Heart Rate Summary
- Fitbit Heart Rate Intraday
- Fitbit Sleep Summary
- Fitbit Sleep Intraday
- Fitbit Steps Summary
- Fitbit Steps Intraday
Empatica
- Empatica Accelerometer
- Empatica Heart Rate
- Empatica Temperature
- Empatica Electrodermal Activity
- Empatica Blood Volume Pulse
- Empatica Inter Beat Interval
- Empatica Tags
### Modify the `config.yaml` file
In this step you need to add your provider configuration section under the relevant sensor in `config.yaml`. See our example for our tutorial's `VEGA` provider for `PHONE_ACCELEROMETER`:
In this step, you need to add your provider configuration section under the relevant sensor in `config.yaml`. See our example for our tutorial's `VEGA` provider for `PHONE_ACCELEROMETER`:
??? example "Example configuration for a new accelerometer provider `VEGA`"
```yaml
```yaml hl_lines="12 13 14 15 16"
PHONE_ACCELEROMETER:
TABLE: accelerometer
CONTAINER: accelerometer
PROVIDERS:
RAPIDS:
RAPIDS: # this is a feature provider
COMPUTE: False
...
PANDA:
PANDA: # this is another feature provider
COMPUTE: False
...
VEGA:
VEGA: # this is our new feature provider
COMPUTE: False
FEATURES: ["feature1", "feature2", "feature3"]
MY_PARAMTER: a_string
SRC_FOLDER: "vega"
SRC_LANGUAGE: "python"
SRC_SCRIPT: src/features/phone_accelerometer/vega/main.py
```
@ -69,68 +89,70 @@ In this step you need to add your provider configuration section under the relev
|---|---|
|`[COMPUTE]`| Flag to activate/deactivate your provider
|`[FEATURES]`| List of features your provider supports. Your provider code should only return the features on this list
|`[MY_PARAMTER]`| An arbitrary parameter that our example provider `VEGA` needs. This can be a boolean, integer, float, string or an array of any of such types.
|`[SRC_LANGUAGE]`| The programming language of your provider script, it can be `python` or `r`, in our example `python`
|`[SRC_FOLDER]`| The name of your provider in lower case, in our example `vega` (this will be the name of your folder in the next step)
|`[MY_PARAMTER]`| An arbitrary parameter that our example provider `VEGA` needs. This can be a boolean, integer, float, string, or an array of any of such types.
|`[SRC_SCRIPT]`| The relative path from RAPIDS' root folder to a script that computes the features for this provider. It can be implemented in R or Python.
### Create a provider folder, script and function
### Create a feature provider script
In this step you need to add a folder, script and function for your provider.
5. Create your provider **folder** under `src/feature/DEVICE_SENSOR/YOUR_PROVIDER`, in our example `src/feature/phone_accelerometer/vega` (same as `[SRC_FOLDER]` in the step above).
6. Create your provider **script** inside your provider folder, it can be a Python file called `main.py` or an R file called `main.R`.
7. Add your provider **function** in your provider script. The name of such function should be `[providername]_features`, in our example `vega_features`
!!! info "Python function"
```python
def [providername]_features(sensor_data_files, time_segment, provider, filter_data_by_segment, *args, **kwargs):
```
!!! info "R function"
```r
[providername]_features <- function(sensor_data, time_segment, provider)
```
Create your feature Python or R script called `main.py` or `main.R` in the correct folder, `src/feature/[sensorname]/[providername]/`. RAPIDS automatically loads and executes it based on the config key `[SRC_SCRIPT]` you added in the last step. For our example, this script is:
```bash
src/feature/phone_accelerometer/vega/main.py
```
### Implement your feature extraction code
Every feature script (`main.[py|R]`) needs a `[providername]_features` function with specific parameters. RAPIDS calls this function with the sensor data ready to process and with other functions and arguments you will need.
The provider function that you created in the step above will receive the following parameters:
=== "Python function"
```python
def [providername]_features(sensor_data_files, time_segment, provider, filter_data_by_segment, *args, **kwargs):
# empty for now
return(your_features_df)
```
=== "R function"
```r
[providername]_features <- function(sensor_data, time_segment, provider){
# empty for now
return(your_features_df)
}
```
| Parameter&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description
|---|---|
|`sensor_data_files`| Path to the CSV file containing the data of a single participant. This data has been cleaned and preprocessed. Your function will be automatically called for each participant in your study (in the `[PIDS]` array in `config.yaml`)
|`time_segment`| The label of the time segment that should be processed.
|`provider`| The parameters you configured for your provider in `config.yaml` will be available in this variable as a dictionary in Python or a list in R. In our example this dictionary contains `{MY_PARAMETER:"a_string"}`
|`filter_data_by_segment`| Python only. A function that you will use to filter your data. In R this function is already available in the environment.
|`provider`| The parameters you configured for your provider in `config.yaml` will be available in this variable as a dictionary in Python or a list in R. In our example, this dictionary contains `{MY_PARAMETER:"a_string"}`
|`filter_data_by_segment`| Python only. A function that you will use to filter your data. In R, this function is already available in the environment.
|`*args`| Python only. Not used for now
|`**kwargs`| Python only. Not used for now
The code to extract your behavioral features should be implemented in your provider function and in general terms it will have three stages:
The next step is to implement the code that computes your behavioral features in your provider script's function. As with any other script, this function can call other auxiliary methods, but in general terms, it should have three stages:
??? info "1. Read a participant's data by loading the CSV data stored in the file pointed by `sensor_data_files`"
``` python
acc_data = pd.read_csv(sensor_data_files["sensor_data"])
```
Note that phone's battery, screen, and activity recognition data is given as episodes instead of event rows (for example, start and end timestamps of the periods the phone screen was on)
Note that the phone's battery, screen, and activity recognition data are given as episodes instead of event rows (for example, start and end timestamps of the periods the phone screen was on)
??? info "2. Filter your data to process only those rows that belong to `time_segment`"
This step is only one line of code, but to undersand why we need it, keep reading.
This step is only one line of code, but keep reading to understand why we need it.
```python
acc_data = filter_data_by_segment(acc_data, time_segment)
```
You should use the `filter_data_by_segment()` function to process and group those rows that belong to each of the [time segments RAPIDS could be configured with](../../setup/configuration/#time-segments).
Let's understand the `filter_data_by_segment()` function with an example. A RAPIDS user can extract features on any arbitrary [time segment](../../setup/configuration/#time-segments). A time segment is a period of time that has a label and one or more instances. For example, the user (or you) could have requested features on a daily, weekly, and week-end basis for `p01`. The labels are arbritrary and the instances depend on the days a participant was monitored for:
Let's understand the `filter_data_by_segment()` function with an example. A RAPIDS user can extract features on any arbitrary [time segment](../../setup/configuration/#time-segments). A time segment is a period that has a label and one or more instances. For example, the user (or you) could have requested features on a daily, weekly, and weekend basis for `p01`. The labels are arbitrary, and the instances depend on the days a participant was monitored for:
- the daily segment could be named `my_days` and if `p01` was monitored for 14 days, it would have 14 instances
- the weekly segment could be named `my_weeks` and if `p01` was monitored for 14 days, it would have 2 instances.
- the weekend segment could be named `my_weekends` and if `p01` was monitored for 14 days, it would have 2 instances.
For this example, RAPIDS will call your provider function three times for `p01`, once where `time_segment` is `my_days`, once where `time_segment` is `my_weeks` and once where `time_segment` is `my_weekends`. In this example not every row in `p01`'s data needs to take part in the feature computation for either segment **and** the rows need to be grouped differently.
For this example, RAPIDS will call your provider function three times for `p01`, once where `time_segment` is `my_days`, once where `time_segment` is `my_weeks`, and once where `time_segment` is `my_weekends`. In this example, not every row in `p01`'s data needs to take part in the feature computation for either segment **and** the rows need to be grouped differently.
Thus `filter_data_by_segment()` comes in handy, it will return a data frame that contains the rows that were logged during a time segment plus an extra column called `local_segment`. This new column will have as many unique values as time segment instances exist (14, 2, and 2 for our `p01`'s `my_days`, `my_weeks`, and `my_weekends` examples). After filtering, **you should group the data frame by this column and compute any desired features**, for example:
@ -138,54 +160,24 @@ The code to extract your behavioral features should be implemented in your provi
acc_features["maxmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].max()
```
The reason RAPIDS does not filter the participant's data set for you is because your code might need to compute something based on a participant's complete dataset before computing their features. For example, you might want to identify the number that called a participant the most throughout the study before computing a feature with the number of calls the participant received from this number.
The reason RAPIDS does not filter the participant's data set for you is because your code might need to compute something based on a participant's complete dataset before computing their features. For example, you might want to identify the number that called a participant the most throughout the study before computing a feature with the number of calls the participant received from that number.
??? info "3. Return a data frame with your features"
After filtering, grouping your data, and computing your features, your provider function should return a data frame that has:
- One row per time segment instance (e.g. 14 our `p01`'s `my_days` example)
- One row per time segment instance (e.g., 14 our `p01`'s `my_days` example)
- The `local_segment` column added by `filter_data_by_segment()`
- One column per feature. By convention the name of your features should only contain letters or numbers (`feature1`). RAPIDS will automatically add the right sensor and provider prefix (`phone_accelerometr_vega_`)
- One column per feature. The name of your features should only contain letters or numbers (`feature1`) by convention. RAPIDS automatically adds the correct sensor and provider prefix; in our example, this prefix is `phone_accelerometr_vega_`.
??? example "`PHONE_ACCELEROMETER` Provider Example"
For your reference, this a short example of our own provider (`RAPIDS`) for `PHONE_ACCELEROMETER` that computes five acceleration features
For your reference, this our own provider (`RAPIDS`) for `PHONE_ACCELEROMETER` that computes five acceleration features
```python
def rapids_features(sensor_data_files, time_segment, provider, filter_data_by_segment, *args, **kwargs):
acc_data = pd.read_csv(sensor_data_files["sensor_data"])
requested_features = provider["FEATURES"]
# name of the features this function can compute
base_features_names = ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
# the subset of requested features this function can compute
features_to_compute = list(set(requested_features) & set(base_features_names))
--8<---- "src/features/phone_accelerometer/rapids/main.py"
acc_features = pd.DataFrame(columns=["local_segment"] + features_to_compute)
if not acc_data.empty:
acc_data = filter_data_by_segment(acc_data, time_segment)
if not acc_data.empty:
acc_features = pd.DataFrame()
# get magnitude related features: magnitude = sqrt(x^2+y^2+z^2)
magnitude = acc_data.apply(lambda row: np.sqrt(row["double_values_0"] ** 2 + row["double_values_1"] ** 2 + row["double_values_2"] ** 2), axis=1)
acc_data = acc_data.assign(magnitude = magnitude.values)
if "maxmagnitude" in features_to_compute:
acc_features["maxmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].max()
if "minmagnitude" in features_to_compute:
acc_features["minmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].min()
if "avgmagnitude" in features_to_compute:
acc_features["avgmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].mean()
if "medianmagnitude" in features_to_compute:
acc_features["medianmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].median()
if "stdmagnitude" in features_to_compute:
acc_features["stdmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].std()
acc_features = acc_features.reset_index()
return acc_features
```
## New Features for Non-Existing Sensors
If you want to add features for a device or a sensor that we do not support at the moment (those that do not appear in the `"Existing Sensors"` list above), [contact us](../../team) or request it on [Slack](http://awareframework.com:3000/) and we can add the necessary code so you can follow the instructions above.
If you want to add features for a device or a sensor that we do not support at the moment (those that do not appear in the `"Existing Sensors"` list above), [open a new discussion](https://github.com/carissalow/rapids/discussions) in Github and we can add the necessary code so you can follow the instructions above.

View File

@ -0,0 +1,42 @@
# Empatica Accelerometer
Sensor parameters description for `[EMPATICA_ACCELEROMETER]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[CONTAINER]`| Name of the CSV file containing accelerometer data that is compressed inside an Empatica zip file. Since these zip files are created [automatically](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) by Empatica, there is no need to change the value of this attribute.
## DBDP provider
!!! info "Available time segments and platforms"
- Available for all time segments
!!! info "File Sequence"
```bash
- data/raw/{pid}/empatica_accelerometer_raw.csv
- data/raw/{pid}/empatica_accelerometer_with_datetime.csv
- data/interim/{pid}/empatica_accelerometer_features/empatica_accelerometer_{language}_{provider_key}.csv
- data/processed/features/{pid}/empatica_accelerometer.csv
```
Parameters description for `[EMPATICA_ACCELEROMETER][PROVIDERS][DBDP]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `EMPATICA_ACCELEROMETER` features from the `DBDP` provider|
|`[FEATURES]` | Features to be computed, see table below
Features description for `[EMPATICA_ACCELEROMETER][PROVIDERS][RAPDBDPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|maxmagnitude |m/s^2^ |The maximum magnitude of acceleration ($\|acceleration\| = \sqrt{x^2 + y^2 + z^2}$).
|minmagnitude |m/s^2^ |The minimum magnitude of acceleration.
|avgmagnitude |m/s^2^ |The average magnitude of acceleration.
|medianmagnitude |m/s^2^ |The median magnitude of acceleration.
|stdmagnitude |m/s^2^ |The standard deviation of acceleration.
!!! note "Assumptions/Observations"
1. Analyzing accelerometer data is a memory intensive task. If RAPIDS crashes is likely because the accelerometer dataset for a participant is too big to fit in memory. We are considering different alternatives to overcome this problem, if this is something you need, get in touch and we can discuss how to implement it.

View File

@ -0,0 +1,46 @@
# Empatica Blood Volume Pulse
Sensor parameters description for `[EMPATICA_BLOOD_VOLUME_PULSE]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[CONTAINER]`| Name of the CSV file containing blood volume pulse data that is compressed inside an Empatica zip file. Since these zip files are created [automatically](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) by Empatica, there is no need to change the value of this attribute.
## DBDP provider
!!! info "Available time segments and platforms"
- Available for all time segments
!!! info "File Sequence"
```bash
- data/raw/{pid}/empatica_blood_volume_pulse_raw.csv
- data/raw/{pid}/empatica_blood_volume_pulse_with_datetime.csv
- data/interim/{pid}/empatica_blood_volume_pulse_features/empatica_blood_volume_pulse_{language}_{provider_key}.csv
- data/processed/features/{pid}/empatica_blood_volume_pulse.csv
```
Parameters description for `[EMPATICA_BLOOD_VOLUME_PULSE][PROVIDERS][DBDP]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]` | Set to `True` to extract `EMPATICA_BLOOD_VOLUME_PULSE` features from the `DBDP` provider|
|`[FEATURES]` | Features to be computed from blood volume pulse intraday data, see table below |
Features description for `[EMPATICA_BLOOD_VOLUME_PULSE][PROVIDERS][DBDP]`:
|Feature |Units |Description|
|-------------------------- |-------------- |---------------------------|
|maxbvp |- |The maximum blood volume pulse during a time segment.
|minbvp |- |The minimum blood volume pulse during a time segment.
|avgbvp |- |The average blood volume pulse during a time segment.
|medianbvp |- |The median of blood volume pulse during a time segment.
|modebvp |- |The mode of blood volume pulse during a time segment.
|stdbvp |- |The standard deviation of blood volume pulse during a time segment.
|diffmaxmodebvp |- |The difference between the maximum and mode blood volume pulse during a time segment.
|diffminmodebvp |- |The difference between the mode and minimum blood volume pulse during a time segment.
|entropybvp |nats |Shannons entropy measurement based on blood volume pulse during a time segment.
!!! note "Assumptions/Observations"
For more information about BVP read [this](https://support.empatica.com/hc/en-us/articles/360029719792-E4-data-BVP-expected-signal).

View File

@ -0,0 +1,46 @@
# Empatica Electrodermal Activity
Sensor parameters description for `[EMPATICA_ELECTRODERMAL_ACTIVITY]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[CONTAINER]`| Name of the CSV file containing electrodermal activity data that is compressed inside an Empatica zip file. Since these zip files are created [automatically](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) by Empatica, there is no need to change the value of this attribute.
## DBDP provider
!!! info "Available time segments and platforms"
- Available for all time segments
!!! info "File Sequence"
```bash
- data/raw/{pid}/empatica_electrodermal_activity_raw.csv
- data/raw/{pid}/empatica_electrodermal_activity_with_datetime.csv
- data/interim/{pid}/empatica_electrodermal_activity_features/empatica_electrodermal activity_{language}_{provider_key}.csv
- data/processed/features/{pid}/empatica_electrodermal_activity.csv
```
Parameters description for `[EMPATICA_ELECTRODERMAL_ACTIVITY][PROVIDERS][DBDP]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]` | Set to `True` to extract `EMPATICA_ELECTRODERMAL_ACTIVITY` features from the `DBDP` provider|
|`[FEATURES]` | Features to be computed from electrodermal activity intraday data, see table below |
Features description for `[EMPATICA_ELECTRODERMAL ACTIVITY][PROVIDERS][DBDP]`:
|Feature |Units |Description|
|-------------------------- |-------------- |---------------------------|
|maxeda |microsiemens |The maximum electrical conductance during a time segment.
|mineda |microsiemens |The minimum electrical conductance during a time segment.
|avgeda |microsiemens |The average electrical conductance during a time segment.
|medianeda |microsiemens |The median of electrical conductance during a time segment.
|modeeda |microsiemens |The mode of electrical conductance during a time segment.
|stdeda |microsiemens |The standard deviation of electrical conductance during a time segment.
|diffmaxmodeeda |microsiemens |The difference between the maximum and mode electrical conductance during a time segment.
|diffminmodeeda |microsiemens |The difference between the mode and minimum electrical conductance during a time segment.
|entropyeda |nats |Shannons entropy measurement based on electrical conductance during a time segment.
!!! note "Assumptions/Observations"
None

View File

@ -0,0 +1,46 @@
# Empatica Heart Rate
Sensor parameters description for `[EMPATICA_HEARTRATE]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[CONTAINER]`| Name of the CSV file containing heart rate data that is compressed inside an Empatica zip file. Since these zip files are created [automatically](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) by Empatica, there is no need to change the value of this attribute.
## DBDP provider
!!! info "Available time segments and platforms"
- Available for all time segments
!!! info "File Sequence"
```bash
- data/raw/{pid}/empatica_heartrate_raw.csv
- data/raw/{pid}/empatica_heartrate_with_datetime.csv
- data/interim/{pid}/empatica_heartrate_features/empatica_heartrate_{language}_{provider_key}.csv
- data/processed/features/{pid}/empatica_heartrate.csv
```
Parameters description for `[EMPATICA_HEARTRATE][PROVIDERS][DBDP]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]` | Set to `True` to extract `EMPATICA_HEARTRATE` features from the `DBDP` provider|
|`[FEATURES]` | Features to be computed from heart rate intraday data, see table below |
Features description for `[EMPATICA_HEARTRATE][PROVIDERS][DBDP]`:
|Feature |Units |Description|
|-------------------------- |-------------- |---------------------------|
|maxhr |beats |The maximum heart rate during a time segment.
|minhr |beats |The minimum heart rate during a time segment.
|avghr |beats |The average heart rate during a time segment.
|medianhr |beats |The median of heart rate during a time segment.
|modehr |beats |The mode of heart rate during a time segment.
|stdhr |beats |The standard deviation of heart rate during a time segment.
|diffmaxmodehr |beats |The difference between the maximum and mode heart rate during a time segment.
|diffminmodehr |beats |The difference between the mode and minimum heart rate during a time segment.
|entropyhr |nats |Shannons entropy measurement based on heart rate during a time segment.
!!! note "Assumptions/Observations"
We extract the previous features based on the average heart rate values computed in [10-second windows](https://support.empatica.com/hc/en-us/articles/360029469772-E4-data-HR-csv-explanation).

View File

@ -0,0 +1,46 @@
# Empatica Inter Beat Interval
Sensor parameters description for `[EMPATICA_INTER_BEAT_INTERVAL]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[CONTAINER]`| Name of the CSV file containing inter beat interval data that is compressed inside an Empatica zip file. Since these zip files are created [automatically](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) by Empatica, there is no need to change the value of this attribute.
## DBDP provider
!!! info "Available time segments and platforms"
- Available for all time segments
!!! info "File Sequence"
```bash
- data/raw/{pid}/empatica_inter_beat_interval_raw.csv
- data/raw/{pid}/empatica_inter_beat_interval_with_datetime.csv
- data/interim/{pid}/empatica_inter_beat_interval_features/empatica_inter_beat_interval_{language}_{provider_key}.csv
- data/processed/features/{pid}/empatica_inter_beat_interval.csv
```
Parameters description for `[EMPATICA_INTER_BEAT_INTERVAL][PROVIDERS][DBDP]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]` | Set to `True` to extract `EMPATICA_INTER_BEAT_INTERVAL` features from the `DBDP` provider|
|`[FEATURES]` | Features to be computed from inter beat interval intraday data, see table below |
Features description for `[EMPATICA_INTER_BEAT_INTERVAL][PROVIDERS][DBDP]`:
|Feature |Units |Description|
|-------------------------- |-------------- |---------------------------|
|maxibi |seconds |The maximum inter beat interval during a time segment.
|minibi |seconds |The minimum inter beat interval during a time segment.
|avgibi |seconds |The average inter beat interval during a time segment.
|medianibi |seconds |The median of inter beat interval during a time segment.
|modeibi |seconds |The mode of inter beat interval during a time segment.
|stdibi |seconds |The standard deviation of inter beat interval during a time segment.
|diffmaxmodeibi |seconds |The difference between the maximum and mode inter beat interval during a time segment.
|diffminmodeibi |seconds |The difference between the mode and minimum inter beat interval during a time segment.
|entropyibi |nats |Shannons entropy measurement based on inter beat interval during a time segment.
!!! note "Assumptions/Observations"
For more information about IBI read [this](https://support.empatica.com/hc/en-us/articles/360030058011-E4-data-IBI-expected-signal).

View File

@ -0,0 +1,11 @@
# Empatica Tags
Sensor parameters description for `[EMPATICA_TAGS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[CONTAINER]`| Name of the CSV file containing tags data that is compressed inside an Empatica zip file. Since these zip files are created [automatically](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) by Empatica, there is no need to change the value of this attribute.
!!! Note
- No feature providers have been implemented for this sensor yet, however you can [implement your own features](../add-new-features).
- To know more about tags read [this](https://support.empatica.com/hc/en-us/articles/204578699-Event-Marking-with-the-E4-wristband).

View File

@ -0,0 +1,46 @@
# Empatica Temperature
Sensor parameters description for `[EMPATICA_TEMPERATURE]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[CONTAINER]`| Name of the CSV file containing temperature data that is compressed inside an Empatica zip file. Since these zip files are created [automatically](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) by Empatica, there is no need to change the value of this attribute.
## DBDP provider
!!! info "Available time segments and platforms"
- Available for all time segments
!!! info "File Sequence"
```bash
- data/raw/{pid}/empatica_temperature_raw.csv
- data/raw/{pid}/empatica_temperature_with_datetime.csv
- data/interim/{pid}/empatica_temperature_features/empatica_temperature_{language}_{provider_key}.csv
- data/processed/features/{pid}/empatica_temperature.csv
```
Parameters description for `[EMPATICA_TEMPERATURE][PROVIDERS][DBDP]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]` | Set to `True` to extract `EMPATICA_TEMPERATURE` features from the `DBDP` provider|
|`[FEATURES]` | Features to be computed from temperature intraday data, see table below |
Features description for `[EMPATICA_TEMPERATURE][PROVIDERS][DBDP]`:
|Feature |Units |Description|
|-------------------------- |-------------- |---------------------------|
|maxtemp |degrees C |The maximum temperature during a time segment.
|mintemp |degrees C |The minimum temperature during a time segment.
|avgtemp |degrees C |The average temperature during a time segment.
|mediantemp |degrees C |The median of temperature during a time segment.
|modetemp |degrees C |The mode of temperature during a time segment.
|stdtemp |degrees C |The standard deviation of temperature during a time segment.
|diffmaxmodetemp |degrees C |The difference between the maximum and mode temperature during a time segment.
|diffminmodetemp |degrees C |The difference between the mode and minimum temperature during a time segment.
|entropytemp |nats |Shannons entropy measurement based on temperature during a time segment.
!!! note "Assumptions/Observations"
None

View File

@ -1,57 +1,45 @@
# Behavioral Features Introduction
Every phone or Fitbit sensor has a corresponding config section in `config.yaml`, these sections follow a similar structure and we'll use `PHONE_ACCELEROMETER` as an example to explain this structure.
A behavioral feature is a metric computed from raw sensor data quantifying the behavior of a participant. For example, the time spent at home computed based on location data. These are also known as digital biomarkers.
RAPIDS' `config.yaml` has a section for each supported device/sensor (e.g., `PHONE_ACCELEROMETER`, `FITBIT_STEPS`, `EMPATICA_HEARTRATE`). These sections follow a similar structure, and they can have one or more feature `PROVIDERS`, that compute one or more behavioral features. You will modify the parameters of these `PROVIDERS` to obtain features from different mobile sensors. We'll use `PHONE_ACCELEROMETER` as an example to explain this further.
!!! hint
- We recommend reading this page if you are using RAPIDS for the first time
- All computed sensor features are stored under `/data/processed/features` on files per sensor, per participant and per study (all participants).
- Every time you change any sensor parameters, provider parameters or provider features, all the necessary files will be updated as soon as you execute RAPIDS.
- In short, to extract features offered by a provider, you need to set its `[COMPUTE]` flag to `TRUE`, configure any of its parameters, and [execute](../../setup/execution) RAPIDS.
!!! example "Config section example for `PHONE_ACCELEROMETER`"
### Explaining the config.yaml sensor sections with an example
```yaml
# 1) Config section
PHONE_ACCELEROMETER:
# 2) Parameters for PHONE_ACCELEROMETER
TABLE: accelerometer
Each sensor section follows the same structure. Click on the numbered markers to know more.
# 3) Providers for PHONE_ACCELEROMETER
PROVIDERS:
# 4) RAPIDS provider
RAPIDS:
# 4.1) Parameters of RAPIDS provider of PHONE_ACCELEROMETER
COMPUTE: False
# 4.2) Features of RAPIDS provider of PHONE_ACCELEROMETER
FEATURES: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
SRC_FOLDER: "rapids" # inside src/features/phone_accelerometer
SRC_LANGUAGE: "python"
# 5) PANDA provider
PANDA:
# 5.1) Parameters of PANDA provider of PHONE_ACCELEROMETER
COMPUTE: False
VALID_SENSED_MINUTES: False
# 5.2) Features of PANDA provider of PHONE_ACCELEROMETER
FEATURES:
exertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
nonexertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
SRC_FOLDER: "panda" # inside src/features/phone_accelerometer
SRC_LANGUAGE: "python"
```
``` { .yaml .annotate }
PHONE_ACCELEROMETER: # (1)
## Sensor Parameters
Each sensor configuration section has a "parameters" subsection (see `#2` in the example). These are parameters that affect different aspects of how the raw data is downloaded, and processed. The `TABLE` parameter exists for every sensor, but some sensors will have extra parameters like [`[PHONE_LOCATIONS]`](../phone-locations/). We explain these parameters in a table at the top of each sensor documentation page.
CONTAINER: accelerometer # (2)
## Sensor Providers
Each sensor configuration section can have zero, one or more behavioral feature **providers** (see `#3` in the example). A provider is a script created by the core RAPIDS team or other researchers that extracts behavioral features for that sensor. In this example, accelerometer has two providers: RAPIDS (see `#4`) and PANDA (see `#5`).
PROVIDERS: # (3)
RAPIDS:
COMPUTE: False # (4)
FEATURES: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
### Provider Parameters
Each provider has parameters that affect the computation of the behavioral features it offers (see `#4.1` or `#5.1` in the example). These parameters will include at least a `[COMPUTE]` flag that you switch to `True` to extract a provider's behavioral features.
SRC_SCRIPT: src/features/phone_accelerometer/rapids/main.py
PANDA:
COMPUTE: False
VALID_SENSED_MINUTES: False
FEATURES: # (5)
exertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
nonexertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
We explain every provider's parameter in a table under the `Parameters description` heading on each provider documentation page.
# (6)
SRC_SCRIPT: src/features/phone_accelerometer/panda/main.py
```
### Provider Features
Each provider offers a set of behavioral features (see `#4.2` or `#5.2` in the example). For some providers these features are grouped in an array (like those for `RAPIDS` provider in `#4.2`) but for others they are grouped in a collection of arrays depending on the meaning and purpose of those features (like those for `PANDAS` provider in `#5.2`). In either case, you can delete the features you are not interested in and they will not be included in the sensor's output feature file.
--8<--- "docs/snippets/feature_introduction_example.md"
We explain each behavioral feature in a table under the `Features description` heading on each provider documentation page.
These are the descriptions of each marker for accessibility:
--8<--- "docs/snippets/feature_introduction_example.md"

View File

@ -0,0 +1,68 @@
# Fitbit Calories Intraday
Sensor parameters description for `[FITBIT_CALORIES_INTRADAY]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[CONTAINER]`| Container where your calories intraday data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. |
## RAPIDS provider
!!! info "Available time segments"
- Available for all time segments
!!! info "File Sequence"
```bash
- data/raw/{pid}/fitbit_calories_intraday_raw.csv
- data/raw/{pid}/fitbit_calories_intraday_with_datetime.csv
- data/interim/{pid}/fitbit_calories_intraday_features/fitbit_calories_intraday_{language}_{provider_key}.csv
- data/processed/features/{pid}/fitbit_calories_intraday.csv
```
Parameters description for `[FITBIT_CALORIES_INTRADAY][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]` | Set to `True` to extract `FITBIT_CALORIES_INTRADAY` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed from calories intraday data, see table below |
|`[EPISODE_TYPE]` | RAPIDS will compute features for any episodes in this list. There are seven types of episodes defined as consecutive appearances of a label. Four are based on the activity level labels provided by Fitbit: `sedentary`, `lightly active`, `fairly active`, and `very active`. One is defined by RAPIDS as moderate to vigorous physical activity `MVPA` episodes that are based on all `fairly active`, and `very active` labels. Two are defined by the user based on a threshold that divides low or high MET (metabolic equivalent) episodes. |
|`EPISODE_TIME_THRESHOLD` | Any consecutive rows of the same `[EPISODE_TYPE]` will be considered a single episode if the time difference between them is less or equal than this threshold in minutes|
|`[EPISODE_MET_THRESHOLD]` | Any 1-minute calorie data chunk with a MET value equal or higher than this threshold will be considered a high MET episode and low MET otherwise. The default value is 3|
|`[EPISODE_MVPA_CATEGORIES]` | The Fitbit level labels that are considered part of a moderate to vigorous physical activity episode. One or more of `sedentary`, `lightly active`, `fairly active`, and `very active`. The default are `fairly active` and `very active`|
|`[EPISODE_REFERENCE_TIME]` | Reference time for the start/end time features. `MIDNIGHT` sets the reference time to 00:00 of each day, `START_OF_THE_SEGMENT` sets the reference time to the start of the time segment (useful when a segment is shorter than a day or spans multiple days)|
Features description for `[FITBIT_CALORIES_INTRADAY][PROVIDERS][RAPIDS]`:
|Feature&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |Units |Description|
|-------------------------- |---------- |---------------------------|
|starttimefirstepisode`EPISODE_TYPE` |minutes |Start time of the first episode of type `[EPISODE_TYPE]`
|endtimefirstepisode`EPISODE_TYPE` |minutes |End time of the first episode of type `[EPISODE_TYPE]`
|starttimelastepisode`EPISODE_TYPE` |minutes |Start time of the last episode of type `[EPISODE_TYPE]`
|endtimelastepisode`EPISODE_TYPE` |minutes |End time of the last episode of type `[EPISODE_TYPE]`
|starttimelongestepisode`EPISODE_TYPE` |minutes |Start time of the longest episode of type `[EPISODE_TYPE]`
|endtimelongestepisode`EPISODE_TYPE` |minutes |End time of the longest episode of type `[EPISODE_TYPE]`
|countepisode`EPISODE_TYPE` |episodes |The number of episodes of type `[EPISODE_TYPE]`
|sumdurationepisode`EPISODE_TYPE` |minutes |The sum of the duration of episodes of type `[EPISODE_TYPE]`
|avgdurationepisode`EPISODE_TYPE` |minutes |The average of the duration of episodes of type `[EPISODE_TYPE]`
|maxdurationepisode`EPISODE_TYPE` |minutes |The maximum of the duration of episodes of type `[EPISODE_TYPE]`
|mindurationepisode`EPISODE_TYPE` |minutes |The minimum of the duration of episodes of type `[EPISODE_TYPE]`
|stddurationepisode`EPISODE_TYPE` |minutes |The standard deviation of the duration of episodes of type `[EPISODE_TYPE]`
|summet`EPISODE_TYPE` |METs |The sum of all METs during episodes of type `[EPISODE_TYPE]`
|avgmet`EPISODE_TYPE` |METs |The average of all METs during episodes of type `[EPISODE_TYPE]`
|maxmet`EPISODE_TYPE` |METs |The maximum of all METs during episodes of type `[EPISODE_TYPE]`
|minmet`EPISODE_TYPE` |METs |The minimum of all METs during episodes of type `[EPISODE_TYPE]`
|stdmet`EPISODE_TYPE` |METs |The standard deviation of all METs during episodes of type `[EPISODE_TYPE]`
|sumcalories`EPISODE_TYPE` |calories |The sum of all calories during episodes of type `[EPISODE_TYPE]`
|avgcalories`EPISODE_TYPE` |calories |The average of all calories during episodes of type `[EPISODE_TYPE]`
|maxcalories`EPISODE_TYPE` |calories |The maximum of all calories during episodes of type `[EPISODE_TYPE]`
|mincalories`EPISODE_TYPE` |calories |The minimum of all calories during episodes of type `[EPISODE_TYPE]`
|stdcalories`EPISODE_TYPE` |calories |The standard deviation of all calories during episodes of type `[EPISODE_TYPE]`
!!! note "Assumptions/Observations"
- These features are based on intraday calories data that is usually obtained in 1-minute chunks from Fitbit's API.
- The MET value returned by Fitbit is divided by 10
- Take into account that the [intraday data returned by Fitbit](https://dev.fitbit.com/build/reference/web-api/activity/#get-activity-intraday-time-series) can contain time series for calories burned inclusive of BMR, tracked activity, and manually logged activities.

View File

@ -0,0 +1,62 @@
# Fitbit Data Yield
We use Fitbit **heart rate intraday** data to extract data yield features. Fitbit data yield features can be used to remove rows ([time segments](../../setup/configuration/#time-segments)) that do not contain enough Fitbit data. You should decide what is your "enough" threshold depending on the time a participant was supposed to be wearing their Fitbit, the length of your study, and the rates of missing data that your analysis could handle.
!!! hint "Why is Fitbit data yield important?"
Imagine that you want to extract `FITBIT_STEPS_SUMMARY` features on daily segments (`00:00` to `23:59`). Let's say that on day 1 the Fitbit logged 6k as the total step count and the heart rate sensor logged 24 hours of data and on day 2 the Fitbit logged 101 as the total step count and the heart rate sensor logged 2 hours of data. Its very likely that on day 2 you walked during the other 22 hours so including this day in your analysis could bias your results.
Sensor parameters description for `[FITBIT_DATA_YIELD]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[SENSORS]`| The Fitbit sensor we considered for calculating the Fitbit data yield features. We only support `FITBIT_HEARTRATE_INTRADAY` since sleep data is commonly collected only overnight, and step counts are 0 even when not wearing the Fitbit device.
## RAPIDS provider
Before explaining the data yield features, let's define the following relevant concepts:
- A valid minute is any 60 second window when Fitbit heart rate intraday sensor logged at least 1 row of data
- A valid hour is any 60 minute window with at least X valid minutes. The X or threshold is given by `[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]`
!!! info "Available time segments and platforms"
- Available for all time segments
!!! info "File Sequence"
```bash
- data/raw/{pid}/fitbit_heartrate_intraday_raw.csv
- data/raw/{pid}/fitbit_heartrate_intraday_with_datetime.csv
- data/interim/{pid}/fitbit_data_yield_features/fitbit_data_yield_{language}_{provider_key}.csv
- data/processed/features/{pid}/fitbit_data_yield.csv
```
Parameters description for `[FITBIT_DATA_YIELD][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `FITBIT_DATA_YIELD` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed, see table below
|`[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]` | The proportion `[0.0 ,1.0]` of valid minutes in a 60-minute window necessary to flag that window as valid.
Features description for `[FITBIT_DATA_YIELD][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|ratiovalidyieldedminutes |- | The ratio between the number of valid minutes and the duration in minutes of a time segment.
|ratiovalidyieldedhours |- | The ratio between the number of valid hours and the duration in hours of a time segment. If the time segment is shorter than 1 hour this feature will always be 1.
!!! note "Assumptions/Observations"
1. We recommend using `ratiovalidyieldedminutes` on time segments that are shorter than two or three hours and `ratiovalidyieldedhours` for longer segments. This is because relying on yielded minutes only can be misleading when a big chunk of those missing minutes are clustered together.
For example, let's assume we are working with a 24-hour time segment that is missing 12 hours of data. Two extreme cases can occur:
<ol type="A">
<li>the 12 missing hours are from the beginning of the segment or </li>
<li>30 minutes could be missing from every hour (24 * 30 minutes = 12 hours).</li>
</ol>
`ratiovalidyieldedminutes` would be 0.5 for both `a` and `b` (hinting the missing circumstances are similar). However, `ratiovalidyieldedhours` would be 0.5 for `a` and 1.0 for `b` if `[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]` is between [0.0 and 0.49] (hinting that the missing circumstances might be more favorable for `b`. In other words, sensed data for `b` is more evenly spread compared to `a`.
2. We assume your Fitbit intraday data was sampled (requested form the Fitbit API) at 1 minute intervals, if the interval is longer, for example 15 minutes, you need to take into account that valid minutes and valid hours ratios are going to be small (for example you would have at most 4 “minutes” of data per hour because you would have four 15-minute windows) and so you should adjust your thresholds to include and exclude rows accordingly. If you are in this situation, get in touch with us, we could implement this use case but we are not sure there is enough demand for it at the moment since you can control the sampling rate of the data you request from Fitbit API.

View File

@ -4,29 +4,7 @@ Sensor parameters description for `[FITBIT_HEARTRATE_INTRADAY]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table name or file path where the heart rate intraday data is stored. The configuration keys in [Device Data Source Configuration](../../setup/configuration/#device-data-source-configuration) control whether this parameter is interpreted as table or file.
The format of the column(s) containing the Fitbit sensor data can be `JSON` or `PLAIN_TEXT`. The data in `JSON` format is obtained directly from the Fitbit API. We support `PLAIN_TEXT` in case you already parsed your data and don't have access to your participants' Fitbit accounts anymore. If your data is in `JSON` format then summary and intraday data come packed together.
We provide examples of the input format that RAPIDS expects, note that both examples for `JSON` and `PLAIN_TEXT` are tabular and the actual format difference comes in the `fitbit_data` column (we truncate the `JSON` example for brevity).
??? example "Example of the structure of source data"
=== "JSON"
|device_id |fitbit_data |
|---------------------------------------- |--------------------------------------------------------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"activities-heart":[{"dateTime":"2020-10-07","value":{"customHeartRateZones":[],"heartRateZones":[{"caloriesOut":1200.6102,"max":88,"min":31,"minutes":1058,"name":"Out of Range"},{"caloriesOut":760.3020,"max":120,"min":86,"minutes":366,"name":"Fat Burn"},{"caloriesOut":15.2048,"max":146,"min":120,"minutes":2,"name":"Cardio"},{"caloriesOut":0,"max":221,"min":148,"minutes":0,"name":"Peak"}],"restingHeartRate":72}}],"activities-heart-intraday":{"dataset":[{"time":"00:00:00","value":68},{"time":"00:01:00","value":67},{"time":"00:02:00","value":67},...],"datasetInterval":1,"datasetType":"minute"}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"activities-heart":[{"dateTime":"2020-10-08","value":{"customHeartRateZones":[],"heartRateZones":[{"caloriesOut":1100.1120,"max":89,"min":30,"minutes":921,"name":"Out of Range"},{"caloriesOut":660.0012,"max":118,"min":82,"minutes":361,"name":"Fat Burn"},{"caloriesOut":23.7088,"max":142,"min":108,"minutes":3,"name":"Cardio"},{"caloriesOut":0,"max":221,"min":148,"minutes":0,"name":"Peak"}],"restingHeartRate":70}}],"activities-heart-intraday":{"dataset":[{"time":"00:00:00","value":77},{"time":"00:01:00","value":75},{"time":"00:02:00","value":73},...],"datasetInterval":1,"datasetType":"minute"}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"activities-heart":[{"dateTime":"2020-10-09","value":{"customHeartRateZones":[],"heartRateZones":[{"caloriesOut":750.3615,"max":77,"min":30,"minutes":851,"name":"Out of Range"},{"caloriesOut":734.1516,"max":107,"min":77,"minutes":550,"name":"Fat Burn"},{"caloriesOut":131.8579,"max":130,"min":107,"minutes":29,"name":"Cardio"},{"caloriesOut":0,"max":220,"min":130,"minutes":0,"name":"Peak"}],"restingHeartRate":69}}],"activities-heart-intraday":{"dataset":[{"time":"00:00:00","value":90},{"time":"00:01:00","value":89},{"time":"00:02:00","value":88},...],"datasetInterval":1,"datasetType":"minute"}}
=== "PLAIN_TEXT"
|device_id |local_date_time |heartrate |heartrate_zone |
|-------------------------------------- |---------------------- |--------- |--------------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 00:00:00 |68 |outofrange |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 00:01:00 |67 |outofrange |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 00:02:00 |67 |outofrange |
|`[CONTAINER]`| Container where your heart rate intraday data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. |
## RAPIDS provider
@ -37,8 +15,7 @@ We provide examples of the input format that RAPIDS expects, note that both exam
!!! info "File Sequence"
```bash
- data/raw/{pid}/fitbit_heartrate_intraday_raw.csv
- data/raw/{pid}/fitbit_heartrate_intraday_parsed.csv
- data/raw/{pid}/fitbit_heartrate_intraday_parsed_with_datetime.csv
- data/raw/{pid}/fitbit_heartrate_intraday_with_datetime.csv
- data/interim/{pid}/fitbit_heartrate_intraday_features/fitbit_heartrate_intraday_{language}_{provider_key}.csv
- data/processed/features/{pid}/fitbit_heartrate_intraday.csv
```

View File

@ -4,29 +4,7 @@ Sensor parameters description for `[FITBIT_HEARTRATE_SUMMARY]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table name or file path where the heart rate summary data is stored. The configuration keys in [Device Data Source Configuration](../../setup/configuration/#device-data-source-configuration) control whether this parameter is interpreted as table or file.
The format of the column(s) containing the Fitbit sensor data can be `JSON` or `PLAIN_TEXT`. The data in `JSON` format is obtained directly from the Fitbit API. We support `PLAIN_TEXT` in case you already parsed your data and don't have access to your participants' Fitbit accounts anymore. If your data is in `JSON` format then summary and intraday data come packed together.
We provide examples of the input format that RAPIDS expects, note that both examples for `JSON` and `PLAIN_TEXT` are tabular and the actual format difference comes in the `fitbit_data` column (we truncate the `JSON` example for brevity).
??? example "Example of the structure of source data"
=== "JSON"
|device_id |fitbit_data |
|---------------------------------------- |--------------------------------------------------------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"activities-heart":[{"dateTime":"2020-10-07","value":{"customHeartRateZones":[],"heartRateZones":[{"caloriesOut":1200.6102,"max":88,"min":31,"minutes":1058,"name":"Out of Range"},{"caloriesOut":760.3020,"max":120,"min":86,"minutes":366,"name":"Fat Burn"},{"caloriesOut":15.2048,"max":146,"min":120,"minutes":2,"name":"Cardio"},{"caloriesOut":0,"max":221,"min":148,"minutes":0,"name":"Peak"}],"restingHeartRate":72}}],"activities-heart-intraday":{"dataset":[{"time":"00:00:00","value":68},{"time":"00:01:00","value":67},{"time":"00:02:00","value":67},...],"datasetInterval":1,"datasetType":"minute"}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"activities-heart":[{"dateTime":"2020-10-08","value":{"customHeartRateZones":[],"heartRateZones":[{"caloriesOut":1100.1120,"max":89,"min":30,"minutes":921,"name":"Out of Range"},{"caloriesOut":660.0012,"max":118,"min":82,"minutes":361,"name":"Fat Burn"},{"caloriesOut":23.7088,"max":142,"min":108,"minutes":3,"name":"Cardio"},{"caloriesOut":0,"max":221,"min":148,"minutes":0,"name":"Peak"}],"restingHeartRate":70}}],"activities-heart-intraday":{"dataset":[{"time":"00:00:00","value":77},{"time":"00:01:00","value":75},{"time":"00:02:00","value":73},...],"datasetInterval":1,"datasetType":"minute"}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"activities-heart":[{"dateTime":"2020-10-09","value":{"customHeartRateZones":[],"heartRateZones":[{"caloriesOut":750.3615,"max":77,"min":30,"minutes":851,"name":"Out of Range"},{"caloriesOut":734.1516,"max":107,"min":77,"minutes":550,"name":"Fat Burn"},{"caloriesOut":131.8579,"max":130,"min":107,"minutes":29,"name":"Cardio"},{"caloriesOut":0,"max":220,"min":130,"minutes":0,"name":"Peak"}],"restingHeartRate":69}}],"activities-heart-intraday":{"dataset":[{"time":"00:00:00","value":90},{"time":"00:01:00","value":89},{"time":"00:02:00","value":88},...],"datasetInterval":1,"datasetType":"minute"}}
=== "PLAIN_TEXT"
|device_id |local_date_time |heartrate_daily_restinghr |heartrate_daily_caloriesoutofrange |heartrate_daily_caloriesfatburn |heartrate_daily_caloriescardio |heartrate_daily_caloriespeak |
|-------------------------------------- |----------------- |------- |-------------- |------------- |------------ |-------|
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 |72 |1200.6102 |760.3020 |15.2048 |0 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-08 |70 |1100.1120 |660.0012 |23.7088 |0 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-09 |69 |750.3615 |734.1516 |131.8579 |0 |
|`[CONTAINER]`| Container where your heart rate summary data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. |
## RAPIDS provider
@ -37,8 +15,7 @@ We provide examples of the input format that RAPIDS expects, note that both exam
!!! info "File Sequence"
```bash
- data/raw/{pid}/fitbit_heartrate_summary_raw.csv
- data/raw/{pid}/fitbit_heartrate_summary_parsed.csv
- data/raw/{pid}/fitbit_heartrate_summary_parsed_with_datetime.csv
- data/raw/{pid}/fitbit_heartrate_summary_with_datetime.csv
- data/interim/{pid}/fitbit_heartrate_summary_features/fitbit_heartrate_summary_{language}_{provider_key}.csv
- data/processed/features/{pid}/fitbit_heartrate_summary.csv
```

View File

@ -0,0 +1,156 @@
# Fitbit Sleep Intraday
Sensor parameters description for `[FITBIT_SLEEP_INTRADAY]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[CONTAINER]`| Container where your sleep intraday data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. |
## RAPIDS provider
!!! hint "Understanding RAPIDS features"
[This diagram](../../img/sleep_intraday_rapids.png) will help you understand how sleep episodes are chunked and grouped within time segments for the RAPIDS provider.
!!! info "Available time segments"
- Available for all time segments
!!! info "File Sequence"
```bash
- data/raw/{pid}/fitbit_sleep_intraday_raw.csv
- data/raw/{pid}/fitbit_sleep_intraday_with_datetime.csv
- data/interim/{pid}/fitbit_sleep_intraday_episodes.csv
- data/interim/{pid}/fitbit_sleep_intraday_episodes_resampled.csv
- data/interim/{pid}/fitbit_sleep_intraday_episodes_resampled_with_datetime.csv
- data/interim/{pid}/fitbit_sleep_intraday_features/fitbit_sleep_intraday_{language}_{provider_key}.csv
- data/processed/features/{pid}/fitbit_sleep_intraday.csv
```
Parameters description for `[FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]` | Set to `True` to extract `FITBIT_SLEEP_INTRADAY` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed from sleep intraday data, see table below |
|`[SLEEP_LEVELS]` | Fitbits sleep API Version 1 only provides `CLASSIC` records. However, Version 1.2 provides 2 types of records: `CLASSIC` and `STAGES`. `STAGES` is only available in devices with a heart rate sensor and even those devices will fail to report it if the battery is low or the device is not tight enough. While `CLASSIC` contains 3 sleep levels (`awake`, `restless`, and `asleep`), `STAGES` contains 4 sleep levels (`wake`, `deep`, `light`, `rem`). To make it consistent, RAPIDS groups them into 2 `UNIFIED` sleep levels: `awake` (`CLASSIC`: `awake` and `restless`; `STAGES`: `wake`) and `asleep` (`CLASSIC`: `asleep`; `STAGES`: `deep`, `light`, and `rem`). In this section, there is a boolean flag named `INCLUDE_ALL_GROUPS` that if set to TRUE, computes LEVELS_AND_TYPES features grouping all levels together in a single `all` category.
|`[SLEEP_TYPES]` | Types of sleep to be included in the feature extraction computation. There are three sleep types: `main`, `nap`, and `all`. The `all` type means both main sleep and naps are considered.
Features description for `[FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS][LEVELS_AND_TYPES]`:
|Feature&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |Units |Description |
|------------------------------- |-------------- |-------------------------------------------------------------|
|countepisode`[LEVEL][TYPE]` |episodes |Number of `[LEVEL][TYPE]`sleep episodes. `[LEVEL]`is one of `[SLEEP_LEVELS]` (e.g. awake-classic or rem-stages) and `[TYPE]` is one of `[SLEEP_TYPES]` (e.g. main). `[LEVEL]` can also be `all` when `INCLUDE_ALL_GROUPS` is True, which ignores the levels and groups by sleep types.
|sumduration`[LEVEL][TYPE]` |minutes |Total duration of all `[LEVEL][TYPE]`sleep episodes. `[LEVEL]`is one of `[SLEEP_LEVELS]` (e.g. awake-classic or rem-stages) and `[TYPE]` is one of `[SLEEP_TYPES]` (e.g. main). `[LEVEL]` can also be `all` when `INCLUDE_ALL_GROUPS` is True, which ignores the levels and groups by sleep types.
|maxduration`[LEVEL][TYPE]` |minutes | Longest duration of any `[LEVEL][TYPE]`sleep episode. `[LEVEL]`is one of `[SLEEP_LEVELS]` (e.g. awake-classic or rem-stages) and `[TYPE]` is one of `[SLEEP_TYPES]` (e.g. main). `[LEVEL]` can also be `all` when `INCLUDE_ALL_GROUPS` is True, which ignores the levels and groups by sleep types.
|minduration`[LEVEL][TYPE]` |minutes | Shortest duration of any `[LEVEL][TYPE]`sleep episode. `[LEVEL]`is one of `[SLEEP_LEVELS]` (e.g. awake-classic or rem-stages) and `[TYPE]` is one of `[SLEEP_TYPES]` (e.g. main). `[LEVEL]` can also be `all` when `INCLUDE_ALL_GROUPS` is True, which ignores the levels and groups by sleep types.
|avgduration`[LEVEL][TYPE]` |minutes | Average duration of all `[LEVEL][TYPE]`sleep episodes. `[LEVEL]`is one of `[SLEEP_LEVELS]` (e.g. awake-classic or rem-stages) and `[TYPE]` is one of `[SLEEP_TYPES]` (e.g. main). `[LEVEL]` can also be `all` when `INCLUDE_ALL_GROUPS` is True, which ignores the levels and groups by sleep types.
|medianduration`[LEVEL][TYPE]` |minutes | Median duration of all `[LEVEL][TYPE]`sleep episodes. `[LEVEL]`is one of `[SLEEP_LEVELS]` (e.g. awake-classic or rem-stages) and `[TYPE]` is one of `[SLEEP_TYPES]` (e.g. main). `[LEVEL]` can also be `all` when `INCLUDE_ALL_GROUPS` is True, which ignores the levels and groups by sleep types.
|stdduration`[LEVEL][TYPE]` |minutes | Standard deviation duration of all `[LEVEL][TYPE]`sleep episodes. `[LEVEL]`is one of `[SLEEP_LEVELS]` (e.g. awake-classic or rem-stages) and `[TYPE]` is one of `[SLEEP_TYPES]` (e.g. main). `[LEVEL]` can also be `all` when `INCLUDE_ALL_GROUPS` is True, which ignores the levels and groups by sleep types.
Features description for `[FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS]` RATIOS `[ACROSS_LEVELS]`:
|Feature&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |Units |Description |
|-------------------------- |-------------- |-------------------------------------------------------------|
|ratiocount`[LEVEL]` |-|Ratio between the **count** of episodes of a single sleep `[LEVEL]` and the **count** of all episodes of all levels during both `main` and `nap` sleep types. This answers the question: what percentage of all `wake`, `deep`, `light`, and `rem` episodes were `rem`? (e.g., $countepisode[remstages][all] / countepisode[all][all]$)
|ratioduration`[LEVEL]` |-|Ratio between the **duration** of episodes of a single sleep `[LEVEL]` and the **duration** of all episodes of all levels during both `main` and `nap` sleep types. This answers the question: what percentage of all `wake`, `deep`, `light`, and `rem` time was `rem`? (e.g., $sumduration[remstages][all] / sumduration[all][all]$)
Features description for `[FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS]` RATIOS `[ACROSS_TYPES]`:
|Feature&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |Units |Description |
|-------------------------- |-------------- |-------------------------------------------------------------|
|ratiocountmain |- |Ratio between the **count** of all `main` episodes (independently of the levels inside) divided by the **count** of all `main` and `nap` episodes. This answers the question: what percentage of all sleep episodes (`main` and `nap`) were `main`? We do not provide the ratio for `nap` because is complementary. ($countepisode[all][main] / countepisode[all][all]$)
|ratiodurationmain |- |Ratio between the **duration** of all `main` episodes (independently of the levels inside) divided by the **duration** of all `main` and `nap` episodes. This answers the question: what percentage of all sleep time (`main` and `nap`) was `main`? We do not provide the ratio for `nap` because is complementary. ($sumduration[all][main] / sumduration[all][all]$)
Features description for `[FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS]` RATIOS `[WITHIN_LEVELS]`:
|Feature&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |Units |Description |
|--------------------------------- |-------------- |-------------------------------------------------------------|
|ratiocountmainwithin`[LEVEL]` |- |Ratio between the **count** of episodes of a single sleep `[LEVEL]` during `main` sleep divided by the **count** of episodes of a single sleep `[LEVEL]` during `main` **and** `nap`. This answers the question: are `rem` episodes more frequent during `main` than `nap` sleep? We do not provide the ratio for `nap` because is complementary. ($countepisode[remstages][main] / countepisode[remstages][all]$)
|ratiodurationmainwithin`[LEVEL]` |- |Ratio between the **duration** of episodes of a single sleep `[LEVEL]` during `main` sleep divided by the **duration** of episodes of a single sleep `[LEVEL]` during `main` **and** `nap`. This answers the question: is `rem` time more frequent during `main` than `nap` sleep? We do not provide the ratio for `nap` because is complementary. ($countepisode[remstages][main] / countepisode[remstages][all]$)
Features description for `[FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS]` RATIOS `[WITHIN_TYPES]`:
|Feature&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|Units|Description|
| - |- | - |
|ratiocount`[LEVEL]`within`[TYPE]` |-|Ratio between the **count** of episodes of a single sleep `[LEVEL]` and the **count** of all episodes of all levels during either `main` or `nap` sleep types. This answers the question: what percentage of all `wake`, `deep`, `light`, and `rem` episodes were `rem` during `main`/`nap` sleep time? (e.g., $countepisode[remstages][main] / countepisode[all][main]$)
|ratioduration`[LEVEL]`within`[TYPE]` |-|Ratio between the **duration** of episodes of a single sleep `[LEVEL]` and the **duration** of all episodes of all levels during either `main` or `nap` sleep types. This answers the question: what percentage of all `wake`, `deep`, `light`, and `rem` time was `rem` during `main`/`nap` sleep time? (e.g., $sumduration[remstages][main] / sumduration[all][main]$)
!!! note "Assumptions/Observations"
1. [This diagram](../../img/sleep_intraday_rapids.png) will help you understand how sleep episodes are chunked and grouped within time segments for the RAPIDS provider.
1. Features listed in `[LEVELS_AND_TYPES]` are computed for any levels and types listed in `[SLEEP_LEVELS]` or `[SLEEP_TYPES]`. For example if `STAGES` only contains `[rem, light]` you will not get `countepisode[wake|deep][TYPE]` or sum, max, min, avg, median, or std `duration`. Levels or types in these lists do not influence `RATIOS` or `ROUTINE` features.
2. Any `[LEVEL]` grouping is done within the elements of each class `CLASSIC`, `STAGES`, and `UNIFIED`. That is, we never combine `CLASSIC` or `STAGES` types to compute features.
3. The categories for `all` levels (when `INCLUDE_ALL_GROUPS` is `True`) and `all` `SLEEP_TYPES` are not considered for `RATIOS` features as they are always 1.
3. These features can be computed in time segments of any length, but only the 1-minute sleep chunks within each segment instance will be used.
## PRICE provider
!!! hint "Understanding PRICE features"
[This diagram](../../img/sleep_intraday_price.png) will help you understand how sleep episodes are chunked and grouped within time segments and `LNE-LNE` intervals for the PRICE provider.
!!! info "Available time segments"
- Available for any time segments larger or equal to one day
!!! info "File Sequence"
```bash
- data/raw/{pid}/fitbit_sleep_intraday_raw.csv
- data/raw/{pid}/fitbit_sleep_intraday_parsed.csv
- data/interim/{pid}/fitbit_sleep_intraday_episodes_resampled.csv
- data/interim/{pid}/fitbit_sleep_intraday_episodes_resampled_with_datetime.csv
- data/interim/{pid}/fitbit_sleep_intraday_features/fitbit_sleep_intraday_{language}_{provider_key}.csv
- data/processed/features/{pid}/fitbit_sleep_intraday.csv
```
Parameters description for `[FITBIT_SLEEP_INTRADAY][PROVIDERS][PRICE]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]` | Set to `True` to extract `FITBIT_SLEEP_INTRADAY` features from the `PRICE` provider |
|`[FEATURES]` | Features to be computed from sleep intraday data, see table below
|`[SLEEP_LEVELS]` | Fitbits sleep API Version 1 only provides `CLASSIC` records. However, Version 1.2 provides 2 types of records: `CLASSIC` and `STAGES`. `STAGES` is only available in devices with a heart rate sensor and even those devices will fail to report it if the battery is low or the device is not tight enough. While `CLASSIC` contains 3 sleep levels (`awake`, `restless`, and `asleep`), `STAGES` contains 4 sleep levels (`wake`, `deep`, `light`, `rem`). To make it consistent, RAPIDS groups them into 2 `UNIFIED` sleep levels: `awake` (`CLASSIC`: `awake` and `restless`; `STAGES`: `wake`) and `asleep` (`CLASSIC`: `asleep`; `STAGES`: `deep`, `light`, and `rem`). In this section, there is a boolean flag named `INCLUDE_ALL_GROUPS` that if set to TRUE, computes avgdurationallmain`[DAY_TYPE]` features grouping all levels together in a single `all` category.
|`[DAY_TYPE]` | The features of this provider can be computed using daily averages/standard deviations that were extracted on `WEEKEND` days only, `WEEK` days only, or `ALL` days|
|`[LAST_NIGHT_END]` | Only `main` sleep episodes that start within the `LNE-LNE` interval [`LAST_NIGHT_END`, `LAST_NIGHT_END` + 23H 59M 59S] are taken into account to compute the features described below. `[LAST_NIGHT_END]` is a number ranging from 0 (midnight) to 1439 (23:59). |
Features description for `[FITBIT_SLEEP_INTRADAY][PROVIDERS][PRICE]`:
|Feature&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |Units |Description |
|------------------------------------- |----------------- |-------------------------------------------------------------|
|avgduration`[LEVEL]`main`[DAY_TYPE]` |minutes | Average duration of daily sleep chunks of a `LEVEL`. Use the `DAY_TYPE` flag to include daily durations from weekend days only, weekdays, or both. Use `[LEVEL]` to group all levels in a single `all` category.
|avgratioduration`[LEVEL]`withinmain`[DAY_TYPE]` |- | Average of the daily ratio between the duration of sleep chunks of a `LEVEL` and total duration of all `main` sleep episodes in a day. When `INCLUDE_ALL_GROUPS` is `True` the `all` `LEVEL` is ignored since this feature is always 1. Use the `DAY_TYPE` flag to include start times from weekend days only, weekdays, or both.
|avgstarttimeofepisodemain`[DAY_TYPE]` |minutes | Average of all start times of the first `main` sleep episode within each `LNE-LNE` interval in a time segment. Use the `DAY_TYPE` flag to include start times from `LNE-LNE` intervals that start on weekend days only, weekdays, or both.
|avgendtimeofepisodemain`[DAY_TYPE]` |minutes | Average of all end times of the last `main` sleep episode within each `LNE-LNE` interval in a time segment. Use the `DAY_TYPE` flag to include end times from `LNE-LNE` intervals that start on weekend days only, weekdays, or both.
|avgmidpointofepisodemain`[DAY_TYPE]` |minutes | Average of all the differences between `avgendtime...` and `avgstarttime..` in a time segment. Use the `DAY_TYPE` flag to include end times from `LNE-LNE` intervals that start on weekend days only, weekdays, or both.
|stdstarttimeofepisodemain`[DAY_TYPE]` |minutes | Standard deviation of all start times of the first `main` sleep episode within each `LNE-LNE` interval in a time segment. Use the `DAY_TYPE` flag to include start times from `LNE-LNE` intervals that start on weekend days only, weekdays, or both.
|stdendtimeofepisodemain`[DAY_TYPE]` |minutes | Standard deviation of all end times of the last `main` sleep episode within each `LNE-LNE` interval in a time segment. Use the `DAY_TYPE` flag to include end times from `LNE-LNE` intervals that start on weekend days only, weekdays, or both.
|stdmidpointofepisodemain`[DAY_TYPE]` |minutes | Standard deviation of all the differences between `avgendtime...` and `avgstarttime..` in a time segment. Use the `DAY_TYPE` flag to include end times from `LNE-LNE` intervals that start on weekend days only, weekdays, or both.
|socialjetlag |minutes | Difference in minutes between the avgmidpointofepisodemain of weekends and weekdays that belong to each time segment instance. If your time segment does not contain at least one week day and one weekend day this feature will be NA.
|rmssdmeanstarttimeofepisodemain |minutes | Square root of the **mean** squared successive difference (RMSSD) between today's and yesterday's `starttimeofepisodemain` values across the entire participant's sleep data grouped per time segment instance. It represents the mean of how someone's `starttimeofepisodemain` (bedtime) changed from night to night.
|rmssdmeanendtimeofepisodemain |minutes | Square root of the **mean** squared successive difference (RMSSD) between today's and yesterday's `endtimeofepisodemain` values across the entire participant's sleep data grouped per time segment instance. It represents the mean of how someone's `endtimeofepisodemain` (wake time) changed from night to night.
|rmssdmeanmidpointofepisodemain |minutes | Square root of the **mean** squared successive difference (RMSSD) between today's and yesterday's `midpointofepisodemain` values across the entire participant's sleep data grouped per time segment instance. It represents the mean of how someone's `midpointofepisodemain` (mid time between bedtime and wake time) changed from night to night.
|rmssdmedianstarttimeofepisodemain |minutes | Square root of the **median** squared successive difference (RMSSD) between today's and yesterday's `starttimeofepisodemain` values across the entire participant's sleep data grouped per time segment instance. It represents the median of how someone's `starttimeofepisodemain` (bedtime) changed from night to night.
|rmssdmedianendtimeofepisodemain |minutes | Square root of the **median** squared successive difference (RMSSD) between today's and yesterday's `endtimeofepisodemain` values across the entire participant's sleep data grouped per time segment instance. It represents the median of how someone's `endtimeofepisodemain` (wake time) changed from night to night.
|rmssdmedianmidpointofepisodemain |minutes | Square root of the **median** squared successive difference (RMSSD) between today's and yesterday's `midpointofepisodemain` values across the entire participant's sleep data grouped per time segment instance. It represents the median of how someone's `midpointofepisodemain` (average mid time between bedtime and wake time) changed from night to night.
!!! note "Assumptions/Observations"
1. [This diagram](../../img/sleep_intraday_price.png) will help you understand how sleep episodes are chunked and grouped within time segments and `LNE-LNE` intervals for the PRICE provider.
1. We recommend you use periodic segments that start in the morning so RAPIDS can chunk and group sleep episodes overnight. Shifted segments (as any other segments) are labelled based on their start and end date times.
5. `avgstarttime...` and `avgendtime...` are roughly equivalent to an average bed and awake time only if you are using shifted segments.
1. The features of this provider are only available on time segments that are longer than 24 hours because they are based on descriptive statistics computed across daily values.
2. Even though Fitbit provides 2 types of sleep episodes (`main` and `nap`), only `main` sleep episodes are considered.
4. The reference point for all times is 00:00 of the first day in the LNE-LNE interval.
5. Sleep episodes are formed by 1-minute chunks that we group overnight starting from todays LNE and ending on tomorrows LNE or the end of that segment (whatever is first).
5. The features `avgstarttime...` and `avgendtime...` are the average of the first and last sleep episode across every LNE-LNE interval within a segment (`avgmidtime...` is the mid point between start and end). Therefore, only segments longer than 24hrs will be averaged across more than one LNE-LNE interval.
5. `socialjetlag` is only available on segment instances equal or longer than 48hrs that contain at least one weekday day and one weekend day, for example seven-day (weekly) segments.

View File

@ -4,61 +4,21 @@ Sensor parameters description for `[FITBIT_SLEEP_SUMMARY]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table name or file path where the sleep summary data is stored. The configuration keys in [Device Data Source Configuration](../../setup/configuration/#device-data-source-configuration) control whether this parameter is interpreted as table or file.
The format of the column(s) containing the Fitbit sensor data can be `JSON` or `PLAIN_TEXT`. The data in `JSON` format is obtained directly from the Fitbit API. We support `PLAIN_TEXT` in case you already parsed your data and don't have access to your participants' Fitbit accounts anymore. If your data is in `JSON` format then summary and intraday data come packed together.
We provide examples of the input format that RAPIDS expects, note that both examples for `JSON` and `PLAIN_TEXT` are tabular and the actual format difference comes in the `fitbit_data` column (we truncate the `JSON` example for brevity).
??? example "Example of the structure of source data with Fitbits sleep API Version 1"
=== "JSON"
|device_id |fitbit_data |
|---------------------------------------- |--------------------------------------------------------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"sleep": [{"awakeCount": 2, "awakeDuration": 3, "awakeningsCount": 10, "dateOfSleep": "2020-10-07", "duration": 8100000, "efficiency": 91, "endTime": "2020-10-07T18:10:00.000", "isMainSleep": true, "logId": 14147921940, "minuteData": [{"dateTime": "15:55:00", "value": "3"}, {"dateTime": "15:56:00", "value": "3"}, {"dateTime": "15:57:00", "value": "2"},...], "minutesAfterWakeup": 0, "minutesAsleep": 123, "minutesAwake": 12, "minutesToFallAsleep": 0, "restlessCount": 8, "restlessDuration": 9, "startTime": "2020-10-07T15:55:00.000", "timeInBed": 135}, {"awakeCount": 0, "awakeDuration": 0, "awakeningsCount": 1, "dateOfSleep": "2020-10-07", "duration": 3780000, "efficiency": 100, "endTime": "2020-10-07T10:52:30.000", "isMainSleep": false, "logId": 14144903977, "minuteData": [{"dateTime": "09:49:00", "value": "1"}, {"dateTime": "09:50:00", "value": "1"}, {"dateTime": "09:51:00", "value": "1"},...], "minutesAfterWakeup": 1, "minutesAsleep": 62, "minutesAwake": 0, "minutesToFallAsleep": 0, "restlessCount": 1, "restlessDuration": 1, "startTime": "2020-10-07T09:49:00.000", "timeInBed": 63}], "summary": {"totalMinutesAsleep": 185, "totalSleepRecords": 2, "totalTimeInBed": 198}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"sleep": [{"awakeCount": 3, "awakeDuration": 21, "awakeningsCount": 16, "dateOfSleep": "2020-10-08", "duration": 19260000, "efficiency": 89, "endTime": "2020-10-08T06:01:30.000", "isMainSleep": true, "logId": 14150613895, "minuteData": [{"dateTime": "00:40:00", "value": "3"}, {"dateTime": "00:41:00", "value": "3"}, {"dateTime": "00:42:00", "value": "3"},...], "minutesAfterWakeup": 0, "minutesAsleep": 275, "minutesAwake": 33, "minutesToFallAsleep": 0, "restlessCount": 13, "restlessDuration": 25, "startTime": "2020-10-08T00:40:00.000", "timeInBed": 321}], "summary": {"totalMinutesAsleep": 275, "totalSleepRecords": 1, "totalTimeInBed": 321}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"sleep": [{"awakeCount": 1, "awakeDuration": 3, "awakeningsCount": 8, "dateOfSleep": "2020-10-09", "duration": 19320000, "efficiency": 96, "endTime": "2020-10-09T05:57:30.000", "isMainSleep": true, "logId": 14161136803, "minuteData": [{"dateTime": "00:35:30", "value": "2"}, {"dateTime": "00:36:30", "value": "1"}, {"dateTime": "00:37:30", "value": "1"},...], "minutesAfterWakeup": 0, "minutesAsleep": 309, "minutesAwake": 13, "minutesToFallAsleep": 0, "restlessCount": 7, "restlessDuration": 10, "startTime": "2020-10-09T00:35:30.000", "timeInBed": 322}], "summary": {"totalMinutesAsleep": 309, "totalSleepRecords": 1, "totalTimeInBed": 322}}
=== "PLAIN_TEXT"
|device_id |local_start_date_time |local_end_date_time |efficiency |minutes_after_wakeup |minutes_asleep |minutes_awake |minutes_to_fall_asleep |minutes_in_bed |is_main_sleep |type |count_awake |duration_awake |count_awakenings |count_restless |duration_restless |
|-------------------------------------- |---------------------- |---------------------- |----------- |--------------------- |--------------- |-------------- |----------------------- |--------------- |-------------- |-------- |----------- |--------------- |----------------- |--------------- |------------------ |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 15:55:00 |2020-10-07 18:10:00 |91 |0 |123 |12 |0 |135 |1 |classic |2 |3 |10 |8 |9 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 09:49:00 |2020-10-07 10:52:30 |100 |1 |62 |0 |0 |63 |0 |classic |0 |0 |1 |1 |1 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-08 00:40:00 |2020-10-08 06:01:30 |89 |0 |275 |33 |0 |321 |1 |classic |3 |21 |16 |13 |25 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-09 00:35:30 |2020-10-09 05:57:30 |96 |0 |309 |13 |0 |322 |1 |classic |1 |3 |8 |7 |10 |
??? example "Example of the structure of source data with Fitbits sleep API Version 1.2"
=== "JSON"
|device_id |fitbit_data |
|---------------------------------------- |--------------------------------------------------------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"sleep":[{"dateOfSleep":"2020-10-10","duration":3600000,"efficiency":92,"endTime":"2020-10-10T16:37:00.000","infoCode":2,"isMainSleep":false,"levels":{"data":[{"dateTime":"2020-10-10T15:36:30.000","level":"restless","seconds":60},{"dateTime":"2020-10-10T15:37:30.000","level":"asleep","seconds":660},{"dateTime":"2020-10-10T15:48:30.000","level":"restless","seconds":60},...], "summary":{"asleep":{"count":0,"minutes":56},"awake":{"count":0,"minutes":0},"restless":{"count":3,"minutes":4}}},"logId":26315914306,"minutesAfterWakeup":0,"minutesAsleep":55,"minutesAwake":5,"minutesToFallAsleep":0,"startTime":"2020-10-10T15:36:30.000","timeInBed":60,"type":"classic"},{"dateOfSleep":"2020-10-10","duration":22980000,"efficiency":88,"endTime":"2020-10-10T08:10:00.000","infoCode":0,"isMainSleep":true,"levels":{"data":[{"dateTime":"2020-10-10T01:46:30.000","level":"light","seconds":420},{"dateTime":"2020-10-10T01:53:30.000","level":"deep","seconds":1230},{"dateTime":"2020-10-10T02:14:00.000","level":"light","seconds":360},...], "summary":{"deep":{"count":3,"minutes":92,"thirtyDayAvgMinutes":0},"light":{"count":29,"minutes":193,"thirtyDayAvgMinutes":0},"rem":{"count":4,"minutes":33,"thirtyDayAvgMinutes":0},"wake":{"count":28,"minutes":65,"thirtyDayAvgMinutes":0}}},"logId":26311786557,"minutesAfterWakeup":0,"minutesAsleep":318,"minutesAwake":65,"minutesToFallAsleep":0,"startTime":"2020-10-10T01:46:30.000","timeInBed":383,"type":"stages"}],"summary":{"stages":{"deep":92,"light":193,"rem":33,"wake":65},"totalMinutesAsleep":373,"totalSleepRecords":2,"totalTimeInBed":443}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"sleep":[{"dateOfSleep":"2020-10-11","duration":41640000,"efficiency":89,"endTime":"2020-10-11T11:47:00.000","infoCode":0,"isMainSleep":true,"levels":{"data":[{"dateTime":"2020-10-11T00:12:30.000","level":"wake","seconds":450},{"dateTime":"2020-10-11T00:20:00.000","level":"light","seconds":870},{"dateTime":"2020-10-11T00:34:30.000","level":"wake","seconds":780},...], "summary":{"deep":{"count":4,"minutes":52,"thirtyDayAvgMinutes":62},"light":{"count":32,"minutes":442,"thirtyDayAvgMinutes":364},"rem":{"count":6,"minutes":68,"thirtyDayAvgMinutes":58},"wake":{"count":29,"minutes":132,"thirtyDayAvgMinutes":94}}},"logId":26589710670,"minutesAfterWakeup":1,"minutesAsleep":562,"minutesAwake":132,"minutesToFallAsleep":0,"startTime":"2020-10-11T00:12:30.000","timeInBed":694,"type":"stages"}],"summary":{"stages":{"deep":52,"light":442,"rem":68,"wake":132},"totalMinutesAsleep":562,"totalSleepRecords":1,"totalTimeInBed":694}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |{"sleep":[{"dateOfSleep":"2020-10-12","duration":28980000,"efficiency":93,"endTime":"2020-10-12T09:34:30.000","infoCode":0,"isMainSleep":true,"levels":{"data":[{"dateTime":"2020-10-12T01:31:00.000","level":"wake","seconds":600},{"dateTime":"2020-10-12T01:41:00.000","level":"light","seconds":60},{"dateTime":"2020-10-12T01:42:00.000","level":"deep","seconds":2340},...], "summary":{"deep":{"count":4,"minutes":63,"thirtyDayAvgMinutes":59},"light":{"count":27,"minutes":257,"thirtyDayAvgMinutes":364},"rem":{"count":5,"minutes":94,"thirtyDayAvgMinutes":58},"wake":{"count":24,"minutes":69,"thirtyDayAvgMinutes":95}}},"logId":26589710673,"minutesAfterWakeup":0,"minutesAsleep":415,"minutesAwake":68,"minutesToFallAsleep":0,"startTime":"2020-10-12T01:31:00.000","timeInBed":483,"type":"stages"}],"summary":{"stages":{"deep":63,"light":257,"rem":94,"wake":69},"totalMinutesAsleep":415,"totalSleepRecords":1,"totalTimeInBed":483}}
=== "PLAIN_TEXT"
|device_id |local_start_date_time |local_end_date_time |efficiency |minutes_after_wakeup |minutes_asleep |minutes_awake |minutes_to_fall_asleep |minutes_in_bed |is_main_sleep |type |
|-------------------------------------- |---------------------- |---------------------- |----------- |--------------------- |--------------- |-------------- |----------------------- |--------------- |-------------- |-------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-10 15:36:30 |2020-10-10 16:37:00 |92 |0 |55 |5 |0 |60 |0 |classic |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-10 01:46:30 |2020-10-10 08:10:00 |88 |0 |318 |65 |0 |383 |1 |stages |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-11 00:12:30 |2020-10-11 11:47:00 |89 |1 |562 |132 |0 |694 |1 |stages |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-12 01:31:00 |2020-10-12 09:34:30 |93 |0 |415 |68 |0 |483 |1 |stages |
|`[CONTAINER]`| Container where your sleep summary data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. |
## RAPIDS provider
!!! hint "Understanding RAPIDS features"
[This diagram](../../img/sleep_summary_rapids.png) will help you understand how sleep episodes are chunked and grouped within time segments using `SLEEP_SUMMARY_LAST_NIGHT_END` for the RAPIDS provider.
!!! info "Available time segments"
- Only available for segments that span 1 or more complete days (e.g. Jan 1st 00:00 to Jan 3rd 23:59)
!!! info "File Sequence"
```bash
- data/raw/{pid}/fitbit_sleep_summary_raw.csv
- data/raw/{pid}/fitbit_sleep_summary_parsed.csv
- data/raw/{pid}/fitbit_sleep_summary_parsed_with_datetime.csv
- data/raw/{pid}/fitbit_sleep_summary_with_datetime.csv
- data/interim/{pid}/fitbit_sleep_summary_features/fitbit_sleep_summary_{language}_{provider_key}.csv
- data/processed/features/{pid}/fitbit_sleep_summary.csv
```
@ -69,14 +29,19 @@ Parameters description for `[FITBIT_SLEEP_SUMMARY][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]` | Set to `True` to extract `FITBIT_SLEEP_SUMMARY` features from the `RAPIDS` provider |
|`[SLEEP_TYPES]` | Types of sleep to be included in the feature extraction computation. Fitbit provides 3 types of sleep: `main`, `nap`, `all`. |
|`[SLEEP_TYPES]` | Types of sleep to be included in the feature extraction computation. There are three sleep types: `main`, `nap`, and `all`. The `all` type means both main sleep and naps are considered. |
|`[FEATURES]` | Features to be computed from sleep summary data, see table below |
|`[FITBIT_DATA_STREAMS][data stream][SLEEP_SUMMARY_LAST_NIGHT_END]` | As an exception, the `LAST_NIGHT_END` parameter for this provider is in the data stream configuration section. This parameter controls how sleep episodes are assigned to different days and affects wake and bedtimes.|
Features description for `[FITBIT_SLEEP_SUMMARY][PROVIDERS][RAPIDS]`:
|Feature |Units |Description |
|------------------------------ |---------- |-------------------------------------------- |
|firstwaketimeTYPE |minutes |First wake time for a certain sleep type during a time segment. Wake time is number of minutes after midnight of a sleep episode's end time.
|lastwaketimeTYPE |minutes |Last wake time for a certain sleep type during a time segment. Wake time is number of minutes after midnight of a sleep episode's end time.
|firstbedtimeTYPE |minutes |First bedtime for a certain sleep type during a time segment. Bedtime is number of minutes after midnight of a sleep episode's start time.
|lastbedtimeTYPE |minutes |Last bedtime for a certain sleep type during a time segment. Bedtime is number of minutes after midnight of a sleep episode's start time.
|countepisodeTYPE |episodes |Number of sleep episodes for a certain sleep type during a time segment.
|avgefficiencyTYPE |scores |Average sleep efficiency for a certain sleep type during a time segment.
|sumdurationafterwakeupTYPE |minutes |Total duration the user stayed in bed after waking up for a certain sleep type during a time segment.
@ -93,10 +58,13 @@ Features description for `[FITBIT_SLEEP_SUMMARY][PROVIDERS][RAPIDS]`:
!!! note "Assumptions/Observations"
1. There are three sleep types (TYPE): `main`, `nap`, `all`. The `all` type contains both main sleep and naps.
1. [This diagram](../../img/sleep_summary_rapids.png) will help you understand how sleep episodes are chunked and grouped within time segments using `LNE` for the RAPIDS provider.
1. There are three sleep types (TYPE): `main`, `nap`, `all`. The `all` type groups both `main` sleep and `naps`. All types are based on Fitbit's labels.
2. There are two versions of Fitbits sleep API ([version 1](https://dev.fitbit.com/build/reference/web-api/sleep-v1/) and [version 1.2](https://dev.fitbit.com/build/reference/web-api/sleep/)), and each provides raw sleep data in a different format:
- _Count & duration summaries_. `v1` contains `count_awake`, `duration_awake`, `count_awakenings`, `count_restless`, and `duration_restless` fields for every sleep record but `v1.2` does not.
3. _API columns_. Features are computed based on the values provided by Fitbits API: `efficiency`, `minutes_after_wakeup`, `minutes_asleep`, `minutes_awake`, `minutes_to_fall_asleep`, `minutes_in_bed`, `is_main_sleep` and `type`.
3. _API columns_. Most features are computed based on the values provided by Fitbits API: `efficiency`, `minutes_after_wakeup`, `minutes_asleep`, `minutes_awake`, `minutes_to_fall_asleep`, `minutes_in_bed`, `is_main_sleep` and `type`.
4. Bed time and sleep duration are based on episodes that started between todays LNE and tomorrows LNE while awake time is based on the episodes that started between yesterdays LNE and todays LNE
5. The reference point for bed/awake times is todays 00:00. You can have bedtimes larger than 24 and awake times smaller than 0
6. These features are only available for time segments that span midnight to midnight of the same or different day.
7. We include first and last wake and bedtimes because, when `LAST_NIGHT_END` is 10 am, the first bedtime could match a nap at 2 pm, and the last bedtime could match a main overnight sleep episode that starts at 10pm.
5. Set the value for `SLEEP_SUMMARY_LAST_NIGHT_END` int the config parameter [FITBIT_DATA_STREAMS][data stream][SLEEP_SUMMARY_LAST_NIGHT_END].

View File

@ -4,30 +4,8 @@ Sensor parameters description for `[FITBIT_STEPS_INTRADAY]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table name or file path where the steps intraday data is stored. The configuration keys in [Device Data Source Configuration](../../setup/configuration/#device-data-source-configuration) control whether this parameter is interpreted as table or file.
The format of the column(s) containing the Fitbit sensor data can be `JSON` or `PLAIN_TEXT`. The data in `JSON` format is obtained directly from the Fitbit API. We support `PLAIN_TEXT` in case you already parsed your data and don't have access to your participants' Fitbit accounts anymore. If your data is in `JSON` format then summary and intraday data come packed together.
We provide examples of the input format that RAPIDS expects, note that both examples for `JSON` and `PLAIN_TEXT` are tabular and the actual format difference comes in the `fitbit_data` column (we truncate the `JSON` example for brevity).
??? example "Example of the structure of source data"
=== "JSON"
|device_id |fitbit_data |
|---------------------------------------- |--------------------------------------------------------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |"activities-steps":[{"dateTime":"2020-10-07","value":"1775"}],"activities-steps-intraday":{"dataset":[{"time":"00:00:00","value":5},{"time":"00:01:00","value":3},{"time":"00:02:00","value":0},...],"datasetInterval":1,"datasetType":"minute"}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |"activities-steps":[{"dateTime":"2020-10-08","value":"3201"}],"activities-steps-intraday":{"dataset":[{"time":"00:00:00","value":14},{"time":"00:01:00","value":11},{"time":"00:02:00","value":10},...],"datasetInterval":1,"datasetType":"minute"}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |"activities-steps":[{"dateTime":"2020-10-09","value":"998"}],"activities-steps-intraday":{"dataset":[{"time":"00:00:00","value":0},{"time":"00:01:00","value":0},{"time":"00:02:00","value":0},...],"datasetInterval":1,"datasetType":"minute"}}
=== "PLAIN_TEXT"
|device_id |local_date_time |steps |
|-------------------------------------- |---------------------- |--------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 00:00:00 |5 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 00:01:00 |3 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 00:02:00 |0 |
|`[CONTAINER]`| Container where your steps intraday data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. |
|`[EXCLUDE_SLEEP]` | Step data will be excluded if it was logged during sleep periods when at least one `[EXCLUDE]` flag is set to `True`. Sleep can be delimited by (1) a fixed period that repeats on every day if `[TIME_BASED][EXCLUDE]` is True or (2) by Fitbit summary sleep episodes if `[FITBIT_BASED][EXCLUDE]` is True. If both are True (3), we use all Fitbit sleep episodes as well as the time-based episodes that do not overlap with any Fitbit episodes. If `[TIME_BASED][EXCLUDE]` is True, make sure Fitbit sleep summary container points to a valid table or file.
## RAPIDS provider
@ -37,8 +15,9 @@ We provide examples of the input format that RAPIDS expects, note that both exam
!!! info "File Sequence"
```bash
- data/raw/{pid}/fitbit_steps_intraday_raw.csv
- data/raw/{pid}/fitbit_steps_intraday_parsed.csv
- data/raw/{pid}/fitbit_steps_intraday_parsed_with_datetime.csv
- data/raw/{pid}/fitbit_steps_intraday_with_datetime.csv
- data/raw/{pid}/fitbit_sleep_summary_raw.csv (Only when [EXCLUDE_SLEEP][EXCLUDE]=True and [EXCLUDE_SLEEP][TYPE]=FITBIT_BASED)
- data/interim/{pid}/fitbit_steps_intraday_with_datetime_exclude_sleep.csv (Only when [EXCLUDE_SLEEP][EXCLUDE]=True)
- data/interim/{pid}/fitbit_steps_intraday_features/fitbit_steps_intraday_{language}_{provider_key}.csv
- data/processed/features/{pid}/fitbit_steps_intraday.csv
```
@ -50,6 +29,7 @@ Parameters description for `[FITBIT_STEPS_INTRADAY][PROVIDERS][RAPIDS]`:
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]` | Set to `True` to extract `FITBIT_STEPS_INTRADAY` features from the `RAPIDS` provider|
|`[FEATURES]` | Features to be computed from steps intraday data, see table below |
|`[REFERENCE_HOUR]` | The reference point from which `firststeptime` or `laststeptime` is to be computed, default is midnight |
|`[THRESHOLD_ACTIVE_BOUT]` | Every minute with Fitbit steps data wil be labelled as `sedentary` if its step count is below this threshold, otherwise, `active`. |
|`[INCLUDE_ZERO_STEP_ROWS]` | Whether or not to include time segments with a 0 step count during the whole day. |
@ -63,6 +43,8 @@ Features description for `[FITBIT_STEPS_INTRADAY][PROVIDERS][RAPIDS]`:
|minsteps |steps |The minimum step count during a time segment.
|avgsteps |steps |The average step count during a time segment.
|stdsteps |steps |The standard deviation of step count during a time segment.
|firststeptime |minutes |Minutes until the first non-zero step count.
|laststeptime |minutes |Minutes until the last non-zero step count.
|countepisodesedentarybout |bouts |Number of sedentary bouts during a time segment.
|sumdurationsedentarybout |minutes |Total duration of all sedentary bouts during a time segment.
|maxdurationsedentarybout |minutes |The maximum duration of any sedentary bout during a time segment.

View File

@ -4,29 +4,7 @@ Sensor parameters description for `[FITBIT_STEPS_SUMMARY]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table name or file path where the steps summary data is stored. The configuration keys in [Device Data Source Configuration](../../setup/configuration/#device-data-source-configuration) control whether this parameter is interpreted as table or file.
The format of the column(s) containing the Fitbit sensor data can be `JSON` or `PLAIN_TEXT`. The data in `JSON` format is obtained directly from the Fitbit API. We support `PLAIN_TEXT` in case you already parsed your data and don't have access to your participants' Fitbit accounts anymore. If your data is in `JSON` format then summary and intraday data come packed together.
We provide examples of the input format that RAPIDS expects, note that both examples for `JSON` and `PLAIN_TEXT` are tabular and the actual format difference comes in the `fitbit_data` column (we truncate the `JSON` example for brevity).
??? example "Example of the structure of source data"
=== "JSON"
|device_id |fitbit_data |
|---------------------------------------- |--------------------------------------------------------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |"activities-steps":[{"dateTime":"2020-10-07","value":"1775"}],"activities-steps-intraday":{"dataset":[{"time":"00:00:00","value":5},{"time":"00:01:00","value":3},{"time":"00:02:00","value":0},...],"datasetInterval":1,"datasetType":"minute"}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |"activities-steps":[{"dateTime":"2020-10-08","value":"3201"}],"activities-steps-intraday":{"dataset":[{"time":"00:00:00","value":14},{"time":"00:01:00","value":11},{"time":"00:02:00","value":10},...],"datasetInterval":1,"datasetType":"minute"}}
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |"activities-steps":[{"dateTime":"2020-10-09","value":"998"}],"activities-steps-intraday":{"dataset":[{"time":"00:00:00","value":0},{"time":"00:01:00","value":0},{"time":"00:02:00","value":0},...],"datasetInterval":1,"datasetType":"minute"}}
=== "PLAIN_TEXT"
|device_id |local_date_time |steps |
|-------------------------------------- |---------------------- |--------- |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-07 |1775 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-08 |3201 |
|a748ee1a-1d0b-4ae9-9074-279a2b6ba524 |2020-10-09 |998 |
|`[CONTAINER]`| Container where your steps summary data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. |
## RAPIDS provider
@ -37,8 +15,7 @@ We provide examples of the input format that RAPIDS expects, note that both exam
!!! info "File Sequence"
```bash
- data/raw/{pid}/fitbit_steps_summary_raw.csv
- data/raw/{pid}/fitbit_steps_summary_parsed.csv
- data/raw/{pid}/fitbit_steps_summary_parsed_with_datetime.csv
- data/raw/{pid}/fitbit_steps_summary_with_datetime.csv
- data/interim/{pid}/fitbit_steps_summary_features/fitbit_steps_summary_{language}_{provider_key}.csv
- data/processed/features/{pid}/fitbit_steps_summary.csv
```

View File

@ -4,7 +4,7 @@ Sensor parameters description for `[PHONE_ACCELEROMETER]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the accelerometer data is stored
|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the accelerometer data is stored
## RAPIDS provider

View File

@ -4,8 +4,8 @@ Sensor parameters description for `[PHONE_ACTIVITY_RECOGNITION]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE][ANDROID]`| Database table where the activity data from Android devices is stored (the AWARE client saves this data on different tables for Android and iOS)
|`[TABLE][IOS]`| Database table where the activity data from iOS devices is stored (the AWARE client saves this data on different tables for Android and iOS)
|`[CONTAINER][ANDROID]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the activity data from Android devices is stored (the AWARE client saves this data on different tables for Android and iOS)
|`[CONTAINER][IOS]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the activity data from iOS devices is stored (the AWARE client saves this data on different tables for Android and iOS)
|`[EPISODE_THRESHOLD_BETWEEN_ROWS]` | Difference in minutes between any two rows for them to be considered part of the same activity episode
## RAPIDS provider
@ -18,7 +18,6 @@ Sensor parameters description for `[PHONE_ACTIVITY_RECOGNITION]`:
```bash
- data/raw/{pid}/phone_activity_recognition_raw.csv
- data/raw/{pid}/phone_activity_recognition_with_datetime.csv
- data/raw/{pid}/phone_activity_recognition_with_datetime_unified.csv
- data/interim/{pid}/phone_activity_recognition_episodes.csv
- data/interim/{pid}/phone_activity_recognition_episodes_resampled.csv
- data/interim/{pid}/phone_activity_recognition_episodes_resampled_with_datetime.csv
@ -45,7 +44,7 @@ Features description for `[PHONE_ACTIVITY_RECOGNITION][PROVIDERS][RAPIDS]`:
|count |rows | Number of episodes.
|mostcommonactivity |activity type | The most common activity type (e.g. `still`, `on_foot`, etc.). If there is a tie, the first one is chosen.
|countuniqueactivities |activity type | Number of unique activities.
|durationstationary |minutes | The total duration of `[ACTIVITY_CLASSES][STATIONARY]` episodes
|durationstationary |minutes | The total duration of `[ACTIVITY_CLASSES][STATIONARY]` episodes of still and tilting activities
|durationmobile |minutes | The total duration of `[ACTIVITY_CLASSES][MOBILE]` episodes of on foot, running, and on bicycle activities
|durationvehicle |minutes | The total duration of `[ACTIVITY_CLASSES][VEHICLE]` episodes of on vehicle activity

View File

@ -0,0 +1,14 @@
# Phone Applications Crashes
Sensor parameters description for `[PHONE_APPLICATIONS_CRASHES]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the applications crashes data is stored
|`[APPLICATION_CATEGORIES][CATALOGUE_SOURCE]` | `FILE` or `GOOGLE`. If `FILE`, app categories (genres) are read from `[CATALOGUE_FILE]`. If `[GOOGLE]`, app categories (genres) are scrapped from the Play Store
|`[APPLICATION_CATEGORIES][CATALOGUE_FILE]` | CSV file with a `package_name` and `genre` column. By default we provide the catalogue created by [Stachl et al](../../citation#stachl-applications-crashes) in `data/external/stachl_application_genre_catalogue.csv`
|`[APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE]` | if `[CATALOGUE_SOURCE]` is equal to `FILE`, this flag signals whether or not to update `[CATALOGUE_FILE]`, if `[CATALOGUE_SOURCE]` is equal to `GOOGLE` all scraped genres will be saved to `[CATALOGUE_FILE]`
|`[APPLICATION_CATEGORIES][SCRAPE_MISSING_CATEGORIES]` | This flag signals whether or not to scrape categories (genres) missing from the `[CATALOGUE_FILE]`. If `[CATALOGUE_SOURCE]` is equal to `GOOGLE`, all genres are scraped anyway (this flag is ignored)
!!! note
No feature providers have been implemented for this sensor yet, however you can use its key (`PHONE_APPLICATIONS_CRASHES`) to improve [`PHONE_DATA_YIELD`](../phone-data-yield) or you can [implement your own features](../add-new-features).

View File

@ -4,7 +4,7 @@ Sensor parameters description for `[PHONE_APPLICATIONS_FOREGROUND]` (these param
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the applications foreground data is stored
|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the applications foreground data is stored
|`[APPLICATION_CATEGORIES][CATALOGUE_SOURCE]` | `FILE` or `GOOGLE`. If `FILE`, app categories (genres) are read from `[CATALOGUE_FILE]`. If `[GOOGLE]`, app categories (genres) are scrapped from the Play Store
|`[APPLICATION_CATEGORIES][CATALOGUE_FILE]` | CSV file with a `package_name` and `genre` column. By default we provide the catalogue created by [Stachl et al](../../citation#stachl-applications-foreground) in `data/external/stachl_application_genre_catalogue.csv`
|`[APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE]` | if `[CATALOGUE_SOURCE]` is equal to `FILE`, this flag signals whether or not to update `[CATALOGUE_FILE]`, if `[CATALOGUE_SOURCE]` is equal to `GOOGLE` all scraped genres will be saved to `[CATALOGUE_FILE]`
@ -33,25 +33,36 @@ Parameters description for `[PHONE_APPLICATIONS_FOREGROUND][PROVIDERS][RAPIDS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_APPLICATIONS_FOREGROUND` features from the `RAPIDS` provider|
|`[INCLUDE_EPISODE_FEATURES]`| Set to `True` to extract features from application usage episodes using Screen data |
|`[FEATURES]` | Features to be computed, see table below
|`[SINGLE_CATEGORIES]` | An array of app categories to be *included* in the feature extraction computation. The special keyword `all` represents a category with all the apps from each participant. By default we use the category catalogue pointed by `[APPLICATION_CATEGORIES][CATALOGUE_FILE]` (see the Sensor parameters description table above)
|`[MULTIPLE_CATEGORIES]` | An array of collections representing meta-categories (a group of categories). They key of each element is the name of the `meta-category` and the value is an array of member app categories. By default we use the category catalogue pointed by `[APPLICATION_CATEGORIES][CATALOGUE_FILE]` (see the Sensor parameters description table above)
|`[SINGLE_CATEGORIES]` | An array of app categories to be *included* in the feature extraction computation. The special keyword `all` represents a category with all the apps from each participant. By default, we use the category catalog pointed by `[APPLICATION_CATEGORIES][CATALOGUE_FILE]` (see the Sensor parameters description table above)
|`[CUSTOM_CATEGORIES]` | An array of collections representing your own app categories. The key of each element is the name of the custom category, and the value is an array of the package names (apps) included in that category.
|`[MULTIPLE_CATEGORIES]` | An array of collections representing meta-categories (a group of categories). The key of each element is the name of the `meta-category` and the value is an array of member app categories. By default, we use the category catalog pointed by `[APPLICATION_CATEGORIES][CATALOGUE_FILE]` (see the Sensor parameters description table above)
|`[SINGLE_APPS]` | An array of apps to be *included* in the feature extraction computation. Use their package name (e.g. `com.google.android.youtube`) or the reserved keyword `top1global` (the most used app by a participant over the whole monitoring study)
|`[EXCLUDED_CATEGORIES]` | An array of app categories to be *excluded* from the feature extraction computation. By default we use the category catalogue pointed by `[APPLICATION_CATEGORIES][CATALOGUE_FILE]` (see the Sensor parameters description table above)
|`[EXCLUDED_CATEGORIES]` | An array of app categories to be *excluded* from the feature extraction computation. By default, we use the category catalog pointed by `[APPLICATION_CATEGORIES][CATALOGUE_FILE]` (see the Sensor parameters description table above)
|`[EXCLUDED_APPS]` | An array of apps to be excluded from the feature extraction computation. Use their package name, for example: `com.google.android.youtube`
Features description for `[PHONE_APPLICATIONS_FOREGROUND][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|count |apps | Number of times a single app or apps within a category were used (i.e. they were brought to the foreground either by tapping their icon or switching to it from another app)
|countevent |apps | Number of times a single app or apps within a category were used (i.e. they were brought to the foreground either by tapping their icon or switching to it from another app)
|timeoffirstuse |minutes | The time in minutes between 12:00am (midnight) and the first use of a single app or apps within a category during a `time_segment`
|timeoflastuse |minutes | The time in minutes between 12:00am (midnight) and the last use of a single app or apps within a category during a `time_segment`
|frequencyentropy |nats | The entropy of the used apps within a category during a `time_segment` (each app is seen as a unique event, the more apps were used, the higher the entropy). This is especially relevant when computed over all apps. Entropy cannot be obtained for a single app
|countepisode |apps | Number of times a usage episode of a single app or apps within a category were logged. In contrast to `countevent`, if an app was used across more than one time segment (for example, across more than one 30-minute segment), the `countepisode` will be one on each time segment instance.
|minduration |minutes | For a `time_segment`, the minimum duration an application was used in minutes
|maxduration |minutes | For a `time_segment`, the maximum duration an application was used in minutes
|meanduration |minutes | For a `time_segment`, the mean duration of all the applications used in minutes
|sumduration |minutes | For a `time_segment`, the sum duration of all the applications used in minutes
!!! note "Assumptions/Observations"
Features can be computed by app, by apps grouped under a single category (genre) and by multiple categories grouped together (meta-categories). For example, we can get features for `Facebook` (single app), for `Social Network` apps (a category including Facebook and other social media apps) or for `Social` (a meta-category formed by `Social Network` and `Social Media Tools` categories).
1. Features can be computed by app, by apps grouped under a single category (genre), by your own categories, or by multiple categories grouped together (meta-categories). For example, we can get features for `Facebook` (single app), for `Social Network` apps (a category including Facebook and other social media apps), for `Traditional Social Media` (a custom category that includes Twitter and Facebook), or for `Social` (a meta-category formed by `Social Network` and `Social Media Tools` categories).
Apps installed by default like YouTube are considered systems apps on some phones. We do an exact match to exclude apps where "genre" == `EXCLUDED_CATEGORIES` or "package_name" == `EXCLUDED_APPS`.
2. Apps installed by default like YouTube are considered systems apps on some phones. We do an exact match to exclude apps where "genre" == `EXCLUDED_CATEGORIES` or "package_name" == `EXCLUDED_APPS`.
We provide three ways of classifying and app within a category (genre): a) by automatically scraping its official category from the Google Play Store, b) by using the catalogue created by Stachl et al. which we provide in RAPIDS (`data/external/stachl_application_genre_catalogue.csv`), or c) by manually creating a personalized catalogue. You can choose a, b or c by modifying `[APPLICATION_GENRES]` keys and values (see the Sensor parameters description table above).
3. We provide four ways of classifying an app within a category (genre): a) by automatically scraping its official category from the Google Play Store, b) by using the catalog created by Stachl et al., which we provide in RAPIDS (`data/external/stachl_application_genre_catalogue.csv`), c) by manually creating a personalized catalog, or d) by defining a custom category in `config.yaml`. You can choose a, b, or c by modifying `[APPLICATION_GENRES]` keys and values (see the first table of this page).
4. We count `episodes` and `events` separately. Events are single app logs (when an app was opened), but episodes span from the time an app was opened until a new app is in the foreground or the screen is locked. Episodes will be chunked across any overlapping time segments. The `top1global` of `episodes` might not be the same as the `top1global` of `events`.
5. The application episodes are calculated using the application foreground and screen unlock episode data. An application episode starts when the application is launched and ends when new application is launched, or the screen is locked.

View File

@ -0,0 +1,14 @@
# Phone Applications Notifications
Sensor parameters description for `[PHONE_APPLICATIONS_NOTIFICATIONS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the applications notifications data is stored
|`[APPLICATION_CATEGORIES][CATALOGUE_SOURCE]` | `FILE` or `GOOGLE`. If `FILE`, app categories (genres) are read from `[CATALOGUE_FILE]`. If `[GOOGLE]`, app categories (genres) are scrapped from the Play Store
|`[APPLICATION_CATEGORIES][CATALOGUE_FILE]` | CSV file with a `package_name` and `genre` column. By default we provide the catalogue created by [Stachl et al](../../citation#stachl-applications-notifications) in `data/external/stachl_application_genre_catalogue.csv`
|`[APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE]` | if `[CATALOGUE_SOURCE]` is equal to `FILE`, this flag signals whether or not to update `[CATALOGUE_FILE]`, if `[CATALOGUE_SOURCE]` is equal to `GOOGLE` all scraped genres will be saved to `[CATALOGUE_FILE]`
|`[APPLICATION_CATEGORIES][SCRAPE_MISSING_CATEGORIES]` | This flag signals whether or not to scrape categories (genres) missing from the `[CATALOGUE_FILE]`. If `[CATALOGUE_SOURCE]` is equal to `GOOGLE`, all genres are scraped anyway (this flag is ignored)
!!! note
No feature providers have been implemented for this sensor yet, however you can use its key (`PHONE_APPLICATIONS_NOTIFICATIONS`) to improve [`PHONE_DATA_YIELD`](../phone-data-yield) or you can [implement your own features](../add-new-features).

View File

@ -4,7 +4,7 @@ Sensor parameters description for `[PHONE_BATTERY]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the battery data is stored
|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the battery data is stored
|`[EPISODE_THRESHOLD_BETWEEN_ROWS]` | Difference in minutes between any two rows for them to be considered part of the same battery charge or discharge episode
## RAPIDS provider

View File

@ -4,7 +4,7 @@ Sensor parameters description for `[PHONE_BLUETOOTH]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the bluetooth data is stored
|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the bluetooth data is stored
## RAPIDS provider
@ -86,6 +86,7 @@ Features description for `[PHONE_BLUETOOTH][PROVIDERS][DORYAB]`:
!!! note "Assumptions/Observations"
- Devices are classified as belonging to the participant (`own`) or to other people (`others`) using k-means based on the number of times and the number of days each device was detected across each participant's dataset. See [Doryab et al](../../citation#doryab-bluetooth) for more details.
- If ownership cannot be computed because all devices were detected on only one day, they are all considered as `other`. Thus `all` and `other` features will be equal. The likelihood of this scenario decreases the more days of data you have.
- When searching for the most frequent device across 30-minute segments, the search range is equivalent to the sum of all segments of the same time period. For instance, the `countscansmostfrequentdeviceacrosssegments` for the time segment (`Fri 00:00:00, Fri 00:29:59`) will get the count in that segment of the most frequent device found within all (`00:00:00, 00:29:59`) time segments. To find `countscansmostfrequentdeviceacrosssegments` for `other` devices, the search range needs to filter out all `own` devices. But no need to do so for `countscansmostfrequentdeviceacrosssedataset`. The most frequent device across the dataset stays the same for `countscansmostfrequentdeviceacrossdatasetall`, `countscansmostfrequentdeviceacrossdatasetown` and `countscansmostfrequentdeviceacrossdatasetother`. Same rule applies to the least frequent device across the dataset.
- The most and least frequent devices will be the same across time segment instances and across the entire dataset when every time segment instance covers every hour of a dataset. For example, daily segments (00:00 to 23:59) fall in this category but morning segments (06:00am to 11:59am) or periodic 30-minute segments don't.
??? info "Example"

View File

@ -4,7 +4,7 @@ Sensor parameters description for `[PHONE_CALLS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the calls data is stored
|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the calls data is stored
## RAPIDS Provider
@ -16,7 +16,6 @@ Sensor parameters description for `[PHONE_CALLS]`:
```bash
- data/raw/{pid}/phone_calls_raw.csv
- data/raw/{pid}/phone_calls_with_datetime.csv
- data/raw/{pid}/phone_calls_with_datetime_unified.csv
- data/interim/{pid}/phone_calls_features/phone_calls_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_calls.csv
```
@ -27,6 +26,7 @@ Parameters description for `[PHONE_CALLS][PROVIDERS][RAPIDS]`:
| Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|-------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|`[COMPUTE]`| Set to `True` to extract `PHONE_CALLS` features from the `RAPIDS` provider|
|`[FEATURES_TYPE]`| Set to `EPISODES` to extract features based on call episodes or `EVENTS` to extract features based on events.|
| `[CALL_TYPES]` | The particular call_type that will be analyzed. The options for this parameter are incoming, outgoing or missed. |
| `[FEATURES]` | Features to be computed for `outgoing`, `incoming`, and `missed` calls. Note that the same features are available for both incoming and outgoing calls, while missed calls has its own set of features. See the tables below. |
@ -61,4 +61,4 @@ Features description for `[PHONE_CALLS][PROVIDERS][RAPIDS]` missed calls:
!!! note "Assumptions/Observations"
1. Traces for iOS calls are unique even for the same contact calling a participant more than once which renders `countmostfrequentcontact` meaningless and `distinctcontacts` equal to the total number of traces.
2. `[CALL_TYPES]` and `[FEATURES]` keys in `config.yaml` need to match. For example, `[CALL_TYPES]` `outgoing` matches the `[FEATURES]` key `outgoing`
3. iOS calls data is transformed to match Android calls data format. See our [algorithm](algorithms/phone-algorithms.md#phone-calls)
3. iOS calls data is transformed to match Android calls data format.

View File

@ -4,8 +4,8 @@ Sensor parameters description for `[PHONE_CONVERSATION]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE][ANDROID]`| Database table where the conversation data from Android devices is stored (the AWARE client saves this data on different tables for Android and iOS)
|`[TABLE][IOS]`| Database table where the conversation data from iOS devices is stored (the AWARE client saves this data on different tables for Android and iOS)
|`[CONTAINER][ANDROID]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the conversation data from Android devices is stored (the AWARE client saves this data on different tables for Android and iOS)
|`[CONTAINER][IOS]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the conversation data from iOS devices is stored (the AWARE client saves this data on different tables for Android and iOS)
## RAPIDS provider
@ -17,7 +17,6 @@ Sensor parameters description for `[PHONE_CONVERSATION]`:
```bash
- data/raw/{pid}/phone_conversation_raw.csv
- data/raw/{pid}/phone_conversation_with_datetime.csv
- data/raw/{pid}/phone_conversation_with_datetime_unified.csv
- data/interim/{pid}/phone_conversation_features/phone_conversation_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_conversation.csv
```

View File

@ -9,23 +9,27 @@ Sensor parameters description for `[PHONE_DATA_YIELD]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[SENSORS]`| One or more phone sensor config keys (e.g. `PHONE_MESSAGE`). The more keys you include the more accurately RAPIDS can approximate the time an smartphone was sensing data. The supported phone sensors you can include in this list are outlined below (**do NOT include Fitbit sensors**).
|`[SENSORS]`| One or more phone sensor config keys (e.g. `PHONE_MESSAGE`). The more keys you include the more accurately RAPIDS can approximate the time an smartphone was sensing data. The supported phone sensors you can include in this list are outlined below (**do NOT include Fitbit sensors, ONLY include phone sensors**).
!!! info "Supported phone sensors for `[PHONE_DATA_YIELD][SENSORS]`"
```yaml
PHONE_ACCELEROMETER
PHONE_ACTIVITY_RECOGNITION
PHONE_APPLICATIONS_CRASHES
PHONE_APPLICATIONS_FOREGROUND
PHONE_APPLICATIONS_NOTIFICATIONS
PHONE_BATTERY
PHONE_BLUETOOTH
PHONE_CALLS
PHONE_CONVERSATION
PHONE_MESSAGES
PHONE_KEYBOARD
PHONE_LIGHT
PHONE_LOCATIONS
PHONE_LOG
PHONE_MESSAGES
PHONE_SCREEN
PHONE_WIFI_VISIBLE
PHONE_WIFI_CONNECTED
PHONE_WIFI_VISIBLE
```
## RAPIDS provider
@ -64,8 +68,8 @@ Features description for `[PHONE_DATA_YIELD][PROVIDERS][RAPIDS]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|ratiovalidyieldedminutes |rows | The ratio between the number of valid minutes and the duration in minutes of a time segment.
|ratiovalidyieldedhours |lux | The ratio between the number of valid hours and the duration in hours of a time segment. If the time segment is shorter than 1 hour this feature will always be 1.
|ratiovalidyieldedminutes |- | The ratio between the number of valid minutes and the duration in minutes of a time segment.
|ratiovalidyieldedhours |- | The ratio between the number of valid hours and the duration in hours of a time segment. If the time segment is shorter than 1 hour this feature will always be 1.
!!! note "Assumptions/Observations"

View File

@ -0,0 +1,40 @@
# Phone Keyboard
Sensor parameters description for `[PHONE_KEYBOARD]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the keyboard data is stored
## RAPIDS provider
!!! info "Available time segments and platforms"
- Available for all time segments
- Available for Android only
!!! info "File Sequence"
```bash
- data/raw/{pid}/phone_keyboard_raw.csv
- data/raw/{pid}/phone_keyboard_with_datetime.csv
- data/interim/{pid}/phone_keyboard_features/phone_keyboard_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_keyboard.csv
```
Features description for `[PHONE_KEYBOARD]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|sessioncount | - |Number of typing sessions in a time segment. A session begins with any keypress and finishes until 5 seconds have elapsed since the last key was pressed or the application that the user was typing on changes.
|averagesessionlength | milliseconds | Average length of all sessions in a time segment instance
|averageinterkeydelay |milliseconds |The average time between keystrokes measured in milliseconds.
|changeintextlengthlessthanminusone | | Number of times a keyboard typing or swiping event changed the length of the current text to less than one fewer character.
|changeintextlengthequaltominusone | | Number of times a keyboard typing or swiping event changed the length of the current text in exactly one fewer character.
|changeintextlengthequaltoone | | Number of times a keyboard typing or swiping event changed the length of the current text in exactly one more character.
|changeintextlengthmorethanone | | Number of times a keyboard typing or swiping event changed the length of the current text to more than one character.
|maxtextlength | | Length in characters of the longest sentence(s) contained in the typing text box of any app during the time segment.
|lastmessagelength | | Length of the last text in characters of the sentence(s) contained in the typing text box of any app during the time segment.
|totalkeyboardtouches | | Average number of typing events across all sessions in a time segment instance.
!!! note
We did not find a reliable way to distinguish between AutoCorrect or AutoComplete changes, since both can be applied with a single touch or swipe event and can decrease or increase the length of the text by an arbitrary number of characters.

View File

@ -4,7 +4,7 @@ Sensor parameters description for `[PHONE_LIGHT]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the light data is stored
|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the light data is stored
## RAPIDS provider

View File

@ -4,16 +4,29 @@ Sensor parameters description for `[PHONE_LOCATIONS]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the location data is stored
|`[LOCATIONS_TO_USE]`| Type of location data to use, one of `ALL`, `GPS` or `FUSED_RESAMPLED`. This filter is based on the `provider` column of the AWARE locations table, `ALL` includes every row, `GPS` only includes rows where provider is gps, and `FUSED_RESAMPLED` only includes rows where provider is fused after being resampled.
|`[FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD]`| if `FUSED_RESAMPLED` is used, the original fused data has to be resampled, a location row will be resampled to the next valid timestamp (see the Assumptions/Observations below) only if the time difference between them is less or equal than this threshold (in minutes).
|`[FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION]`| if `FUSED_RESAMPLED` is used, the original fused data has to be resampled, a location row will be resampled at most for this long (in minutes)
|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the location data is stored
|`[LOCATIONS_TO_USE]`| Type of location data to use, one of `ALL`, `GPS`, `ALL_RESAMPLED` or `FUSED_RESAMPLED`. This filter is based on the `provider` column of the locations table, `ALL` includes every row, `GPS` only includes rows where the provider is gps, `ALL_RESAMPLED` includes all rows after being resampled, and `FUSED_RESAMPLED` only includes rows where the provider is fused after being resampled.
|`[FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD]`| If `ALL_RESAMPLED` or `FUSED_RESAMPLED` is used, the original fused data has to be resampled. A location row is resampled to the next valid timestamp (see the Assumptions/Observations below) only if the time difference between them is less or equal than this threshold (in minutes).
|`[FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION]`| If `ALL_RESAMPLED` or `FUSED_RESAMPLED` is used, the original fused data has to be resampled. A location row is resampled at most for this long (in minutes).
|`[ACCURACY_LIMIT]` | An integer in meters, any location rows with an accuracy higher or equal than this is dropped. This number means there's a 68% probability the actual location is within this radius.
!!! note "Assumptions/Observations"
**Types of location data to use**
AWARE Android and iOS clients can collect location coordinates through the phone\'s GPS, the network cellular towers around the phone or Google\'s fused location API. If you want to use only the GPS provider set `[LOCATIONS_TO_USE]` to `GPS`, if you want to use all providers (not recommended due to the difference in accuracy) set `[LOCATIONS_TO_USE]` to `ALL`, if your AWARE client was configured to use fused location only or want to focus only on this provider, set `[LOCATIONS_TO_USE]` to `RESAMPLE_FUSED`. `RESAMPLE_FUSED` takes the original fused location coordinates and replicates each pair forward in time as long as the phone was sensing data as indicated by the joined timestamps of [`[PHONE_DATA_YIELD][SENSORS]`](../phone-data-yield/), this is done because Google\'s API only logs a new location coordinate pair when it is sufficiently different in time or space from the previous one.
Android and iOS clients can collect location coordinates through the phone's GPS, the network cellular towers around the phone, or Google's fused location API.
- If you want to use only the GPS provider, set `[LOCATIONS_TO_USE]` to `GPS`
- If you want to use all providers, set `[LOCATIONS_TO_USE]` to `ALL`
- If you collected location data from different providers, including the fused API, use `ALL_RESAMPLED`
- If your mobile client was configured to use fused location only or want to focus only on this provider, set `[LOCATIONS_TO_USE]` to `FUSED_RESAMPLED`.
`ALL_RESAMPLED` and `FUSED_RESAMPLED` take the original location coordinates and replicate each pair forward in time as long as the phone was sensing data as indicated by the joined timestamps of [`[PHONE_DATA_YIELD][SENSORS]`](../phone-data-yield/). This is done because Google's API only logs a new location coordinate pair when it is sufficiently different in time or space from the previous one and because GPS and network providers can log data at variable rates.
There are two parameters associated with resampling fused location. `FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD` (in minutes, default 30) controls the maximum gap between any two coordinate pairs to replicate the last known pair (for example, participant A\'s phone did not collect data between 10.30am and 10:50am and between 11:05am and 11:40am, the last known coordinate pair will be replicated during the first period but not the second, in other words, we assume that we cannot longer guarantee the participant stayed at the last known location if the phone did not sense data for more than 30 minutes). `FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION` (in minutes, default 720 or 12 hours) stops the last known fused location from being replicated longer that this threshold even if the phone was sensing data continuously (for example, participant A went home at 9pm and their phone was sensing data without gaps until 11am the next morning, the last known location will only be replicated until 9am). If you have suggestions to modify or improve this resampling, let us know.
There are two parameters associated with resampling fused location.
1. `FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD` (in minutes, default 30) controls the maximum gap between any two coordinate pairs to replicate the last known pair. For example, participant A's phone did not collect data between 10.30 am and 10:50 am and between 11:05am and 11:40am, the last known coordinate pair is replicated during the first period but not the second. In other words, we assume that we cannot longer guarantee the participant stayed at the last known location if the phone did not sense data for more than 30 minutes.
2. `FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION` (in minutes, default 720 or 12 hours) stops the last known fused location from being replicated longer than this threshold even if the phone was sensing data continuously. For example, participant A went home at 9 pm, and their phone was sensing data without gaps until 11 am the next morning, the last known location is replicated until 9 am.
If you have suggestions to modify or improve this resampling, let us know.
## BARNETT provider
@ -21,7 +34,7 @@ These features are based on the original open-source implementation by [Barnett
!!! info "Available time segments and platforms"
- Available only for segments that start at 00:00:00 and end at 23:59:59 of the same day (daily segments)
- Available only for segments that start at 00:00:00 and end at 23:59:59 of the same or a different day (daily, weekly, weekend, etc.)
- Available for Android and iOS
!!! info "File Sequence"
@ -29,6 +42,7 @@ These features are based on the original open-source implementation by [Barnett
- data/raw/{pid}/phone_locations_raw.csv
- data/interim/{pid}/phone_locations_processed.csv
- data/interim/{pid}/phone_locations_processed_with_datetime.csv
- data/interim/{pid}/phone_locations_barnett_daily.csv
- data/interim/{pid}/phone_locations_features/phone_locations_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_locations.csv
```
@ -36,13 +50,12 @@ These features are based on the original open-source implementation by [Barnett
Parameters description for `[PHONE_LOCATIONS][PROVIDERS][BARNETT]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_LOCATIONS` features from the `BARNETT` provider|
|`[FEATURES]` | Features to be computed, see table below
|`[ACCURACY_LIMIT]` | An integer in meters, any location rows with an accuracy higher than this will be dropped. This number means there's a 68% probability the true location is within this radius
|`[TIMEZONE]` | Timezone where the location data was collected. By default points to the one defined in the [Configuration](../../setup/configuration#timezone-of-your-study)
|`[MINUTES_DATA_USED]` | Set to `True` to include an extra column in the final location feature file containing the number of minutes used to compute the features on each time segment. Use this for quality control purposes, the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough.
|`[IF_MULTIPLE_TIMEZONES]` | Currently, `USE_MOST_COMMON` is the only value supported. If the location data for a participant belongs to multiple time zones, we select the most common because Barnett's algorithm can only handle one time zone
|`[MINUTES_DATA_USED]` | Set to `True` to include an extra column in the final location feature file containing the number of minutes used to compute the features on each time segment. Use this for quality control purposes; the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough.
@ -50,9 +63,9 @@ Features description for `[PHONE_LOCATIONS][PROVIDERS][BARNETT]` adapted from [B
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|hometime |minutes | Time at home. Time spent at home in minutes. Home is the most visited significant location between 8 pm and 8 am including any pauses within a 200-meter radius.
|disttravelled |meters | Total distance travelled over a day (flights).
|rog |meters | The Radius of Gyration (rog) is a measure in meters of the area covered by a person over a day. A centroid is calculated for all the places (pauses) visited during a day and a weighted distance between all the places and that centroid is computed. The weights are proportional to the time spent in each place.
|hometime |minutes | Time at home. Time spent at home in minutes. Home is the most visited significant location between 8 pm and 8 am, including any pauses within a 200-meter radius.
|disttravelled |meters | Total distance traveled over a day (flights).
|rog |meters | The Radius of Gyration (rog) is a measure in meters of the area covered by a person over a day. A centroid is calculated for all the places (pauses) visited during a day, and a weighted distance between all the places and that centroid is computed. The weights are proportional to the time spent in each place.
|maxdiam |meters | The maximum diameter is the largest distance between any two pauses.
|maxhomedist |meters | The maximum distance from home in meters.
|siglocsvisited |locations | The number of significant locations visited during the day. Significant locations are computed using k-means clustering over pauses found in the whole monitoring period. The number of clusters is found iterating k from 1 to 200 stopping until the centroids of two significant locations are within 400 meters of one another.
@ -61,16 +74,26 @@ Features description for `[PHONE_LOCATIONS][PROVIDERS][BARNETT]` adapted from [B
|avgflightdur |seconds | Mean duration of all flights.
|stdflightdur |seconds | The standard deviation of the duration of all flights.
|probpause | - | The fraction of a day spent in a pause (as opposed to a flight)
|siglocentropy |nats | Shannons entropy measurement based on the proportion of time spent at each significant location visited during a day.
|circdnrtn | - | A continuous metric quantifying a persons circadian routine that can take any value between 0 and 1, where 0 represents a daily routine completely different from any other sensed days and 1 a routine the same as every other sensed day.
|siglocentropy |nats | Shannon's entropy measurement is based on the proportion of time spent at each significant location visited during a day.
|circdnrtn | - | A continuous metric quantifying a person's circadian routine that can take any value between 0 and 1, where 0 represents a daily routine completely different from any other sensed days and 1 a routine the same as every other sensed day.
|wkenddayrtn | - | Same as circdnrtn but computed separately for weekends and weekdays.
!!! note "Assumptions/Observations"
**Barnett\'s et al features**
These features are based on a Pause-Flight model. A pause is defined as a mobiity trace (location pings) within a certain duration and distance (by default 300 seconds and 60 meters). A flight is any mobility trace between two pauses. Data is resampled and imputed before the features are computed. See [Barnett et al](../../citation#barnett-locations) for more information. In RAPIDS we only expose two parameters for these features (timezone and accuracy limit). You can change other parameters in `src/features/phone_locations/barnett/library/MobilityFeatures.R`.
**Multi day segment features**
Barnett's features are only available on time segments that span entire days (00:00:00 to 23:59:59). Such segments can be one-day long (daily) or multi-day (weekly, for example). Multi-day segment features are computed based on daily features summarized the following way:
- sum for `hometime`, `disttravelled`, `siglocsvisited`, and `minutes_data_used`
- max for `maxdiam`, and `maxhomedist`
- mean for `rog`, `avgflightlen`, `stdflightlen`, `avgflightdur`, `stdflightdur`, `probpause`, `siglocentropy`, `circdnrtn`, `wkenddayrtn`, and `minsmissing`
**Computation speed**
The process to extract these features can be slow compared to other sensors and providers due to the required simulation.
**How are these features computed?**
These features are based on a Pause-Flight model. A pause is defined as a mobility trace (location pings) within a certain duration and distance (by default, 300 seconds and 60 meters). A flight is any mobility trace between two pauses. Data is resampled and imputed before the features are computed. See [Barnett et al](../../citation#barnett-locations) for more information. In RAPIDS, we only expose one parameter for these features (accuracy limit). You can change other parameters in `src/features/phone_locations/barnett/library/MobilityFeatures.R`.
**Significant Locations**
Significant locations are determined using K-means clustering on pauses longer than 10 minutes. The number of clusters (K) is increased until no two clusters are within 400 meters from each other. After this, pauses within a certain range of a cluster (200 meters by default) will count as a visit to that significant location. This description was adapted from the Supplementary Materials of [Barnett et al](../../citation#barnett-locations).
Significant locations are determined using K-means clustering on pauses longer than 10 minutes. The number of clusters (K) is increased until no two clusters are within 400 meters from each other. After this, pauses within a certain range of a cluster (200 meters by default) count as a visit to that significant location. This description was adapted from the Supplementary Materials of [Barnett et al](../../citation#barnett-locations).
**The Circadian Calculation**
For a detailed description of how this is calculated, see [Canzian et al](../../citation#barnett-locations).
@ -89,54 +112,90 @@ These features are based on the original implementation by [Doryab et al.](../..
- data/raw/{pid}/phone_locations_raw.csv
- data/interim/{pid}/phone_locations_processed.csv
- data/interim/{pid}/phone_locations_processed_with_datetime.csv
- data/interim/{pid}/phone_locations_processed_with_datetime_with_doryab_columns_episodes.csv
- data/interim/{pid}/phone_locations_processed_with_datetime_with_doryab_columns_episodes_resampled.csv
- data/interim/{pid}/phone_locations_processed_with_datetime_with_doryab_columns_episodes_resampled_with_datetime.csv
- data/interim/{pid}/phone_locations_features/phone_locations_{language}_{provider_key}.csv
- data/processed/features/{pid}/phone_locations.csv
```
Parameters description for `[PHONE_LOCATIONS][PROVIDERS][BARNETT]`:
Parameters description for `[PHONE_LOCATIONS][PROVIDERS][DORYAB]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[COMPUTE]`| Set to `True` to extract `PHONE_LOCATIONS` features from the `BARNETT` provider|
|`[COMPUTE]`| Set to `True` to extract `PHONE_LOCATIONS` features from the `DORYAB` provider|
|`[FEATURES]` | Features to be computed, see table below
| `[DBSCAN_EPS]` | The maximum distance in meters between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function.
| `[DBSCAN_MINSAMPLES]` | The number of samples (or total weight) in a neighborhood for a point to be considered as a core point of a cluster. This includes the point itself.
| `[THRESHOLD_STATIC]` | It is the threshold value in km/hr which labels a row as Static or Moving.
| `[MAXIMUM_GAP_ALLOWED]` | The maximum gap (in seconds) allowed between any two consecutive rows for them to be considered part of the same displacement. If this threshold is too high, it can throw speed and distance calculations off for periods when the the phone was not sensing.
| `[MINUTES_DATA_USED]` | Set to `True` to include an extra column in the final location feature file containing the number of minutes used to compute the features on each time segment. Use this for quality control purposes, the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough.
| `[SAMPLING_FREQUENCY]` | Expected time difference between any two location rows in minutes. If set to `0`, the sampling frequency will be inferred automatically as the median of all the differences between any two consecutive row timestamps (recommended if you are using `FUSED_RESAMPLED` data). This parameter impacts all the time calculations.
| `[MAXIMUM_ROW_GAP]` | The maximum gap (in seconds) allowed between any two consecutive rows for them to be considered part of the same displacement. If this threshold is too high, it can throw speed and distance calculations off for periods when the phone was not sensing. This value must be larger than your GPS sampling interval when `[LOCATIONS_TO_USE]` is `ALL` or `GPS`, otherwise all the stationary-related features will be NA. If `[LOCATIONS_TO_USE]` is `ALL_RESAMPLED` or `FUSED_RESAMPLED`, you can use the default value as every row will be resampled at 1-minute intervals.
| `[MINUTES_DATA_USED]` | Set to `True` to include an extra column in the final location feature file containing the number of minutes used to compute the features on each time segment. Use this for quality control purposes; the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough.
| `[CLUSTER_ON]` | Set this flag to `PARTICIPANT_DATASET` to create clusters based on the entire participant's dataset or to `TIME_SEGMENT` to create clusters based on all the instances of the corresponding time segment (e.g. all mornings) or to `TIME_SEGMENT_INSTANCE` to create clusters based on a single instance (e.g. 2020-05-20's morning).
|`[INFER_HOME_LOCATION_STRATEGY]` | The strategy applied to infer home locations. Set to `DORYAB_STRATEGY` to infer one home location for the entire dataset of each participant or to `SUN_LI_VEGA_STRATEGY` to infer one home location per day per participant. See Observations below to know more.
|`[MINIMUM_DAYS_TO_DETECT_HOME_CHANGES]` | The minimum number of consecutive days a new home location candidate has to repeat before it is considered the participant's new home. This parameter will be used only when `[INFER_HOME_LOCATION_STRATEGY]` is set to `SUN_LI_VEGA_STRATEGY`.
| `[CLUSTERING_ALGORITHM]` | The original Doryab et al. implementation uses `DBSCAN`, `OPTICS` is also available with similar (but not identical) clustering results and lower memory consumption.
| `[RADIUS_FOR_HOME]` | All location coordinates within this distance (meters) from the home location coordinates are considered a homestay (see `timeathome` feature).
Features description for `[PHONE_LOCATIONS][PROVIDERS][BARNETT]`:
Features description for `[PHONE_LOCATIONS][PROVIDERS][DORYAB]`:
|Feature |Units |Description|
|-------------------------- |---------- |---------------------------|
|locationvariance |$meters^2$ |The sum of the variances of the latitude and longitude columns.
|loglocationvariance | - | Log of the sum of the variances of the latitude and longitude columns.
|totaldistance |meters |Total distance travelled in a time segment using the haversine formula.
|averagespeed |km/hr |Average speed in a time segment considering only the instances labeled as Moving.
|varspeed |km/hr |Speed variance in a time segment considering only the instances labeled as Moving.
|circadianmovement |- | \"It encodes the extent to which a person's location patterns follow a 24-hour circadian cycle.\" [Doryab et al.](../../citation#doryab-locations).
|numberofsignificantplaces |places |Number of significant locations visited. It is calculated using the DBSCAN clustering algorithm which takes in EPS and MIN_SAMPLES as parameters to identify clusters. Each cluster is a significant place.
|totaldistance |meters |Total distance traveled in a time segment using the haversine formula.
|avgspeed |km/hr |Average speed in a time segment considering only the instances labeled as Moving. This feature is 0 when the participant is stationary during a time segment.
|varspeed |km/hr |Speed variance in a time segment considering only the instances labeled as Moving. This feature is 0 when the participant is stationary during a time segment.
|{--circadianmovement--} |- | Deprecated, see Observations below. \ "It encodes the extent to which a person's location patterns follow a 24-hour circadian cycle.\" [Doryab et al.](../../citation#doryab-locations).
|numberofsignificantplaces |places |Number of significant locations visited. It is calculated using the DBSCAN/OPTICS clustering algorithm which takes in EPS and MIN_SAMPLES as parameters to identify clusters. Each cluster is a significant place.
|numberlocationtransitions |transitions |Number of movements between any two clusters in a time segment.
|radiusgyration |meters |Quantifies the area covered by a participant
|timeattop1location |minutes |Time spent at the most significant location.
|timeattop2location |minutes |Time spent at the 2nd most significant location.
|timeattop3location |minutes |Time spent at the 3rd most significant location.
|movingtostaticratio | - | Ratio between the number of rows labeled Moving versus Static
|outlierstimepercent | - | Ratio between the number of rows that belong to non-significant clusters divided by the total number of rows in a time segment.
|movingtostaticratio | - | Ratio between stationary time and total location sensed time. A lat/long coordinate pair is labeled as stationary if its speed (distance/time) to the next coordinate pair is less than 1km/hr. A higher value represents a more stationary routine.
|outlierstimepercent | - | Ratio between the time spent in non-significant clusters divided by the time spent in all clusters (stationary time. Only stationary samples are clustered). A higher value represents more time spent in non-significant clusters.
|maxlengthstayatclusters |minutes |Maximum time spent in a cluster (significant location).
|minlengthstayatclusters |minutes |Minimum time spent in a cluster (significant location).
|meanlengthstayatclusters |minutes |Average time spent in a cluster (significant location).
|avglengthstayatclusters |minutes |Average time spent in a cluster (significant location).
|stdlengthstayatclusters |minutes |Standard deviation of time spent in a cluster (significant location).
|locationentropy |nats |Shannon Entropy computed over the row count of each cluster (significant location), it will be higher the more rows belong to a cluster (i.e. the more time a participant spent at a significant location).
|normalizedlocationentropy |nats |Shannon Entropy computed over the row count of each cluster (significant location) divided by the number of clusters, it will be higher the more rows belong to a cluster (i.e. the more time a participant spent at a significant location).
|locationentropy |nats |Shannon Entropy computed over the row count of each cluster (significant location), it is higher the more rows belong to a cluster (i.e., the more time a participant spent at a significant location).
|normalizedlocationentropy |nats |Shannon Entropy computed over the row count of each cluster (significant location) divided by the number of clusters; it is higher the more rows belong to a cluster (i.e., the more time a participant spent at a significant location).
|timeathome |minutes | Time spent at home (see Observations below for a description on how we compute home).
|homelabel |- | An integer that represents a different home location. It will be a constant number (1) for all participants when `[INFER_HOME_LOCATION_STRATEGY]` is set to `DORYAB_STRATEGY` or an incremental index if the strategy is set to `SUN_LI_VEGA_STRATEGY`.
!!! note "Assumptions/Observations"
**Significant Locations Identified**
Significant locations are determined using DBSCAN clustering on locations that a patient visit over the course of the period of data collection.
Significant locations are determined using `DBSCAN` or `OPTICS` clustering on locations that a participant visited over the course of the period of data collection. The most significant location is the place where the participant stayed for the longest time.
**The Circadian Calculation**
For a detailed description of how this is calculated, see [Canzian et al](../../citation#doryab-locations).
**Circadian Movement Calculation**
Note Feb 3 2021. It seems the implementation of this feature is not correct; we suggest not to use this feature until a fix is in place. For a detailed description of how this should be calculated, see [Saeb et al](https://pubmed.ncbi.nlm.nih.gov/28344895/).
**Fine-Tuning Clustering Parameters**
Based on an experiment where we collected fused location data for 7 days with a mean accuracy of 86 & SD of 350.874635, we determined that `EPS/MAX_EPS`=100 produced closer clustering results to reality. Higher values (>100) missed out on some significant places, like a short grocery visit, while lower values (<100) picked up traffic lights and stop signs while driving as significant locations. We recommend you set `EPS` based on your location data's accuracy (the more accurate your data is, the lower you should be able to set EPS).
**Duration Calculation**
To calculate the time duration component for our features, we compute the difference between consecutive rows' timestamps to take into account sampling rate variability. If this time difference is larger than a threshold (300 seconds by default), we replace it with NA and label that row as Moving.
**Home location**
- `DORYAB_STRATEGY`: home is calculated using all location data of a participant between 12 am and 6 am, then applying a clustering algorithm (`DBSCAN` or `OPTICS`) and considering the center of the biggest cluster home for that participant.
- `SUN_LI_VEGA_STRATEGY`: home is calculated using all location data of a participant between 12 am and 6 am, then applying a clustering algorithm (`DBSCAN` or `OPTICS`). The following steps are used to infer the home location per day for that participant:
1. if there are records within [03:30:00, 04:30:00] for that night:<br>
&nbsp;&nbsp;&nbsp;&nbsp;we choose the most common cluster during that period as a home candidate for that day.<br>
elif there are records within [midnight, 03:30:00) for that night:<br>
&nbsp;&nbsp;&nbsp;&nbsp;we choose the last valid cluster during that period as a home candidate for that day.<br>
elif there are records within (04:30:00, 06:00:00] for that night:<br>
&nbsp;&nbsp;&nbsp;&nbsp;we choose the first valid cluster during that period as a home candidate for that day.<br>
else:<br>
&nbsp;&nbsp;&nbsp;&nbsp;the home location is NA (missing) for that day.
2. If the count of consecutive days with the same candidate home location cluster label is larger or equal to `[MINIMUM_DAYS_TO_DETECT_HOME_CHANGES]`,
the candidate will be regarded as the home cluster; otherwise, the home cluster will be the last valid day's cluster.
If there are no valid clusters before that day, the first home location in the days after is used.
**Clustering algorithms**
[`DBSCAN`](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) and [`OPTICS`](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.OPTICS.html#r2c55e37003fe-1) algorithms are available currently. Duplicated locations are discarded while clustering. The `DBSCAN` algorithm takes the time spent at each location into consideration. However, the `OPTICS` algorithm ignores it as it is not supported in the current [scikit-learn](https://github.com/scikit-learn/scikit-learn/issues/12394) implementation.

View File

@ -0,0 +1,11 @@
# Phone Log
Sensor parameters description for `[PHONE_LOG]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[CONTAINER][ANDROID]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where a data log is stored for Android devices
|`[CONTAINER][IOS]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where a data log is stored for iOS devices
!!! note
No feature providers have been implemented for this sensor yet, however you can use its key (`PHONE_LOG`) to improve [`PHONE_DATA_YIELD`](../phone-data-yield) or you can [implement your own features](../add-new-features).

View File

@ -4,7 +4,7 @@ Sensor parameters description for `[PHONE_MESSAGES]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the messages data is stored
|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the messages data is stored
## RAPIDS provider

View File

@ -4,7 +4,7 @@ Sensor parameters description for `[PHONE_SCREEN]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the screen data is stored
|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the screen data is stored
## RAPIDS provider
@ -16,7 +16,6 @@ Sensor parameters description for `[PHONE_SCREEN]`:
```bash
- data/raw/{pid}/phone_screen_raw.csv
- data/raw/{pid}/phone_screen_with_datetime.csv
- data/raw/{pid}/phone_screen_with_datetime_unified.csv
- data/interim/{pid}/phone_screen_episodes.csv
- data/interim/{pid}/phone_screen_episodes_resampled.csv
- data/interim/{pid}/phone_screen_episodes_resampled_with_datetime.csv
@ -33,7 +32,7 @@ Parameters description for `[PHONE_SCREEN][PROVIDERS][RAPIDS]`:
|`[FEATURES]` | Features to be computed, see table below
|`[REFERENCE_HOUR_FIRST_USE]` | The reference point from which `firstuseafter` is to be computed, default is midnight
|`[IGNORE_EPISODES_SHORTER_THAN]` | Ignore episodes that are shorter than this threshold (minutes). Set to 0 to disable this filter.
|`[IGNORE_EPISODES_LONGER_THAN]` | Ignore episodes that are longer than this threshold (minutes). Set to 0 to disable this filter.
|`[IGNORE_EPISODES_LONGER_THAN]` | Ignore episodes that are longer than this threshold (minutes), default is 6 hours. Set to 0 to disable this filter.
|`[EPISODE_TYPES]` | Currently we only support `unlock` episodes (from when the phone is unlocked until the screen is off)
@ -47,9 +46,10 @@ Features description for `[PHONE_SCREEN][PROVIDERS][RAPIDS]`:
|avgduration |minutes |Average duration of all unlock episodes.
|stdduration |minutes |Standard deviation duration of all unlock episodes.
|countepisode |episodes |Number of all unlock episodes
<!-- |episodepersensedminutes |episodes/minute |The ratio between the total number of episodes in an epoch divided by the total time (minutes) the phone was sensing data. -->
|firstuseafter |minutes |Minutes until the first unlock episode.
<!-- |episodepersensedminutes |episodes/minute |The ratio between the total number of episodes in an epoch divided by the total time (minutes) the phone was sensing data. -->
!!! note "Assumptions/Observations"
1. In Android, `lock` events can happen right after an `off` event, after a few seconds of an `off` event, or never happen depending on the phone\'s settings, therefore, an `unlock` episode is defined as the time between an `unlock` and a `off` event. In iOS, `on` and `off` events do not exist, so an `unlock` episode is defined as the time between an `unlock` and a `lock` event.

View File

@ -4,7 +4,7 @@ Sensor parameters description for `[PHONE_WIFI_CONNECTED]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the wifi (connected) data is stored
|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the wifi (connected) data is stored
## RAPIDS provider

View File

@ -4,7 +4,7 @@ Sensor parameters description for `[PHONE_WIFI_VISIBLE]`:
|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the wifi (visible) data is stored
|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the wifi (visible) data is stored
## RAPIDS provider

View File

@ -1,20 +0,0 @@
# File Structure
!!! tip
- Read this page if you want to learn more about how RAPIDS is structured. If you want to start using it go to [Installation](../setup/installation/), then to [Configuration](../setup/configuration/), and then to [Execution](../setup/execution/)
- All paths mentioned in this page are relative to RAPIDS' root folder.
If you want to extract the behavioral features that RAPIDS offers, you will only have to create or modify the [`.env` file](../setup/configuration/#database-credentials), [participants files](../setup/configuration/#participant-files), [time segment files](../setup/configuration/#time-segments), and the `config.yaml` file as instructed in the [Configuration page](../setup/configuration). The `config.yaml` file is the heart of RAPIDS and includes parameters to manage participants, data sources, sensor data, visualizations and more.
All data is saved in `data/`. The `data/external/` folder stores any data imported or created by the user, `data/raw/` stores sensor data as imported from your database, `data/interim/` has intermediate files necessary to compute behavioral features from raw data, and `data/processed/` has all the final files with the behavioral features in folders per participant and sensor.
RAPIDS source code is saved in `src/`. The `src/data/` folder stores scripts to download, clean and pre-process sensor data, `src/features` has scripts to extract behavioral features organized in their respective sensor subfolders , `src/models/` can host any script to create models or statistical analyses with the behavioral features you extract, and `src/visualization/` has scripts to create plots of the raw and processed data. There are other files and folders but only relevant if you are interested in extending RAPIDS (e.g. virtual env files, docs, tests, Dockerfile, the Snakefile, etc.).
In the figure below, we represent the interactions between users and files. After a user modifies the configuration files mentioned above, the `Snakefile` file will search for and execute the Snakemake rules that contain the Python or R scripts necessary to generate or update the required output files (behavioral features, plots, etc.).
<figure>
<img src="../img/files.png" max-width="100%" />
<figcaption>Interaction diagram between the user, and important files in RAPIDS</figcaption>
</figure>

View File

@ -0,0 +1,9 @@
"_id","timestamp","device_id","call_type","call_duration","trace"
1,1587663260695,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",2,14,"d5e84f8af01b2728021d4f43f53a163c0c90000c"
2,1587739118007,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",3,0,"47c125dc7bd163b8612cdea13724a814917b6e93"
5,1587746544891,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",2,95,"9cc793ffd6e88b1d850ce540b5d7e000ef5650d4"
6,1587911379859,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",2,63,"51fb9344e988049a3fec774c7ca622358bf80264"
7,1587992647361,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",3,0,"2a862a7730cfdfaf103a9487afe3e02935fd6e02"
8,1588020039448,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",1,11,"a2c53f6a086d98622c06107780980cf1bb4e37bd"
11,1588176189024,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",2,65,"56589df8c830c70e330b644921ed38e08d8fd1f3"
12,1588197745079,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",3,0,"cab458018a8ed3b626515e794c70b6f415318adc"
1 _id timestamp device_id call_type call_duration trace
2 1 1587663260695 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2 14 d5e84f8af01b2728021d4f43f53a163c0c90000c
3 2 1587739118007 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 3 0 47c125dc7bd163b8612cdea13724a814917b6e93
4 5 1587746544891 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2 95 9cc793ffd6e88b1d850ce540b5d7e000ef5650d4
5 6 1587911379859 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2 63 51fb9344e988049a3fec774c7ca622358bf80264
6 7 1587992647361 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 3 0 2a862a7730cfdfaf103a9487afe3e02935fd6e02
7 8 1588020039448 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 1 11 a2c53f6a086d98622c06107780980cf1bb4e37bd
8 11 1588176189024 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 2 65 56589df8c830c70e330b644921ed38e08d8fd1f3
9 12 1588197745079 a748ee1a-1d0b-4ae9-9074-279a2b6ba524 3 0 cab458018a8ed3b626515e794c70b6f415318adc

Binary file not shown.

After

Width:  |  Height:  |  Size: 337 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 314 KiB

After

Width:  |  Height:  |  Size: 198 KiB

File diff suppressed because one or more lines are too long

Binary file not shown.

Before

Width:  |  Height:  |  Size: 37 KiB

After

Width:  |  Height:  |  Size: 110 KiB

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 127 KiB

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 133 KiB

Some files were not shown because too many files have changed in this diff Show More