Compare commits

..

49 Commits

Author SHA1 Message Date
junos 63f5a526fc Bring back requested fields in config.yaml.
Update coding files based on 7e565c34db98265afcda922a337493781fdd8ed5 in supermodule.
2023-04-19 11:07:58 +02:00
junos 1cc7339fc8 Completely remove PACKAGE_NAMES_HASHED and instead provide a differently structured file. 2023-04-18 22:58:42 +02:00
junos 5307c71df0 Missing comma. 2023-04-18 22:45:12 +02:00
junos f261286542 Add package_names_hashed param for rule phone_application_categories. 2023-04-18 22:40:11 +02:00
junos a6bc0a90d1 Do not ignore application categories. 2023-04-18 21:34:59 +02:00
junos f161da41f4 Merge branch 'master' into runner 2023-04-18 21:23:26 +02:00
junos 8ffd934fd3 Categorize applications in config.yaml. 2023-04-18 20:39:57 +02:00
junos cf6af7c9a4 Add a TODO. 2023-04-18 16:11:30 +02:00
junos 4dacb7129d Change targets for 30 before.
Further increase resources for acc.
2023-04-18 10:47:54 +02:00
junos f542a97eab Change targets for 90 before. 2023-04-15 16:29:06 +02:00
junos 5cb2dcfb00 Run 90 before event. 2023-04-15 16:18:55 +02:00
junos 8cef60ba87 Limit memory usage by readable_datetime.
Especially important for accelerometer data.
2023-04-14 16:01:44 +02:00
junos 0d634f3622 Remove deprecated numpy dtype. 2023-04-14 13:43:20 +02:00
junos 00e4f8deae More numeric_only arguments.
See 1d903f3629 for explanation.
2023-04-13 13:04:53 +02:00
junos 03687a1ac2 Fix deprecated attribute. 2023-04-12 18:21:43 +02:00
junos a36da99ccb Catch another possible exception. 2023-04-12 16:37:25 +02:00
junos 1d903f3629 Specify numeric_only for pandas.core.groupby.DataFrameGroupBy.mean.
This parameter used to be None by default, but this usage is deprecated since pandas 2.0.
See [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.mean.html):

> Changed in version 2.0.0: numeric_only no longer accepts None and defaults to False.
2023-04-12 16:05:58 +02:00
junos d678be0641 Extract definition of function. 2023-04-12 16:00:13 +02:00
junos 27b90421bf Add a missing pip dependency. 2023-04-11 19:06:29 +02:00
junos cb006ed0cf Completely overhaul environment.yml. 2023-04-11 19:04:28 +02:00
junos 9ca58ed204 Fix Python libraries. 2023-04-11 17:21:03 +02:00
junos 982fa982f7 Remove Python libraries versions. 2023-04-11 17:16:42 +02:00
junos f8088172e9 Update more R packages. 2023-04-11 15:31:44 +02:00
junos 801fbe1c10 Update R packages. 2023-04-11 15:26:21 +02:00
Primoz 8721b944ca Revert "Set config to testing mode."
This reverts commit e825aa7c89.
2023-02-16 09:56:23 +00:00
Primoz 36651a11c8 Drop all window count related features in cleaning script. 2023-02-15 14:15:56 +00:00
Primoz 8ae5ad0e88 Add missing rapids_columns for speech sensor. 2023-02-15 13:42:44 +00:00
Primoz e825aa7c89 Set config to testing mode. 2023-02-15 13:36:00 +00:00
Primoz 5958948af2 Merge branch 'speech_sensor' 2023-02-15 13:30:43 +00:00
Primoz 7e37eb9067 Change SPEECH sensor place in config 2023-01-23 15:32:52 +00:00
Primoz 4d0497a5e0 Set appropriate calculations for speech senzor. 2023-01-17 14:00:42 +00:00
Primoz 75b054d358 Integrate phone_speech into rapids pipeline. 2023-01-17 14:00:14 +00:00
Primoz e27ec0269f Revert "Add Speech sensor - preparation."
This reverts commit 74fd4dfbd7.
2023-01-17 12:06:53 +00:00
Primoz 9b45188a61 Revert "PHONE SPEECH - continuation ..."
This reverts commit 3e6b34babc.
2023-01-17 12:06:44 +00:00
Primoz 3e6b34babc PHONE SPEECH - continuation ... 2023-01-11 13:05:42 +00:00
Primoz 74fd4dfbd7 Add Speech sensor - preparation. 2023-01-11 12:48:38 +00:00
Primoz 7b8538ce51 Fix a bug and remove sys.exit line from cleaning script. 2022-12-21 10:40:07 +00:00
Primoz 41a17d35f1 Update ERS stress_event logic. 2022-12-19 15:40:40 +00:00
Primoz 7f5a4e6744 Make stress events to be equal in duration. 2022-12-14 14:52:20 +00:00
Primoz 3ce7f2c2a5 Seperate target standardization from rest of the features. 2022-12-13 15:31:39 +00:00
Primoz e40f0fd8dc Bug fixed (mixed dtype warning). 2022-12-12 11:29:59 +00:00
Primoz 8af3bdf768 Reset PIDS 2022-12-09 16:07:13 +00:00
Primoz 01931b8873 Update README 2022-12-09 16:04:11 +00:00
Primoz 569854ddf5 Merge branch 'master' of https://repo.ijs.si/junoslukan/rapids 2022-12-09 16:01:52 +00:00
Primoz 3b2001f570 Modify the stress_event logic so that it includes where stressfulness is 0. 2022-12-09 16:01:46 +00:00
junos 44a87c53eb Clarify runtime vs installation export of TZ. 2022-12-09 15:34:06 +01:00
junos 8da7bd71b2 Merge branch 'master' of https://repo.ijs.si/junoslukan/rapids 2022-12-09 15:26:23 +01:00
junos 788a81d96f Update readme with info from supermodule. 2022-12-09 15:26:12 +01:00
Primoz 87e5209a9f Squashed commit of the following:
commit 8a6b52a97c
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 29 11:35:49 2022 +0000

    Switch to 30_before ERS with corresponding targets.

commit 244a053730
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 29 11:19:43 2022 +0000

    Change output files settings to nonstandardized.

commit be0324fd01
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Nov 28 12:44:25 2022 +0000

    Fix some bugs and set categorical columns as categories dtypes.

commit 99c2fab8f9
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Nov 16 09:50:18 2022 +0000

    Fix a bug in the making of the individual model (when there is no target in the participants columns).

commit 286de93bfd
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 15 11:21:51 2022 +0000

    Fix some bugs and extend ERS and cleaning scripts with multiple stress event targets logic.

commit ab803ee49c
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 15 10:14:07 2022 +0000

    Add additional appraisal targets.

commit 621f11b2d9
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 15 09:53:31 2022 +0000

    Fix a bug related to wrong user input (duplicated events).

commit bd41f42a5d
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Nov 14 15:07:36 2022 +0000

    Rename target_ to segmenting_ method.

commit a543ce372f
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Nov 14 15:04:16 2022 +0000

    Add comments for event_related_script understanding.

commit 74b454b07b
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Nov 11 09:15:12 2022 +0000

    Apply changes to string answers to make them language-generic.

commit 6ebe83e47e
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Nov 10 12:42:52 2022 +0000

    Improve the ERS extract method with a couple of validations.

commit 00350ef8ca
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Nov 10 10:32:58 2022 +0000

    Change config for stressfulness event target method.

commit e4985c9121
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Nov 10 10:29:11 2022 +0000

    Override stressfulness event target with extracted values from csv.

commit a668b6e8da
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Nov 10 09:37:27 2022 +0000

    Extract ERS and stress event targets to csv files (completed).

commit 9199b53ded
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Nov 9 15:11:51 2022 +0000

    Get, join and start processing required ERS stress event data.

commit f3c6a66da9
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 8 15:53:43 2022 +0000

    Begin with stress events in the ERS script.

commit 0b3e9226b3
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 8 14:44:24 2022 +0000

    Make small corrections in ERS file.

commit 2d83f7ddec
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 8 11:32:05 2022 +0000

    Begin the ERS logic for 90-minutes events.

commit 1da72a7cbe
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Nov 8 09:45:37 2022 +0000

    Rename targets method in config.

commit 9f441afc16
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Nov 4 15:09:04 2022 +0000

    Begin ERS logic for 90-minutes events.

commit c1c9f4d05a
Merge: 62f46ea3 7ab0280d
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Nov 4 09:11:58 2022 +0000

    Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning

commit 62f46ea376
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Nov 4 09:11:53 2022 +0000

    Prepare method-based logic for ERS generating.

commit 7ab0280d7e
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Nov 4 08:58:08 2022 +0000

    Correctly rename stressful event target variable.

commit eefa9f3f4d
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Nov 3 14:49:54 2022 +0000

    Add new target: stressfulness_event.

commit 5e8174dd41
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Nov 3 13:52:45 2022 +0000

    Add new target: stressfulness_period.

commit 35c1a762e7
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Nov 3 13:51:18 2022 +0000

    Improve filtering by esm_session and device_id.

commit 02264b21fd
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Nov 3 09:30:12 2022 +0000

    Add logic for target selection in ERS processing.

commit 0ce8723bdb
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Nov 2 14:01:21 2022 +0000

    Extend imputation logic within the cleaning script.

commit 30b38bfc02
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Oct 28 09:00:13 2022 +0000

    Fix the generating procedure of ERS file for participants with multiple devices.

commit cd137af15a
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Oct 27 14:20:15 2022 +0000

    Config for 30 minute EMA segments.

commit 3c0585a566
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Oct 27 14:12:56 2022 +0000

    Remove obsolete comments.

commit 6b487fcf7b
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Oct 27 14:11:42 2022 +0000

    Set E4 data yield to 1 if it is over 1. Optimize E4 data_yield script.

commit 5d17c92e54
Merge: a31fdd14 0d143e6a
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 26 14:18:20 2022 +0000

    Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning

commit a31fdd1479
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 26 14:18:08 2022 +0000

    Start to test empatica_data_yield precieved error.

commit 936324d234
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 26 14:17:27 2022 +0000

    Switch config for 30 minutes event related segments.

commit da0a4596f8
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 26 14:16:25 2022 +0000

    Add additional ESM processing logic for ERS csv extraction.

commit d4d74818e6
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 26 14:14:32 2022 +0000

    Fix a bug - missing time_segment column when df is empty

commit 14ff59914b
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 26 09:59:46 2022 +0000

    Fix to correct dtypes.

commit 6ab0ac5329
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 26 09:57:26 2022 +0000

    Optimize memory consumption with dtype definition while reading csv file.

commit 0d143e6aad
Merge: 8acac501 b92a3aa3
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Oct 25 15:28:27 2022 +0000

    Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning

commit 8acac50125
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Oct 25 15:26:43 2022 +0000

    Add safenet when features dataframe is empty.

commit b92a3aa37a
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Oct 25 15:25:22 2022 +0000

    Remove unwanted output or other error producing code.

commit bfd637eb9c
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Oct 25 08:53:44 2022 +0000

    Improve strings formatting in straw_events file.

commit 0d81ad5756
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 19 13:35:04 2022 +0000

    Debug assignment of segments to rows

commit cea451d344
Merge: e88bbd54 cf38d9f1
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Oct 18 09:15:06 2022 +0000

    Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning

commit e88bbd548f
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Oct 18 09:15:00 2022 +0000

    Add new daily segment and filter by segment in the cleaning script.

commit cf38d9f175
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Oct 17 15:07:33 2022 +0000

    Implement ERS generating logic.

commit f3ca56cdbf
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Oct 14 14:46:28 2022 +0000

    Start with ERS logic integration within Snakemake.

commit 797aa98f4f
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 12 15:51:50 2022 +0000

    Config for ERS testing.

commit 9baff159cd
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 12 15:51:23 2022 +0000

    Changes needed for testing and starting of the Event-Related Segments.

commit 0f21273508
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 12 12:32:51 2022 +0000

    Bugs fix

commit 55517eb737
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 12 12:23:11 2022 +0000

    Necessary commit before proceeding.

commit de15a52dba
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Oct 11 08:36:23 2022 +0000

    Bug fix

commit 1ad25bb572
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Oct 11 08:26:17 2022 +0000

    Few modifications of some imputation values in cleaning script and feature extraction.

commit 9884b383cf
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Oct 10 16:45:38 2022 +0000

    Testing new data with AutoML.

commit 2dc89c083c
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Oct 7 08:52:12 2022 +0000

    Small changes in cleaning overall

commit 001d400729
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Oct 6 14:28:12 2022 +0000

    Clean features and create input files based on all possible targets.

commit 1e38d9bf1e
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Oct 6 13:27:38 2022 +0000

    Standardization and correlation visualization in overall cleaning script.

commit a34412a18d
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 5 14:16:55 2022 +0000

    E4 data yield corrections. Changes in overal cs - standardization.

commit 437459648f
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Oct 5 13:35:05 2022 +0000

    Errors fix: individual script - treat participants missing data.

commit 53f6cc60d5
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Oct 3 13:06:39 2022 +0000

    Config and cleaning script necessary changes ...

commit bbeabeee6f
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Oct 3 12:53:31 2022 +0000

    Last changes before processing on the server.

commit 44531c6d94
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Sep 30 10:04:07 2022 +0000

    Code cleaning, reworking cleaning individual based on changes in overall script. Changes in thresholds.

commit 7ac7cd5a37
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 29 14:33:21 2022 +0000

    Preparation of the overall cleaning script.

commit 68fd69dada
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 29 11:55:25 2022 +0000

    Cleaning script for individuals: corrections and comments.

commit a4f0d056a0
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 29 11:44:27 2022 +0000

    Fillna for app foreground and activity recognition

commit 6286e7a44c
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Sep 28 12:47:08 2022 +0000

    firstuseafter column removed from contextual imputation

commit 9b3447febd
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Sep 28 12:40:05 2022 +0000

    Contextual imputation correction

commit d6adda30cf
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Sep 28 12:37:51 2022 +0000

    Contextual imputation on time(first/last) features.

commit 8af4ef11dc
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Sep 28 10:02:47 2022 +0000

    Contextual imputation by feature type.

commit 536b9494cd
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Sep 27 14:12:08 2022 +0000

    Cleaning script corrections

commit f0b87c9dd0
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Sep 27 09:54:15 2022 +0000

    Debugging of the empatica data yield integration.

commit 7fcdb873fe
Merge: 5c7bb0f4 bd53dc16
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Sep 27 07:50:29 2022 +0000

    Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning

commit 5c7bb0f4c1
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Sep 27 07:48:32 2022 +0000

    Config changes

commit bd53dc1684
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Sep 26 15:54:00 2022 +0000

    Empatica data yield usage in the cleaning script.

commit d9a574c550
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Sep 23 13:24:50 2022 +0000

    Changes in the cleaning script and preparation of empatica data yield method.

commit 19aa8707c0
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 22 13:45:51 2022 +0000

    Redefined cleaning steps after revision

commit 247d758cb7
Merge: 90ee99e4 7493aaa6
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Sep 21 07:18:01 2022 +0000

    Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning

commit 90ee99e4b9
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Sep 21 07:16:00 2022 +0000

    Remove TODO comments

commit 7493aaa643
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Sep 20 12:57:55 2022 +0000

    Small changes in cleaning scrtipt and missing vals testing.

commit eaf4340afd
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Sep 20 08:03:48 2022 +0000

    Small imputation and cleaning corrections.

commit a96ea508c6
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Sep 19 07:34:02 2022 +0000

    Fill NaN of Empatica's SD second order feature (must be tested).

commit 52e11cdcab
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Sep 19 07:25:54 2022 +0000

    Configurations for new standardization path.

commit 92aff93e65
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Sep 19 07:25:16 2022 +0000

    Remove standardization script.

commit 18b63127de
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Sep 19 06:16:26 2022 +0000

    Removed all standardizaton rules and configurations.

commit 62982866cd
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Sep 16 13:24:21 2022 +0000

    Phone wifi visible inspection (WIP)

commit 0ce6da5444
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Sep 16 11:30:08 2022 +0000

    kNN imputation relocation and execution only on specific columns.

commit e3b78c8a85
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Sep 16 10:58:57 2022 +0000

    Impute selected phone features with 0.
    Wifi visible, screen, and light.

commit 7d85f75d21
Author: Primoz <sisko.primoz@gmail.com>
Date:   Fri Sep 16 09:03:30 2022 +0000

    Changes in phone features NaN values script.

commit 385e21409d
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 15 14:16:58 2022 +0000

    Changes in NaN values testing script.

commit 18002f59e1
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 15 10:48:59 2022 +0000

    Doryab bluetooth and locations features fill in NaN values.

commit 3cf7ca41aa
Merge: d27a4a71 d5ab5a03
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Sep 14 15:38:32 2022 +0000

    Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning

commit d5ab5a0394
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Sep 14 14:13:03 2022 +0000

    Writing testing scripts to determine the point of manual imputation.

commit dfbb758902
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Sep 13 13:54:06 2022 +0000

    Changes in AutoML params and environment.yml

commit 4ec371ed96
Author: Primoz <sisko.primoz@gmail.com>
Date:   Tue Sep 13 09:51:03 2022 +0000

    Testing auto-sklearn

commit d27a4a71c8
Author: Primoz <sisko.primoz@gmail.com>
Date:   Mon Sep 12 13:44:17 2022 +0000

    Reorganisation and reordering of the cleaning script.

commit 15d792089d
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 1 10:33:36 2022 +0000

    Changes in cleaning script:
    - target extracted from config to remove rows where target is nan
    - prepared sns.heatmap for further missing values analysis
    - necessary changes in config and participant p01
    - picture of heatmap which shows the values state after cleaning

commit cb351e0ff6
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 1 10:06:57 2022 +0000

    Unnecessary line (rows with no target value will be removed in cleaning script).

commit 86299d346b
Author: Primoz <sisko.primoz@gmail.com>
Date:   Thu Sep 1 09:57:21 2022 +0000

    Impute phone and sms NAs with 0

commit 3f7ec80c18
Author: Primoz <sisko.primoz@gmail.com>
Date:   Wed Aug 31 10:18:50 2022 +0000

    Preparation a) phone_calls 0 imputation b) remove rows with NaN target
2022-12-08 16:04:39 +00:00
27 changed files with 1950 additions and 421 deletions

3
.gitignore vendored
View File

@ -100,6 +100,9 @@ data/external/*
!/data/external/wiki_tz.csv
!/data/external/main_study_usernames.csv
!/data/external/timezone.csv
!/data/external/play_store_application_genre_catalogue.csv
!/data/external/play_store_categories_count.csv
data/raw/*
!/data/raw/.gitkeep

BIN
NaN.png

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

128
README.md
View File

@ -16,7 +16,7 @@ By [MoSHI](https://www.moshi.pitt.edu/), [University of Pittsburgh](https://www.
For RAPIDS installation refer to to the [documentation](https://www.rapids.science/1.8/setup/installation/)
## For the installation of the Docker version
### For the installation of the Docker version
1. Follow the [instructions](https://www.rapids.science/1.8/setup/installation/) to setup RAPIDS via Docker (from scratch).
@ -46,7 +46,7 @@ Type R to go to the interactive R session and then:
```
6. Install cr-features module
From: https://repo.ijs.si/matjazbostic/calculatingfeatures.git -> branch modifications_for_rapids.
From: https://repo.ijs.si/matjazbostic/calculatingfeatures.git -> branch master.
Then follow the "cr-features module" section below.
7. Install all required packages from environment.yml, prune also deletes conda packages not present in environment file.
@ -62,7 +62,7 @@ Then follow the "cr-features module" section below.
conda env export --no-builds | sed 's/^.*libgfortran.*$/ - libgfortran/' | sed 's/^.*mkl=.*$/ - mkl/' > environment.yml
```
## cr-features module
### cr-features module
This RAPIDS extension uses cr-features library accessible [here](https://repo.ijs.si/matjazbostic/calculatingfeatures).
@ -78,4 +78,124 @@ To use cr-features library:
e.g. pip install ./calculatingfeatures if the folder is copied to main parent directory
cr-features package has to be built and installed everytime to get the newest version.
Or an the newest version of the docker image must be used.
```
```
## Updating RAPIDS
To update RAPIDS, first pull and merge [origin]( https://github.com/carissalow/rapids), such as with:
```commandline
git fetch --progress "origin" refs/heads/master
git merge --no-ff origin/master
```
Next, update the conda and R virtual environment.
```bash
R -e 'renv::restore(repos = c(CRAN = "https://packagemanager.rstudio.com/all/__linux__/focal/latest"))'
```
## Custom configuration
### Credentials
As mentioned under [Database in RAPIDS documentation](https://www.rapids.science/1.6/snippets/database/), a `credentials.yaml` file is needed to connect to a database.
It should contain:
```yaml
PSQL_STRAW:
database: staw
host: 212.235.208.113
password: password
port: 5432
user: staw_db
```
where`password` needs to be specified as well.
## Possible installation issues
### Missing dependencies for RPostgres
To install `RPostgres` R package (used to connect to the PostgreSQL database), an error might occur:
```text
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because libpq was not found. Try installing:
* deb: libpq-dev (Debian, Ubuntu, etc)
* rpm: postgresql-devel (Fedora, EPEL)
* rpm: postgreql8-devel, psstgresql92-devel, postgresql93-devel, or postgresql94-devel (Amazon Linux)
* csw: postgresql_dev (Solaris)
* brew: libpq (OSX)
If libpq is already installed, check that either:
(i) 'pkg-config' is in your PATH AND PKG_CONFIG_PATH contains a libpq.pc file; or
(ii) 'pg_config' is in your PATH.
If neither can detect , you can set INCLUDE_DIR
and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------[ ERROR MESSAGE ]----------------------------
<stdin>:1:10: fatal error: libpq-fe.h: No such file or directory
compilation terminated.
```
The library requires `libpq` for compiling from source, so install accordingly.
### Timezone environment variable for tidyverse (relevant for WSL2)
One of the R packages, `tidyverse` might need access to the `TZ` environment variable during the installation.
On Ubuntu 20.04 on WSL2 this triggers the following error:
```text
> install.packages('tidyverse')
ERROR: configuration failed for package xml2
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
Warning in system("timedatectl", intern = TRUE) :
running command 'timedatectl' had status 1
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
namespace xml2 1.3.1 is already loaded, but >= 1.3.2 is required
Calls: <Anonymous> ... namespaceImportFrom -> asNamespace -> loadNamespace
Execution halted
ERROR: lazy loading failed for package tidyverse
```
This happens because WSL2 does not use the `timedatectl` service, which provides this variable.
```bash
~$ timedatectl
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
```
and later
```bash
Warning message:
In system("timedatectl", intern = TRUE) :
running command 'timedatectl' had status 1
Execution halted
```
This can be amended by setting the environment variable manually before attempting to install `tidyverse`:
```bash
export TZ='Europe/Ljubljana'
```
Note: if this is needed to avoid runtime issues, you need to either define this environment variable in each new terminal window or (better) define it in your `~/.bashrc` or `~/.bash_profile`.
## Possible runtime issues
### Unix end of line characters
Upon running rapids, an error might occur:
```bash
/usr/bin/env: python3\r: No such file or directory
```
This is due to Windows style end of line characters.
To amend this, I added a `.gitattributes` files to force `git` to checkout `rapids` using Unix EOL characters.
If this still fails, `dos2unix` can be used to change them.
### System has not been booted with systemd as init system (PID 1)
See [the installation issue above](#Timezone-environment-variable-for-tidyverse-(relevant-for-WSL2)).

View File

@ -174,6 +174,15 @@ for provider in config["PHONE_ESM"]["PROVIDERS"].keys():
# files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv",pid=config["PIDS"]))
# files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["PHONE_SPEECH"]["PROVIDERS"].keys():
if config["PHONE_SPEECH"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_speech_raw.csv",pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_speech_with_datetime.csv",pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_speech_features/phone_speech_{language}_{provider_key}.csv",pid=config["PIDS"],language=get_script_language(config["PHONE_SPEECH"]["PROVIDERS"][provider]["SRC_SCRIPT"]),provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_speech.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
# We can delete these if's as soon as we add feature PROVIDERS to any of these sensors
if isinstance(config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"], dict):
for provider in config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"].keys():

View File

@ -3,7 +3,7 @@
########################################################################################################################
# See https://www.rapids.science/latest/setup/configuration/#participant-files
PIDS: ['p03'] #['p031', 'p032', 'p033', 'p034', 'p035', 'p036', 'p037', 'p038', 'p039', 'p040', 'p042', 'p043', 'p044', 'p045', 'p046', 'p049', 'p050', 'p052', 'p053', 'p054', 'p055', 'p057', 'p058', 'p059', 'p060', 'p061', 'p062', 'p064', 'p067', 'p068', 'p069', 'p070', 'p071', 'p072', 'p073', 'p074', 'p075', 'p076', 'p077', 'p078', 'p079', 'p080', 'p081', 'p082', 'p083', 'p084', 'p085', 'p086', 'p088', 'p089', 'p090', 'p091', 'p092', 'p093', 'p106', 'p107']
PIDS: ['p031', 'p032', 'p033', 'p034', 'p035', 'p036', 'p037', 'p038', 'p039', 'p040', 'p042', 'p043', 'p044', 'p045', 'p046', 'p049', 'p050', 'p052', 'p053', 'p054', 'p055', 'p057', 'p058', 'p059', 'p060', 'p061', 'p062', 'p064', 'p067', 'p068', 'p069', 'p070', 'p071', 'p072', 'p073', 'p074', 'p075', 'p076', 'p077', 'p078', 'p079', 'p080', 'p081', 'p082', 'p083', 'p084', 'p085', 'p086', 'p088', 'p089', 'p090', 'p091', 'p092', 'p093', 'p106', 'p107']
# See https://www.rapids.science/latest/setup/configuration/#automatic-creation-of-participant-files
CREATE_PARTICIPANT_FILES:
@ -26,7 +26,9 @@ TIME_SEGMENTS: &time_segments
INCLUDE_PAST_PERIODIC_SEGMENTS: TRUE # Only relevant if TYPE=PERIODIC, see docs
TAILORED_EVENTS: # Only relevant if TYPE=EVENT
COMPUTE: True
PARAMETER_ONE: "something"
SEGMENTING_METHOD: "30_before" # 30_before, 90_before, stress_event
INTERVAL_OF_INTEREST: 10 # duration of event of interest [minutes]
IOI_ERROR_TOLERANCE: 5 # interval of interest erorr tolerance (before and after IOI) [minutes]
# See https://www.rapids.science/latest/setup/configuration/#timezone-of-your-study
TIMEZONE:
@ -102,9 +104,9 @@ PHONE_APPLICATIONS_CRASHES:
CONTAINER: applications_crashes
APPLICATION_CATEGORIES:
CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
CATALOGUE_FILE: "data/external/stachl_application_genre_catalogue.csv"
UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether or not to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
SCRAPE_MISSING_CATEGORIES: False # whether or not to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
CATALOGUE_FILE: "data/external/play_store_application_genre_catalogue.csv"
UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
SCRAPE_MISSING_CATEGORIES: False # whether to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
PROVIDERS: # None implemented yet but this sensor can be used in PHONE_DATA_YIELD
# See https://www.rapids.science/latest/features/phone-applications-foreground/
@ -112,24 +114,32 @@ PHONE_APPLICATIONS_FOREGROUND:
CONTAINER: applications
APPLICATION_CATEGORIES:
CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
CATALOGUE_FILE: "data/external/stachl_application_genre_catalogue.csv"
PACKAGE_NAMES_HASHED: True
UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether or not to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
SCRAPE_MISSING_CATEGORIES: False # whether or not to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
CATALOGUE_FILE: "data/external/play_store_application_genre_catalogue.csv"
# Refer to data/external/play_store_categories_count.csv for a list of categories (genres) and their frequency.
UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
SCRAPE_MISSING_CATEGORIES: False # whether to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
PROVIDERS:
RAPIDS:
COMPUTE: True
INCLUDE_EPISODE_FEATURES: True
SINGLE_CATEGORIES: ["all", "email"]
SINGLE_CATEGORIES: ["Productivity", "Tools", "Communication", "Education", "Social"]
MULTIPLE_CATEGORIES:
social: ["socialnetworks", "socialmediatools"]
entertainment: ["entertainment", "gamingknowledge", "gamingcasual", "gamingadventure", "gamingstrategy", "gamingtoolscommunity", "gamingroleplaying", "gamingaction", "gaminglogic", "gamingsports", "gamingsimulation"]
games: ["Puzzle", "Card", "Casual", "Board", "Strategy", "Trivia", "Word", "Adventure", "Role Playing", "Simulation", "Board, Brain Games", "Racing"]
social: ["Communication", "Social", "Dating"]
productivity: ["Tools", "Productivity", "Finance", "Education", "News & Magazines", "Business", "Books & Reference"]
health: ["Health & Fitness", "Lifestyle", "Food & Drink", "Sports", "Medical", "Parenting"]
entertainment: ["Shopping", "Music & Audio", "Entertainment", "Travel & Local", "Photography", "Video Players & Editors", "Personalization", "House & Home", "Art & Design", "Auto & Vehicles", "Entertainment,Music & Video",
"Puzzle", "Card", "Casual", "Board", "Strategy", "Trivia", "Word", "Adventure", "Role Playing", "Simulation", "Board, Brain Games", "Racing" # Add all games.
]
maps_weather: ["Maps & Navigation", "Weather"]
CUSTOM_CATEGORIES:
social_media: ["com.google.android.youtube", "com.snapchat.android", "com.instagram.android", "com.zhiliaoapp.musically", "com.facebook.katana"]
dating: ["com.tinder", "com.relance.happycouple", "com.kiwi.joyride"]
SINGLE_APPS: ["top1global", "com.facebook.moments", "com.google.android.youtube", "com.twitter.android"] # There's no entropy for single apps
EXCLUDED_CATEGORIES: []
EXCLUDED_APPS: ["com.fitbit.FitbitMobile", "com.aware.plugin.upmc.cancer"] # TODO list system apps?
SINGLE_APPS: []
EXCLUDED_CATEGORIES: ["System", "STRAW"]
# Note: A special option here is "is_system_app".
# This excludes applications that have is_system_app = TRUE, which is a separate column in the table.
# However, all of these applications have been assigned System category.
# I will therefore filter by that category, which is a superset and is more complete. JL
EXCLUDED_APPS: []
FEATURES:
APP_EVENTS: ["countevent", "timeoffirstuse", "timeoflastuse", "frequencyentropy"]
APP_EPISODES: ["countepisode", "minduration", "maxduration", "meanduration", "sumduration"]
@ -241,7 +251,8 @@ PHONE_ESM:
PROVIDERS:
STRAW:
COMPUTE: True
SCALES: ["PANAS_positive_affect", "PANAS_negative_affect", "JCQ_job_demand", "JCQ_job_control", "JCQ_supervisor_support", "JCQ_coworker_support"]
SCALES: ["PANAS_positive_affect", "PANAS_negative_affect", "JCQ_job_demand", "JCQ_job_control", "JCQ_supervisor_support", "JCQ_coworker_support",
"appraisal_stressfulness_period", "appraisal_stressfulness_event", "appraisal_threat", "appraisal_challenge"]
FEATURES: [mean]
SRC_SCRIPT: src/features/phone_esm/straw/main.py
@ -326,6 +337,15 @@ PHONE_SCREEN:
EPISODE_TYPES: ["unlock"]
SRC_SCRIPT: src/features/phone_screen/rapids/main.py
# Custom added sensor
PHONE_SPEECH:
CONTAINER: speech
PROVIDERS:
STRAW:
COMPUTE: True
FEATURES: ["meanspeech", "stdspeech", "nlargest", "nsmallest", "medianspeech"]
SRC_SCRIPT: src/features/phone_speech/straw/main.py
# See https://www.rapids.science/latest/features/phone-wifi-connected/
PHONE_WIFI_CONNECTED:
CONTAINER: sensor_wifi
@ -710,6 +730,7 @@ ALL_CLEANING_OVERALL:
MIN_OVERLAP_FOR_CORR_THRESHOLD: 0.5
CORR_THRESHOLD: 0.95
STANDARDIZATION: True
TARGET_STANDARDIZATION: False
SRC_SCRIPT: src/features/all_cleaning_overall/straw/main.py
@ -731,5 +752,7 @@ PARAMS_FOR_ANALYSIS:
TARGET:
COMPUTE: True
LABEL: PANAS_negative_affect_mean
ALL_LABELS: [PANAS_positive_affect_mean, PANAS_negative_affect_mean, "JCQ_job_demand_mean", "JCQ_job_control_mean", "JCQ_supervisor_support_mean", "JCQ_coworker_support_mean"]
LABEL: appraisal_stressfulness_event_mean
ALL_LABELS: [PANAS_positive_affect_mean, PANAS_negative_affect_mean, JCQ_job_demand_mean, JCQ_job_control_mean, JCQ_supervisor_support_mean, JCQ_coworker_support_mean, appraisal_stressfulness_period_mean]
# PANAS_positive_affect_mean, PANAS_negative_affect_mean, JCQ_job_demand_mean, JCQ_job_control_mean, JCQ_supervisor_support_mean,
# JCQ_coworker_support_mean, appraisal_stressfulness_period_mean, appraisal_stressfulness_event_mean, appraisal_threat_mean, appraisal_challenge_mean

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,45 @@
genre,n
System,261
Tools,96
Productivity,71
Health & Fitness,60
Finance,54
Communication,39
Music & Audio,39
Shopping,38
Lifestyle,33
Education,28
News & Magazines,24
Maps & Navigation,23
Entertainment,21
Business,18
Travel & Local,18
Books & Reference,16
Social,16
Weather,16
Food & Drink,14
Sports,14
Other,13
Photography,13
Puzzle,13
Video Players & Editors,12
Card,9
Casual,9
Personalization,8
Medical,7
Board,5
Strategy,4
House & Home,3
Trivia,3
Word,3
Adventure,2
Art & Design,2
Auto & Vehicles,2
Dating,2
Role Playing,2
STRAW,2
Simulation,2
"Board,Brain Games",1
"Entertainment,Music & Video",1
Parenting,1
Racing,1
1 genre n
2 System 261
3 Tools 96
4 Productivity 71
5 Health & Fitness 60
6 Finance 54
7 Communication 39
8 Music & Audio 39
9 Shopping 38
10 Lifestyle 33
11 Education 28
12 News & Magazines 24
13 Maps & Navigation 23
14 Entertainment 21
15 Business 18
16 Travel & Local 18
17 Books & Reference 16
18 Social 16
19 Weather 16
20 Food & Drink 14
21 Sports 14
22 Other 13
23 Photography 13
24 Puzzle 13
25 Video Players & Editors 12
26 Card 9
27 Casual 9
28 Personalization 8
29 Medical 7
30 Board 5
31 Strategy 4
32 House & Home 3
33 Trivia 3
34 Word 3
35 Adventure 2
36 Art & Design 2
37 Auto & Vehicles 2
38 Dating 2
39 Role Playing 2
40 STRAW 2
41 Simulation 2
42 Board,Brain Games 1
43 Entertainment,Music & Video 1
44 Parenting 1
45 Racing 1

View File

@ -0,0 +1,39 @@
"""
Please do not make any changes, as RAPIDS is running on tmux server ...
"""
# !
# !
"""
Please do not make any changes, as RAPIDS is running on tmux server ...
"""
# !
# !
"""
Please do not make any changes, as RAPIDS is running on tmux server ...
"""
# !
# !
"""
Please do not make any changes, as RAPIDS is running on tmux server ...
"""
# !
# !
"""
Please do not make any changes, as RAPIDS is running on tmux server ...
"""
# !
# !
"""
Please do not make any changes, as RAPIDS is running on tmux server ...
"""
# !
# !
"""
Please do not make any changes, as RAPIDS is running on tmux server ...
"""
# !
# !
"""
Please do not make any changes, as RAPIDS is running on tmux server ...
"""
# !

View File

@ -1,165 +1,30 @@
name: rapids
channels:
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1
- _openmp_mutex=4.5
- _py-xgboost-mutex=2.0
- appdirs=1.4.4
- arrow=0.16.0
- asn1crypto=1.4.0
- astropy=4.2.1
- attrs=20.3.0
- binaryornot=0.4.4
- blas=1.0
- brotlipy=0.7.0
- bzip2=1.0.8
- ca-certificates=2021.7.5
- certifi=2021.5.30
- cffi=1.14.4
- chardet=3.0.4
- click=7.1.2
- colorama=0.4.4
- cookiecutter=1.6.0
- cryptography=3.3.1
- datrie=0.8.2
- docutils=0.16
- future=0.18.2
- gitdb=4.0.5
- gitdb2=4.0.2
- gitpython=3.1.11
- idna=2.10
- imbalanced-learn=0.6.2
- importlib-metadata=2.0.0
- importlib_metadata=2.0.0
- intel-openmp=2019.4
- jinja2=2.11.2
- jinja2-time=0.2.0
- joblib=1.0.0
- jsonschema=3.2.0
- ld_impl_linux-64=2.36.1
- libblas=3.8.0
- libcblas=3.8.0
- libcxx=10.0.0
- libcxxabi=10.0.0
- libedit=3.1.20191231
- libffi=3.3
- libgcc-ng=11.2.0
- libgfortran
- libgfortran
- libgfortran
- liblapack=3.8.0
- libopenblas=0.3.10
- libstdcxx-ng=11.2.0
- libxgboost=0.90
- libzlib=1.2.11
- lightgbm=3.1.1
- llvm-openmp=10.0.0
- markupsafe=1.1.1
- mkl
- mkl-service=2.3.0
- mkl_fft=1.2.0
- mkl_random=1.1.1
- more-itertools=8.6.0
- ncurses=6.2
- numpy=1.19.2
- numpy-base=1.19.2
- openblas=0.3.4
- openssl=1.1.1k
- pandas=1.1.5
- pbr=5.5.1
- pip=20.3.3
- plotly=4.14.1
- poyo=0.5.0
- psutil=5.7.2
- py-xgboost=0.90
- pycparser=2.20
- pyerfa=1.7.1.1
- pyopenssl=20.0.1
- pysocks=1.7.1
- python=3.7.9
- python-dateutil=2.8.1
- python_abi=3.7
- pytz=2020.4
- pyyaml=5.3.1
- readline=8.0
- requests=2.25.0
- retrying=1.3.3
- setuptools=51.0.0
- six=1.15.0
- smmap=3.0.4
- smmap2=3.0.1
- sqlite=3.33.0
- threadpoolctl=2.1.0
- tk=8.6.10
- tqdm=4.62.0
- urllib3=1.25.11
- wheel=0.36.2
- whichcraft=0.6.1
- wrapt=1.12.1
- xgboost=0.90
- xz=5.2.5
- yaml=0.2.5
- zipp=3.4.0
- zlib=1.2.11
- pip:
- amply==0.1.4
- auto-sklearn==0.14.7
- bidict==0.22.0
- biosppy==0.8.0
- build==0.8.0
- cached-property==1.5.2
- cloudpickle==2.2.0
- configargparse==0.15.1
- configspace==0.4.21
- cr-features==0.2.1
- cycler==0.11.0
- cython==0.29.32
- dask==2022.2.0
- decorator==4.4.2
- distributed==2022.2.0
- distro==1.7.0
- emcee==3.1.2
- fonttools==4.33.2
- fsspec==2022.8.2
- h5py==3.6.0
- heapdict==1.0.1
- hmmlearn==0.2.7
- ipython-genutils==0.2.0
- jupyter-core==4.6.3
- kiwisolver==1.4.2
- liac-arff==2.5.0
- locket==1.0.0
- matplotlib==3.5.1
- msgpack==1.0.4
- nbformat==5.0.7
- opencv-python==4.5.5.64
- packaging==21.3
- partd==1.3.0
- peakutils==1.3.3
- pep517==0.13.0
- pillow==9.1.0
- pulp==2.4
- pynisher==0.6.4
- pyparsing==2.4.7
- pyrfr==0.8.3
- pyrsistent==0.15.5
- pywavelets==1.3.0
- ratelimiter==1.2.0.post0
- scikit-learn==0.24.2
- scipy==1.7.3
- seaborn==0.11.2
- shortuuid==1.0.8
- smac==1.2
- snakemake==5.30.2
- sortedcontainers==2.4.0
- tblib==1.7.0
- tomli==2.0.1
- toolz==0.12.0
- toposort==1.5
- tornado==6.2
- traitlets==4.3.3
- typing-extensions==4.2.0
- zict==2.2.0
prefix: /opt/conda/envs/rapids
- auto-sklearn
- hmmlearn
- imbalanced-learn
- jsonschema
- lightgbm
- matplotlib
- numpy
- pandas
- peakutils
- pip
- plotly
- python-dateutil
- pytz
- pywavelets
- pyyaml
- scikit-learn
- scipy
- seaborn
- setuptools
- bioconda::snakemake
- bioconda::snakemake-minimal
- tqdm
- xgboost
- pip:
- biosppy
- cr_features>=0.2

338
renv.lock
View File

@ -1,6 +1,6 @@
{
"R": {
"Version": "4.1.2",
"Version": "4.2.3",
"Repositories": [
{
"Name": "CRAN",
@ -46,10 +46,10 @@
},
"Hmisc": {
"Package": "Hmisc",
"Version": "4.4-2",
"Version": "5.0-1",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "66458e906b2112a8b1639964efd77d7c"
"Repository": "CRAN",
"Hash": "bf9fe82c010a468fb32f913ff56d65e1"
},
"KernSmooth": {
"Package": "KernSmooth",
@ -104,7 +104,7 @@
"Package": "RPostgres",
"Version": "1.4.4",
"Source": "Repository",
"Repository": "CRAN",
"Repository": "RSPM",
"Hash": "c593ecb8dbca9faf3906431be610ca28"
},
"Rcpp": {
@ -181,7 +181,7 @@
"Package": "base64enc",
"Version": "0.1-3",
"Source": "Repository",
"Repository": "CRAN",
"Repository": "RSPM",
"Hash": "543776ae6848fde2f48ff3816d0628bc"
},
"bit": {
@ -221,17 +221,24 @@
},
"broom": {
"Package": "broom",
"Version": "0.7.3",
"Version": "1.0.4",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "5581a5ddc8fe2ac5e0d092ec2de4c4ae"
"Repository": "CRAN",
"Hash": "f62b2504021369a2449c54bbda362d30"
},
"cachem": {
"Package": "cachem",
"Version": "1.0.7",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "cda74447c42f529de601fe4d4050daef"
},
"callr": {
"Package": "callr",
"Version": "3.5.1",
"Version": "3.7.3",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "b7d7f1e926dfcd57c74ce93f5c048e80"
"Repository": "CRAN",
"Hash": "9b2191ede20fa29828139b9900922e51"
},
"caret": {
"Package": "caret",
@ -263,10 +270,10 @@
},
"cli": {
"Package": "cli",
"Version": "2.2.0",
"Version": "3.6.1",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "3ef298932294b775fa0a3eeaa3a645b0"
"Repository": "CRAN",
"Hash": "89e6d8219950eac806ae0c489052048a"
},
"clipr": {
"Package": "clipr",
@ -286,7 +293,7 @@
"Package": "codetools",
"Version": "0.2-18",
"Source": "Repository",
"Repository": "CRAN",
"Repository": "RSPM",
"Hash": "019388fc48e48b3da0d3a76ff94608a8"
},
"colorspace": {
@ -303,6 +310,13 @@
"Repository": "RSPM",
"Hash": "0f22be39ec1d141fd03683c06f3a6e67"
},
"conflicted": {
"Package": "conflicted",
"Version": "1.2.0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "bb097fccb22d156624fd07cd2894ddb6"
},
"corpcor": {
"Package": "corpcor",
"Version": "1.6.9",
@ -319,10 +333,10 @@
},
"cpp11": {
"Package": "cpp11",
"Version": "0.2.4",
"Version": "0.4.3",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "ba66e5a750d39067d888aa7af797fed2"
"Repository": "CRAN",
"Hash": "ed588261931ee3be2c700d22e94a29ab"
},
"crayon": {
"Package": "crayon",
@ -354,10 +368,10 @@
},
"dbplyr": {
"Package": "dbplyr",
"Version": "2.1.1",
"Version": "2.3.2",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "1f37fa4ab2f5f7eded42f78b9a887182"
"Hash": "d24305b92db333726aed162a2c23a147"
},
"desc": {
"Package": "desc",
@ -382,17 +396,17 @@
},
"dplyr": {
"Package": "dplyr",
"Version": "1.0.5",
"Version": "1.1.1",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "d0d76c11ec807eb3f000eba4e3eb0f68"
"Repository": "CRAN",
"Hash": "eb5742d256a0d9306d85ea68756d8187"
},
"dtplyr": {
"Package": "dtplyr",
"Version": "1.1.0",
"Version": "1.3.1",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "1e14e4c5b2814de5225312394bc316da"
"Repository": "CRAN",
"Hash": "54ed3ea01b11e81a86544faaecfef8e2"
},
"e1071": {
"Package": "e1071",
@ -419,7 +433,7 @@
"Package": "evaluate",
"Version": "0.14",
"Source": "Repository",
"Repository": "CRAN",
"Repository": "RSPM",
"Hash": "ec8ca05cffcc70569eaaad8469d2a3a7"
},
"fansi": {
@ -452,10 +466,10 @@
},
"forcats": {
"Package": "forcats",
"Version": "0.5.0",
"Version": "1.0.0",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "1cb4279e697650f0bd78cd3601ee7576"
"Repository": "CRAN",
"Hash": "1a0a9a3d5083d0d573c4214576f1e690"
},
"foreach": {
"Package": "foreach",
@ -492,6 +506,13 @@
"Repository": "RSPM",
"Hash": "f568ce73d3d59582b0f7babd0eb33d07"
},
"gargle": {
"Package": "gargle",
"Version": "1.3.0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "bb3208dcdfeb2e68bf33c87601b3cbe3"
},
"gclus": {
"Package": "gclus",
"Version": "1.3.2",
@ -515,10 +536,10 @@
},
"ggplot2": {
"Package": "ggplot2",
"Version": "3.3.2",
"Version": "3.4.1",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "4ded8b439797f7b1693bd3d238d0106b"
"Repository": "CRAN",
"Hash": "d494daf77c4aa7f084dbbe6ca5dcaca7"
},
"ggraph": {
"Package": "ggraph",
@ -557,16 +578,30 @@
},
"glue": {
"Package": "glue",
"Version": "1.4.2",
"Version": "1.6.2",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "6efd734b14c6471cfe443345f3e35e29"
"Repository": "CRAN",
"Hash": "4f2596dfb05dac67b9dc558e5c6fba2e"
},
"googledrive": {
"Package": "googledrive",
"Version": "2.1.0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "e88ba642951bc8d1898ba0d12581850b"
},
"googlesheets4": {
"Package": "googlesheets4",
"Version": "1.1.0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "fd7b97bd862a14297b0bb7ed28a3dada"
},
"gower": {
"Package": "gower",
"Version": "0.2.2",
"Source": "Repository",
"Repository": "CRAN",
"Repository": "RSPM",
"Hash": "be6a2b3529928bd803d1c437d1d43152"
},
"graphlayouts": {
@ -599,10 +634,10 @@
},
"haven": {
"Package": "haven",
"Version": "2.3.1",
"Version": "2.5.2",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "221d0ad75dfa03ebf17b1a4cc5c31dfc"
"Repository": "CRAN",
"Hash": "8b331e659e67d757db0fcc28e689c501"
},
"highr": {
"Package": "highr",
@ -613,10 +648,10 @@
},
"hms": {
"Package": "hms",
"Version": "1.1.1",
"Version": "1.1.3",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "5b8a2dd0fdbe2ab4f6081e6c7be6dfca"
"Hash": "b59377caa7ed00fa41808342002138f9"
},
"htmlTable": {
"Package": "htmlTable",
@ -648,10 +683,10 @@
},
"httr": {
"Package": "httr",
"Version": "1.4.2",
"Version": "1.4.5",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "a525aba14184fec243f9eaec62fbed43"
"Repository": "CRAN",
"Hash": "f6844033201269bec3ca0097bc6c97b3"
},
"huge": {
"Package": "huge",
@ -660,6 +695,13 @@
"Repository": "RSPM",
"Hash": "a4cde4dd1d2551edb99a3273a4ad34ea"
},
"ids": {
"Package": "ids",
"Version": "1.0.1",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "99df65cfef20e525ed38c3d2577f7190"
},
"igraph": {
"Package": "igraph",
"Version": "1.2.6",
@ -704,10 +746,10 @@
},
"jsonlite": {
"Package": "jsonlite",
"Version": "1.7.2",
"Version": "1.8.4",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "98138e0994d41508c7a6b84a0600cfcb"
"Repository": "CRAN",
"Hash": "a4269a09a9b865579b2635c77e572374"
},
"knitr": {
"Package": "knitr",
@ -760,10 +802,10 @@
},
"lifecycle": {
"Package": "lifecycle",
"Version": "1.0.0",
"Version": "1.0.3",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "3471fb65971f1a7b2d4ae7848cf2db8d"
"Repository": "CRAN",
"Hash": "001cecbeac1cff9301bdc3775ee46a86"
},
"listenv": {
"Package": "listenv",
@ -774,17 +816,17 @@
},
"lubridate": {
"Package": "lubridate",
"Version": "1.7.9.2",
"Version": "1.9.2",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "5b5b02f621d39a499def7923a5aee746"
"Repository": "CRAN",
"Hash": "e25f18436e3efd42c7c590a1c4c15390"
},
"magrittr": {
"Package": "magrittr",
"Version": "2.0.1",
"Version": "2.0.3",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "41287f1ac7d28a92f0a286ed507928d3"
"Repository": "CRAN",
"Hash": "7ce2733a9826b3aeb1775d56fd305472"
},
"markdown": {
"Package": "markdown",
@ -800,6 +842,13 @@
"Repository": "RSPM",
"Hash": "67101e7448dfd9add4ac418623060262"
},
"memoise": {
"Package": "memoise",
"Version": "2.0.1",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "e2817ccf4a065c5d9d7f2cfbe7c1d78c"
},
"mgcv": {
"Package": "mgcv",
"Version": "1.8-33",
@ -830,10 +879,10 @@
},
"modelr": {
"Package": "modelr",
"Version": "0.1.8",
"Version": "0.1.11",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "9fd59716311ee82cba83dc2826fc5577"
"Repository": "CRAN",
"Hash": "4f50122dc256b1b6996a4703fecea821"
},
"munsell": {
"Package": "munsell",
@ -888,7 +937,7 @@
"Package": "parallelly",
"Version": "1.29.0",
"Source": "Repository",
"Repository": "CRAN",
"Repository": "RSPM",
"Hash": "b5f399c9ce96977e22ef32c20b6cfe87"
},
"pbapply": {
@ -907,10 +956,10 @@
},
"pillar": {
"Package": "pillar",
"Version": "1.4.7",
"Version": "1.9.0",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "3b3dd89b2ee115a8b54e93a34cd546b4"
"Repository": "CRAN",
"Hash": "15da5a8412f317beeee6175fbc76f4bb"
},
"pkgbuild": {
"Package": "pkgbuild",
@ -977,10 +1026,10 @@
},
"processx": {
"Package": "processx",
"Version": "3.4.5",
"Version": "3.8.0",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "22aab6098cb14edd0a5973a8438b569b"
"Repository": "CRAN",
"Hash": "a33ee2d9bf07564efb888ad98410da84"
},
"prodlim": {
"Package": "prodlim",
@ -1000,7 +1049,7 @@
"Package": "progressr",
"Version": "0.9.0",
"Source": "Repository",
"Repository": "CRAN",
"Repository": "RSPM",
"Hash": "ca0d80ecc29903f7579edbabd91f4199"
},
"promises": {
@ -1033,10 +1082,10 @@
},
"purrr": {
"Package": "purrr",
"Version": "0.3.4",
"Version": "1.0.1",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "97def703420c8ab10d8f0e6c72101e02"
"Repository": "CRAN",
"Hash": "d71c815267c640f17ddbf7f16144b4bb"
},
"qap": {
"Package": "qap",
@ -1052,6 +1101,13 @@
"Repository": "RSPM",
"Hash": "d35964686307333a7121eb41c7dcd4e0"
},
"ragg": {
"Package": "ragg",
"Version": "1.2.5",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "690bc058ea2b1b8a407d3cfe3dce3ef9"
},
"rappdirs": {
"Package": "rappdirs",
"Version": "0.3.3",
@ -1061,17 +1117,17 @@
},
"readr": {
"Package": "readr",
"Version": "1.4.0",
"Version": "2.1.4",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "2639976851f71f330264a9c9c3d43a61"
"Repository": "CRAN",
"Hash": "b5047343b3825f37ad9d3b5d89aa1078"
},
"readxl": {
"Package": "readxl",
"Version": "1.3.1",
"Version": "1.4.2",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "63537c483c2dbec8d9e3183b3735254a"
"Repository": "CRAN",
"Hash": "2e6020b1399d95f947ed867045e9ca17"
},
"recipes": {
"Package": "recipes",
@ -1110,10 +1166,10 @@
},
"reprex": {
"Package": "reprex",
"Version": "0.3.0",
"Version": "2.0.2",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "b06bfb3504cc8a4579fd5567646f745b"
"Repository": "CRAN",
"Hash": "d66fe009d4c20b7ab1927eb405db9ee2"
},
"reshape2": {
"Package": "reshape2",
@ -1138,10 +1194,10 @@
},
"rlang": {
"Package": "rlang",
"Version": "0.4.10",
"Version": "1.1.0",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "599df23c40a4fce9c7b4764f28c37857"
"Repository": "CRAN",
"Hash": "dc079ccd156cde8647360f473c1fa718"
},
"rmarkdown": {
"Package": "rmarkdown",
@ -1173,24 +1229,24 @@
},
"rstudioapi": {
"Package": "rstudioapi",
"Version": "0.13",
"Version": "0.14",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "06c85365a03fdaf699966cc1d3cf53ea"
"Repository": "CRAN",
"Hash": "690bd2acc42a9166ce34845884459320"
},
"rvest": {
"Package": "rvest",
"Version": "0.3.6",
"Version": "1.0.3",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "a9795ccb2d608330e841998b67156764"
"Repository": "CRAN",
"Hash": "a4a5ac819a467808c60e36e92ddf195e"
},
"scales": {
"Package": "scales",
"Version": "1.1.1",
"Version": "1.2.1",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "6f76f71042411426ec8df6c54f34e6dd"
"Repository": "CRAN",
"Hash": "906cb23d2f1c5680b8ce439b44c6fa63"
},
"selectr": {
"Package": "selectr",
@ -1236,17 +1292,17 @@
},
"stringi": {
"Package": "stringi",
"Version": "1.5.3",
"Version": "1.7.12",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "a063ebea753c92910a4cca7b18bc1f05"
"Repository": "CRAN",
"Hash": "ca8bd84263c77310739d2cf64d84d7c9"
},
"stringr": {
"Package": "stringr",
"Version": "1.4.0",
"Version": "1.5.0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "0759e6b6c0957edb1311028a49a35e76"
"Hash": "671a4d384ae9d32fc47a14e98bfa3dc8"
},
"survival": {
"Package": "survival",
@ -1262,6 +1318,13 @@
"Repository": "RSPM",
"Hash": "b227d13e29222b4574486cfcbde077fa"
},
"systemfonts": {
"Package": "systemfonts",
"Version": "1.0.4",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "90b28393209827327de889f49935140a"
},
"testthat": {
"Package": "testthat",
"Version": "3.0.1",
@ -1269,12 +1332,19 @@
"Repository": "RSPM",
"Hash": "17826764cb92d8b5aae6619896e5a161"
},
"textshaping": {
"Package": "textshaping",
"Version": "0.3.6",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "1ab6223d3670fac7143202cb6a2d43d5"
},
"tibble": {
"Package": "tibble",
"Version": "3.0.4",
"Version": "3.2.1",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "71dffd8544691c520dd8e41ed2d7e070"
"Repository": "CRAN",
"Hash": "a84e2cc86d07289b3b6f5069df7a004c"
},
"tidygraph": {
"Package": "tidygraph",
@ -1285,24 +1355,24 @@
},
"tidyr": {
"Package": "tidyr",
"Version": "1.1.2",
"Version": "1.3.0",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "c40b2d5824d829190f4b825f4496dfae"
"Repository": "CRAN",
"Hash": "e47debdc7ce599b070c8e78e8ac0cfcf"
},
"tidyselect": {
"Package": "tidyselect",
"Version": "1.1.0",
"Version": "1.2.0",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "6ea435c354e8448819627cf686f66e0a"
"Repository": "CRAN",
"Hash": "79540e5fcd9e0435af547d885f184fd5"
},
"tidyverse": {
"Package": "tidyverse",
"Version": "1.3.0",
"Version": "2.0.0",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "bd51be662f359fa99021f3d51e911490"
"Repository": "CRAN",
"Hash": "c328568cd14ea89a83bd4ca7f54ae07e"
},
"timeDate": {
"Package": "timeDate",
@ -1311,6 +1381,13 @@
"Repository": "RSPM",
"Hash": "fde4fc571f5f61978652c229d4713845"
},
"timechange": {
"Package": "timechange",
"Version": "0.2.0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "8548b44f79a35ba1791308b61e6012d7"
},
"tinytex": {
"Package": "tinytex",
"Version": "0.28",
@ -1332,6 +1409,13 @@
"Repository": "RSPM",
"Hash": "fc77eb5297507cccfa3349a606061030"
},
"tzdb": {
"Package": "tzdb",
"Version": "0.3.0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "b2e1cbce7c903eaf23ec05c58e59fb5e"
},
"utf8": {
"Package": "utf8",
"Version": "1.1.4",
@ -1339,12 +1423,19 @@
"Repository": "RSPM",
"Hash": "4a5081acfb7b81a572e4384a7aaf2af1"
},
"vctrs": {
"Package": "vctrs",
"Version": "0.3.8",
"uuid": {
"Package": "uuid",
"Version": "1.1-0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "ecf749a1b39ea72bd9b51b76292261f1"
"Hash": "f1cb46c157d080b729159d407be83496"
},
"vctrs": {
"Package": "vctrs",
"Version": "0.6.1",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "06eceb3a5d716fd0654cc23ca3d71a99"
},
"viridis": {
"Package": "viridis",
@ -1360,6 +1451,13 @@
"Repository": "RSPM",
"Hash": "ce4f6271baa94776db692f1cb2055bee"
},
"vroom": {
"Package": "vroom",
"Version": "1.6.1",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "7015a74373b83ffaef64023f4a0f5033"
},
"waldo": {
"Package": "waldo",
"Version": "0.2.3",
@ -1376,10 +1474,10 @@
},
"withr": {
"Package": "withr",
"Version": "2.3.0",
"Version": "2.5.0",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "7307d79f58d1885b38c4f4f1a8cb19dd"
"Repository": "CRAN",
"Hash": "c0e49a9760983e81e55cdd9be92e7182"
},
"xfun": {
"Package": "xfun",
@ -1390,10 +1488,10 @@
},
"xml2": {
"Package": "xml2",
"Version": "1.3.2",
"Version": "1.3.3",
"Source": "Repository",
"Repository": "RSPM",
"Hash": "d4d71a75dd3ea9eb5fa28cc21f9585e2"
"Repository": "CRAN",
"Hash": "40682ed6a969ea5abfd351eb67833adc"
},
"xtable": {
"Package": "xtable",

View File

@ -345,6 +345,19 @@ rule esm_features:
script:
"../src/features/entry.py"
rule phone_speech_python_features:
input:
sensor_data = "data/raw/{pid}/phone_speech_with_datetime.csv",
time_segments_labels = "data/interim/time_segments/{pid}_time_segments_labels.csv"
params:
provider = lambda wildcards: config["PHONE_SPEECH"]["PROVIDERS"][wildcards.provider_key.upper()],
provider_key = "{provider_key}",
sensor_key = "phone_speech"
output:
"data/interim/{pid}/phone_speech_features/phone_speech_python_{provider_key}.csv"
script:
"../src/features/entry.py"
rule phone_keyboard_python_features:
input:
sensor_data = "data/raw/{pid}/phone_keyboard_with_datetime.csv",

View File

@ -247,6 +247,8 @@ rule empatica_readable_datetime:
include_past_periodic_segments = config["TIME_SEGMENTS"]["INCLUDE_PAST_PERIODIC_SEGMENTS"]
output:
"data/raw/{pid}/empatica_{sensor}_with_datetime.csv"
resources:
mem_mb=50000
script:
"../src/data/datetime/readable_datetime.R"
@ -259,16 +261,19 @@ rule extract_event_information_from_esm:
stage = "extract",
pid = "{pid}"
output:
"data/raw/ers/{pid}_ers.csv"
"data/raw/ers/{pid}_ers.csv",
"data/raw/ers/{pid}_stress_event_targets.csv"
script:
"../src/features/phone_esm/straw/process_user_event_related_segments.py"
rule create_event_related_segments_file:
rule merge_event_related_segments_files:
input:
ers_files = expand("data/raw/ers/{pid}_ers.csv", pid=config["PIDS"])
ers_files = expand("data/raw/ers/{pid}_ers.csv", pid=config["PIDS"]),
se_files = expand("data/raw/ers/{pid}_stress_event_targets.csv", pid=config["PIDS"])
params:
stage = "merge"
output:
"data/external/straw_events.csv"
"data/external/straw_events.csv",
"data/external/stress_event_targets.csv"
script:
"../src/features/phone_esm/straw/process_user_event_related_segments.py"

View File

@ -29,23 +29,16 @@ get_genre <- function(apps){
apps <- read.csv(snakemake@input[[1]], stringsAsFactors = F)
genre_catalogue <- data.frame()
catalogue_source <- snakemake@params[["catalogue_source"]]
package_names_hashed <- snakemake@params[["package_names_hashed"]]
update_catalogue_file <- snakemake@params[["update_catalogue_file"]]
scrape_missing_genres <- snakemake@params[["scrape_missing_genres"]]
apps_with_genre <- data.frame(matrix(ncol=length(colnames(apps)) + 1,nrow=0, dimnames=list(NULL, c(colnames(apps), "genre"))))
if (length(package_names_hashed) == 0) {package_names_hashed <- FALSE}
if(nrow(apps) > 0){
if(catalogue_source == "GOOGLE"){
apps_with_genre <- apps %>% mutate(genre = NA_character_)
} else if(catalogue_source == "FILE"){
genre_catalogue <- read.csv(snakemake@params[["catalogue_file"]], colClasses = c("character", "character"))
if (package_names_hashed) {
apps_with_genre <- left_join(apps, genre_catalogue, by = "package_hash")
} else {
apps_with_genre <- left_join(apps, genre_catalogue, by = "package_name")
}
apps_with_genre <- left_join(apps, genre_catalogue, by = "package_name")
}
if(catalogue_source == "GOOGLE" || (catalogue_source == "FILE" && scrape_missing_genres)){

View File

@ -349,3 +349,24 @@ PHONE_WIFI_VISIBLE:
COLUMN_MAPPINGS:
SCRIPTS: # List any python or r scripts that mutate your raw data
PHONE_SPEECH:
ANDROID:
RAPIDS_COLUMN_MAPPINGS:
TIMESTAMP: timestamp
DEVICE_ID: device_id
SPEECH_PROPORTION: speech_proportion
MUTATION:
COLUMN_MAPPINGS:
SCRIPTS: # List any python or r scripts that mutate your raw data
IOS:
RAPIDS_COLUMN_MAPPINGS:
TIMESTAMP: timestamp
DEVICE_ID: device_id
SPEECH_PROPORTION: speech_proportion
MUTATION:
COLUMN_MAPPINGS:
SCRIPTS: # List any python or r scripts that mutate your raw data

View File

@ -136,8 +136,9 @@ def patch_ibi_with_bvp(ibi_data, bvp_data):
# Begin with the cr-features part
try:
ibi_data, ibi_start_timestamp = empatica2d_to_array(ibi_data_file)
except IndexError as e:
except (IndexError, KeyError) as e:
# Checks whether IBI.csv is empty
# It may raise a KeyError if df is empty here: startTimeStamp = df.time[0]
df_test = pd.read_csv(ibi_data_file, names=['timings', 'inter_beat_interval'], header=None)
if df_test.empty:
df_test['timestamp'] = df_test['timings']

View File

@ -118,6 +118,11 @@ PHONE_SCREEN:
- DEVICE_ID
- SCREEN_STATUS
PHONE_SPEECH:
- TIMESTAMP
- DEVICE_ID
- SPEECH_PROPORTION
PHONE_WIFI_CONNECTED:
- TIMESTAMP
- DEVICE_ID

View File

@ -14,7 +14,6 @@ from src.features import empatica_data_yield as edy
pd.set_option('display.max_columns', 20)
def straw_cleaning(sensor_data_files, provider):
# TODO (maybe): reorganize the script based on the overall
features = pd.read_csv(sensor_data_files["sensor_data"][0])
@ -28,12 +27,18 @@ def straw_cleaning(sensor_data_files, provider):
# (1) FILTER_OUT THE ROWS THAT DO NOT HAVE THE TARGET COLUMN AVAILABLE
if config['PARAMS_FOR_ANALYSIS']['TARGET']['COMPUTE']:
target = config['PARAMS_FOR_ANALYSIS']['TARGET']['LABEL'] # get target label from config
features = features[features['phone_esm_straw_' + target].notna()].reset_index(drop=True)
if 'phone_esm_straw_' + target in features:
features = features[features['phone_esm_straw_' + target].notna()].reset_index(drop=True)
else:
return features
# (2.1) QUALITY CHECK (DATA YIELD COLUMN) deletes the rows where E4 or phone data is low quality
phone_data_yield_unit = provider["PHONE_DATA_YIELD_FEATURE"].split("_")[3].lower()
phone_data_yield_column = "phone_data_yield_rapids_ratiovalidyielded" + phone_data_yield_unit
if features.empty:
return features
features = edy.calculate_empatica_data_yield(features)
if not phone_data_yield_column in features.columns and not "empatica_data_yield" in features.columns:
@ -115,7 +120,7 @@ def straw_cleaning(sensor_data_files, provider):
esm_cols = features.loc[:, features.columns.str.startswith('phone_esm_straw')]
if provider["COLS_VAR_THRESHOLD"]:
features.drop(features.std()[features.std() == 0].index.values, axis=1, inplace=True)
features.drop(features.std(numeric_only=True)[features.std(numeric_only=True) == 0].index.values, axis=1, inplace=True)
fe5 = features.copy()
@ -129,7 +134,7 @@ def straw_cleaning(sensor_data_files, provider):
valid_features = features[numerical_cols].loc[:, features[numerical_cols].isna().sum() < drop_corr_features['MIN_OVERLAP_FOR_CORR_THRESHOLD'] * features[numerical_cols].shape[0]]
corr_matrix = valid_features.corr().abs()
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool))
to_drop = [column for column in upper.columns if any(upper[column] > drop_corr_features["CORR_THRESHOLD"])]
features.drop(to_drop, axis=1, inplace=True)
@ -139,20 +144,20 @@ def straw_cleaning(sensor_data_files, provider):
if esm not in features:
features[esm] = esm_cols[esm]
fe6 = features.copy()
# (9) VERIFY IF THERE ARE ANY NANS LEFT IN THE DATAFRAME
if features.isna().any().any():
raise ValueError("There are still some NaNs present in the dataframe. Please check for implementation errors.")
return features
def k_nearest(df):
pd.set_option('display.max_columns', None)
imputer = KNNImputer(n_neighbors=3)
return pd.DataFrame(imputer.fit_transform(df), columns=df.columns)
def impute(df, method='zero'):
def k_nearest(df):
pd.set_option('display.max_columns', None)
imputer = KNNImputer(n_neighbors=3)
return pd.DataFrame(imputer.fit_transform(df), columns=df.columns)
return {
'zero': df.fillna(0),
@ -162,6 +167,7 @@ def impute(df, method='zero'):
'knn': k_nearest(df)
}[method]
def graph_bf_af(features, phase_name, plt_flag=False):
if plt_flag:
sns.set(rc={"figure.figsize":(16, 8)})

View File

@ -14,21 +14,41 @@ def straw_cleaning(sensor_data_files, provider, target):
features = pd.read_csv(sensor_data_files["sensor_data"][0])
# features = features[features['local_segment_label'] == 'working_day'] # Filtriranje ustreznih časovnih segmentov
# print(features)
# sys.exit()
esm_cols = features.loc[:, features.columns.str.startswith('phone_esm_straw')] # Get target (esm) columns
with open('config.yaml', 'r') as stream:
config = yaml.load(stream, Loader=yaml.FullLoader)
esm_cols = features.loc[:, features.columns.str.startswith('phone_esm_straw')] # Get target (esm) columns
excluded_columns = ['local_segment', 'local_segment_label', 'local_segment_start_datetime', 'local_segment_end_datetime']
graph_bf_af(features, "1target_rows_before")
# (1) FILTER_OUT THE ROWS THAT DO NOT HAVE THE TARGET COLUMN AVAILABLE
# (1.0) OVERRIDE STRESSFULNESS EVENT TARGETS IF ERS SEGMENTING_METHOD IS "STRESS_EVENT"
if config["TIME_SEGMENTS"]["TAILORED_EVENTS"]["SEGMENTING_METHOD"] == "stress_event":
stress_events_targets = pd.read_csv("data/external/stress_event_targets.csv")
if "appraisal_stressfulness_event_mean" in config['PARAMS_FOR_ANALYSIS']['TARGET']['ALL_LABELS']:
features.drop(columns=['phone_esm_straw_appraisal_stressfulness_event_mean'], inplace=True)
features = features.merge(stress_events_targets[["label", "appraisal_stressfulness_event"]] \
.rename(columns={'label': 'local_segment_label'}), on=['local_segment_label'], how='inner') \
.rename(columns={'appraisal_stressfulness_event': 'phone_esm_straw_appraisal_stressfulness_event_mean'})
if "appraisal_threat_mean" in config['PARAMS_FOR_ANALYSIS']['TARGET']['ALL_LABELS']:
features.drop(columns=['phone_esm_straw_appraisal_threat_mean'], inplace=True)
features = features.merge(stress_events_targets[["label", "appraisal_threat"]] \
.rename(columns={'label': 'local_segment_label'}), on=['local_segment_label'], how='inner') \
.rename(columns={'appraisal_threat': 'phone_esm_straw_appraisal_threat_mean'})
if "appraisal_challenge_mean" in config['PARAMS_FOR_ANALYSIS']['TARGET']['ALL_LABELS']:
features.drop(columns=['phone_esm_straw_appraisal_challenge_mean'], inplace=True)
features = features.merge(stress_events_targets[["label", "appraisal_challenge"]] \
.rename(columns={'label': 'local_segment_label'}), on=['local_segment_label'], how='inner') \
.rename(columns={'appraisal_challenge': 'phone_esm_straw_appraisal_challenge_mean'})
esm_cols = features.loc[:, features.columns.str.startswith('phone_esm_straw')] # Get target (esm) columns
# (1.1) FILTER_OUT THE ROWS THAT DO NOT HAVE THE TARGET COLUMN AVAILABLE
if config['PARAMS_FOR_ANALYSIS']['TARGET']['COMPUTE']:
features = features[features['phone_esm_straw_' + target].notna()].reset_index(drop=True)
@ -36,7 +56,6 @@ def straw_cleaning(sensor_data_files, provider, target):
return pd.DataFrame(columns=excluded_columns)
graph_bf_af(features, "2target_rows_after")
print("HERE1", target, features["pid"])
# (2) QUALITY CHECK (DATA YIELD COLUMN) drops the rows where E4 or phone data is low quality
phone_data_yield_unit = provider["PHONE_DATA_YIELD_FEATURE"].split("_")[3].lower()
@ -52,26 +71,23 @@ def straw_cleaning(sensor_data_files, provider, target):
# Drop rows where phone data yield is less then given threshold
if provider["PHONE_DATA_YIELD_RATIO_THRESHOLD"]:
# print("\nThreshold:", provider["PHONE_DATA_YIELD_RATIO_THRESHOLD"])
# print("Phone features data yield stats:", features[phone_data_yield_column].describe(), "\n")
# print(features[phone_data_yield_column].sort_values())
hist = features[phone_data_yield_column].hist(bins=5)
plt.close()
features = features[features[phone_data_yield_column] >= provider["PHONE_DATA_YIELD_RATIO_THRESHOLD"]].reset_index(drop=True)
# Drop rows where empatica data yield is less then given threshold
if provider["EMPATICA_DATA_YIELD_RATIO_THRESHOLD"]:
# print("\nThreshold:", provider["EMPATICA_DATA_YIELD_RATIO_THRESHOLD"])
# print("E4 features data yield stats:", features["empatica_data_yield"].describe(), "\n")
# print(features["empatica_data_yield"].sort_values())
features = features[features["empatica_data_yield"] >= provider["EMPATICA_DATA_YIELD_RATIO_THRESHOLD"]].reset_index(drop=True)
if features.empty:
return pd.DataFrame(columns=excluded_columns)
graph_bf_af(features, "3data_yield_drop_rows")
if features.empty:
return pd.DataFrame(columns=excluded_columns)
# (3) CONTEXTUAL IMPUTATION
# Impute selected phone features with a high number
@ -93,20 +109,31 @@ def straw_cleaning(sensor_data_files, provider, target):
features[impute_w_sn2] = features[impute_w_sn2].fillna(1) # Special case of imputation - nominal/ordinal value
impute_w_sn3 = [col for col in features.columns if "loglocationvariance" in col]
features[impute_w_sn2] = features[impute_w_sn2].fillna(-1000000) # Special case of imputation - loglocation
features[impute_w_sn3] = features[impute_w_sn3].fillna(-1000000) # Special case of imputation - loglocation
# Impute selected phone features with 0 + impute ESM features with 0
# Impute location features
impute_locations = [col for col in features \
if col.startswith('phone_locations_doryab_') and
'radiusgyration' not in col
]
# Impute selected phone, location, and esm features with 0
impute_zero = [col for col in features if \
col.startswith('phone_applications_foreground_rapids_') or
col.startswith('phone_activity_recognition_') or
col.startswith('phone_battery_rapids_') or
col.startswith('phone_bluetooth_rapids_') or
col.startswith('phone_light_rapids_') or
col.startswith('phone_calls_rapids_') or
col.startswith('phone_messages_rapids_') or
col.startswith('phone_screen_rapids_') or
col.startswith('phone_wifi_visible')]
features[impute_zero+list(esm_cols.columns)] = features[impute_zero+list(esm_cols.columns)].fillna(0)
col.startswith('phone_bluetooth_doryab_') or
col.startswith('phone_wifi_visible')
]
features[impute_zero+impute_locations+list(esm_cols.columns)] = features[impute_zero+impute_locations+list(esm_cols.columns)].fillna(0)
pd.set_option('display.max_rows', None)
graph_bf_af(features, "4context_imp")
@ -119,7 +146,7 @@ def straw_cleaning(sensor_data_files, provider, target):
# (5) REMOVE COLS WHERE VARIANCE IS 0
if provider["COLS_VAR_THRESHOLD"]:
features.drop(features.std()[features.std() == 0].index.values, axis=1, inplace=True)
features.drop(features.std(numeric_only=True)[features.std(numeric_only=True) == 0].index.values, axis=1, inplace=True)
graph_bf_af(features, "6variance_drop")
@ -137,15 +164,18 @@ def straw_cleaning(sensor_data_files, provider, target):
if features.empty:
return pd.DataFrame(columns=excluded_columns)
# (7) STANDARDIZATION TODO: exclude nominal features from standardization
# (7) STANDARDIZATION
if provider["STANDARDIZATION"]:
nominal_cols = [col for col in features.columns if "mostcommonactivity" in col or "homelabel" in col] # Excluded nominal features
# Expected warning within this code block
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=RuntimeWarning)
features.loc[:, ~features.columns.isin(excluded_columns + ["pid"])] = \
features.loc[:, ~features.columns.isin(excluded_columns)].groupby('pid').transform(lambda x: StandardScaler().fit_transform(x.values[:,np.newaxis]).ravel())
if provider["TARGET_STANDARDIZATION"]:
features.loc[:, ~features.columns.isin(excluded_columns + ["pid"] + nominal_cols)] = \
features.loc[:, ~features.columns.isin(excluded_columns + nominal_cols)].groupby('pid').transform(lambda x: StandardScaler().fit_transform(x.values[:,np.newaxis]).ravel())
else:
features.loc[:, ~features.columns.isin(excluded_columns + ["pid"] + nominal_cols + ['phone_esm_straw_' + target])] = \
features.loc[:, ~features.columns.isin(excluded_columns + nominal_cols + ['phone_esm_straw_' + target])].groupby('pid').transform(lambda x: StandardScaler().fit_transform(x.values[:,np.newaxis]).ravel())
graph_bf_af(features, "8standardization")
@ -170,7 +200,7 @@ def straw_cleaning(sensor_data_files, provider, target):
valid_features = features[numerical_cols].loc[:, features[numerical_cols].isna().sum() < drop_corr_features['MIN_OVERLAP_FOR_CORR_THRESHOLD'] * features[numerical_cols].shape[0]]
corr_matrix = valid_features.corr().abs()
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool))
to_drop = [column for column in upper.columns if any(upper[column] > drop_corr_features["CORR_THRESHOLD"])]
# sns.heatmap(corr_matrix, cmap="YlGnBu")
@ -193,17 +223,35 @@ def straw_cleaning(sensor_data_files, provider, target):
graph_bf_af(features, "10correlation_drop")
# (10) VERIFY IF THERE ARE ANY NANS LEFT IN THE DATAFRAME
# Transform categorical columns to category dtype
cat1 = [col for col in features.columns if "mostcommonactivity" in col]
if cat1: # Transform columns to category dtype (mostcommonactivity)
features[cat1] = features[cat1].astype(int).astype('category')
cat2 = [col for col in features.columns if "homelabel" in col]
if cat2: # Transform columns to category dtype (homelabel)
features[cat2] = features[cat2].astype(int).astype('category')
# (10) DROP ALL WINDOW RELATED COLUMNS
win_count_cols = [col for col in features if "SO_windowsCount" in col]
if win_count_cols:
features.drop(columns=win_count_cols, inplace=True)
# (11) VERIFY IF THERE ARE ANY NANS LEFT IN THE DATAFRAME
if features.isna().any().any():
raise ValueError("There are still some NaNs present in the dataframe. Please check for implementation errors.")
return features
def k_nearest(df):
imputer = KNNImputer(n_neighbors=3)
return pd.DataFrame(imputer.fit_transform(df), columns=df.columns)
def impute(df, method='zero'):
def k_nearest(df):
imputer = KNNImputer(n_neighbors=3)
return pd.DataFrame(imputer.fit_transform(df), columns=df.columns)
return {
'zero': df.fillna(0),
@ -213,7 +261,8 @@ def impute(df, method='zero'):
'knn': k_nearest(df)
}[method]
def graph_bf_af(features, phase_name, plt_flag=True):
def graph_bf_af(features, phase_name, plt_flag=False):
if plt_flag:
sns.set(rc={"figure.figsize":(16, 8)})
sns.heatmap(features.isna(), cbar=False) #features.select_dtypes(include=np.number)

View File

@ -15,13 +15,13 @@ def extract_second_order_features(intraday_features, so_features_names, prefix="
so_features = pd.DataFrame()
#print(intraday_features.drop("level_1", axis=1).groupby(["local_segment"]).nsmallest())
if "mean" in so_features_names:
so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).mean().add_suffix("_SO_mean")], axis=1)
so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).mean(numeric_only=True).add_suffix("_SO_mean")], axis=1)
if "median" in so_features_names:
so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).median().add_suffix("_SO_median")], axis=1)
so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).median(numeric_only=True).add_suffix("_SO_median")], axis=1)
if "sd" in so_features_names:
so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).std().fillna(0).add_suffix("_SO_sd")], axis=1)
so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).std(numeric_only=True).fillna(0).add_suffix("_SO_sd")], axis=1)
if "nlargest" in so_features_names: # largest 5 -- maybe there is a faster groupby solution?
for column in intraday_features.loc[:, ~intraday_features.columns.isin(groupby_cols+[prefix+"level_1"])]:

View File

@ -2,25 +2,31 @@ import pandas as pd
import numpy as np
from datetime import datetime
import sys
import sys, yaml
def calculate_empatica_data_yield(features):
# Get time segment duration in seconds from dataframe
datetime_start = datetime.strptime(features.loc[0, 'local_segment_start_datetime'], '%Y-%m-%d %H:%M:%S')
datetime_end = datetime.strptime(features.loc[0, 'local_segment_end_datetime'], '%Y-%m-%d %H:%M:%S')
tseg_duration = (datetime_end - datetime_start).total_seconds()
def calculate_empatica_data_yield(features): # TODO
features["acc_data_yield"] = (features['empatica_accelerometer_cr_SO_windowsCount'] * 15) / tseg_duration \
if 'empatica_accelerometer_cr_SO_windowsCount' in features else 0
features["temp_data_yield"] = (features['empatica_temperature_cr_SO_windowsCount'] * 300) / tseg_duration \
if 'empatica_temperature_cr_SO_windowsCount' in features else 0
features["eda_data_yield"] = (features['empatica_electrodermal_activity_cr_SO_windowsCount'] * 60) / tseg_duration \
if 'empatica_electrodermal_activity_cr_SO_windowsCount' in features else 0
features["ibi_data_yield"] = (features['empatica_inter_beat_interval_cr_SO_windowsCount'] * 300) / tseg_duration \
if 'empatica_inter_beat_interval_cr_SO_windowsCount' in features else 0
# Get time segment duration in seconds from all segments in features dataframe
datetime_start = pd.to_datetime(features['local_segment_start_datetime'], format='%Y-%m-%d %H:%M:%S')
datetime_end = pd.to_datetime(features['local_segment_end_datetime'], format='%Y-%m-%d %H:%M:%S')
tseg_duration = (datetime_end - datetime_start).dt.total_seconds()
empatica_data_yield_cols = ['acc_data_yield', 'temp_data_yield', 'eda_data_yield', 'ibi_data_yield']
features["empatica_data_yield"] = features[empatica_data_yield_cols].mean(axis=1).fillna(0)
with open('config.yaml', 'r') as stream:
config = yaml.load(stream, Loader=yaml.FullLoader)
sensors = ["EMPATICA_ACCELEROMETER", "EMPATICA_TEMPERATURE", "EMPATICA_ELECTRODERMAL_ACTIVITY", "EMPATICA_INTER_BEAT_INTERVAL"]
for sensor in sensors:
features[f"{sensor.lower()}_data_yield"] = \
(features[f"{sensor.lower()}_cr_SO_windowsCount"] * config[sensor]["PROVIDERS"]["CR"]["WINDOWS"]["WINDOW_LENGTH"]) / tseg_duration \
if f'{sensor.lower()}_cr_SO_windowsCount' in features else 0
empatica_data_yield_cols = [sensor.lower() + "_data_yield" for sensor in sensors]
pd.set_option('display.max_rows', None)
# Assigns 1 to values that are over 1 (in case of windows not being filled fully)
features[empatica_data_yield_cols] = features[empatica_data_yield_cols].apply(lambda x: [y if y <= 1 or np.isnan(y) else 1 for y in x])
features["empatica_data_yield"] = features[empatica_data_yield_cols].mean(axis=1, numeric_only=True).fillna(0)
features.drop(empatica_data_yield_cols, axis=1, inplace=True) # In case of if the advanced operations will later not be needed (e.g., weighted average)
return features

View File

@ -54,8 +54,7 @@ def cr_features(sensor_data_files, time_segment, provider, filter_data_by_segmen
data_types = {'local_timezone': 'str', 'device_id': 'str', 'timestamp': 'int64', 'inter_beat_interval': 'float64', 'timings': 'float64', 'local_date_time': 'str',
'local_date': "str", 'local_time': "str", 'local_hour': "str", 'local_minute': "str", 'assigned_segments': "str"}
temperature_intraday_data = pd.read_csv(sensor_data_files["sensor_data"], dtype=data_types)
ibi_intraday_data = pd.read_csv(sensor_data_files["sensor_data"])
ibi_intraday_data = pd.read_csv(sensor_data_files["sensor_data"], dtype=data_types)
requested_intraday_features = provider["FEATURES"]

View File

@ -42,7 +42,8 @@ def straw_features(sensor_data_files, time_segment, provider, filter_data_by_seg
requested_features = provider["FEATURES"]
# name of the features this function can compute
requested_scales = provider["SCALES"]
base_features_names = ["PANAS_positive_affect", "PANAS_negative_affect", "JCQ_job_demand", "JCQ_job_control", "JCQ_supervisor_support", "JCQ_coworker_support"]
base_features_names = ["PANAS_positive_affect", "PANAS_negative_affect", "JCQ_job_demand", "JCQ_job_control", "JCQ_supervisor_support", "JCQ_coworker_support",
"appraisal_stressfulness_period", "appraisal_stressfulness_event", "appraisal_threat", "appraisal_challenge"]
#TODO Check valid questionnaire and feature names.
# the subset of requested features this function can compute
features_to_compute = list(set(requested_features) & set(base_features_names))

View File

@ -10,6 +10,15 @@ from esm import classify_sessions_by_completion_time, preprocess_esm
input_data_files = dict(snakemake.input)
def format_timestamp(x):
"""This method formates inputed timestamp into format "HH MM SS". Including spaces. If there is no hours or minutes present
that part is ignored, e.g., "MM SS" or just "SS".
Args:
x (int): unix timestamp in seconds
Returns:
str: formatted timestamp using "HH MM SS" sintax
"""
tstring=""
space = False
if x//3600 > 0:
@ -23,64 +32,229 @@ def format_timestamp(x):
return tstring
def extract_ers_from_file(esm_df, device_id): # TODO: session_id groupby -> spremeni naziv segmenta
def extract_ers(esm_df):
"""This method has two major functionalities:
(1) It prepares STRAW event-related segments file with the use of esm file. The execution protocol is depended on
the segmenting method specified in the config.yaml file.
(2) It prepares and writes csv with targets and corresponding time segments labels. This is later used
in the overall cleaning script (straw).
Details about each segmenting method are listed below by each corresponding condition. Refer to the RAPIDS documentation for the
ERS file format: https://www.rapids.science/1.9/setup/configuration/#time-segments -> event segments
pd.set_option("display.max_rows", None)
Args:
esm_df (DataFrame): read esm file that is dependend on the current participant.
Returns:
extracted_ers (DataFrame): dataframe with all necessary information to write event-related segments file
in the correct format.
"""
pd.set_option("display.max_rows", 100)
pd.set_option("display.max_columns", None)
# extracted_ers = pd.DataFrame(columns=["label", "event_timestamp", "length", "shift", "shift_direction", "device_id"])
with open('config.yaml', 'r') as stream:
config = yaml.load(stream, Loader=yaml.FullLoader)
pd.DataFrame(columns=["label"]).to_csv(snakemake.output[1]) # Create an empty stress_events_targets file
# esm_df = clean_up_esm(preprocess_esm(esm_df))
esm_preprocessed = clean_up_esm(preprocess_esm(esm_df))
# Take only during work sessions
# during_work = esm_df[esm_df["esm_trigger"].str.contains("during_work", na=False)]
# esm_trigger_group = esm_df.groupby("esm_session").agg(pd.Series.mode)['esm_trigger'] # Get most frequent esm_trigger within particular session
# esm_filtered_sessions = list(esm_trigger_group[esm_trigger_group == 'during_work'].index) # Take only sessions that contains during work
# Take only ema_completed sessions responses
classified = classify_sessions_by_completion_time(esm_preprocessed)
esm_filtered_sessions = classified[classified["session_response"] == 'ema_completed'].reset_index()['esm_session']
esm_filtered_sessions = classified[classified["session_response"] == 'ema_completed'].reset_index()[['device_id', 'esm_session']]
esm_df = esm_preprocessed.loc[(esm_preprocessed['device_id'].isin(esm_filtered_sessions['device_id'])) & (esm_preprocessed['esm_session'].isin(esm_filtered_sessions['esm_session']))]
esm_df = esm_preprocessed[esm_preprocessed["esm_session"].isin(esm_filtered_sessions)]
segmenting_method = config["TIME_SEGMENTS"]["TAILORED_EVENTS"]["SEGMENTING_METHOD"]
if segmenting_method in ["30_before", "90_before"]: # takes 30-minute peroid before the questionnaire + the duration of the questionnaire
""" '30-minutes and 90-minutes before' have the same fundamental logic with couple of deviations that will be explained below.
Both take x-minute period before the questionnaire that is summed with the questionnaire duration.
All questionnaire durations over 15 minutes are excluded from the querying.
"""
# Extract time-relevant information
extracted_ers = esm_df.groupby(["device_id", "esm_session"])['timestamp'].apply(lambda x: math.ceil((x.max() - x.min()) / 1000)).reset_index() # questionnaire length
extracted_ers["label"] = f"straw_event_{segmenting_method}_" + snakemake.params["pid"] + "_" + extracted_ers.index.astype(str).str.zfill(3)
extracted_ers[['event_timestamp', 'device_id']] = esm_df.groupby(["device_id", "esm_session"])['timestamp'].min().reset_index()[['timestamp', 'device_id']]
extracted_ers = extracted_ers[extracted_ers["timestamp"] <= 15 * 60].reset_index(drop=True) # ensure that the longest duration of the questionnaire anwsering is 15 min
extracted_ers["shift_direction"] = -1
# Extract time-relevant information
extracted_ers = esm_df.groupby(["device_id", "esm_session"])['timestamp'].apply(lambda x: math.ceil((x.max() - x.min()) / 1000)).reset_index() # in rounded up seconds
extracted_ers = extracted_ers[extracted_ers["timestamp"] <= 15 * 60].reset_index(drop=True) # ensure that the longest duration of the questionnaire anwsering is 15 min
# TODO: Rename "timestamp" column meaningfully.
if segmenting_method == "30_before":
"""The method 30-minutes before simply takes 30 minutes before the questionnaire and sums it with the questionnaire duration.
The timestamps are formatted with the help of format_timestamp() method.
"""
time_before_questionnaire = 30 * 60 # in seconds (30 minutes)
time_before_questionnaire = 30 * 60 # in seconds (30 minutes)
extracted_ers["length"] = (extracted_ers["timestamp"] + time_before_questionnaire).apply(lambda x: format_timestamp(x))
extracted_ers["shift"] = time_before_questionnaire
extracted_ers["shift"] = extracted_ers["shift"].apply(lambda x: format_timestamp(x))
elif segmenting_method == "90_before":
"""The method 90-minutes before has an important condition. If the time between the current and the previous questionnaire is
longer then 90 minutes it takes 90 minutes, otherwise it takes the original time difference between the questionnaires.
"""
time_before_questionnaire = 90 * 60 # in seconds (90 minutes)
extracted_ers["label"] = "straw_event_" + snakemake.params["pid"] + "_" + extracted_ers.index.astype(str).str.zfill(3)
extracted_ers["event_timestamp"] = esm_df.groupby("esm_session")['timestamp'].min().reset_index()['timestamp']
extracted_ers["length"] = (extracted_ers["timestamp"] + time_before_questionnaire).apply(lambda x: format_timestamp(x))
# TODO: Think about adding questionnaire duration.
extracted_ers["shift"] = time_before_questionnaire
extracted_ers["shift"] = extracted_ers["shift"].apply(lambda x: format_timestamp(x))
extracted_ers["shift_direction"] = -1
extracted_ers["device_id"] = device_id
extracted_ers[['end_event_timestamp', 'device_id']] = esm_df.groupby(["device_id", "esm_session"])['timestamp'].max().reset_index()[['timestamp', 'device_id']]
extracted_ers['diffs'] = extracted_ers['event_timestamp'].astype('int64') - extracted_ers['end_event_timestamp'].shift(1, fill_value=0).astype('int64')
extracted_ers.loc[extracted_ers['diffs'] > time_before_questionnaire * 1000, 'diffs'] = time_before_questionnaire * 1000
extracted_ers["diffs"] = (extracted_ers["diffs"] / 1000).apply(lambda x: math.ceil(x))
extracted_ers["length"] = (extracted_ers["timestamp"] + extracted_ers["diffs"]).apply(lambda x: format_timestamp(x))
extracted_ers["shift"] = extracted_ers["diffs"].apply(lambda x: format_timestamp(x))
elif segmenting_method == "stress_event":
"""
TODO: update documentation for this condition
This is a special case of the method as it consists of two important parts:
(1) Generating of the ERS file (same as the methods above) and
(2) Generating targets file alongside with the correct time segment labels.
This extracts event-related segments, depended on the event time and duration specified by the participant in the next
questionnaire. Additionally, 5 minutes before the specified start time of this event is taken to take into a account the
possiblity of the participant not remembering the start time percisely => this parameter can be manipulated with the variable
"time_before_event" which is defined below.
In case if the participant marked that no stressful event happened, the default of 30 minutes before the event is choosen.
In this case, se_threat and se_challenge are NaN.
By default, this method also excludes all events that are longer then 2.5 hours so that the segments are easily comparable.
"""
ioi = config["TIME_SEGMENTS"]["TAILORED_EVENTS"]["INTERVAL_OF_INTEREST"] * 60 # interval of interest in seconds
ioi_error_tolerance = config["TIME_SEGMENTS"]["TAILORED_EVENTS"]["IOI_ERROR_TOLERANCE"] * 60 # interval of interest error tolerance in seconds
# Get and join required data
extracted_ers = esm_df.groupby(["device_id", "esm_session"])['timestamp'].apply(lambda x: math.ceil((x.max() - x.min()) / 1000)).reset_index().rename(columns={'timestamp': 'session_length'}) # questionnaire length
extracted_ers = extracted_ers[extracted_ers["session_length"] <= 15 * 60].reset_index(drop=True) # ensure that the longest duration of the questionnaire answering is 15 min
session_start_timestamp = esm_df.groupby(['device_id', 'esm_session'])['timestamp'].min().to_frame().rename(columns={'timestamp': 'session_start_timestamp'}) # questionnaire start timestamp
session_end_timestamp = esm_df.groupby(['device_id', 'esm_session'])['timestamp'].max().to_frame().rename(columns={'timestamp': 'session_end_timestamp'}) # questionnaire end timestamp
# Users' answers for the stressfulness event (se) start times and durations
se_time = esm_df[esm_df.questionnaire_id == 90.].set_index(['device_id', 'esm_session'])['esm_user_answer'].to_frame().rename(columns={'esm_user_answer': 'se_time'})
se_duration = esm_df[esm_df.questionnaire_id == 91.].set_index(['device_id', 'esm_session'])['esm_user_answer'].to_frame().rename(columns={'esm_user_answer': 'se_duration'})
# Make se_durations to the appropriate lengths
# Extracted 3 targets that will be transfered in the csv file to the cleaning script.
se_stressfulness_event_tg = esm_df[esm_df.questionnaire_id == 87.].set_index(['device_id', 'esm_session'])['esm_user_answer_numeric'].to_frame().rename(columns={'esm_user_answer_numeric': 'appraisal_stressfulness_event'})
se_threat_tg = esm_df[esm_df.questionnaire_id == 88.].groupby(["device_id", "esm_session"]).mean(numeric_only=True)['esm_user_answer_numeric'].to_frame().rename(columns={'esm_user_answer_numeric': 'appraisal_threat'})
se_challenge_tg = esm_df[esm_df.questionnaire_id == 89.].groupby(["device_id", "esm_session"]).mean(numeric_only=True)['esm_user_answer_numeric'].to_frame().rename(columns={'esm_user_answer_numeric': 'appraisal_challenge'})
# All relevant features are joined by inner join to remove standalone columns (e.g., stressfulness event target has larger count)
extracted_ers = extracted_ers.join(session_start_timestamp, on=['device_id', 'esm_session'], how='inner') \
.join(session_end_timestamp, on=['device_id', 'esm_session'], how='inner') \
.join(se_stressfulness_event_tg, on=['device_id', 'esm_session'], how='inner') \
.join(se_time, on=['device_id', 'esm_session'], how='left') \
.join(se_duration, on=['device_id', 'esm_session'], how='left') \
.join(se_threat_tg, on=['device_id', 'esm_session'], how='left') \
.join(se_challenge_tg, on=['device_id', 'esm_session'], how='left')
# Filter-out the sessions that are not useful. Because of the ambiguity this excludes:
# (1) straw event times that are marked as "0 - I don't remember"
extracted_ers = extracted_ers[~extracted_ers.se_time.astype(str).str.startswith("0 - ")]
extracted_ers.reset_index(drop=True, inplace=True)
extracted_ers.loc[extracted_ers.se_duration.astype(str).str.startswith("0 - "), 'se_duration'] = 0
# Add default duration in case if participant answered that no stressful event occured
extracted_ers["se_duration"] = extracted_ers["se_duration"].fillna(int((ioi + 2*ioi_error_tolerance) * 1000))
# Prepare data to fit the data structure in the CSV file ...
# Add the event time as the end of the questionnaire if no stress event occured
extracted_ers['se_time'] = extracted_ers['se_time'].fillna(extracted_ers['session_start_timestamp'])
# Type could be an int (timestamp [ms]) which stays the same, and datetime str which is converted to timestamp in miliseconds
extracted_ers['event_timestamp'] = extracted_ers['se_time'].apply(lambda x: x if isinstance(x, int) else pd.to_datetime(x).timestamp() * 1000).astype('int64')
extracted_ers['shift_direction'] = -1
""">>>>> begin section (could be optimized) <<<<<"""
# Checks whether the duration is marked with "1 - It's still ongoing" which means that the end of the current questionnaire
# is taken as end time of the segment. Else the user input duration is taken.
extracted_ers['se_duration'] = \
np.where(
extracted_ers['se_duration'].astype(str).str.startswith("1 - "),
extracted_ers['session_end_timestamp'] - extracted_ers['event_timestamp'],
extracted_ers['se_duration']
)
# This converts the rows of timestamps in miliseconds and the rows with datetime... to timestamp in seconds.
extracted_ers['se_duration'] = \
extracted_ers['se_duration'].apply(lambda x: math.ceil(x / 1000) if isinstance(x, int) else (pd.to_datetime(x).hour * 60 + pd.to_datetime(x).minute) * 60)
# Check explicitley whether min duration is at least 0. This will eliminate rows that would be investigated after the end of the questionnaire.
extracted_ers = extracted_ers[extracted_ers['session_end_timestamp'] - extracted_ers['event_timestamp'] >= 0]
# Double check whether min se_duration is at least 0. Filter-out the rest. Negative values are considered invalid.
extracted_ers = extracted_ers[extracted_ers["se_duration"] >= 0].reset_index(drop=True)
""">>>>> end section <<<<<"""
# Simply override all durations to be of an equal amount
extracted_ers['se_duration'] = ioi + 2*ioi_error_tolerance
# If target is 0 then shift by the total stress event duration, otherwise shift it by ioi_tolerance
extracted_ers['shift'] = \
np.where(
extracted_ers['appraisal_stressfulness_event'] == 0,
extracted_ers['se_duration'],
ioi_error_tolerance
)
extracted_ers['shift'] = extracted_ers['shift'].apply(lambda x: format_timestamp(int(x)))
extracted_ers['length'] = extracted_ers['se_duration'].apply(lambda x: format_timestamp(int(x)))
# Drop event_timestamp duplicates in case in the user is referencing the same event over multiple questionnaires
extracted_ers.drop_duplicates(subset=["event_timestamp"], keep='first', inplace=True)
extracted_ers.reset_index(drop=True, inplace=True)
extracted_ers["label"] = f"straw_event_{segmenting_method}_" + snakemake.params["pid"] + "_" + extracted_ers.index.astype(str).str.zfill(3)
# Write the csv of extracted ERS labels with targets related to stressfulness event
extracted_ers[["label", "appraisal_stressfulness_event", "appraisal_threat", "appraisal_challenge"]].to_csv(snakemake.output[1], index=False)
else:
raise Exception("Please select correct target method for the event-related segments.")
extracted_ers = pd.DataFrame(columns=["label", "event_timestamp", "length", "shift", "shift_direction", "device_id"])
return extracted_ers[["label", "event_timestamp", "length", "shift", "shift_direction", "device_id"]]
# TODO: potrebno preveriti kako se izvaja iskanje prek device_id -> na tem temelji tudi proces ekstrahiranja ERS
if snakemake.params["stage"] == "extract": # TODO: najprej preveri ustreznost umeščenosti v RAPIDS pipelineu
"""
Here the code is executed - this .py file is used both for extraction of the STRAW time_segments file for the individual
participant, and also for merging all participant's files into one combined file which is later used for the time segments
to all sensors assignment.
There are two files involved (see rules extract_event_information_from_esm and merge_event_related_segments_files in preprocessing.smk)
(1) ERS file which contains all the information about the time segment timings and
(2) targets file which has corresponding target value for the segment label which is later used to merge with other features in the cleaning script.
For more information, see the comment in the method above.
"""
if snakemake.params["stage"] == "extract":
esm_df = pd.read_csv(input_data_files['esm_raw_input'])
with open(input_data_files['pid_file'], 'r') as stream:
pid_file = yaml.load(stream, Loader=yaml.FullLoader)
extracted_ers = extract_ers_from_file(esm_df, pid_file["PHONE"]["DEVICE_IDS"][0])
extracted_ers = extract_ers(esm_df)
extracted_ers.to_csv(snakemake.output[0], index=False)
elif snakemake.params["stage"] == "merge":
input_data_files = dict(snakemake.input)
straw_events = pd.DataFrame(columns=["label", "event_timestamp", "length", "shift", "shift_direction", "device_id"])
stress_events_targets = pd.DataFrame(columns=["label", "appraisal_stressfulness_event", "appraisal_threat", "appraisal_challenge"])
for input_file in input_data_files["ers_files"]:
ers_df = pd.read_csv(input_file)
straw_events = pd.concat([straw_events, ers_df], axis=0, ignore_index=True)
straw_events.to_csv(snakemake.output[0], index=False)
for input_file in input_data_files["se_files"]:
se_df = pd.read_csv(input_file)
stress_events_targets = pd.concat([stress_events_targets, se_df], axis=0, ignore_index=True)
stress_events_targets.to_csv(snakemake.output[1], index=False)

View File

@ -115,7 +115,7 @@ cluster_on = provider["CLUSTER_ON"]
strategy = provider["INFER_HOME_LOCATION_STRATEGY"]
days_threshold = provider["MINIMUM_DAYS_TO_DETECT_HOME_CHANGES"]
if not location_data.timestamp.is_monotonic:
if not location_data.timestamp.is_monotonic_increasing:
location_data.sort_values(by=["timestamp"], inplace=True)
location_data["duration_in_seconds"] = -1 * location_data.timestamp.diff(-1) / 1000

View File

@ -0,0 +1,30 @@
import pandas as pd
def straw_features(sensor_data_files, time_segment, provider, filter_data_by_segment, *args, **kwargs):
speech_data = pd.read_csv(sensor_data_files["sensor_data"])
requested_features = provider["FEATURES"]
# name of the features this function can compute+
base_features_names = ["meanspeech", "stdspeech", "nlargest", "nsmallest", "medianspeech"]
features_to_compute = list(set(requested_features) & set(base_features_names))
speech_features = pd.DataFrame(columns=["local_segment"] + features_to_compute)
if not speech_data.empty:
speech_data = filter_data_by_segment(speech_data, time_segment)
if not speech_data.empty:
speech_features = pd.DataFrame()
if "meanspeech" in features_to_compute:
speech_features["meanspeech"] = speech_data.groupby(["local_segment"])['speech_proportion'].mean()
if "stdspeech" in features_to_compute:
speech_features["stdspeech"] = speech_data.groupby(["local_segment"])['speech_proportion'].std()
if "nlargest" in features_to_compute:
speech_features["nlargest"] = speech_data.groupby(["local_segment"])['speech_proportion'].apply(lambda x: x.nlargest(5).mean())
if "nsmallest" in features_to_compute:
speech_features["nsmallest"] = speech_data.groupby(["local_segment"])['speech_proportion'].apply(lambda x: x.nsmallest(5).mean())
if "medianspeech" in features_to_compute:
speech_features["medianspeech"] = speech_data.groupby(["local_segment"])['speech_proportion'].median()
speech_features = speech_features.reset_index()
return speech_features

View File

@ -1,5 +1,6 @@
import pandas as pd
import sys
import warnings
def retain_target_column(df_input: pd.DataFrame, target_variable_name: str):
column_names = df_input.columns
@ -8,9 +9,9 @@ def retain_target_column(df_input: pd.DataFrame, target_variable_name: str):
esm_names = column_names[esm_names_index]
target_variable_index = esm_names.str.contains(target_variable_name)
if all(~target_variable_index):
raise ValueError("The requested target (", target_variable_name,
")cannot be found in the dataset.",
"Please check the names of phone_esm_ columns in z_all_sensor_features_cleaned_straw_py.csv")
warnings.warn(f"The requested target (, {target_variable_name} ,)cannot be found in the dataset. Please check the names of phone_esm_ columns in cleaned python file")
return None
sensor_features_plus_target = df_input.drop(esm_names, axis=1)
sensor_features_plus_target["target"] = df_input[esm_names[target_variable_index]]
# We will only keep one column related to phone_esm and that will be our target variable.

View File

@ -7,4 +7,7 @@ target_variable_name = snakemake.params["target_variable"]
model_input = retain_target_column(cleaned_sensor_features, target_variable_name)
model_input.to_csv(snakemake.output[0], index=False)
if model_input is None:
pd.DataFrame().to_csv(snakemake.output[0])
else:
model_input.to_csv(snakemake.output[0], index=False)