Compare commits

..

100 Commits

Author SHA1 Message Date
Primoz 8a6b52a97c Switch to 30_before ERS with corresponding targets. 2022-11-29 11:35:49 +00:00
Primoz 244a053730 Change output files settings to nonstandardized. 2022-11-29 11:19:43 +00:00
Primoz be0324fd01 Fix some bugs and set categorical columns as categories dtypes. 2022-11-28 12:44:25 +00:00
Primoz 99c2fab8f9 Fix a bug in the making of the individual model (when there is no target in the participants columns). 2022-11-16 09:50:18 +00:00
Primoz 286de93bfd Fix some bugs and extend ERS and cleaning scripts with multiple stress event targets logic. 2022-11-15 11:21:51 +00:00
Primoz ab803ee49c Add additional appraisal targets. 2022-11-15 10:14:07 +00:00
Primoz 621f11b2d9 Fix a bug related to wrong user input (duplicated events). 2022-11-15 09:53:31 +00:00
Primoz bd41f42a5d Rename target_ to segmenting_ method. 2022-11-14 15:07:36 +00:00
Primoz a543ce372f Add comments for event_related_script understanding. 2022-11-14 15:04:16 +00:00
Primoz 74b454b07b Apply changes to string answers to make them language-generic. 2022-11-11 09:15:12 +00:00
Primoz 6ebe83e47e Improve the ERS extract method with a couple of validations. 2022-11-10 12:42:52 +00:00
Primoz 00350ef8ca Change config for stressfulness event target method. 2022-11-10 10:32:58 +00:00
Primoz e4985c9121 Override stressfulness event target with extracted values from csv. 2022-11-10 10:29:11 +00:00
Primoz a668b6e8da Extract ERS and stress event targets to csv files (completed). 2022-11-10 09:37:27 +00:00
Primoz 9199b53ded Get, join and start processing required ERS stress event data. 2022-11-09 15:11:51 +00:00
Primoz f3c6a66da9 Begin with stress events in the ERS script. 2022-11-08 15:53:43 +00:00
Primoz 0b3e9226b3 Make small corrections in ERS file. 2022-11-08 14:44:24 +00:00
Primoz 2d83f7ddec Begin the ERS logic for 90-minutes events. 2022-11-08 11:32:05 +00:00
Primoz 1da72a7cbe Rename targets method in config. 2022-11-08 09:45:37 +00:00
Primoz 9f441afc16 Begin ERS logic for 90-minutes events. 2022-11-04 15:09:04 +00:00
Primoz c1c9f4d05a Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning 2022-11-04 09:11:58 +00:00
Primoz 62f46ea376 Prepare method-based logic for ERS generating. 2022-11-04 09:11:53 +00:00
Primoz 7ab0280d7e Correctly rename stressful event target variable. 2022-11-04 08:58:08 +00:00
Primoz eefa9f3f4d Add new target: stressfulness_event. 2022-11-03 14:49:54 +00:00
Primoz 5e8174dd41 Add new target: stressfulness_period. 2022-11-03 13:52:45 +00:00
Primoz 35c1a762e7 Improve filtering by esm_session and device_id. 2022-11-03 13:51:18 +00:00
Primoz 02264b21fd Add logic for target selection in ERS processing. 2022-11-03 09:30:12 +00:00
Primoz 0ce8723bdb Extend imputation logic within the cleaning script. 2022-11-02 14:01:21 +00:00
Primoz 30b38bfc02 Fix the generating procedure of ERS file for participants with multiple devices. 2022-10-28 09:00:13 +00:00
Primoz cd137af15a Config for 30 minute EMA segments. 2022-10-27 14:20:15 +00:00
Primoz 3c0585a566 Remove obsolete comments. 2022-10-27 14:12:56 +00:00
Primoz 6b487fcf7b Set E4 data yield to 1 if it is over 1. Optimize E4 data_yield script. 2022-10-27 14:11:42 +00:00
Primoz 5d17c92e54 Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning 2022-10-26 14:18:20 +00:00
Primoz a31fdd1479 Start to test empatica_data_yield precieved error. 2022-10-26 14:18:08 +00:00
Primoz 936324d234 Switch config for 30 minutes event related segments. 2022-10-26 14:17:27 +00:00
Primoz da0a4596f8 Add additional ESM processing logic for ERS csv extraction. 2022-10-26 14:16:25 +00:00
Primoz d4d74818e6 Fix a bug - missing time_segment column when df is empty 2022-10-26 14:14:32 +00:00
Primoz 14ff59914b Fix to correct dtypes. 2022-10-26 09:59:46 +00:00
Primoz 6ab0ac5329 Optimize memory consumption with dtype definition while reading csv file. 2022-10-26 09:57:26 +00:00
Primoz 0d143e6aad Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning 2022-10-25 15:28:27 +00:00
Primoz 8acac50125 Add safenet when features dataframe is empty. 2022-10-25 15:26:43 +00:00
Primoz b92a3aa37a Remove unwanted output or other error producing code. 2022-10-25 15:25:22 +00:00
Primoz bfd637eb9c Improve strings formatting in straw_events file. 2022-10-25 08:53:44 +00:00
Primoz 0d81ad5756 Debug assignment of segments to rows 2022-10-19 13:35:04 +00:00
Primoz cea451d344 Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning 2022-10-18 09:15:06 +00:00
Primoz e88bbd548f Add new daily segment and filter by segment in the cleaning script. 2022-10-18 09:15:00 +00:00
Primoz cf38d9f175 Implement ERS generating logic. 2022-10-17 15:07:33 +00:00
Primoz f3ca56cdbf Start with ERS logic integration within Snakemake. 2022-10-14 14:46:28 +00:00
Primoz 797aa98f4f Config for ERS testing. 2022-10-12 15:51:50 +00:00
Primoz 9baff159cd Changes needed for testing and starting of the Event-Related Segments. 2022-10-12 15:51:23 +00:00
Primoz 0f21273508 Bugs fix 2022-10-12 12:32:51 +00:00
Primoz 55517eb737 Necessary commit before proceeding. 2022-10-12 12:23:11 +00:00
Primoz de15a52dba Bug fix 2022-10-11 08:36:23 +00:00
Primoz 1ad25bb572 Few modifications of some imputation values in cleaning script and feature extraction. 2022-10-11 08:26:17 +00:00
Primoz 9884b383cf Testing new data with AutoML. 2022-10-10 16:45:38 +00:00
Primoz 2dc89c083c Small changes in cleaning overall 2022-10-07 08:52:12 +00:00
Primoz 001d400729 Clean features and create input files based on all possible targets. 2022-10-06 14:28:12 +00:00
Primoz 1e38d9bf1e Standardization and correlation visualization in overall cleaning script. 2022-10-06 13:27:38 +00:00
Primoz a34412a18d E4 data yield corrections. Changes in overal cs - standardization. 2022-10-05 14:16:55 +00:00
Primoz 437459648f Errors fix: individual script - treat participants missing data. 2022-10-05 13:35:05 +00:00
Primoz 53f6cc60d5 Config and cleaning script necessary changes ... 2022-10-03 13:06:39 +00:00
Primoz bbeabeee6f Last changes before processing on the server. 2022-10-03 12:53:31 +00:00
Primoz 44531c6d94 Code cleaning, reworking cleaning individual based on changes in overall script. Changes in thresholds. 2022-09-30 10:04:07 +00:00
Primoz 7ac7cd5a37 Preparation of the overall cleaning script. 2022-09-29 14:33:21 +00:00
Primoz 68fd69dada Cleaning script for individuals: corrections and comments. 2022-09-29 11:55:25 +00:00
Primoz a4f0d056a0 Fillna for app foreground and activity recognition 2022-09-29 11:44:27 +00:00
Primoz 6286e7a44c firstuseafter column removed from contextual imputation 2022-09-28 12:47:08 +00:00
Primoz 9b3447febd Contextual imputation correction 2022-09-28 12:40:05 +00:00
Primoz d6adda30cf Contextual imputation on time(first/last) features. 2022-09-28 12:37:51 +00:00
Primoz 8af4ef11dc Contextual imputation by feature type. 2022-09-28 10:02:47 +00:00
Primoz 536b9494cd Cleaning script corrections 2022-09-27 14:12:08 +00:00
Primoz f0b87c9dd0 Debugging of the empatica data yield integration. 2022-09-27 09:54:15 +00:00
Primoz 7fcdb873fe Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning 2022-09-27 07:50:29 +00:00
Primoz 5c7bb0f4c1 Config changes 2022-09-27 07:48:32 +00:00
Primoz bd53dc1684 Empatica data yield usage in the cleaning script. 2022-09-26 15:54:00 +00:00
Primoz d9a574c550 Changes in the cleaning script and preparation of empatica data yield method. 2022-09-23 13:24:50 +00:00
Primoz 19aa8707c0 Redefined cleaning steps after revision 2022-09-22 13:45:51 +00:00
Primoz 247d758cb7 Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning 2022-09-21 07:18:01 +00:00
Primoz 90ee99e4b9 Remove TODO comments 2022-09-21 07:16:00 +00:00
Primoz 7493aaa643 Small changes in cleaning scrtipt and missing vals testing. 2022-09-20 12:57:55 +00:00
Primoz eaf4340afd Small imputation and cleaning corrections. 2022-09-20 08:03:48 +00:00
Primoz a96ea508c6 Fill NaN of Empatica's SD second order feature (must be tested). 2022-09-19 07:34:02 +00:00
Primoz 52e11cdcab Configurations for new standardization path. 2022-09-19 07:25:54 +00:00
Primoz 92aff93e65 Remove standardization script. 2022-09-19 07:25:16 +00:00
Primoz 18b63127de Removed all standardizaton rules and configurations. 2022-09-19 06:16:26 +00:00
Primoz 62982866cd Phone wifi visible inspection (WIP) 2022-09-16 13:24:21 +00:00
Primoz 0ce6da5444 kNN imputation relocation and execution only on specific columns. 2022-09-16 11:30:08 +00:00
Primoz e3b78c8a85 Impute selected phone features with 0.
Wifi visible, screen, and light.
2022-09-16 10:58:57 +00:00
Primoz 7d85f75d21 Changes in phone features NaN values script. 2022-09-16 09:03:30 +00:00
Primoz 385e21409d Changes in NaN values testing script. 2022-09-15 14:16:58 +00:00
Primoz 18002f59e1 Doryab bluetooth and locations features fill in NaN values. 2022-09-15 10:48:59 +00:00
Primoz 3cf7ca41aa Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning 2022-09-14 15:38:32 +00:00
Primoz d5ab5a0394 Writing testing scripts to determine the point of manual imputation. 2022-09-14 14:13:03 +00:00
Primoz dfbb758902 Changes in AutoML params and environment.yml 2022-09-13 13:54:06 +00:00
Primoz 4ec371ed96 Testing auto-sklearn 2022-09-13 09:51:03 +00:00
Primoz d27a4a71c8 Reorganisation and reordering of the cleaning script. 2022-09-12 13:44:17 +00:00
Primoz 15d792089d Changes in cleaning script:
- target extracted from config to remove rows where target is nan
- prepared sns.heatmap for further missing values analysis
- necessary changes in config and participant p01
- picture of heatmap which shows the values state after cleaning
2022-09-01 10:33:36 +00:00
Primoz cb351e0ff6 Unnecessary line (rows with no target value will be removed in cleaning script). 2022-09-01 10:06:57 +00:00
Primoz 86299d346b Impute phone and sms NAs with 0 2022-09-01 09:57:21 +00:00
Primoz 3f7ec80c18 Preparation a) phone_calls 0 imputation b) remove rows with NaN target 2022-08-31 10:18:50 +00:00
23 changed files with 361 additions and 1703 deletions

3
.gitignore vendored
View File

@ -100,9 +100,6 @@ data/external/*
!/data/external/wiki_tz.csv !/data/external/wiki_tz.csv
!/data/external/main_study_usernames.csv !/data/external/main_study_usernames.csv
!/data/external/timezone.csv !/data/external/timezone.csv
!/data/external/play_store_application_genre_catalogue.csv
!/data/external/play_store_categories_count.csv
data/raw/* data/raw/*
!/data/raw/.gitkeep !/data/raw/.gitkeep

126
README.md
View File

@ -16,7 +16,7 @@ By [MoSHI](https://www.moshi.pitt.edu/), [University of Pittsburgh](https://www.
For RAPIDS installation refer to to the [documentation](https://www.rapids.science/1.8/setup/installation/) For RAPIDS installation refer to to the [documentation](https://www.rapids.science/1.8/setup/installation/)
### For the installation of the Docker version ## For the installation of the Docker version
1. Follow the [instructions](https://www.rapids.science/1.8/setup/installation/) to setup RAPIDS via Docker (from scratch). 1. Follow the [instructions](https://www.rapids.science/1.8/setup/installation/) to setup RAPIDS via Docker (from scratch).
@ -46,7 +46,7 @@ Type R to go to the interactive R session and then:
``` ```
6. Install cr-features module 6. Install cr-features module
From: https://repo.ijs.si/matjazbostic/calculatingfeatures.git -> branch master. From: https://repo.ijs.si/matjazbostic/calculatingfeatures.git -> branch modifications_for_rapids.
Then follow the "cr-features module" section below. Then follow the "cr-features module" section below.
7. Install all required packages from environment.yml, prune also deletes conda packages not present in environment file. 7. Install all required packages from environment.yml, prune also deletes conda packages not present in environment file.
@ -62,7 +62,7 @@ Then follow the "cr-features module" section below.
conda env export --no-builds | sed 's/^.*libgfortran.*$/ - libgfortran/' | sed 's/^.*mkl=.*$/ - mkl/' > environment.yml conda env export --no-builds | sed 's/^.*libgfortran.*$/ - libgfortran/' | sed 's/^.*mkl=.*$/ - mkl/' > environment.yml
``` ```
### cr-features module ## cr-features module
This RAPIDS extension uses cr-features library accessible [here](https://repo.ijs.si/matjazbostic/calculatingfeatures). This RAPIDS extension uses cr-features library accessible [here](https://repo.ijs.si/matjazbostic/calculatingfeatures).
@ -79,123 +79,3 @@ To use cr-features library:
cr-features package has to be built and installed everytime to get the newest version. cr-features package has to be built and installed everytime to get the newest version.
Or an the newest version of the docker image must be used. Or an the newest version of the docker image must be used.
``` ```
## Updating RAPIDS
To update RAPIDS, first pull and merge [origin]( https://github.com/carissalow/rapids), such as with:
```commandline
git fetch --progress "origin" refs/heads/master
git merge --no-ff origin/master
```
Next, update the conda and R virtual environment.
```bash
R -e 'renv::restore(repos = c(CRAN = "https://packagemanager.rstudio.com/all/__linux__/focal/latest"))'
```
## Custom configuration
### Credentials
As mentioned under [Database in RAPIDS documentation](https://www.rapids.science/1.6/snippets/database/), a `credentials.yaml` file is needed to connect to a database.
It should contain:
```yaml
PSQL_STRAW:
database: staw
host: 212.235.208.113
password: password
port: 5432
user: staw_db
```
where`password` needs to be specified as well.
## Possible installation issues
### Missing dependencies for RPostgres
To install `RPostgres` R package (used to connect to the PostgreSQL database), an error might occur:
```text
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because libpq was not found. Try installing:
* deb: libpq-dev (Debian, Ubuntu, etc)
* rpm: postgresql-devel (Fedora, EPEL)
* rpm: postgreql8-devel, psstgresql92-devel, postgresql93-devel, or postgresql94-devel (Amazon Linux)
* csw: postgresql_dev (Solaris)
* brew: libpq (OSX)
If libpq is already installed, check that either:
(i) 'pkg-config' is in your PATH AND PKG_CONFIG_PATH contains a libpq.pc file; or
(ii) 'pg_config' is in your PATH.
If neither can detect , you can set INCLUDE_DIR
and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------[ ERROR MESSAGE ]----------------------------
<stdin>:1:10: fatal error: libpq-fe.h: No such file or directory
compilation terminated.
```
The library requires `libpq` for compiling from source, so install accordingly.
### Timezone environment variable for tidyverse (relevant for WSL2)
One of the R packages, `tidyverse` might need access to the `TZ` environment variable during the installation.
On Ubuntu 20.04 on WSL2 this triggers the following error:
```text
> install.packages('tidyverse')
ERROR: configuration failed for package xml2
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
Warning in system("timedatectl", intern = TRUE) :
running command 'timedatectl' had status 1
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
namespace xml2 1.3.1 is already loaded, but >= 1.3.2 is required
Calls: <Anonymous> ... namespaceImportFrom -> asNamespace -> loadNamespace
Execution halted
ERROR: lazy loading failed for package tidyverse
```
This happens because WSL2 does not use the `timedatectl` service, which provides this variable.
```bash
~$ timedatectl
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
```
and later
```bash
Warning message:
In system("timedatectl", intern = TRUE) :
running command 'timedatectl' had status 1
Execution halted
```
This can be amended by setting the environment variable manually before attempting to install `tidyverse`:
```bash
export TZ='Europe/Ljubljana'
```
Note: if this is needed to avoid runtime issues, you need to either define this environment variable in each new terminal window or (better) define it in your `~/.bashrc` or `~/.bash_profile`.
## Possible runtime issues
### Unix end of line characters
Upon running rapids, an error might occur:
```bash
/usr/bin/env: python3\r: No such file or directory
```
This is due to Windows style end of line characters.
To amend this, I added a `.gitattributes` files to force `git` to checkout `rapids` using Unix EOL characters.
If this still fails, `dos2unix` can be used to change them.
### System has not been booted with systemd as init system (PID 1)
See [the installation issue above](#Timezone-environment-variable-for-tidyverse-(relevant-for-WSL2)).

View File

@ -174,15 +174,6 @@ for provider in config["PHONE_ESM"]["PROVIDERS"].keys():
# files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv",pid=config["PIDS"])) # files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv",pid=config["PIDS"]))
# files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv") # files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
for provider in config["PHONE_SPEECH"]["PROVIDERS"].keys():
if config["PHONE_SPEECH"]["PROVIDERS"][provider]["COMPUTE"]:
files_to_compute.extend(expand("data/raw/{pid}/phone_speech_raw.csv",pid=config["PIDS"]))
files_to_compute.extend(expand("data/raw/{pid}/phone_speech_with_datetime.csv",pid=config["PIDS"]))
files_to_compute.extend(expand("data/interim/{pid}/phone_speech_features/phone_speech_{language}_{provider_key}.csv",pid=config["PIDS"],language=get_script_language(config["PHONE_SPEECH"]["PROVIDERS"][provider]["SRC_SCRIPT"]),provider_key=provider.lower()))
files_to_compute.extend(expand("data/processed/features/{pid}/phone_speech.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
# We can delete these if's as soon as we add feature PROVIDERS to any of these sensors # We can delete these if's as soon as we add feature PROVIDERS to any of these sensors
if isinstance(config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"], dict): if isinstance(config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"], dict):
for provider in config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"].keys(): for provider in config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"].keys():

View File

@ -27,8 +27,6 @@ TIME_SEGMENTS: &time_segments
TAILORED_EVENTS: # Only relevant if TYPE=EVENT TAILORED_EVENTS: # Only relevant if TYPE=EVENT
COMPUTE: True COMPUTE: True
SEGMENTING_METHOD: "30_before" # 30_before, 90_before, stress_event SEGMENTING_METHOD: "30_before" # 30_before, 90_before, stress_event
INTERVAL_OF_INTEREST: 10 # duration of event of interest [minutes]
IOI_ERROR_TOLERANCE: 5 # interval of interest erorr tolerance (before and after IOI) [minutes]
# See https://www.rapids.science/latest/setup/configuration/#timezone-of-your-study # See https://www.rapids.science/latest/setup/configuration/#timezone-of-your-study
TIMEZONE: TIMEZONE:
@ -104,9 +102,9 @@ PHONE_APPLICATIONS_CRASHES:
CONTAINER: applications_crashes CONTAINER: applications_crashes
APPLICATION_CATEGORIES: APPLICATION_CATEGORIES:
CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store) CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
CATALOGUE_FILE: "data/external/play_store_application_genre_catalogue.csv" CATALOGUE_FILE: "data/external/stachl_application_genre_catalogue.csv"
UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether or not to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
SCRAPE_MISSING_CATEGORIES: False # whether to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway SCRAPE_MISSING_CATEGORIES: False # whether or not to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
PROVIDERS: # None implemented yet but this sensor can be used in PHONE_DATA_YIELD PROVIDERS: # None implemented yet but this sensor can be used in PHONE_DATA_YIELD
# See https://www.rapids.science/latest/features/phone-applications-foreground/ # See https://www.rapids.science/latest/features/phone-applications-foreground/
@ -114,32 +112,24 @@ PHONE_APPLICATIONS_FOREGROUND:
CONTAINER: applications CONTAINER: applications
APPLICATION_CATEGORIES: APPLICATION_CATEGORIES:
CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store) CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
CATALOGUE_FILE: "data/external/play_store_application_genre_catalogue.csv" CATALOGUE_FILE: "data/external/stachl_application_genre_catalogue.csv"
# Refer to data/external/play_store_categories_count.csv for a list of categories (genres) and their frequency. PACKAGE_NAMES_HASHED: True
UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether or not to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
SCRAPE_MISSING_CATEGORIES: False # whether to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway SCRAPE_MISSING_CATEGORIES: False # whether or not to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
PROVIDERS: PROVIDERS:
RAPIDS: RAPIDS:
COMPUTE: True COMPUTE: True
INCLUDE_EPISODE_FEATURES: True INCLUDE_EPISODE_FEATURES: True
SINGLE_CATEGORIES: ["Productivity", "Tools", "Communication", "Education", "Social"] SINGLE_CATEGORIES: ["all", "email"]
MULTIPLE_CATEGORIES: MULTIPLE_CATEGORIES:
games: ["Puzzle", "Card", "Casual", "Board", "Strategy", "Trivia", "Word", "Adventure", "Role Playing", "Simulation", "Board, Brain Games", "Racing"] social: ["socialnetworks", "socialmediatools"]
social: ["Communication", "Social", "Dating"] entertainment: ["entertainment", "gamingknowledge", "gamingcasual", "gamingadventure", "gamingstrategy", "gamingtoolscommunity", "gamingroleplaying", "gamingaction", "gaminglogic", "gamingsports", "gamingsimulation"]
productivity: ["Tools", "Productivity", "Finance", "Education", "News & Magazines", "Business", "Books & Reference"]
health: ["Health & Fitness", "Lifestyle", "Food & Drink", "Sports", "Medical", "Parenting"]
entertainment: ["Shopping", "Music & Audio", "Entertainment", "Travel & Local", "Photography", "Video Players & Editors", "Personalization", "House & Home", "Art & Design", "Auto & Vehicles", "Entertainment,Music & Video",
"Puzzle", "Card", "Casual", "Board", "Strategy", "Trivia", "Word", "Adventure", "Role Playing", "Simulation", "Board, Brain Games", "Racing" # Add all games.
]
maps_weather: ["Maps & Navigation", "Weather"]
CUSTOM_CATEGORIES: CUSTOM_CATEGORIES:
SINGLE_APPS: [] social_media: ["com.google.android.youtube", "com.snapchat.android", "com.instagram.android", "com.zhiliaoapp.musically", "com.facebook.katana"]
EXCLUDED_CATEGORIES: ["System", "STRAW"] dating: ["com.tinder", "com.relance.happycouple", "com.kiwi.joyride"]
# Note: A special option here is "is_system_app". SINGLE_APPS: ["top1global", "com.facebook.moments", "com.google.android.youtube", "com.twitter.android"] # There's no entropy for single apps
# This excludes applications that have is_system_app = TRUE, which is a separate column in the table. EXCLUDED_CATEGORIES: []
# However, all of these applications have been assigned System category. EXCLUDED_APPS: ["com.fitbit.FitbitMobile", "com.aware.plugin.upmc.cancer"] # TODO list system apps?
# I will therefore filter by that category, which is a superset and is more complete. JL
EXCLUDED_APPS: []
FEATURES: FEATURES:
APP_EVENTS: ["countevent", "timeoffirstuse", "timeoflastuse", "frequencyentropy"] APP_EVENTS: ["countevent", "timeoffirstuse", "timeoflastuse", "frequencyentropy"]
APP_EPISODES: ["countepisode", "minduration", "maxduration", "meanduration", "sumduration"] APP_EPISODES: ["countepisode", "minduration", "maxduration", "meanduration", "sumduration"]
@ -337,15 +327,6 @@ PHONE_SCREEN:
EPISODE_TYPES: ["unlock"] EPISODE_TYPES: ["unlock"]
SRC_SCRIPT: src/features/phone_screen/rapids/main.py SRC_SCRIPT: src/features/phone_screen/rapids/main.py
# Custom added sensor
PHONE_SPEECH:
CONTAINER: speech
PROVIDERS:
STRAW:
COMPUTE: True
FEATURES: ["meanspeech", "stdspeech", "nlargest", "nsmallest", "medianspeech"]
SRC_SCRIPT: src/features/phone_speech/straw/main.py
# See https://www.rapids.science/latest/features/phone-wifi-connected/ # See https://www.rapids.science/latest/features/phone-wifi-connected/
PHONE_WIFI_CONNECTED: PHONE_WIFI_CONNECTED:
CONTAINER: sensor_wifi CONTAINER: sensor_wifi
@ -729,8 +710,7 @@ ALL_CLEANING_OVERALL:
COMPUTE: True COMPUTE: True
MIN_OVERLAP_FOR_CORR_THRESHOLD: 0.5 MIN_OVERLAP_FOR_CORR_THRESHOLD: 0.5
CORR_THRESHOLD: 0.95 CORR_THRESHOLD: 0.95
STANDARDIZATION: True STANDARDIZATION: False
TARGET_STANDARDIZATION: False
SRC_SCRIPT: src/features/all_cleaning_overall/straw/main.py SRC_SCRIPT: src/features/all_cleaning_overall/straw/main.py
@ -753,6 +733,7 @@ PARAMS_FOR_ANALYSIS:
TARGET: TARGET:
COMPUTE: True COMPUTE: True
LABEL: appraisal_stressfulness_event_mean LABEL: appraisal_stressfulness_event_mean
ALL_LABELS: [PANAS_positive_affect_mean, PANAS_negative_affect_mean, JCQ_job_demand_mean, JCQ_job_control_mean, JCQ_supervisor_support_mean, JCQ_coworker_support_mean, appraisal_stressfulness_period_mean] ALL_LABELS: [PANAS_positive_affect_mean, PANAS_negative_affect_mean, JCQ_job_demand_mean, JCQ_job_control_mean, JCQ_supervisor_support_mean,
JCQ_coworker_support_mean, appraisal_stressfulness_period_mean, appraisal_stressfulness_event_mean, appraisal_threat_mean, appraisal_challenge_mean]
# PANAS_positive_affect_mean, PANAS_negative_affect_mean, JCQ_job_demand_mean, JCQ_job_control_mean, JCQ_supervisor_support_mean, # PANAS_positive_affect_mean, PANAS_negative_affect_mean, JCQ_job_demand_mean, JCQ_job_control_mean, JCQ_supervisor_support_mean,
# JCQ_coworker_support_mean, appraisal_stressfulness_period_mean, appraisal_stressfulness_event_mean, appraisal_threat_mean, appraisal_challenge_mean # JCQ_coworker_support_mean, appraisal_stressfulness_period_mean, appraisal_stressfulness_event_mean, appraisal_threat_mean, appraisal_challenge_mean

File diff suppressed because it is too large Load Diff

View File

@ -1,45 +0,0 @@
genre,n
System,261
Tools,96
Productivity,71
Health & Fitness,60
Finance,54
Communication,39
Music & Audio,39
Shopping,38
Lifestyle,33
Education,28
News & Magazines,24
Maps & Navigation,23
Entertainment,21
Business,18
Travel & Local,18
Books & Reference,16
Social,16
Weather,16
Food & Drink,14
Sports,14
Other,13
Photography,13
Puzzle,13
Video Players & Editors,12
Card,9
Casual,9
Personalization,8
Medical,7
Board,5
Strategy,4
House & Home,3
Trivia,3
Word,3
Adventure,2
Art & Design,2
Auto & Vehicles,2
Dating,2
Role Playing,2
STRAW,2
Simulation,2
"Board,Brain Games",1
"Entertainment,Music & Video",1
Parenting,1
Racing,1
1 genre n
2 System 261
3 Tools 96
4 Productivity 71
5 Health & Fitness 60
6 Finance 54
7 Communication 39
8 Music & Audio 39
9 Shopping 38
10 Lifestyle 33
11 Education 28
12 News & Magazines 24
13 Maps & Navigation 23
14 Entertainment 21
15 Business 18
16 Travel & Local 18
17 Books & Reference 16
18 Social 16
19 Weather 16
20 Food & Drink 14
21 Sports 14
22 Other 13
23 Photography 13
24 Puzzle 13
25 Video Players & Editors 12
26 Card 9
27 Casual 9
28 Personalization 8
29 Medical 7
30 Board 5
31 Strategy 4
32 House & Home 3
33 Trivia 3
34 Word 3
35 Adventure 2
36 Art & Design 2
37 Auto & Vehicles 2
38 Dating 2
39 Role Playing 2
40 STRAW 2
41 Simulation 2
42 Board,Brain Games 1
43 Entertainment,Music & Video 1
44 Parenting 1
45 Racing 1

View File

@ -1,39 +0,0 @@
"""
Please do not make any changes, as RAPIDS is running on tmux server ...
"""
# !
# !
"""
Please do not make any changes, as RAPIDS is running on tmux server ...
"""
# !
# !
"""
Please do not make any changes, as RAPIDS is running on tmux server ...
"""
# !
# !
"""
Please do not make any changes, as RAPIDS is running on tmux server ...
"""
# !
# !
"""
Please do not make any changes, as RAPIDS is running on tmux server ...
"""
# !
# !
"""
Please do not make any changes, as RAPIDS is running on tmux server ...
"""
# !
# !
"""
Please do not make any changes, as RAPIDS is running on tmux server ...
"""
# !
# !
"""
Please do not make any changes, as RAPIDS is running on tmux server ...
"""
# !

View File

@ -1,30 +1,165 @@
name: rapids name: rapids
channels: channels:
- conda-forge - conda-forge
- defaults
dependencies: dependencies:
- auto-sklearn - _libgcc_mutex=0.1
- hmmlearn - _openmp_mutex=4.5
- imbalanced-learn - _py-xgboost-mutex=2.0
- jsonschema - appdirs=1.4.4
- lightgbm - arrow=0.16.0
- matplotlib - asn1crypto=1.4.0
- numpy - astropy=4.2.1
- pandas - attrs=20.3.0
- peakutils - binaryornot=0.4.4
- pip - blas=1.0
- plotly - brotlipy=0.7.0
- python-dateutil - bzip2=1.0.8
- pytz - ca-certificates=2021.7.5
- pywavelets - certifi=2021.5.30
- pyyaml - cffi=1.14.4
- scikit-learn - chardet=3.0.4
- scipy - click=7.1.2
- seaborn - colorama=0.4.4
- setuptools - cookiecutter=1.6.0
- bioconda::snakemake - cryptography=3.3.1
- bioconda::snakemake-minimal - datrie=0.8.2
- tqdm - docutils=0.16
- xgboost - future=0.18.2
- gitdb=4.0.5
- gitdb2=4.0.2
- gitpython=3.1.11
- idna=2.10
- imbalanced-learn=0.6.2
- importlib-metadata=2.0.0
- importlib_metadata=2.0.0
- intel-openmp=2019.4
- jinja2=2.11.2
- jinja2-time=0.2.0
- joblib=1.0.0
- jsonschema=3.2.0
- ld_impl_linux-64=2.36.1
- libblas=3.8.0
- libcblas=3.8.0
- libcxx=10.0.0
- libcxxabi=10.0.0
- libedit=3.1.20191231
- libffi=3.3
- libgcc-ng=11.2.0
- libgfortran
- libgfortran
- libgfortran
- liblapack=3.8.0
- libopenblas=0.3.10
- libstdcxx-ng=11.2.0
- libxgboost=0.90
- libzlib=1.2.11
- lightgbm=3.1.1
- llvm-openmp=10.0.0
- markupsafe=1.1.1
- mkl
- mkl-service=2.3.0
- mkl_fft=1.2.0
- mkl_random=1.1.1
- more-itertools=8.6.0
- ncurses=6.2
- numpy=1.19.2
- numpy-base=1.19.2
- openblas=0.3.4
- openssl=1.1.1k
- pandas=1.1.5
- pbr=5.5.1
- pip=20.3.3
- plotly=4.14.1
- poyo=0.5.0
- psutil=5.7.2
- py-xgboost=0.90
- pycparser=2.20
- pyerfa=1.7.1.1
- pyopenssl=20.0.1
- pysocks=1.7.1
- python=3.7.9
- python-dateutil=2.8.1
- python_abi=3.7
- pytz=2020.4
- pyyaml=5.3.1
- readline=8.0
- requests=2.25.0
- retrying=1.3.3
- setuptools=51.0.0
- six=1.15.0
- smmap=3.0.4
- smmap2=3.0.1
- sqlite=3.33.0
- threadpoolctl=2.1.0
- tk=8.6.10
- tqdm=4.62.0
- urllib3=1.25.11
- wheel=0.36.2
- whichcraft=0.6.1
- wrapt=1.12.1
- xgboost=0.90
- xz=5.2.5
- yaml=0.2.5
- zipp=3.4.0
- zlib=1.2.11
- pip: - pip:
- biosppy - amply==0.1.4
- cr_features>=0.2 - auto-sklearn==0.14.7
- bidict==0.22.0
- biosppy==0.8.0
- build==0.8.0
- cached-property==1.5.2
- cloudpickle==2.2.0
- configargparse==0.15.1
- configspace==0.4.21
- cr-features==0.2.1
- cycler==0.11.0
- cython==0.29.32
- dask==2022.2.0
- decorator==4.4.2
- distributed==2022.2.0
- distro==1.7.0
- emcee==3.1.2
- fonttools==4.33.2
- fsspec==2022.8.2
- h5py==3.6.0
- heapdict==1.0.1
- hmmlearn==0.2.7
- ipython-genutils==0.2.0
- jupyter-core==4.6.3
- kiwisolver==1.4.2
- liac-arff==2.5.0
- locket==1.0.0
- matplotlib==3.5.1
- msgpack==1.0.4
- nbformat==5.0.7
- opencv-python==4.5.5.64
- packaging==21.3
- partd==1.3.0
- peakutils==1.3.3
- pep517==0.13.0
- pillow==9.1.0
- pulp==2.4
- pynisher==0.6.4
- pyparsing==2.4.7
- pyrfr==0.8.3
- pyrsistent==0.15.5
- pywavelets==1.3.0
- ratelimiter==1.2.0.post0
- scikit-learn==0.24.2
- scipy==1.7.3
- seaborn==0.11.2
- shortuuid==1.0.8
- smac==1.2
- snakemake==5.30.2
- sortedcontainers==2.4.0
- tblib==1.7.0
- tomli==2.0.1
- toolz==0.12.0
- toposort==1.5
- tornado==6.2
- traitlets==4.3.3
- typing-extensions==4.2.0
- zict==2.2.0
prefix: /opt/conda/envs/rapids

334
renv.lock
View File

@ -1,6 +1,6 @@
{ {
"R": { "R": {
"Version": "4.2.3", "Version": "4.1.2",
"Repositories": [ "Repositories": [
{ {
"Name": "CRAN", "Name": "CRAN",
@ -46,10 +46,10 @@
}, },
"Hmisc": { "Hmisc": {
"Package": "Hmisc", "Package": "Hmisc",
"Version": "5.0-1", "Version": "4.4-2",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "bf9fe82c010a468fb32f913ff56d65e1" "Hash": "66458e906b2112a8b1639964efd77d7c"
}, },
"KernSmooth": { "KernSmooth": {
"Package": "KernSmooth", "Package": "KernSmooth",
@ -104,7 +104,7 @@
"Package": "RPostgres", "Package": "RPostgres",
"Version": "1.4.4", "Version": "1.4.4",
"Source": "Repository", "Source": "Repository",
"Repository": "RSPM", "Repository": "CRAN",
"Hash": "c593ecb8dbca9faf3906431be610ca28" "Hash": "c593ecb8dbca9faf3906431be610ca28"
}, },
"Rcpp": { "Rcpp": {
@ -181,7 +181,7 @@
"Package": "base64enc", "Package": "base64enc",
"Version": "0.1-3", "Version": "0.1-3",
"Source": "Repository", "Source": "Repository",
"Repository": "RSPM", "Repository": "CRAN",
"Hash": "543776ae6848fde2f48ff3816d0628bc" "Hash": "543776ae6848fde2f48ff3816d0628bc"
}, },
"bit": { "bit": {
@ -221,24 +221,17 @@
}, },
"broom": { "broom": {
"Package": "broom", "Package": "broom",
"Version": "1.0.4", "Version": "0.7.3",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "f62b2504021369a2449c54bbda362d30" "Hash": "5581a5ddc8fe2ac5e0d092ec2de4c4ae"
},
"cachem": {
"Package": "cachem",
"Version": "1.0.7",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "cda74447c42f529de601fe4d4050daef"
}, },
"callr": { "callr": {
"Package": "callr", "Package": "callr",
"Version": "3.7.3", "Version": "3.5.1",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "9b2191ede20fa29828139b9900922e51" "Hash": "b7d7f1e926dfcd57c74ce93f5c048e80"
}, },
"caret": { "caret": {
"Package": "caret", "Package": "caret",
@ -270,10 +263,10 @@
}, },
"cli": { "cli": {
"Package": "cli", "Package": "cli",
"Version": "3.6.1", "Version": "2.2.0",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "89e6d8219950eac806ae0c489052048a" "Hash": "3ef298932294b775fa0a3eeaa3a645b0"
}, },
"clipr": { "clipr": {
"Package": "clipr", "Package": "clipr",
@ -293,7 +286,7 @@
"Package": "codetools", "Package": "codetools",
"Version": "0.2-18", "Version": "0.2-18",
"Source": "Repository", "Source": "Repository",
"Repository": "RSPM", "Repository": "CRAN",
"Hash": "019388fc48e48b3da0d3a76ff94608a8" "Hash": "019388fc48e48b3da0d3a76ff94608a8"
}, },
"colorspace": { "colorspace": {
@ -310,13 +303,6 @@
"Repository": "RSPM", "Repository": "RSPM",
"Hash": "0f22be39ec1d141fd03683c06f3a6e67" "Hash": "0f22be39ec1d141fd03683c06f3a6e67"
}, },
"conflicted": {
"Package": "conflicted",
"Version": "1.2.0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "bb097fccb22d156624fd07cd2894ddb6"
},
"corpcor": { "corpcor": {
"Package": "corpcor", "Package": "corpcor",
"Version": "1.6.9", "Version": "1.6.9",
@ -333,10 +319,10 @@
}, },
"cpp11": { "cpp11": {
"Package": "cpp11", "Package": "cpp11",
"Version": "0.4.3", "Version": "0.2.4",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "ed588261931ee3be2c700d22e94a29ab" "Hash": "ba66e5a750d39067d888aa7af797fed2"
}, },
"crayon": { "crayon": {
"Package": "crayon", "Package": "crayon",
@ -368,10 +354,10 @@
}, },
"dbplyr": { "dbplyr": {
"Package": "dbplyr", "Package": "dbplyr",
"Version": "2.3.2", "Version": "2.1.1",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "CRAN",
"Hash": "d24305b92db333726aed162a2c23a147" "Hash": "1f37fa4ab2f5f7eded42f78b9a887182"
}, },
"desc": { "desc": {
"Package": "desc", "Package": "desc",
@ -396,17 +382,17 @@
}, },
"dplyr": { "dplyr": {
"Package": "dplyr", "Package": "dplyr",
"Version": "1.1.1", "Version": "1.0.5",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "eb5742d256a0d9306d85ea68756d8187" "Hash": "d0d76c11ec807eb3f000eba4e3eb0f68"
}, },
"dtplyr": { "dtplyr": {
"Package": "dtplyr", "Package": "dtplyr",
"Version": "1.3.1", "Version": "1.1.0",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "54ed3ea01b11e81a86544faaecfef8e2" "Hash": "1e14e4c5b2814de5225312394bc316da"
}, },
"e1071": { "e1071": {
"Package": "e1071", "Package": "e1071",
@ -433,7 +419,7 @@
"Package": "evaluate", "Package": "evaluate",
"Version": "0.14", "Version": "0.14",
"Source": "Repository", "Source": "Repository",
"Repository": "RSPM", "Repository": "CRAN",
"Hash": "ec8ca05cffcc70569eaaad8469d2a3a7" "Hash": "ec8ca05cffcc70569eaaad8469d2a3a7"
}, },
"fansi": { "fansi": {
@ -466,10 +452,10 @@
}, },
"forcats": { "forcats": {
"Package": "forcats", "Package": "forcats",
"Version": "1.0.0", "Version": "0.5.0",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "1a0a9a3d5083d0d573c4214576f1e690" "Hash": "1cb4279e697650f0bd78cd3601ee7576"
}, },
"foreach": { "foreach": {
"Package": "foreach", "Package": "foreach",
@ -506,13 +492,6 @@
"Repository": "RSPM", "Repository": "RSPM",
"Hash": "f568ce73d3d59582b0f7babd0eb33d07" "Hash": "f568ce73d3d59582b0f7babd0eb33d07"
}, },
"gargle": {
"Package": "gargle",
"Version": "1.3.0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "bb3208dcdfeb2e68bf33c87601b3cbe3"
},
"gclus": { "gclus": {
"Package": "gclus", "Package": "gclus",
"Version": "1.3.2", "Version": "1.3.2",
@ -536,10 +515,10 @@
}, },
"ggplot2": { "ggplot2": {
"Package": "ggplot2", "Package": "ggplot2",
"Version": "3.4.1", "Version": "3.3.2",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "d494daf77c4aa7f084dbbe6ca5dcaca7" "Hash": "4ded8b439797f7b1693bd3d238d0106b"
}, },
"ggraph": { "ggraph": {
"Package": "ggraph", "Package": "ggraph",
@ -578,30 +557,16 @@
}, },
"glue": { "glue": {
"Package": "glue", "Package": "glue",
"Version": "1.6.2", "Version": "1.4.2",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "4f2596dfb05dac67b9dc558e5c6fba2e" "Hash": "6efd734b14c6471cfe443345f3e35e29"
},
"googledrive": {
"Package": "googledrive",
"Version": "2.1.0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "e88ba642951bc8d1898ba0d12581850b"
},
"googlesheets4": {
"Package": "googlesheets4",
"Version": "1.1.0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "fd7b97bd862a14297b0bb7ed28a3dada"
}, },
"gower": { "gower": {
"Package": "gower", "Package": "gower",
"Version": "0.2.2", "Version": "0.2.2",
"Source": "Repository", "Source": "Repository",
"Repository": "RSPM", "Repository": "CRAN",
"Hash": "be6a2b3529928bd803d1c437d1d43152" "Hash": "be6a2b3529928bd803d1c437d1d43152"
}, },
"graphlayouts": { "graphlayouts": {
@ -634,10 +599,10 @@
}, },
"haven": { "haven": {
"Package": "haven", "Package": "haven",
"Version": "2.5.2", "Version": "2.3.1",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "8b331e659e67d757db0fcc28e689c501" "Hash": "221d0ad75dfa03ebf17b1a4cc5c31dfc"
}, },
"highr": { "highr": {
"Package": "highr", "Package": "highr",
@ -648,10 +613,10 @@
}, },
"hms": { "hms": {
"Package": "hms", "Package": "hms",
"Version": "1.1.3", "Version": "1.1.1",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "CRAN",
"Hash": "b59377caa7ed00fa41808342002138f9" "Hash": "5b8a2dd0fdbe2ab4f6081e6c7be6dfca"
}, },
"htmlTable": { "htmlTable": {
"Package": "htmlTable", "Package": "htmlTable",
@ -683,10 +648,10 @@
}, },
"httr": { "httr": {
"Package": "httr", "Package": "httr",
"Version": "1.4.5", "Version": "1.4.2",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "f6844033201269bec3ca0097bc6c97b3" "Hash": "a525aba14184fec243f9eaec62fbed43"
}, },
"huge": { "huge": {
"Package": "huge", "Package": "huge",
@ -695,13 +660,6 @@
"Repository": "RSPM", "Repository": "RSPM",
"Hash": "a4cde4dd1d2551edb99a3273a4ad34ea" "Hash": "a4cde4dd1d2551edb99a3273a4ad34ea"
}, },
"ids": {
"Package": "ids",
"Version": "1.0.1",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "99df65cfef20e525ed38c3d2577f7190"
},
"igraph": { "igraph": {
"Package": "igraph", "Package": "igraph",
"Version": "1.2.6", "Version": "1.2.6",
@ -746,10 +704,10 @@
}, },
"jsonlite": { "jsonlite": {
"Package": "jsonlite", "Package": "jsonlite",
"Version": "1.8.4", "Version": "1.7.2",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "a4269a09a9b865579b2635c77e572374" "Hash": "98138e0994d41508c7a6b84a0600cfcb"
}, },
"knitr": { "knitr": {
"Package": "knitr", "Package": "knitr",
@ -802,10 +760,10 @@
}, },
"lifecycle": { "lifecycle": {
"Package": "lifecycle", "Package": "lifecycle",
"Version": "1.0.3", "Version": "1.0.0",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "001cecbeac1cff9301bdc3775ee46a86" "Hash": "3471fb65971f1a7b2d4ae7848cf2db8d"
}, },
"listenv": { "listenv": {
"Package": "listenv", "Package": "listenv",
@ -816,17 +774,17 @@
}, },
"lubridate": { "lubridate": {
"Package": "lubridate", "Package": "lubridate",
"Version": "1.9.2", "Version": "1.7.9.2",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "e25f18436e3efd42c7c590a1c4c15390" "Hash": "5b5b02f621d39a499def7923a5aee746"
}, },
"magrittr": { "magrittr": {
"Package": "magrittr", "Package": "magrittr",
"Version": "2.0.3", "Version": "2.0.1",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "7ce2733a9826b3aeb1775d56fd305472" "Hash": "41287f1ac7d28a92f0a286ed507928d3"
}, },
"markdown": { "markdown": {
"Package": "markdown", "Package": "markdown",
@ -842,13 +800,6 @@
"Repository": "RSPM", "Repository": "RSPM",
"Hash": "67101e7448dfd9add4ac418623060262" "Hash": "67101e7448dfd9add4ac418623060262"
}, },
"memoise": {
"Package": "memoise",
"Version": "2.0.1",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "e2817ccf4a065c5d9d7f2cfbe7c1d78c"
},
"mgcv": { "mgcv": {
"Package": "mgcv", "Package": "mgcv",
"Version": "1.8-33", "Version": "1.8-33",
@ -879,10 +830,10 @@
}, },
"modelr": { "modelr": {
"Package": "modelr", "Package": "modelr",
"Version": "0.1.11", "Version": "0.1.8",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "4f50122dc256b1b6996a4703fecea821" "Hash": "9fd59716311ee82cba83dc2826fc5577"
}, },
"munsell": { "munsell": {
"Package": "munsell", "Package": "munsell",
@ -937,7 +888,7 @@
"Package": "parallelly", "Package": "parallelly",
"Version": "1.29.0", "Version": "1.29.0",
"Source": "Repository", "Source": "Repository",
"Repository": "RSPM", "Repository": "CRAN",
"Hash": "b5f399c9ce96977e22ef32c20b6cfe87" "Hash": "b5f399c9ce96977e22ef32c20b6cfe87"
}, },
"pbapply": { "pbapply": {
@ -956,10 +907,10 @@
}, },
"pillar": { "pillar": {
"Package": "pillar", "Package": "pillar",
"Version": "1.9.0", "Version": "1.4.7",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "15da5a8412f317beeee6175fbc76f4bb" "Hash": "3b3dd89b2ee115a8b54e93a34cd546b4"
}, },
"pkgbuild": { "pkgbuild": {
"Package": "pkgbuild", "Package": "pkgbuild",
@ -1026,10 +977,10 @@
}, },
"processx": { "processx": {
"Package": "processx", "Package": "processx",
"Version": "3.8.0", "Version": "3.4.5",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "a33ee2d9bf07564efb888ad98410da84" "Hash": "22aab6098cb14edd0a5973a8438b569b"
}, },
"prodlim": { "prodlim": {
"Package": "prodlim", "Package": "prodlim",
@ -1049,7 +1000,7 @@
"Package": "progressr", "Package": "progressr",
"Version": "0.9.0", "Version": "0.9.0",
"Source": "Repository", "Source": "Repository",
"Repository": "RSPM", "Repository": "CRAN",
"Hash": "ca0d80ecc29903f7579edbabd91f4199" "Hash": "ca0d80ecc29903f7579edbabd91f4199"
}, },
"promises": { "promises": {
@ -1082,10 +1033,10 @@
}, },
"purrr": { "purrr": {
"Package": "purrr", "Package": "purrr",
"Version": "1.0.1", "Version": "0.3.4",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "d71c815267c640f17ddbf7f16144b4bb" "Hash": "97def703420c8ab10d8f0e6c72101e02"
}, },
"qap": { "qap": {
"Package": "qap", "Package": "qap",
@ -1101,13 +1052,6 @@
"Repository": "RSPM", "Repository": "RSPM",
"Hash": "d35964686307333a7121eb41c7dcd4e0" "Hash": "d35964686307333a7121eb41c7dcd4e0"
}, },
"ragg": {
"Package": "ragg",
"Version": "1.2.5",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "690bc058ea2b1b8a407d3cfe3dce3ef9"
},
"rappdirs": { "rappdirs": {
"Package": "rappdirs", "Package": "rappdirs",
"Version": "0.3.3", "Version": "0.3.3",
@ -1117,17 +1061,17 @@
}, },
"readr": { "readr": {
"Package": "readr", "Package": "readr",
"Version": "2.1.4", "Version": "1.4.0",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "b5047343b3825f37ad9d3b5d89aa1078" "Hash": "2639976851f71f330264a9c9c3d43a61"
}, },
"readxl": { "readxl": {
"Package": "readxl", "Package": "readxl",
"Version": "1.4.2", "Version": "1.3.1",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "2e6020b1399d95f947ed867045e9ca17" "Hash": "63537c483c2dbec8d9e3183b3735254a"
}, },
"recipes": { "recipes": {
"Package": "recipes", "Package": "recipes",
@ -1166,10 +1110,10 @@
}, },
"reprex": { "reprex": {
"Package": "reprex", "Package": "reprex",
"Version": "2.0.2", "Version": "0.3.0",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "d66fe009d4c20b7ab1927eb405db9ee2" "Hash": "b06bfb3504cc8a4579fd5567646f745b"
}, },
"reshape2": { "reshape2": {
"Package": "reshape2", "Package": "reshape2",
@ -1194,10 +1138,10 @@
}, },
"rlang": { "rlang": {
"Package": "rlang", "Package": "rlang",
"Version": "1.1.0", "Version": "0.4.10",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "dc079ccd156cde8647360f473c1fa718" "Hash": "599df23c40a4fce9c7b4764f28c37857"
}, },
"rmarkdown": { "rmarkdown": {
"Package": "rmarkdown", "Package": "rmarkdown",
@ -1229,24 +1173,24 @@
}, },
"rstudioapi": { "rstudioapi": {
"Package": "rstudioapi", "Package": "rstudioapi",
"Version": "0.14", "Version": "0.13",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "690bd2acc42a9166ce34845884459320" "Hash": "06c85365a03fdaf699966cc1d3cf53ea"
}, },
"rvest": { "rvest": {
"Package": "rvest", "Package": "rvest",
"Version": "1.0.3", "Version": "0.3.6",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "a4a5ac819a467808c60e36e92ddf195e" "Hash": "a9795ccb2d608330e841998b67156764"
}, },
"scales": { "scales": {
"Package": "scales", "Package": "scales",
"Version": "1.2.1", "Version": "1.1.1",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "906cb23d2f1c5680b8ce439b44c6fa63" "Hash": "6f76f71042411426ec8df6c54f34e6dd"
}, },
"selectr": { "selectr": {
"Package": "selectr", "Package": "selectr",
@ -1292,17 +1236,17 @@
}, },
"stringi": { "stringi": {
"Package": "stringi", "Package": "stringi",
"Version": "1.7.12", "Version": "1.5.3",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "ca8bd84263c77310739d2cf64d84d7c9" "Hash": "a063ebea753c92910a4cca7b18bc1f05"
}, },
"stringr": { "stringr": {
"Package": "stringr", "Package": "stringr",
"Version": "1.5.0", "Version": "1.4.0",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "CRAN",
"Hash": "671a4d384ae9d32fc47a14e98bfa3dc8" "Hash": "0759e6b6c0957edb1311028a49a35e76"
}, },
"survival": { "survival": {
"Package": "survival", "Package": "survival",
@ -1318,13 +1262,6 @@
"Repository": "RSPM", "Repository": "RSPM",
"Hash": "b227d13e29222b4574486cfcbde077fa" "Hash": "b227d13e29222b4574486cfcbde077fa"
}, },
"systemfonts": {
"Package": "systemfonts",
"Version": "1.0.4",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "90b28393209827327de889f49935140a"
},
"testthat": { "testthat": {
"Package": "testthat", "Package": "testthat",
"Version": "3.0.1", "Version": "3.0.1",
@ -1332,19 +1269,12 @@
"Repository": "RSPM", "Repository": "RSPM",
"Hash": "17826764cb92d8b5aae6619896e5a161" "Hash": "17826764cb92d8b5aae6619896e5a161"
}, },
"textshaping": {
"Package": "textshaping",
"Version": "0.3.6",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "1ab6223d3670fac7143202cb6a2d43d5"
},
"tibble": { "tibble": {
"Package": "tibble", "Package": "tibble",
"Version": "3.2.1", "Version": "3.0.4",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "a84e2cc86d07289b3b6f5069df7a004c" "Hash": "71dffd8544691c520dd8e41ed2d7e070"
}, },
"tidygraph": { "tidygraph": {
"Package": "tidygraph", "Package": "tidygraph",
@ -1355,24 +1285,24 @@
}, },
"tidyr": { "tidyr": {
"Package": "tidyr", "Package": "tidyr",
"Version": "1.3.0", "Version": "1.1.2",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "e47debdc7ce599b070c8e78e8ac0cfcf" "Hash": "c40b2d5824d829190f4b825f4496dfae"
}, },
"tidyselect": { "tidyselect": {
"Package": "tidyselect", "Package": "tidyselect",
"Version": "1.2.0", "Version": "1.1.0",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "79540e5fcd9e0435af547d885f184fd5" "Hash": "6ea435c354e8448819627cf686f66e0a"
}, },
"tidyverse": { "tidyverse": {
"Package": "tidyverse", "Package": "tidyverse",
"Version": "2.0.0", "Version": "1.3.0",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "c328568cd14ea89a83bd4ca7f54ae07e" "Hash": "bd51be662f359fa99021f3d51e911490"
}, },
"timeDate": { "timeDate": {
"Package": "timeDate", "Package": "timeDate",
@ -1381,13 +1311,6 @@
"Repository": "RSPM", "Repository": "RSPM",
"Hash": "fde4fc571f5f61978652c229d4713845" "Hash": "fde4fc571f5f61978652c229d4713845"
}, },
"timechange": {
"Package": "timechange",
"Version": "0.2.0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "8548b44f79a35ba1791308b61e6012d7"
},
"tinytex": { "tinytex": {
"Package": "tinytex", "Package": "tinytex",
"Version": "0.28", "Version": "0.28",
@ -1409,13 +1332,6 @@
"Repository": "RSPM", "Repository": "RSPM",
"Hash": "fc77eb5297507cccfa3349a606061030" "Hash": "fc77eb5297507cccfa3349a606061030"
}, },
"tzdb": {
"Package": "tzdb",
"Version": "0.3.0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "b2e1cbce7c903eaf23ec05c58e59fb5e"
},
"utf8": { "utf8": {
"Package": "utf8", "Package": "utf8",
"Version": "1.1.4", "Version": "1.1.4",
@ -1423,19 +1339,12 @@
"Repository": "RSPM", "Repository": "RSPM",
"Hash": "4a5081acfb7b81a572e4384a7aaf2af1" "Hash": "4a5081acfb7b81a572e4384a7aaf2af1"
}, },
"uuid": {
"Package": "uuid",
"Version": "1.1-0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "f1cb46c157d080b729159d407be83496"
},
"vctrs": { "vctrs": {
"Package": "vctrs", "Package": "vctrs",
"Version": "0.6.1", "Version": "0.3.8",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "CRAN",
"Hash": "06eceb3a5d716fd0654cc23ca3d71a99" "Hash": "ecf749a1b39ea72bd9b51b76292261f1"
}, },
"viridis": { "viridis": {
"Package": "viridis", "Package": "viridis",
@ -1451,13 +1360,6 @@
"Repository": "RSPM", "Repository": "RSPM",
"Hash": "ce4f6271baa94776db692f1cb2055bee" "Hash": "ce4f6271baa94776db692f1cb2055bee"
}, },
"vroom": {
"Package": "vroom",
"Version": "1.6.1",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "7015a74373b83ffaef64023f4a0f5033"
},
"waldo": { "waldo": {
"Package": "waldo", "Package": "waldo",
"Version": "0.2.3", "Version": "0.2.3",
@ -1474,10 +1376,10 @@
}, },
"withr": { "withr": {
"Package": "withr", "Package": "withr",
"Version": "2.5.0", "Version": "2.3.0",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "c0e49a9760983e81e55cdd9be92e7182" "Hash": "7307d79f58d1885b38c4f4f1a8cb19dd"
}, },
"xfun": { "xfun": {
"Package": "xfun", "Package": "xfun",
@ -1488,10 +1390,10 @@
}, },
"xml2": { "xml2": {
"Package": "xml2", "Package": "xml2",
"Version": "1.3.3", "Version": "1.3.2",
"Source": "Repository", "Source": "Repository",
"Repository": "CRAN", "Repository": "RSPM",
"Hash": "40682ed6a969ea5abfd351eb67833adc" "Hash": "d4d71a75dd3ea9eb5fa28cc21f9585e2"
}, },
"xtable": { "xtable": {
"Package": "xtable", "Package": "xtable",

View File

@ -345,19 +345,6 @@ rule esm_features:
script: script:
"../src/features/entry.py" "../src/features/entry.py"
rule phone_speech_python_features:
input:
sensor_data = "data/raw/{pid}/phone_speech_with_datetime.csv",
time_segments_labels = "data/interim/time_segments/{pid}_time_segments_labels.csv"
params:
provider = lambda wildcards: config["PHONE_SPEECH"]["PROVIDERS"][wildcards.provider_key.upper()],
provider_key = "{provider_key}",
sensor_key = "phone_speech"
output:
"data/interim/{pid}/phone_speech_features/phone_speech_python_{provider_key}.csv"
script:
"../src/features/entry.py"
rule phone_keyboard_python_features: rule phone_keyboard_python_features:
input: input:
sensor_data = "data/raw/{pid}/phone_keyboard_with_datetime.csv", sensor_data = "data/raw/{pid}/phone_keyboard_with_datetime.csv",

View File

@ -247,8 +247,6 @@ rule empatica_readable_datetime:
include_past_periodic_segments = config["TIME_SEGMENTS"]["INCLUDE_PAST_PERIODIC_SEGMENTS"] include_past_periodic_segments = config["TIME_SEGMENTS"]["INCLUDE_PAST_PERIODIC_SEGMENTS"]
output: output:
"data/raw/{pid}/empatica_{sensor}_with_datetime.csv" "data/raw/{pid}/empatica_{sensor}_with_datetime.csv"
resources:
mem_mb=50000
script: script:
"../src/data/datetime/readable_datetime.R" "../src/data/datetime/readable_datetime.R"

View File

@ -29,17 +29,24 @@ get_genre <- function(apps){
apps <- read.csv(snakemake@input[[1]], stringsAsFactors = F) apps <- read.csv(snakemake@input[[1]], stringsAsFactors = F)
genre_catalogue <- data.frame() genre_catalogue <- data.frame()
catalogue_source <- snakemake@params[["catalogue_source"]] catalogue_source <- snakemake@params[["catalogue_source"]]
package_names_hashed <- snakemake@params[["package_names_hashed"]]
update_catalogue_file <- snakemake@params[["update_catalogue_file"]] update_catalogue_file <- snakemake@params[["update_catalogue_file"]]
scrape_missing_genres <- snakemake@params[["scrape_missing_genres"]] scrape_missing_genres <- snakemake@params[["scrape_missing_genres"]]
apps_with_genre <- data.frame(matrix(ncol=length(colnames(apps)) + 1,nrow=0, dimnames=list(NULL, c(colnames(apps), "genre")))) apps_with_genre <- data.frame(matrix(ncol=length(colnames(apps)) + 1,nrow=0, dimnames=list(NULL, c(colnames(apps), "genre"))))
if (length(package_names_hashed) == 0) {package_names_hashed <- FALSE}
if(nrow(apps) > 0){ if(nrow(apps) > 0){
if(catalogue_source == "GOOGLE"){ if(catalogue_source == "GOOGLE"){
apps_with_genre <- apps %>% mutate(genre = NA_character_) apps_with_genre <- apps %>% mutate(genre = NA_character_)
} else if(catalogue_source == "FILE"){ } else if(catalogue_source == "FILE"){
genre_catalogue <- read.csv(snakemake@params[["catalogue_file"]], colClasses = c("character", "character")) genre_catalogue <- read.csv(snakemake@params[["catalogue_file"]], colClasses = c("character", "character"))
if (package_names_hashed) {
apps_with_genre <- left_join(apps, genre_catalogue, by = "package_hash")
} else {
apps_with_genre <- left_join(apps, genre_catalogue, by = "package_name") apps_with_genre <- left_join(apps, genre_catalogue, by = "package_name")
} }
}
if(catalogue_source == "GOOGLE" || (catalogue_source == "FILE" && scrape_missing_genres)){ if(catalogue_source == "GOOGLE" || (catalogue_source == "FILE" && scrape_missing_genres)){
apps_without_genre <- (apps_with_genre %>% filter(is.na(genre)) %>% distinct(package_name))$package_name apps_without_genre <- (apps_with_genre %>% filter(is.na(genre)) %>% distinct(package_name))$package_name

View File

@ -349,24 +349,3 @@ PHONE_WIFI_VISIBLE:
COLUMN_MAPPINGS: COLUMN_MAPPINGS:
SCRIPTS: # List any python or r scripts that mutate your raw data SCRIPTS: # List any python or r scripts that mutate your raw data
PHONE_SPEECH:
ANDROID:
RAPIDS_COLUMN_MAPPINGS:
TIMESTAMP: timestamp
DEVICE_ID: device_id
SPEECH_PROPORTION: speech_proportion
MUTATION:
COLUMN_MAPPINGS:
SCRIPTS: # List any python or r scripts that mutate your raw data
IOS:
RAPIDS_COLUMN_MAPPINGS:
TIMESTAMP: timestamp
DEVICE_ID: device_id
SPEECH_PROPORTION: speech_proportion
MUTATION:
COLUMN_MAPPINGS:
SCRIPTS: # List any python or r scripts that mutate your raw data

View File

@ -136,9 +136,8 @@ def patch_ibi_with_bvp(ibi_data, bvp_data):
# Begin with the cr-features part # Begin with the cr-features part
try: try:
ibi_data, ibi_start_timestamp = empatica2d_to_array(ibi_data_file) ibi_data, ibi_start_timestamp = empatica2d_to_array(ibi_data_file)
except (IndexError, KeyError) as e: except IndexError as e:
# Checks whether IBI.csv is empty # Checks whether IBI.csv is empty
# It may raise a KeyError if df is empty here: startTimeStamp = df.time[0]
df_test = pd.read_csv(ibi_data_file, names=['timings', 'inter_beat_interval'], header=None) df_test = pd.read_csv(ibi_data_file, names=['timings', 'inter_beat_interval'], header=None)
if df_test.empty: if df_test.empty:
df_test['timestamp'] = df_test['timings'] df_test['timestamp'] = df_test['timings']

View File

@ -118,11 +118,6 @@ PHONE_SCREEN:
- DEVICE_ID - DEVICE_ID
- SCREEN_STATUS - SCREEN_STATUS
PHONE_SPEECH:
- TIMESTAMP
- DEVICE_ID
- SPEECH_PROPORTION
PHONE_WIFI_CONNECTED: PHONE_WIFI_CONNECTED:
- TIMESTAMP - TIMESTAMP
- DEVICE_ID - DEVICE_ID

View File

@ -36,9 +36,6 @@ def straw_cleaning(sensor_data_files, provider):
phone_data_yield_unit = provider["PHONE_DATA_YIELD_FEATURE"].split("_")[3].lower() phone_data_yield_unit = provider["PHONE_DATA_YIELD_FEATURE"].split("_")[3].lower()
phone_data_yield_column = "phone_data_yield_rapids_ratiovalidyielded" + phone_data_yield_unit phone_data_yield_column = "phone_data_yield_rapids_ratiovalidyielded" + phone_data_yield_unit
if features.empty:
return features
features = edy.calculate_empatica_data_yield(features) features = edy.calculate_empatica_data_yield(features)
if not phone_data_yield_column in features.columns and not "empatica_data_yield" in features.columns: if not phone_data_yield_column in features.columns and not "empatica_data_yield" in features.columns:
@ -120,7 +117,7 @@ def straw_cleaning(sensor_data_files, provider):
esm_cols = features.loc[:, features.columns.str.startswith('phone_esm_straw')] esm_cols = features.loc[:, features.columns.str.startswith('phone_esm_straw')]
if provider["COLS_VAR_THRESHOLD"]: if provider["COLS_VAR_THRESHOLD"]:
features.drop(features.std(numeric_only=True)[features.std(numeric_only=True) == 0].index.values, axis=1, inplace=True) features.drop(features.std()[features.std() == 0].index.values, axis=1, inplace=True)
fe5 = features.copy() fe5 = features.copy()
@ -134,7 +131,7 @@ def straw_cleaning(sensor_data_files, provider):
valid_features = features[numerical_cols].loc[:, features[numerical_cols].isna().sum() < drop_corr_features['MIN_OVERLAP_FOR_CORR_THRESHOLD'] * features[numerical_cols].shape[0]] valid_features = features[numerical_cols].loc[:, features[numerical_cols].isna().sum() < drop_corr_features['MIN_OVERLAP_FOR_CORR_THRESHOLD'] * features[numerical_cols].shape[0]]
corr_matrix = valid_features.corr().abs() corr_matrix = valid_features.corr().abs()
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool)) upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))
to_drop = [column for column in upper.columns if any(upper[column] > drop_corr_features["CORR_THRESHOLD"])] to_drop = [column for column in upper.columns if any(upper[column] > drop_corr_features["CORR_THRESHOLD"])]
features.drop(to_drop, axis=1, inplace=True) features.drop(to_drop, axis=1, inplace=True)
@ -150,15 +147,13 @@ def straw_cleaning(sensor_data_files, provider):
return features return features
def impute(df, method='zero'):
def k_nearest(df): def k_nearest(df):
pd.set_option('display.max_columns', None) pd.set_option('display.max_columns', None)
imputer = KNNImputer(n_neighbors=3) imputer = KNNImputer(n_neighbors=3)
return pd.DataFrame(imputer.fit_transform(df), columns=df.columns) return pd.DataFrame(imputer.fit_transform(df), columns=df.columns)
def impute(df, method='zero'):
return { return {
'zero': df.fillna(0), 'zero': df.fillna(0),
'high_number': df.fillna(1500), 'high_number': df.fillna(1500),
@ -167,7 +162,6 @@ def impute(df, method='zero'):
'knn': k_nearest(df) 'knn': k_nearest(df)
}[method] }[method]
def graph_bf_af(features, phase_name, plt_flag=False): def graph_bf_af(features, phase_name, plt_flag=False):
if plt_flag: if plt_flag:
sns.set(rc={"figure.figsize":(16, 8)}) sns.set(rc={"figure.figsize":(16, 8)})

View File

@ -87,7 +87,6 @@ def straw_cleaning(sensor_data_files, provider, target):
if features.empty: if features.empty:
return pd.DataFrame(columns=excluded_columns) return pd.DataFrame(columns=excluded_columns)
# (3) CONTEXTUAL IMPUTATION # (3) CONTEXTUAL IMPUTATION
# Impute selected phone features with a high number # Impute selected phone features with a high number
@ -146,7 +145,7 @@ def straw_cleaning(sensor_data_files, provider, target):
# (5) REMOVE COLS WHERE VARIANCE IS 0 # (5) REMOVE COLS WHERE VARIANCE IS 0
if provider["COLS_VAR_THRESHOLD"]: if provider["COLS_VAR_THRESHOLD"]:
features.drop(features.std(numeric_only=True)[features.std(numeric_only=True) == 0].index.values, axis=1, inplace=True) features.drop(features.std()[features.std() == 0].index.values, axis=1, inplace=True)
graph_bf_af(features, "6variance_drop") graph_bf_af(features, "6variance_drop")
@ -170,12 +169,8 @@ def straw_cleaning(sensor_data_files, provider, target):
# Expected warning within this code block # Expected warning within this code block
with warnings.catch_warnings(): with warnings.catch_warnings():
warnings.simplefilter("ignore", category=RuntimeWarning) warnings.simplefilter("ignore", category=RuntimeWarning)
if provider["TARGET_STANDARDIZATION"]:
features.loc[:, ~features.columns.isin(excluded_columns + ["pid"] + nominal_cols)] = \ features.loc[:, ~features.columns.isin(excluded_columns + ["pid"] + nominal_cols)] = \
features.loc[:, ~features.columns.isin(excluded_columns + nominal_cols)].groupby('pid').transform(lambda x: StandardScaler().fit_transform(x.values[:,np.newaxis]).ravel()) features.loc[:, ~features.columns.isin(excluded_columns + nominal_cols)].groupby('pid').transform(lambda x: StandardScaler().fit_transform(x.values[:,np.newaxis]).ravel())
else:
features.loc[:, ~features.columns.isin(excluded_columns + ["pid"] + nominal_cols + ['phone_esm_straw_' + target])] = \
features.loc[:, ~features.columns.isin(excluded_columns + nominal_cols + ['phone_esm_straw_' + target])].groupby('pid').transform(lambda x: StandardScaler().fit_transform(x.values[:,np.newaxis]).ravel())
graph_bf_af(features, "8standardization") graph_bf_af(features, "8standardization")
@ -200,7 +195,7 @@ def straw_cleaning(sensor_data_files, provider, target):
valid_features = features[numerical_cols].loc[:, features[numerical_cols].isna().sum() < drop_corr_features['MIN_OVERLAP_FOR_CORR_THRESHOLD'] * features[numerical_cols].shape[0]] valid_features = features[numerical_cols].loc[:, features[numerical_cols].isna().sum() < drop_corr_features['MIN_OVERLAP_FOR_CORR_THRESHOLD'] * features[numerical_cols].shape[0]]
corr_matrix = valid_features.corr().abs() corr_matrix = valid_features.corr().abs()
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool)) upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))
to_drop = [column for column in upper.columns if any(upper[column] > drop_corr_features["CORR_THRESHOLD"])] to_drop = [column for column in upper.columns if any(upper[column] > drop_corr_features["CORR_THRESHOLD"])]
# sns.heatmap(corr_matrix, cmap="YlGnBu") # sns.heatmap(corr_matrix, cmap="YlGnBu")
@ -233,26 +228,18 @@ def straw_cleaning(sensor_data_files, provider, target):
if cat2: # Transform columns to category dtype (homelabel) if cat2: # Transform columns to category dtype (homelabel)
features[cat2] = features[cat2].astype(int).astype('category') features[cat2] = features[cat2].astype(int).astype('category')
# (10) DROP ALL WINDOW RELATED COLUMNS # (10) VERIFY IF THERE ARE ANY NANS LEFT IN THE DATAFRAME
win_count_cols = [col for col in features if "SO_windowsCount" in col]
if win_count_cols:
features.drop(columns=win_count_cols, inplace=True)
# (11) VERIFY IF THERE ARE ANY NANS LEFT IN THE DATAFRAME
if features.isna().any().any(): if features.isna().any().any():
raise ValueError("There are still some NaNs present in the dataframe. Please check for implementation errors.") raise ValueError("There are still some NaNs present in the dataframe. Please check for implementation errors.")
return features return features
def impute(df, method='zero'):
def k_nearest(df): def k_nearest(df):
imputer = KNNImputer(n_neighbors=3) imputer = KNNImputer(n_neighbors=3)
return pd.DataFrame(imputer.fit_transform(df), columns=df.columns) return pd.DataFrame(imputer.fit_transform(df), columns=df.columns)
def impute(df, method='zero'):
return { return {
'zero': df.fillna(0), 'zero': df.fillna(0),
'high_number': df.fillna(1500), 'high_number': df.fillna(1500),
@ -261,7 +248,6 @@ def impute(df, method='zero'):
'knn': k_nearest(df) 'knn': k_nearest(df)
}[method] }[method]
def graph_bf_af(features, phase_name, plt_flag=False): def graph_bf_af(features, phase_name, plt_flag=False):
if plt_flag: if plt_flag:
sns.set(rc={"figure.figsize":(16, 8)}) sns.set(rc={"figure.figsize":(16, 8)})

View File

@ -15,13 +15,13 @@ def extract_second_order_features(intraday_features, so_features_names, prefix="
so_features = pd.DataFrame() so_features = pd.DataFrame()
#print(intraday_features.drop("level_1", axis=1).groupby(["local_segment"]).nsmallest()) #print(intraday_features.drop("level_1", axis=1).groupby(["local_segment"]).nsmallest())
if "mean" in so_features_names: if "mean" in so_features_names:
so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).mean(numeric_only=True).add_suffix("_SO_mean")], axis=1) so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).mean().add_suffix("_SO_mean")], axis=1)
if "median" in so_features_names: if "median" in so_features_names:
so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).median(numeric_only=True).add_suffix("_SO_median")], axis=1) so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).median().add_suffix("_SO_median")], axis=1)
if "sd" in so_features_names: if "sd" in so_features_names:
so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).std(numeric_only=True).fillna(0).add_suffix("_SO_sd")], axis=1) so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).std().fillna(0).add_suffix("_SO_sd")], axis=1)
if "nlargest" in so_features_names: # largest 5 -- maybe there is a faster groupby solution? if "nlargest" in so_features_names: # largest 5 -- maybe there is a faster groupby solution?
for column in intraday_features.loc[:, ~intraday_features.columns.isin(groupby_cols+[prefix+"level_1"])]: for column in intraday_features.loc[:, ~intraday_features.columns.isin(groupby_cols+[prefix+"level_1"])]:

View File

@ -26,7 +26,7 @@ def calculate_empatica_data_yield(features): # TODO
# Assigns 1 to values that are over 1 (in case of windows not being filled fully) # Assigns 1 to values that are over 1 (in case of windows not being filled fully)
features[empatica_data_yield_cols] = features[empatica_data_yield_cols].apply(lambda x: [y if y <= 1 or np.isnan(y) else 1 for y in x]) features[empatica_data_yield_cols] = features[empatica_data_yield_cols].apply(lambda x: [y if y <= 1 or np.isnan(y) else 1 for y in x])
features["empatica_data_yield"] = features[empatica_data_yield_cols].mean(axis=1, numeric_only=True).fillna(0) features["empatica_data_yield"] = features[empatica_data_yield_cols].mean(axis=1).fillna(0)
features.drop(empatica_data_yield_cols, axis=1, inplace=True) # In case of if the advanced operations will later not be needed (e.g., weighted average) features.drop(empatica_data_yield_cols, axis=1, inplace=True) # In case of if the advanced operations will later not be needed (e.g., weighted average)
return features return features

View File

@ -54,7 +54,8 @@ def cr_features(sensor_data_files, time_segment, provider, filter_data_by_segmen
data_types = {'local_timezone': 'str', 'device_id': 'str', 'timestamp': 'int64', 'inter_beat_interval': 'float64', 'timings': 'float64', 'local_date_time': 'str', data_types = {'local_timezone': 'str', 'device_id': 'str', 'timestamp': 'int64', 'inter_beat_interval': 'float64', 'timings': 'float64', 'local_date_time': 'str',
'local_date': "str", 'local_time': "str", 'local_hour': "str", 'local_minute': "str", 'assigned_segments': "str"} 'local_date': "str", 'local_time': "str", 'local_hour': "str", 'local_minute': "str", 'assigned_segments': "str"}
ibi_intraday_data = pd.read_csv(sensor_data_files["sensor_data"], dtype=data_types) temperature_intraday_data = pd.read_csv(sensor_data_files["sensor_data"], dtype=data_types)
ibi_intraday_data = pd.read_csv(sensor_data_files["sensor_data"])
requested_intraday_features = provider["FEATURES"] requested_intraday_features = provider["FEATURES"]

View File

@ -49,14 +49,13 @@ def extract_ers(esm_df):
extracted_ers (DataFrame): dataframe with all necessary information to write event-related segments file extracted_ers (DataFrame): dataframe with all necessary information to write event-related segments file
in the correct format. in the correct format.
""" """
pd.set_option("display.max_rows", 20)
pd.set_option("display.max_rows", 100)
pd.set_option("display.max_columns", None) pd.set_option("display.max_columns", None)
with open('config.yaml', 'r') as stream: with open('config.yaml', 'r') as stream:
config = yaml.load(stream, Loader=yaml.FullLoader) config = yaml.load(stream, Loader=yaml.FullLoader)
pd.DataFrame(columns=["label"]).to_csv(snakemake.output[1]) # Create an empty stress_events_targets file pd.DataFrame(columns=["label", "intensity"]).to_csv(snakemake.output[1]) # Create an empty stress_events_targets file
esm_preprocessed = clean_up_esm(preprocess_esm(esm_df)) esm_preprocessed = clean_up_esm(preprocess_esm(esm_df))
@ -106,9 +105,7 @@ def extract_ers(esm_df):
extracted_ers["shift"] = extracted_ers["diffs"].apply(lambda x: format_timestamp(x)) extracted_ers["shift"] = extracted_ers["diffs"].apply(lambda x: format_timestamp(x))
elif segmenting_method == "stress_event": elif segmenting_method == "stress_event":
""" """This is a special case of the method as it consists of two important parts:
TODO: update documentation for this condition
This is a special case of the method as it consists of two important parts:
(1) Generating of the ERS file (same as the methods above) and (1) Generating of the ERS file (same as the methods above) and
(2) Generating targets file alongside with the correct time segment labels. (2) Generating targets file alongside with the correct time segment labels.
@ -117,95 +114,58 @@ def extract_ers(esm_df):
possiblity of the participant not remembering the start time percisely => this parameter can be manipulated with the variable possiblity of the participant not remembering the start time percisely => this parameter can be manipulated with the variable
"time_before_event" which is defined below. "time_before_event" which is defined below.
In case if the participant marked that no stressful event happened, the default of 30 minutes before the event is choosen.
In this case, se_threat and se_challenge are NaN.
By default, this method also excludes all events that are longer then 2.5 hours so that the segments are easily comparable. By default, this method also excludes all events that are longer then 2.5 hours so that the segments are easily comparable.
""" """
ioi = config["TIME_SEGMENTS"]["TAILORED_EVENTS"]["INTERVAL_OF_INTEREST"] * 60 # interval of interest in seconds
ioi_error_tolerance = config["TIME_SEGMENTS"]["TAILORED_EVENTS"]["IOI_ERROR_TOLERANCE"] * 60 # interval of interest error tolerance in seconds
# Get and join required data # Get and join required data
extracted_ers = esm_df.groupby(["device_id", "esm_session"])['timestamp'].apply(lambda x: math.ceil((x.max() - x.min()) / 1000)).reset_index().rename(columns={'timestamp': 'session_length'}) # questionnaire length extracted_ers = esm_df.groupby(["device_id", "esm_session"])['timestamp'].apply(lambda x: math.ceil((x.max() - x.min()) / 1000)).reset_index().rename(columns={'timestamp': 'session_length'}) # questionnaire end timestamp
extracted_ers = extracted_ers[extracted_ers["session_length"] <= 15 * 60].reset_index(drop=True) # ensure that the longest duration of the questionnaire answering is 15 min extracted_ers = extracted_ers[extracted_ers["session_length"] <= 15 * 60].reset_index(drop=True) # ensure that the longest duration of the questionnaire anwsering is 15 min
session_start_timestamp = esm_df.groupby(['device_id', 'esm_session'])['timestamp'].min().to_frame().rename(columns={'timestamp': 'session_start_timestamp'}) # questionnaire start timestamp
session_end_timestamp = esm_df.groupby(['device_id', 'esm_session'])['timestamp'].max().to_frame().rename(columns={'timestamp': 'session_end_timestamp'}) # questionnaire end timestamp session_end_timestamp = esm_df.groupby(['device_id', 'esm_session'])['timestamp'].max().to_frame().rename(columns={'timestamp': 'session_end_timestamp'}) # questionnaire end timestamp
# Users' answers for the stressfulness event (se) start times and durations
se_time = esm_df[esm_df.questionnaire_id == 90.].set_index(['device_id', 'esm_session'])['esm_user_answer'].to_frame().rename(columns={'esm_user_answer': 'se_time'}) se_time = esm_df[esm_df.questionnaire_id == 90.].set_index(['device_id', 'esm_session'])['esm_user_answer'].to_frame().rename(columns={'esm_user_answer': 'se_time'})
se_duration = esm_df[esm_df.questionnaire_id == 91.].set_index(['device_id', 'esm_session'])['esm_user_answer'].to_frame().rename(columns={'esm_user_answer': 'se_duration'}) se_duration = esm_df[esm_df.questionnaire_id == 91.].set_index(['device_id', 'esm_session'])['esm_user_answer'].to_frame().rename(columns={'esm_user_answer': 'se_duration'})
# Make se_durations to the appropriate lengths # Extracted 3 targets that will be transfered with the csv file to the cleaning script.
# Extracted 3 targets that will be transfered in the csv file to the cleaning script.
se_stressfulness_event_tg = esm_df[esm_df.questionnaire_id == 87.].set_index(['device_id', 'esm_session'])['esm_user_answer_numeric'].to_frame().rename(columns={'esm_user_answer_numeric': 'appraisal_stressfulness_event'}) se_stressfulness_event_tg = esm_df[esm_df.questionnaire_id == 87.].set_index(['device_id', 'esm_session'])['esm_user_answer_numeric'].to_frame().rename(columns={'esm_user_answer_numeric': 'appraisal_stressfulness_event'})
se_threat_tg = esm_df[esm_df.questionnaire_id == 88.].groupby(["device_id", "esm_session"]).mean(numeric_only=True)['esm_user_answer_numeric'].to_frame().rename(columns={'esm_user_answer_numeric': 'appraisal_threat'}) se_threat_tg = esm_df[esm_df.questionnaire_id == 88.].groupby(["device_id", "esm_session"]).mean()['esm_user_answer_numeric'].to_frame().rename(columns={'esm_user_answer_numeric': 'appraisal_threat'})
se_challenge_tg = esm_df[esm_df.questionnaire_id == 89.].groupby(["device_id", "esm_session"]).mean(numeric_only=True)['esm_user_answer_numeric'].to_frame().rename(columns={'esm_user_answer_numeric': 'appraisal_challenge'}) se_challenge_tg = esm_df[esm_df.questionnaire_id == 89.].groupby(["device_id", "esm_session"]).mean()['esm_user_answer_numeric'].to_frame().rename(columns={'esm_user_answer_numeric': 'appraisal_challenge'})
# All relevant features are joined by inner join to remove standalone columns (e.g., stressfulness event target has larger count) # All relevant features are joined by inner join to remove standalone columns (e.g., stressfulness event target has larger count)
extracted_ers = extracted_ers.join(session_start_timestamp, on=['device_id', 'esm_session'], how='inner') \ extracted_ers = extracted_ers.join(session_end_timestamp, on=['device_id', 'esm_session'], how='inner') \
.join(session_end_timestamp, on=['device_id', 'esm_session'], how='inner') \ .join(se_time, on=['device_id', 'esm_session'], how='inner') \
.join(se_duration, on=['device_id', 'esm_session'], how='inner') \
.join(se_stressfulness_event_tg, on=['device_id', 'esm_session'], how='inner') \ .join(se_stressfulness_event_tg, on=['device_id', 'esm_session'], how='inner') \
.join(se_time, on=['device_id', 'esm_session'], how='left') \ .join(se_threat_tg, on=['device_id', 'esm_session'], how='inner') \
.join(se_duration, on=['device_id', 'esm_session'], how='left') \ .join(se_challenge_tg, on=['device_id', 'esm_session'], how='inner')
.join(se_threat_tg, on=['device_id', 'esm_session'], how='left') \
.join(se_challenge_tg, on=['device_id', 'esm_session'], how='left')
# Filter-out the sessions that are not useful. Because of the ambiguity this excludes:
# Filter sessions that are not useful. Because of the ambiguity this excludes:
# (1) straw event times that are marked as "0 - I don't remember" # (1) straw event times that are marked as "0 - I don't remember"
extracted_ers = extracted_ers[~extracted_ers.se_time.astype(str).str.startswith("0 - ")] # (2) straw event durations that are marked as "0 - I don't remember"
extracted_ers = extracted_ers[(~extracted_ers.se_time.str.startswith("0 - ")) & (~extracted_ers.se_duration.str.startswith("0 - "))]
# Transform data into its final form, ready for the extraction
extracted_ers.reset_index(drop=True, inplace=True) extracted_ers.reset_index(drop=True, inplace=True)
extracted_ers.loc[extracted_ers.se_duration.astype(str).str.startswith("0 - "), 'se_duration'] = 0 time_before_event = 5 * 60 # in seconds (5 minutes)
extracted_ers['event_timestamp'] = pd.to_datetime(extracted_ers['se_time']).apply(lambda x: x.timestamp() * 1000).astype('int64')
# Add default duration in case if participant answered that no stressful event occured
extracted_ers["se_duration"] = extracted_ers["se_duration"].fillna(int((ioi + 2*ioi_error_tolerance) * 1000))
# Prepare data to fit the data structure in the CSV file ...
# Add the event time as the end of the questionnaire if no stress event occured
extracted_ers['se_time'] = extracted_ers['se_time'].fillna(extracted_ers['session_start_timestamp'])
# Type could be an int (timestamp [ms]) which stays the same, and datetime str which is converted to timestamp in miliseconds
extracted_ers['event_timestamp'] = extracted_ers['se_time'].apply(lambda x: x if isinstance(x, int) else pd.to_datetime(x).timestamp() * 1000).astype('int64')
extracted_ers['shift_direction'] = -1 extracted_ers['shift_direction'] = -1
""">>>>> begin section (could be optimized) <<<<<"""
# Checks whether the duration is marked with "1 - It's still ongoing" which means that the end of the current questionnaire # Checks whether the duration is marked with "1 - It's still ongoing" which means that the end of the current questionnaire
# is taken as end time of the segment. Else the user input duration is taken. # is taken as end time of the segment. Else the user input duration is taken.
extracted_ers['se_duration'] = \ extracted_ers['se_duration'] = \
np.where( np.where(
extracted_ers['se_duration'].astype(str).str.startswith("1 - "), extracted_ers['se_duration'].str.startswith("1 - "),
extracted_ers['session_end_timestamp'] - extracted_ers['event_timestamp'], extracted_ers['session_end_timestamp'] - extracted_ers['event_timestamp'],
extracted_ers['se_duration'] extracted_ers['se_duration']
) )
# This converts the rows of timestamps in miliseconds and the rows with datetime... to timestamp in seconds. # This converts the rows of timestamps in miliseconds and the row with datetime to timestamp in seconds.
extracted_ers['se_duration'] = \ extracted_ers['se_duration'] = \
extracted_ers['se_duration'].apply(lambda x: math.ceil(x / 1000) if isinstance(x, int) else (pd.to_datetime(x).hour * 60 + pd.to_datetime(x).minute) * 60) extracted_ers['se_duration'].apply(lambda x: math.ceil(x / 1000) if isinstance(x, int) else (pd.to_datetime(x).hour * 60 + pd.to_datetime(x).minute) * 60) + time_before_event
# Check explicitley whether min duration is at least 0. This will eliminate rows that would be investigated after the end of the questionnaire. extracted_ers['shift'] = format_timestamp(time_before_event)
extracted_ers = extracted_ers[extracted_ers['session_end_timestamp'] - extracted_ers['event_timestamp'] >= 0] extracted_ers['length'] = extracted_ers['se_duration'].apply(lambda x: format_timestamp(x))
# Double check whether min se_duration is at least 0. Filter-out the rest. Negative values are considered invalid.
extracted_ers = extracted_ers[extracted_ers["se_duration"] >= 0].reset_index(drop=True)
""">>>>> end section <<<<<""" # Drop event_timestamp duplicates in case of user referencing the same event over multiple questionnaires
# Simply override all durations to be of an equal amount
extracted_ers['se_duration'] = ioi + 2*ioi_error_tolerance
# If target is 0 then shift by the total stress event duration, otherwise shift it by ioi_tolerance
extracted_ers['shift'] = \
np.where(
extracted_ers['appraisal_stressfulness_event'] == 0,
extracted_ers['se_duration'],
ioi_error_tolerance
)
extracted_ers['shift'] = extracted_ers['shift'].apply(lambda x: format_timestamp(int(x)))
extracted_ers['length'] = extracted_ers['se_duration'].apply(lambda x: format_timestamp(int(x)))
# Drop event_timestamp duplicates in case in the user is referencing the same event over multiple questionnaires
extracted_ers.drop_duplicates(subset=["event_timestamp"], keep='first', inplace=True) extracted_ers.drop_duplicates(subset=["event_timestamp"], keep='first', inplace=True)
extracted_ers.reset_index(drop=True, inplace=True) extracted_ers.reset_index(drop=True, inplace=True)

View File

@ -115,7 +115,7 @@ cluster_on = provider["CLUSTER_ON"]
strategy = provider["INFER_HOME_LOCATION_STRATEGY"] strategy = provider["INFER_HOME_LOCATION_STRATEGY"]
days_threshold = provider["MINIMUM_DAYS_TO_DETECT_HOME_CHANGES"] days_threshold = provider["MINIMUM_DAYS_TO_DETECT_HOME_CHANGES"]
if not location_data.timestamp.is_monotonic_increasing: if not location_data.timestamp.is_monotonic:
location_data.sort_values(by=["timestamp"], inplace=True) location_data.sort_values(by=["timestamp"], inplace=True)
location_data["duration_in_seconds"] = -1 * location_data.timestamp.diff(-1) / 1000 location_data["duration_in_seconds"] = -1 * location_data.timestamp.diff(-1) / 1000

View File

@ -1,30 +0,0 @@
import pandas as pd
def straw_features(sensor_data_files, time_segment, provider, filter_data_by_segment, *args, **kwargs):
speech_data = pd.read_csv(sensor_data_files["sensor_data"])
requested_features = provider["FEATURES"]
# name of the features this function can compute+
base_features_names = ["meanspeech", "stdspeech", "nlargest", "nsmallest", "medianspeech"]
features_to_compute = list(set(requested_features) & set(base_features_names))
speech_features = pd.DataFrame(columns=["local_segment"] + features_to_compute)
if not speech_data.empty:
speech_data = filter_data_by_segment(speech_data, time_segment)
if not speech_data.empty:
speech_features = pd.DataFrame()
if "meanspeech" in features_to_compute:
speech_features["meanspeech"] = speech_data.groupby(["local_segment"])['speech_proportion'].mean()
if "stdspeech" in features_to_compute:
speech_features["stdspeech"] = speech_data.groupby(["local_segment"])['speech_proportion'].std()
if "nlargest" in features_to_compute:
speech_features["nlargest"] = speech_data.groupby(["local_segment"])['speech_proportion'].apply(lambda x: x.nlargest(5).mean())
if "nsmallest" in features_to_compute:
speech_features["nsmallest"] = speech_data.groupby(["local_segment"])['speech_proportion'].apply(lambda x: x.nsmallest(5).mean())
if "medianspeech" in features_to_compute:
speech_features["medianspeech"] = speech_data.groupby(["local_segment"])['speech_proportion'].median()
speech_features = speech_features.reset_index()
return speech_features