Squashed commit of the following:
commit 31a47a5ee4569264e39d7c445525a6e64bb7700a Author: Primoz <sisko.primoz@gmail.com> Date: Wed Jul 20 13:49:22 2022 +0000 Environment version change. commit 5b274ed8993f58e783bda6d82fce936764209c28 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Jul 19 16:10:07 2022 +0000 Enabled cleaning for all participants + standardization files. commit 203fdb31e0f3c647ef8c8a60cb9531831b7ab924 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Jul 19 14:14:51 2022 +0000 Features cleaning fixes after testing. Visualization script for phone features values. commit 176178d73b154c30b9eb9eb4a67514f00d6a924e Author: Primoz <sisko.primoz@gmail.com> Date: Tue Jul 19 09:05:14 2022 +0000 Revert "Necessary config changes." This reverts commit 6ec1ef50430d2e1f5ce4670d505d5e84ac47f0a0. commit 26ea6512c9d512f95837e7b047fe510c1d196403 Author: Primoz <sisko.primoz@gmail.com> Date: Mon Jul 18 13:19:47 2022 +0000 Adding cleaning function condition and cleaning functionality. commit 575c29eef9c21e6f2d7832871e73bc0941643734 Author: Primoz <sisko.primoz@gmail.com> Date: Mon Jul 18 12:51:56 2022 +0000 Translation of the cleaning individual RAPIDS function from R to py. commit 6ec1ef50430d2e1f5ce4670d505d5e84ac47f0a0 Author: Primoz <sisko.primoz@gmail.com> Date: Mon Jul 18 12:02:18 2022 +0000 Necessary config changes. commit b5669f51612fbd8378848615d639677851ab032f Author: Primoz <sisko.primoz@gmail.com> Date: Fri Jul 15 15:26:00 2022 +0000 Modified snakemake rule to dynamically choose script extention. commit 66636be1e8ae4828228b37c59b9df1faf3fc3d3d Author: Primoz <sisko.primoz@gmail.com> Date: Fri Jul 15 14:43:08 2022 +0000 Trying to modify the snakefile rule to execute scripts in two languages depended on the provider. commit 574778b00f3cbb368ef4bc74de15cf5070c65ea9 Author: Primoz <sisko.primoz@gmail.com> Date: Fri Jul 15 09:49:41 2022 +0000 gitignore: adding required files so that RAPIDS can be run successfully. commit 71018ab178256970535e78961602ab8c7f0ebb14 Author: Primoz <sisko.primoz@gmail.com> Date: Fri Jul 15 08:34:19 2022 +0000 Standardization bug fixes commit 6253c470a624e6bfbb02e0c453b652452eb2dbbc Author: Primoz <sisko.primoz@gmail.com> Date: Thu Jul 14 15:28:02 2022 +0000 Seperate rules for empatica vs. nonempatica standardization. Parameter in config that controls the creation of standardized merged files for individual and all participants.. commit 90f902778565e0896d3bae22ae8551be8b487e67 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Jul 12 14:23:03 2022 +0000 Preparing for final csvs' standardization. commit d25dde3998786a9a582f5cda544ee104386778f9 Author: Primoz <sisko.primoz@gmail.com> Date: Mon Jul 11 12:08:47 2022 +0000 Revert "Changes in config to be reverted." This reverts commit bea7608e7095021fb7c53a9afa07074448fe4313. commit 6b23e70857e63deda98eb98d190af9090626c84b Author: Primoz <sisko.primoz@gmail.com> Date: Mon Jul 11 12:08:26 2022 +0000 Enabled standardization for rest (previously active) phone features. Testing still needed. commit 8ec58a6f34ba3d42e5cc71d26e6d91837472ca5f Author: Primoz <sisko.primoz@gmail.com> Date: Mon Jul 11 09:07:55 2022 +0000 Enabled standardization for phone calls. All steps completed and tested. commit bea7608e7095021fb7c53a9afa07074448fe4313 Author: Primoz <sisko.primoz@gmail.com> Date: Mon Jul 11 07:47:51 2022 +0000 Changes in config to be reverted. commit 4e84ca0e51bf709bff56fd09437b95310ec6bedd Author: Primoz <sisko.primoz@gmail.com> Date: Fri Jul 8 14:11:24 2022 +0000 Standardization for the rest of the features. commit cc581aa788e3d5c17131af8f3d5dd6b0c3b5aff7 Author: Primoz <sisko.primoz@gmail.com> Date: Fri Jul 8 14:11:08 2022 +0000 README update againsociality-task
parent
788ac31190
commit
6ba4a66deb
|
@ -98,6 +98,9 @@ data/external/*
|
|||
!/data/external/stachl_application_genre_catalogue.csv
|
||||
!/data/external/timesegments*.csv
|
||||
!/data/external/wiki_tz.csv
|
||||
!/data/external/main_study_usernames.csv
|
||||
!/data/external/timezone.csv
|
||||
|
||||
data/raw/*
|
||||
!/data/raw/.gitkeep
|
||||
data/interim/*
|
||||
|
|
19
README.md
19
README.md
|
@ -46,34 +46,37 @@ Type R to go to the interactive R session and then:
|
|||
```
|
||||
|
||||
6. Install cr-features module
|
||||
From: https://repo.ijs.si/matjazbostic/calculatingfeatures.git -> branch calculations_for_rapids.
|
||||
From: https://repo.ijs.si/matjazbostic/calculatingfeatures.git -> branch modifications_for_rapids.
|
||||
Then follow the "cr-features module" section below.
|
||||
|
||||
7. Install all required packages from environment.yml, prune also deletes conda packages not present in environment file.
|
||||
```
|
||||
conda env update --file environment.yml –prune
|
||||
```
|
||||
|
||||
8. If you wish to update your R or Python venvs.
|
||||
```
|
||||
R in interactive session:
|
||||
renv::snapshot()
|
||||
Python:
|
||||
conda env export --no-builds | sed 's/^.*libgfortran.*$/ - libgfortran/' | sed 's/^.*mkl=.*$/ - mkl/' > environment.ym
|
||||
conda env export --no-builds | sed 's/^.*libgfortran.*$/ - libgfortran/' | sed 's/^.*mkl=.*$/ - mkl/' > environment.yml
|
||||
```
|
||||
|
||||
## cr-features module
|
||||
|
||||
This RAPIDS extension uses CalculatingFeatures library accessible [here](https://repo.ijs.si/matjazbostic/calculatingfeatures).
|
||||
This RAPIDS extension uses cr-features library accessible [here](https://repo.ijs.si/matjazbostic/calculatingfeatures).
|
||||
|
||||
To use CalculatingFeatures library:
|
||||
To use cr-features library:
|
||||
- For now, use the "modifications_for_rapids" branch to get the newest version of cr-features that is functional for RAPIDS-STRAW analysis.
|
||||
|
||||
- Follow the installation instructions in the [README.md](https://repo.ijs.si/matjazbostic/calculatingfeatures/-/blob/master/README.md).
|
||||
|
||||
- Copy built calculatingfeatures folder into the RAPIDS workspace.
|
||||
|
||||
- Install the CalculatingFeatures package by:
|
||||
- Install the cr-features package by:
|
||||
```
|
||||
pip install "path/to/the/calculatingfeatures/folder"
|
||||
e.g. "./calculatingfeatures" if the folder is copied to main parent directory
|
||||
CalculatingFeatures package has to be built and installed everytime to get the newest version.
|
||||
pip install path/to/the/calculatingfeatures/folder
|
||||
e.g. pip install ./calculatingfeatures if the folder is copied to main parent directory
|
||||
cr-features package has to be built and installed everytime to get the newest version.
|
||||
Or an the newest version of the docker image must be used.
|
||||
```
|
128
Snakefile
128
Snakefile
|
@ -33,6 +33,12 @@ for provider in config["PHONE_DATA_YIELD"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/phone_data_yield.csv", pid=config["PIDS"]))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["LIST"] and config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["COMPUTE"] \
|
||||
and config["PHONE_DATA_YIELD"]["PROVIDERS"][provider]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_phone_data_yield.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
for provider in config["PHONE_MESSAGES"]["PROVIDERS"].keys():
|
||||
if config["PHONE_MESSAGES"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
|
@ -42,6 +48,12 @@ for provider in config["PHONE_MESSAGES"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/phone_messages.csv", pid=config["PIDS"]))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["LIST"] and config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["COMPUTE"] \
|
||||
and config["PHONE_MESSAGES"]["PROVIDERS"][provider]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_phone_messages.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
for provider in config["PHONE_CALLS"]["PROVIDERS"].keys():
|
||||
if config["PHONE_CALLS"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
|
@ -56,6 +68,12 @@ for provider in config["PHONE_CALLS"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/phone_calls.csv", pid=config["PIDS"]))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["LIST"] and config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["COMPUTE"] \
|
||||
and config["PHONE_CALLS"]["PROVIDERS"][provider]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_phone_calls.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
for provider in config["PHONE_BLUETOOTH"]["PROVIDERS"].keys():
|
||||
if config["PHONE_BLUETOOTH"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
|
@ -65,6 +83,12 @@ for provider in config["PHONE_BLUETOOTH"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/phone_bluetooth.csv", pid=config["PIDS"]))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["LIST"] and config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["COMPUTE"] \
|
||||
and config["PHONE_BLUETOOTH"]["PROVIDERS"][provider]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_phone_bluetooth.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
for provider in config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"].keys():
|
||||
if config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
|
@ -77,6 +101,12 @@ for provider in config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/phone_activity_recognition.csv", pid=config["PIDS"]))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["LIST"] and config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["COMPUTE"] \
|
||||
and config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"][provider]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_phone_activity_recognition.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
for provider in config["PHONE_BATTERY"]["PROVIDERS"].keys():
|
||||
if config["PHONE_BATTERY"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
|
@ -88,6 +118,12 @@ for provider in config["PHONE_BATTERY"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/phone_battery.csv", pid=config["PIDS"]))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["LIST"] and config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["COMPUTE"] \
|
||||
and config["PHONE_BATTERY"]["PROVIDERS"][provider]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_phone_battery.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
for provider in config["PHONE_SCREEN"]["PROVIDERS"].keys():
|
||||
if config["PHONE_SCREEN"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
|
@ -104,6 +140,12 @@ for provider in config["PHONE_SCREEN"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/phone_screen.csv", pid=config["PIDS"]))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["LIST"] and config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["COMPUTE"] \
|
||||
and config["PHONE_SCREEN"]["PROVIDERS"][provider]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_phone_screen.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
for provider in config["PHONE_LIGHT"]["PROVIDERS"].keys():
|
||||
if config["PHONE_LIGHT"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
|
@ -113,6 +155,12 @@ for provider in config["PHONE_LIGHT"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/phone_light.csv", pid=config["PIDS"],))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["LIST"] and config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["COMPUTE"] \
|
||||
and config["PHONE_LIGHT"]["PROVIDERS"][provider]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_phone_light.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
for provider in config["PHONE_ACCELEROMETER"]["PROVIDERS"].keys():
|
||||
if config["PHONE_ACCELEROMETER"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
|
@ -136,6 +184,12 @@ for provider in config["PHONE_APPLICATIONS_FOREGROUND"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/phone_applications_foreground.csv", pid=config["PIDS"]))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["LIST"] and config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["COMPUTE"] \
|
||||
and config["PHONE_APPLICATIONS_FOREGROUND"]["PROVIDERS"][provider]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_phone_applications_foreground.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
for provider in config["PHONE_WIFI_VISIBLE"]["PROVIDERS"].keys():
|
||||
if config["PHONE_WIFI_VISIBLE"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
|
@ -145,6 +199,12 @@ for provider in config["PHONE_WIFI_VISIBLE"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/phone_wifi_visible.csv", pid=config["PIDS"]))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["LIST"] and config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["COMPUTE"] \
|
||||
and config["PHONE_WIFI_VISIBLE"]["PROVIDERS"][provider]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_phone_wifi_visible.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
for provider in config["PHONE_WIFI_CONNECTED"]["PROVIDERS"].keys():
|
||||
if config["PHONE_WIFI_CONNECTED"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
|
@ -173,6 +233,12 @@ for provider in config["PHONE_ESM"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/phone_esm.csv", pid=config["PIDS"]))
|
||||
# files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv",pid=config["PIDS"]))
|
||||
# files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["LIST"] and config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["COMPUTE"] \
|
||||
and config["PHONE_ESM"]["PROVIDERS"][provider]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_phone_esm.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
# We can delete these if's as soon as we add feature PROVIDERS to any of these sensors
|
||||
if isinstance(config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"], dict):
|
||||
|
@ -238,6 +304,12 @@ for provider in config["PHONE_LOCATIONS"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/phone_locations.csv", pid=config["PIDS"]))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["LIST"] and config["STANDARDIZATION"]["PROVIDERS"]["OTHER"]["COMPUTE"] \
|
||||
and config["PHONE_LOCATIONS"]["PROVIDERS"][provider]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_phone_locations.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
for provider in config["FITBIT_CALORIES_INTRADAY"]["PROVIDERS"].keys():
|
||||
if config["FITBIT_CALORIES_INTRADAY"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
|
@ -328,8 +400,13 @@ for provider in config["EMPATICA_ACCELEROMETER"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/empatica_accelerometer.csv", pid=config["PIDS"]))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"] and config["STANDARDIZATION"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"] and config["STANDARDIZATION"]["PROVIDERS"][provider]["COMPUTE"] \
|
||||
and config["EMPATICA_ACCELEROMETER"]["PROVIDERS"][provider]["WINDOWS"]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/interim/{pid}/empatica_accelerometer_features/z_empatica_accelerometer_{language}_{provider_key}_windows.csv", pid=config["PIDS"], language=get_script_language(config["STANDARDIZATION"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_empatica_accelerometer.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
for provider in config["EMPATICA_HEARTRATE"]["PROVIDERS"].keys():
|
||||
if config["EMPATICA_HEARTRATE"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
|
@ -349,8 +426,13 @@ for provider in config["EMPATICA_TEMPERATURE"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/empatica_temperature.csv", pid=config["PIDS"]))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"] and config["STANDARDIZATION"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"] and config["STANDARDIZATION"]["PROVIDERS"][provider]["COMPUTE"] \
|
||||
and config["EMPATICA_TEMPERATURE"]["PROVIDERS"][provider]["WINDOWS"]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/interim/{pid}/empatica_temperature_features/z_empatica_temperature_{language}_{provider_key}_windows.csv", pid=config["PIDS"], language=get_script_language(config["STANDARDIZATION"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_empatica_temperature.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
for provider in config["EMPATICA_ELECTRODERMAL_ACTIVITY"]["PROVIDERS"].keys():
|
||||
if config["EMPATICA_ELECTRODERMAL_ACTIVITY"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
|
@ -360,8 +442,13 @@ for provider in config["EMPATICA_ELECTRODERMAL_ACTIVITY"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/empatica_electrodermal_activity.csv", pid=config["PIDS"]))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"] and config["STANDARDIZATION"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"] and config["STANDARDIZATION"]["PROVIDERS"][provider]["COMPUTE"] \
|
||||
and config["EMPATICA_ELECTRODERMAL_ACTIVITY"]["PROVIDERS"][provider]["WINDOWS"]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/interim/{pid}/empatica_electrodermal_activity_features/z_empatica_electrodermal_activity_{language}_{provider_key}_windows.csv", pid=config["PIDS"], language=get_script_language(config["STANDARDIZATION"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_empatica_electrodermal_activity.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
for provider in config["EMPATICA_BLOOD_VOLUME_PULSE"]["PROVIDERS"].keys():
|
||||
if config["EMPATICA_BLOOD_VOLUME_PULSE"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
|
@ -371,9 +458,13 @@ for provider in config["EMPATICA_BLOOD_VOLUME_PULSE"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/empatica_blood_volume_pulse.csv", pid=config["PIDS"]))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"] and config["STANDARDIZATION"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"] and config["STANDARDIZATION"]["PROVIDERS"][provider]["COMPUTE"] \
|
||||
and config["EMPATICA_BLOOD_VOLUME_PULSE"]["PROVIDERS"][provider]["WINDOWS"]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/interim/{pid}/empatica_blood_volume_pulse_features/z_empatica_blood_volume_pulse_{language}_{provider_key}_windows.csv", pid=config["PIDS"], language=get_script_language(config["STANDARDIZATION"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
|
||||
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_empatica_blood_volume_pulse.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
for provider in config["EMPATICA_INTER_BEAT_INTERVAL"]["PROVIDERS"].keys():
|
||||
if config["EMPATICA_INTER_BEAT_INTERVAL"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
|
@ -383,8 +474,13 @@ for provider in config["EMPATICA_INTER_BEAT_INTERVAL"]["PROVIDERS"].keys():
|
|||
files_to_compute.extend(expand("data/processed/features/{pid}/empatica_inter_beat_interval.csv", pid=config["PIDS"]))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"] and config["STANDARDIZATION"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
if provider in config["STANDARDIZATION"]["PROVIDERS"] and config["STANDARDIZATION"]["PROVIDERS"][provider]["COMPUTE"] \
|
||||
and config["EMPATICA_INTER_BEAT_INTERVAL"]["PROVIDERS"][provider]["WINDOWS"]["STANDARDIZE_FEATURES"]:
|
||||
files_to_compute.extend(expand("data/interim/{pid}/empatica_inter_beat_interval_features/z_empatica_inter_beat_interval_{language}_{provider_key}_windows.csv", pid=config["PIDS"], language=get_script_language(config["STANDARDIZATION"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_empatica_inter_beat_interval.csv", pid=config["PIDS"]))
|
||||
if config["STANDARDIZATION"]["MERGE_ALL"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"]))
|
||||
files_to_compute.append("data/processed/features/all_participants/z_all_sensor_features.csv")
|
||||
|
||||
if isinstance(config["EMPATICA_TAGS"]["PROVIDERS"], dict):
|
||||
for provider in config["EMPATICA_TAGS"]["PROVIDERS"].keys():
|
||||
|
@ -419,10 +515,26 @@ if config["HEATMAP_FEATURE_CORRELATION_MATRIX"]["PLOT"]:
|
|||
# Data Cleaning
|
||||
for provider in config["ALL_CLEANING_INDIVIDUAL"]["PROVIDERS"].keys():
|
||||
if config["ALL_CLEANING_INDIVIDUAL"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features_cleaned_" + provider.lower() +".csv", pid=config["PIDS"]))
|
||||
if provider == "STRAW":
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features_cleaned_" + provider.lower() + "_py.csv", pid=config["PIDS"]))
|
||||
if config["ALL_CLEANING_INDIVIDUAL"]["CLEAN_STANDARDIZED"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features_cleaned_" + provider.lower() + "_py.csv", pid=config["PIDS"]))
|
||||
else:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features_cleaned_" + provider.lower() + "_R.csv", pid=config["PIDS"]))
|
||||
if config["ALL_CLEANING_INDIVIDUAL"]["CLEAN_STANDARDIZED"]:
|
||||
files_to_compute.extend(expand("data/processed/features/{pid}/z_all_sensor_features_cleaned_" + provider.lower() + "_R.csv", pid=config["PIDS"]))
|
||||
|
||||
for provider in config["ALL_CLEANING_OVERALL"]["PROVIDERS"].keys():
|
||||
if config["ALL_CLEANING_OVERALL"]["PROVIDERS"][provider]["COMPUTE"]:
|
||||
files_to_compute.extend(expand("data/processed/features/all_participants/all_sensor_features_cleaned_" + provider.lower() +".csv"))
|
||||
if provider == "STRAW":
|
||||
files_to_compute.extend(expand("data/processed/features/all_participants/all_sensor_features_cleaned_" + provider.lower() +"_py.csv"))
|
||||
if config["ALL_CLEANING_OVERALL"]["CLEAN_STANDARDIZED"]:
|
||||
files_to_compute.extend(expand("data/processed/features/all_participants/z_all_sensor_features_cleaned_" + provider.lower() +"_py.csv"))
|
||||
else:
|
||||
files_to_compute.extend(expand("data/processed/features/all_participants/all_sensor_features_cleaned_" + provider.lower() +"_R.csv"))
|
||||
if config["ALL_CLEANING_OVERALL"]["CLEAN_STANDARDIZED"]:
|
||||
files_to_compute.extend(expand("data/processed/features/all_participants/z_all_sensor_features_cleaned_" + provider.lower() +"_R.csv"))
|
||||
|
||||
|
||||
# Baseline features
|
||||
if config["PARAMS_FOR_ANALYSIS"]["BASELINE"]["COMPUTE"]:
|
||||
|
|
59
config.yaml
59
config.yaml
|
@ -93,6 +93,7 @@ PHONE_ACTIVITY_RECOGNITION:
|
|||
STATIONARY: ["still", "tilting"]
|
||||
MOBILE: ["on_foot", "walking", "running", "on_bicycle"]
|
||||
VEHICLE: ["in_vehicle"]
|
||||
STANDARDIZE_FEATURES: True
|
||||
SRC_SCRIPT: src/features/phone_activity_recognition/rapids/main.py
|
||||
|
||||
# See https://www.rapids.science/latest/features/phone-applications-crashes/
|
||||
|
@ -133,6 +134,7 @@ PHONE_APPLICATIONS_FOREGROUND:
|
|||
APP_EPISODES: ["countepisode", "minduration", "maxduration", "meanduration", "sumduration"]
|
||||
IGNORE_EPISODES_SHORTER_THAN: 0 # in minutes, set to 0 to disable
|
||||
IGNORE_EPISODES_LONGER_THAN: 300 # in minutes, set to 0 to disable
|
||||
STANDARDIZE_FEATURES: True
|
||||
SRC_SCRIPT: src/features/phone_applications_foreground/rapids/main.py
|
||||
|
||||
# See https://www.rapids.science/latest/features/phone-applications-notifications/
|
||||
|
@ -153,6 +155,7 @@ PHONE_BATTERY:
|
|||
RAPIDS:
|
||||
COMPUTE: True
|
||||
FEATURES: ["countdischarge", "sumdurationdischarge", "countcharge", "sumdurationcharge", "avgconsumptionrate", "maxconsumptionrate"]
|
||||
STANDARDIZE_FEATURES: True
|
||||
SRC_SCRIPT: src/features/phone_battery/rapids/main.py
|
||||
|
||||
# See https://www.rapids.science/latest/features/phone-bluetooth/
|
||||
|
@ -162,6 +165,7 @@ PHONE_BLUETOOTH:
|
|||
RAPIDS:
|
||||
COMPUTE: True
|
||||
FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
|
||||
STANDARDIZE_FEATURES: True
|
||||
SRC_SCRIPT: src/features/phone_bluetooth/rapids/main.R
|
||||
|
||||
DORYAB:
|
||||
|
@ -179,6 +183,7 @@ PHONE_BLUETOOTH:
|
|||
DEVICES: ["countscans", "uniquedevices", "meanscans", "stdscans"]
|
||||
SCANS_MOST_FREQUENT_DEVICE: ["withinsegments", "acrosssegments", "acrossdataset"]
|
||||
SCANS_LEAST_FREQUENT_DEVICE: ["withinsegments", "acrosssegments", "acrossdataset"]
|
||||
STANDARDIZE_FEATURES: True
|
||||
SRC_SCRIPT: src/features/phone_bluetooth/doryab/main.py
|
||||
|
||||
# See https://www.rapids.science/latest/features/phone-calls/
|
||||
|
@ -193,6 +198,7 @@ PHONE_CALLS:
|
|||
missed: [count, distinctcontacts, timefirstcall, timelastcall, countmostfrequentcontact]
|
||||
incoming: [count, distinctcontacts, meanduration, sumduration, minduration, maxduration, stdduration, modeduration, entropyduration, timefirstcall, timelastcall, countmostfrequentcontact]
|
||||
outgoing: [count, distinctcontacts, meanduration, sumduration, minduration, maxduration, stdduration, modeduration, entropyduration, timefirstcall, timelastcall, countmostfrequentcontact]
|
||||
STANDARDIZE_FEATURES: True
|
||||
SRC_SCRIPT: src/features/phone_calls/rapids/main.R
|
||||
|
||||
# See https://www.rapids.science/latest/features/phone-conversation/
|
||||
|
@ -232,6 +238,7 @@ PHONE_DATA_YIELD:
|
|||
COMPUTE: True
|
||||
FEATURES: [ratiovalidyieldedminutes, ratiovalidyieldedhours]
|
||||
MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS: 0.5 # 0 to 1, minimum percentage of valid minutes in an hour to be considered valid.
|
||||
STANDARDIZE_FEATURES: True
|
||||
SRC_SCRIPT: src/features/phone_data_yield/rapids/main.R
|
||||
|
||||
PHONE_ESM:
|
||||
|
@ -241,6 +248,7 @@ PHONE_ESM:
|
|||
COMPUTE: True
|
||||
SCALES: ["PANAS_positive_affect", "PANAS_negative_affect", "JCQ_job_demand", "JCQ_job_control", "JCQ_supervisor_support", "JCQ_coworker_support"]
|
||||
FEATURES: [mean]
|
||||
STANDARDIZE_FEATURES: True
|
||||
SRC_SCRIPT: src/features/phone_esm/straw/main.py
|
||||
|
||||
# See https://www.rapids.science/latest/features/phone-keyboard/
|
||||
|
@ -259,6 +267,7 @@ PHONE_LIGHT:
|
|||
RAPIDS:
|
||||
COMPUTE: True
|
||||
FEATURES: ["count", "maxlux", "minlux", "avglux", "medianlux", "stdlux"]
|
||||
STANDARDIZE_FEATURES: True
|
||||
SRC_SCRIPT: src/features/phone_light/rapids/main.py
|
||||
|
||||
# See https://www.rapids.science/latest/features/phone-locations/
|
||||
|
@ -283,6 +292,7 @@ PHONE_LOCATIONS:
|
|||
MINIMUM_DAYS_TO_DETECT_HOME_CHANGES: 3
|
||||
CLUSTERING_ALGORITHM: DBSCAN # DBSCAN, OPTICS
|
||||
RADIUS_FOR_HOME: 100
|
||||
STANDARDIZE_FEATURES: True
|
||||
SRC_SCRIPT: src/features/phone_locations/doryab/main.py
|
||||
|
||||
BARNETT:
|
||||
|
@ -290,6 +300,7 @@ PHONE_LOCATIONS:
|
|||
FEATURES: ["hometime","disttravelled","rog","maxdiam","maxhomedist","siglocsvisited","avgflightlen","stdflightlen","avgflightdur","stdflightdur","probpause","siglocentropy","circdnrtn","wkenddayrtn"]
|
||||
IF_MULTIPLE_TIMEZONES: USE_MOST_COMMON
|
||||
MINUTES_DATA_USED: False # Use this for quality control purposes, how many minutes of data (location coordinates gruped by minute) were used to compute features
|
||||
STANDARDIZE_FEATURES: True
|
||||
SRC_SCRIPT: src/features/phone_locations/barnett/main.R
|
||||
|
||||
# See https://www.rapids.science/latest/features/phone-log/
|
||||
|
@ -309,6 +320,7 @@ PHONE_MESSAGES:
|
|||
FEATURES:
|
||||
received: [count, distinctcontacts, timefirstmessage, timelastmessage, countmostfrequentcontact]
|
||||
sent: [count, distinctcontacts, timefirstmessage, timelastmessage, countmostfrequentcontact]
|
||||
STANDARDIZE_FEATURES: True
|
||||
SRC_SCRIPT: src/features/phone_messages/rapids/main.R
|
||||
|
||||
# See https://www.rapids.science/latest/features/phone-screen/
|
||||
|
@ -322,6 +334,7 @@ PHONE_SCREEN:
|
|||
IGNORE_EPISODES_LONGER_THAN: 360 # in minutes, set to 0 to disable
|
||||
FEATURES: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration", "firstuseafter"] # "episodepersensedminutes" needs to be added later
|
||||
EPISODE_TYPES: ["unlock"]
|
||||
STANDARDIZE_FEATURES: True
|
||||
SRC_SCRIPT: src/features/phone_screen/rapids/main.py
|
||||
|
||||
# See https://www.rapids.science/latest/features/phone-wifi-connected/
|
||||
|
@ -340,6 +353,7 @@ PHONE_WIFI_VISIBLE:
|
|||
RAPIDS:
|
||||
COMPUTE: True
|
||||
FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
|
||||
STANDARDIZE_FEATURES: True
|
||||
SRC_SCRIPT: src/features/phone_wifi_visible/rapids/main.R
|
||||
|
||||
|
||||
|
@ -653,6 +667,7 @@ HEATMAP_FEATURE_CORRELATION_MATRIX:
|
|||
########################################################################################################################
|
||||
|
||||
ALL_CLEANING_INDIVIDUAL:
|
||||
CLEAN_STANDARDIZED: True
|
||||
PROVIDERS:
|
||||
RAPIDS:
|
||||
COMPUTE: True
|
||||
|
@ -669,11 +684,28 @@ ALL_CLEANING_INDIVIDUAL:
|
|||
MIN_OVERLAP_FOR_CORR_THRESHOLD: 0.5
|
||||
CORR_THRESHOLD: 0.95
|
||||
SRC_SCRIPT: src/features/all_cleaning_individual/rapids/main.R
|
||||
STRAW:
|
||||
COMPUTE: True
|
||||
IMPUTE_PHONE_SELECTED_EVENT_FEATURES:
|
||||
COMPUTE: True
|
||||
TYPE: median # options: zero, mean, median or k-nearest
|
||||
MIN_DATA_YIELDED_MINUTES_TO_IMPUTE: 0.33
|
||||
COLS_NAN_THRESHOLD: 0.3 # set to 1 to disable
|
||||
COLS_VAR_THRESHOLD: True
|
||||
ROWS_NAN_THRESHOLD: 0 # set to 1 to disable
|
||||
DATA_YIELD_FEATURE: RATIO_VALID_YIELDED_HOURS # RATIO_VALID_YIELDED_HOURS or RATIO_VALID_YIELDED_MINUTES
|
||||
DATA_YIELD_RATIO_THRESHOLD: 0.3 # set to 0 to disable
|
||||
DROP_HIGHLY_CORRELATED_FEATURES:
|
||||
COMPUTE: True
|
||||
MIN_OVERLAP_FOR_CORR_THRESHOLD: 0.5
|
||||
CORR_THRESHOLD: 0.95
|
||||
SRC_SCRIPT: src/features/all_cleaning_individual/straw/main.py
|
||||
|
||||
ALL_CLEANING_OVERALL:
|
||||
CLEAN_STANDARDIZED: True
|
||||
PROVIDERS:
|
||||
RAPIDS:
|
||||
COMPUTE: True
|
||||
COMPUTE: False
|
||||
IMPUTE_SELECTED_EVENT_FEATURES:
|
||||
COMPUTE: True
|
||||
MIN_DATA_YIELDED_MINUTES_TO_IMPUTE: 0.33
|
||||
|
@ -687,6 +719,22 @@ ALL_CLEANING_OVERALL:
|
|||
MIN_OVERLAP_FOR_CORR_THRESHOLD: 0.5
|
||||
CORR_THRESHOLD: 0.95
|
||||
SRC_SCRIPT: src/features/all_cleaning_overall/rapids/main.R
|
||||
STRAW:
|
||||
COMPUTE: True
|
||||
IMPUTE_PHONE_SELECTED_EVENT_FEATURES:
|
||||
COMPUTE: True
|
||||
TYPE: median # options: zero, mean, median or k-nearest
|
||||
MIN_DATA_YIELDED_MINUTES_TO_IMPUTE: 0.33
|
||||
COLS_NAN_THRESHOLD: 0.3 # set to 1 to disable
|
||||
COLS_VAR_THRESHOLD: True
|
||||
ROWS_NAN_THRESHOLD: 0 # set to 1 to disable
|
||||
DATA_YIELD_FEATURE: RATIO_VALID_YIELDED_HOURS # RATIO_VALID_YIELDED_HOURS or RATIO_VALID_YIELDED_MINUTES
|
||||
DATA_YIELD_RATIO_THRESHOLD: 0.3 # set to 0 to disable
|
||||
DROP_HIGHLY_CORRELATED_FEATURES:
|
||||
COMPUTE: True
|
||||
MIN_OVERLAP_FOR_CORR_THRESHOLD: 0.5
|
||||
CORR_THRESHOLD: 0.95
|
||||
SRC_SCRIPT: src/features/all_cleaning_overall/straw/main.py
|
||||
|
||||
|
||||
########################################################################################################################
|
||||
|
@ -694,10 +742,15 @@ ALL_CLEANING_OVERALL:
|
|||
########################################################################################################################
|
||||
|
||||
STANDARDIZATION:
|
||||
MERGE_ALL: True # Creates the joint standardized file for each participant and all participants. Similar to merge_sensor_features_for_all_participants rule
|
||||
PROVIDERS:
|
||||
CR:
|
||||
COMPUTE: True
|
||||
SRC_SCRIPT: src/features/standardization/main.py
|
||||
OTHER:
|
||||
COMPUTE: True
|
||||
LIST: [RAPIDS, DORYAB, BARNETT, STRAW]
|
||||
SRC_SCRIPT: src/features/standardization/main.py
|
||||
|
||||
|
||||
########################################################################################################################
|
||||
|
@ -706,7 +759,7 @@ STANDARDIZATION:
|
|||
|
||||
PARAMS_FOR_ANALYSIS:
|
||||
BASELINE:
|
||||
COMPUTE: True
|
||||
COMPUTE: False
|
||||
FOLDER: data/external/baseline
|
||||
CONTAINER: [results-survey637813_final.csv, # Slovenia
|
||||
results-survey358134_final.csv, # Belgium 1
|
||||
|
@ -717,5 +770,5 @@ PARAMS_FOR_ANALYSIS:
|
|||
CATEGORICAL_FEATURES: [gender]
|
||||
|
||||
TARGET:
|
||||
COMPUTE: True
|
||||
COMPUTE: False
|
||||
LABEL: PANAS_negative_affect_mean
|
||||
|
|
|
@ -0,0 +1,57 @@
|
|||
label,empatica_id
|
||||
uploader_79170,A0245B
|
||||
uploader_89788,A02731
|
||||
uploader_68294,A02705
|
||||
uploader_92856,A024AF
|
||||
uploader_23726,A0231C
|
||||
uploader_66620,A02305
|
||||
uploader_58435,A026B5
|
||||
uploader_87801,A022A8
|
||||
uploader_96055,A027BA
|
||||
uploader_69549,A0226C
|
||||
uploader_26363,A0263D
|
||||
uploader_72010,A023FA
|
||||
uploader_13997,A024AF
|
||||
uploader_31156,A02305
|
||||
uploader_63187,A027BA
|
||||
uploader_94821,A022A8
|
||||
uploader_65413,A023F1;A023FA
|
||||
uploader_36488,A02713
|
||||
uploader_91087,A0231C
|
||||
uploader_35174,A025D1
|
||||
uploader_73880,A02705
|
||||
uploader_78650,A02731
|
||||
uploader_70578,A0245B
|
||||
uploader_88313,A02736
|
||||
uploader_58482,A0261A
|
||||
uploader_80601,A027BA
|
||||
uploader_93729,A0226C
|
||||
uploader_61663,A0245B
|
||||
uploader_80848,A025D1
|
||||
uploader_57312,A023F9;A02361;A027A0
|
||||
uploader_52087,A02666
|
||||
uploader_98770,A02953
|
||||
uploader_51327,A0245F
|
||||
uploader_11737,A02732
|
||||
uploader_77440,A0264E
|
||||
uploader_57277,A02422
|
||||
uploader_13098,A026E5
|
||||
uploader_80719,A023C8
|
||||
uploader_54698,A02953
|
||||
uploader_95571,A02853
|
||||
uploader_21880,A024DC
|
||||
uploader_92905,A02920
|
||||
uploader_12108,A023F4
|
||||
uploader_17436,A026E5
|
||||
uploader_58440,A0273F
|
||||
uploader_22172,A0245F
|
||||
uploader_39250,A02422
|
||||
uploader_15311,A023F9
|
||||
uploader_45766,A02920
|
||||
uploader_23096,A02361
|
||||
uploader_78243,A02422
|
||||
uploader_58777,A0245F
|
||||
uploader_82941,A02666
|
||||
uploader_89606,A023F4
|
||||
uploader_82969,A023C8
|
||||
uploader_53573,A024DC;A02361
|
|
File diff suppressed because it is too large
Load Diff
|
@ -111,7 +111,7 @@ dependencies:
|
|||
- biosppy==0.8.0
|
||||
- cached-property==1.5.2
|
||||
- configargparse==0.15.1
|
||||
- cr-features==0.1.20
|
||||
- cr-features==0.2.1
|
||||
- cycler==0.11.0
|
||||
- decorator==4.4.2
|
||||
- fonttools==4.33.2
|
||||
|
|
|
@ -40,6 +40,26 @@ def find_features_files(wildcards):
|
|||
feature_files.extend(expand("data/interim/{{pid}}/{sensor_key}_features/{sensor_key}_{language}_{provider_key}.csv", sensor_key=wildcards.sensor_key.lower(), language=get_script_language(provider["SRC_SCRIPT"]), provider_key=provider_key.lower()))
|
||||
return(feature_files)
|
||||
|
||||
def find_empaticas_standardized_features_files(wildcards):
|
||||
feature_files = []
|
||||
if "empatica" in wildcards.sensor_key:
|
||||
for provider_key, provider in config[(wildcards.sensor_key).upper()]["PROVIDERS"].items():
|
||||
if provider["COMPUTE"] and provider.get("WINDOWS", False) and provider["WINDOWS"]["COMPUTE"]:
|
||||
if "empatica" in wildcards.sensor_key:
|
||||
feature_files.extend(expand("data/interim/{{pid}}/{sensor_key}_features/z_{sensor_key}_{language}_{provider_key}.csv", sensor_key=wildcards.sensor_key.lower(), language=get_script_language(provider["SRC_SCRIPT"]), provider_key=provider_key.lower()))
|
||||
return(feature_files)
|
||||
|
||||
def find_joint_non_empatica_sensor_files(wildcards):
|
||||
joined_files = []
|
||||
for config_key in config.keys():
|
||||
if config_key.startswith(("PHONE", "FITBIT")) and "PROVIDERS" in config[config_key] and isinstance(config[config_key]["PROVIDERS"], dict):
|
||||
for provider_key, provider in config[config_key]["PROVIDERS"].items():
|
||||
if "COMPUTE" in provider.keys() and provider["COMPUTE"]:
|
||||
joined_files.append("data/processed/features/{pid}/" + config_key.lower() + ".csv")
|
||||
break
|
||||
return joined_files
|
||||
|
||||
|
||||
def optional_steps_sleep_input(wildcards):
|
||||
if config["FITBIT_STEPS_INTRADAY"]["EXCLUDE_SLEEP"]["FITBIT_BASED"]["EXCLUDE"]:
|
||||
return "data/raw/{pid}/fitbit_sleep_summary_raw.csv"
|
||||
|
@ -62,6 +82,18 @@ def input_merge_sensor_features_for_individual_participants(wildcards):
|
|||
break
|
||||
return feature_files
|
||||
|
||||
def input_merge_standardized_sensor_features_for_individual_participants(wildcards):
|
||||
feature_files = []
|
||||
for config_key in config.keys():
|
||||
if config_key.startswith(("PHONE", "FITBIT", "EMPATICA")) and "PROVIDERS" in config[config_key] and isinstance(config[config_key]["PROVIDERS"], dict):
|
||||
for provider_key, provider in config[config_key]["PROVIDERS"].items():
|
||||
if "COMPUTE" in provider.keys() and provider["COMPUTE"] and ("STANDARDIZE_FEATURES" in provider.keys() and provider["STANDARDIZE_FEATURES"] or
|
||||
"WINDOWS" in provider.keys() and "STANDARDIZE_FEATURES" in provider["WINDOWS"].keys() and provider["WINDOWS"]["STANDARDIZE_FEATURES"]):
|
||||
feature_files.append("data/processed/features/{pid}/z_" + config_key.lower() + ".csv")
|
||||
break
|
||||
|
||||
return feature_files
|
||||
|
||||
def get_phone_sensor_names():
|
||||
phone_sensor_names = []
|
||||
for config_key in config.keys():
|
||||
|
|
|
@ -1048,6 +1048,38 @@ rule merge_sensor_features_for_individual_participants:
|
|||
script:
|
||||
"../src/features/utils/merge_sensor_features_for_individual_participants.R"
|
||||
|
||||
rule join_standardized_features_from_empatica:
|
||||
input:
|
||||
sensor_features = find_empaticas_standardized_features_files
|
||||
wildcard_constraints:
|
||||
sensor_key = '(empatica).*'
|
||||
output:
|
||||
"data/processed/features/{pid}/z_{sensor_key}.csv"
|
||||
script:
|
||||
"../src/features/utils/join_features_from_providers.R"
|
||||
|
||||
rule standardize_features_from_providers_no_empatica:
|
||||
input:
|
||||
sensor_features = find_joint_non_empatica_sensor_files
|
||||
wildcard_constraints:
|
||||
sensor_key = '(phone|fitbit).*'
|
||||
params:
|
||||
provider = config["STANDARDIZATION"]["PROVIDERS"]["OTHER"],
|
||||
provider_key = "OTHER",
|
||||
sensor_key = "{sensor_key}"
|
||||
output:
|
||||
"data/processed/features/{pid}/z_{sensor_key}.csv"
|
||||
script:
|
||||
"../src/features/standardization/main.py"
|
||||
|
||||
rule merge_standardized_sensor_features_for_individual_participants:
|
||||
input:
|
||||
feature_files = input_merge_standardized_sensor_features_for_individual_participants
|
||||
output:
|
||||
"data/processed/features/{pid}/z_all_sensor_features.csv"
|
||||
script:
|
||||
"../src/features/utils/merge_sensor_features_for_individual_participants.R"
|
||||
|
||||
rule merge_sensor_features_for_all_participants:
|
||||
input:
|
||||
feature_files = expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"])
|
||||
|
@ -1056,6 +1088,14 @@ rule merge_sensor_features_for_all_participants:
|
|||
script:
|
||||
"../src/features/utils/merge_sensor_features_for_all_participants.R"
|
||||
|
||||
rule merge_standardized_sensor_features_for_all_participants:
|
||||
input:
|
||||
feature_files = expand("data/processed/features/{pid}/z_all_sensor_features.csv", pid=config["PIDS"])
|
||||
output:
|
||||
"data/processed/features/all_participants/z_all_sensor_features.csv"
|
||||
script:
|
||||
"../src/features/utils/merge_standardized_sensor_features_for_all_participants.R"
|
||||
|
||||
rule clean_sensor_features_for_individual_participants:
|
||||
input:
|
||||
sensor_data = rules.merge_sensor_features_for_individual_participants.output
|
||||
|
@ -1064,11 +1104,12 @@ rule clean_sensor_features_for_individual_participants:
|
|||
params:
|
||||
provider = lambda wildcards: config["ALL_CLEANING_INDIVIDUAL"]["PROVIDERS"][wildcards.provider_key.upper()],
|
||||
provider_key = "{provider_key}",
|
||||
script_extension = "{script_extension}",
|
||||
sensor_key = "all_cleaning_individual"
|
||||
output:
|
||||
"data/processed/features/{pid}/all_sensor_features_cleaned_{provider_key}.csv"
|
||||
"data/processed/features/{pid}/all_sensor_features_cleaned_{provider_key}_{script_extension}.csv" # bo predstavljalo probleme za naprej (kako iskati datoteke + standardizacija itd.)
|
||||
script:
|
||||
"../src/features/entry.R"
|
||||
"../src/features/entry.{params.script_extension}"
|
||||
|
||||
rule clean_sensor_features_for_all_participants:
|
||||
input:
|
||||
|
@ -1076,9 +1117,38 @@ rule clean_sensor_features_for_all_participants:
|
|||
params:
|
||||
provider = lambda wildcards: config["ALL_CLEANING_OVERALL"]["PROVIDERS"][wildcards.provider_key.upper()],
|
||||
provider_key = "{provider_key}",
|
||||
script_extension = "{script_extension}",
|
||||
sensor_key = "all_cleaning_overall"
|
||||
output:
|
||||
"data/processed/features/all_participants/all_sensor_features_cleaned_{provider_key}.csv"
|
||||
"data/processed/features/all_participants/all_sensor_features_cleaned_{provider_key}_{script_extension}.csv"
|
||||
script:
|
||||
"../src/features/entry.R"
|
||||
"../src/features/entry.{params.script_extension}"
|
||||
|
||||
rule clean_standardized_sensor_features_for_individual_participants:
|
||||
input:
|
||||
sensor_data = rules.merge_standardized_sensor_features_for_individual_participants.output
|
||||
wildcard_constraints:
|
||||
pid = "("+"|".join(config["PIDS"])+")"
|
||||
params:
|
||||
provider = lambda wildcards: config["ALL_CLEANING_INDIVIDUAL"]["PROVIDERS"][wildcards.provider_key.upper()],
|
||||
provider_key = "{provider_key}",
|
||||
script_extension = "{script_extension}",
|
||||
sensor_key = "all_cleaning_individual"
|
||||
output:
|
||||
"data/processed/features/{pid}/z_all_sensor_features_cleaned_{provider_key}_{script_extension}.csv" # bo predstavljalo probleme za naprej (kako iskati datoteke + standardizacija itd.)
|
||||
script:
|
||||
"../src/features/entry.{params.script_extension}"
|
||||
|
||||
rule clean_standardized_sensor_features_for_all_participants:
|
||||
input:
|
||||
sensor_data = rules.merge_standardized_sensor_features_for_all_participants.output
|
||||
params:
|
||||
provider = lambda wildcards: config["ALL_CLEANING_OVERALL"]["PROVIDERS"][wildcards.provider_key.upper()],
|
||||
provider_key = "{provider_key}",
|
||||
script_extension = "{script_extension}",
|
||||
sensor_key = "all_cleaning_overall"
|
||||
output:
|
||||
"data/processed/features/all_participants/z_all_sensor_features_cleaned_{provider_key}_{script_extension}.csv"
|
||||
script:
|
||||
"../src/features/entry.{params.script_extension}"
|
||||
|
||||
|
|
|
@ -0,0 +1,72 @@
|
|||
import pandas as pd
|
||||
import numpy as np
|
||||
import math, sys
|
||||
|
||||
def straw_cleaning(sensor_data_files, provider):
|
||||
|
||||
features = pd.read_csv(sensor_data_files["sensor_data"][0])
|
||||
|
||||
# Impute selected features event
|
||||
impute_phone_features = provider["IMPUTE_PHONE_SELECTED_EVENT_FEATURES"]
|
||||
if impute_phone_features["COMPUTE"]:
|
||||
if not 'phone_data_yield_rapids_ratiovalidyieldedminutes' in features.columns:
|
||||
raise KeyError("RAPIDS provider needs to impute the selected event features based on phone_data_yield_rapids_ratiovalidyieldedminutes column, please set config[PHONE_DATA_YIELD][PROVIDERS][RAPIDS][COMPUTE] to True and include 'ratiovalidyieldedminutes' in [FEATURES].")
|
||||
|
||||
phone_cols = [col for col in features if \
|
||||
col.startswith('phone_applications_foreground_rapids_') or
|
||||
col.startswith('phone_battery_rapids_') or
|
||||
col.startswith('phone_calls_rapids_') or
|
||||
col.startswith('phone_keyboard_rapids_') or
|
||||
col.startswith('phone_messages_rapids_') or
|
||||
col.startswith('phone_screen_rapids_') or
|
||||
col.startswith('phone_wifi_')]
|
||||
|
||||
mask = features['phone_data_yield_rapids_ratiovalidyieldedminutes'] > impute_phone_features['MIN_DATA_YIELDED_MINUTES_TO_IMPUTE']
|
||||
features.loc[mask, phone_cols] = impute(features[mask][phone_cols], method=impute_phone_features["TYPE"])
|
||||
|
||||
# Drop rows with the value of data_yield_column less than data_yield_ratio_threshold
|
||||
data_yield_unit = provider["DATA_YIELD_FEATURE"].split("_")[3].lower()
|
||||
data_yield_column = "phone_data_yield_rapids_ratiovalidyielded" + data_yield_unit
|
||||
|
||||
if not data_yield_column in features.columns:
|
||||
raise KeyError(f"RAPIDS provider needs to impute the selected event features based on {data_yield_column} column, please set config[PHONE_DATA_YIELD][PROVIDERS][RAPIDS][COMPUTE] to True and include 'ratiovalidyielded{data_yield_unit}' in [FEATURES].")
|
||||
|
||||
features = features[features[data_yield_column] >= provider["DATA_YIELD_RATIO_THRESHOLD"]]
|
||||
|
||||
# Remove cols if threshold of NaN values is passed
|
||||
features = features.loc[:, features.isna().sum() < provider["COLS_NAN_THRESHOLD"] * features.shape[0]]
|
||||
|
||||
# Remove cols where variance is 0
|
||||
if provider["COLS_VAR_THRESHOLD"]:
|
||||
features.drop(features.std()[features.std() == 0].index.values, axis=1, inplace=True)
|
||||
|
||||
# Drop highly correlated features - To-Do še en thershold var, ki je v config + kako se tretirajo NaNs?
|
||||
drop_corr_features = provider["DROP_HIGHLY_CORRELATED_FEATURES"]
|
||||
if drop_corr_features["COMPUTE"]:
|
||||
numerical_cols = features.select_dtypes(include=np.number).columns.tolist()
|
||||
|
||||
cor_matrix = features[numerical_cols].corr(method='spearman').abs()
|
||||
|
||||
upper_tri = cor_matrix.where(np.triu(np.ones(cor_matrix.shape), k=1).astype(np.bool))
|
||||
|
||||
to_drop = [column for column in upper_tri.columns if any(upper_tri[column] > drop_corr_features["CORR_THRESHOLD"])]
|
||||
|
||||
# Tukaj je še neka validacija s thresholdom, ampak ne razumem R kode "valid_pairs"
|
||||
features.drop(to_drop, axis=1, inplace=True)
|
||||
|
||||
# Remove rows if threshold of NaN values is passed
|
||||
min_count = math.ceil((1 - provider["ROWS_NAN_THRESHOLD"]) * features.shape[1]) # min not nan values in row
|
||||
features.dropna(axis=0, thresh=min_count, inplace=True)
|
||||
|
||||
return features
|
||||
|
||||
def impute(df, method='zero'):
|
||||
df.loc[:, df.isna().all()] = df.loc[:, df.isna().all()].fillna(0) # if column contains only NaN values impute it with 0
|
||||
return { # rest of the columns should be imputed with the selected method
|
||||
'zero': df.fillna(0),
|
||||
'mean': df.fillna(df.mean()),
|
||||
'median': df.fillna(df.median()),
|
||||
'k-nearest': None # To-Do
|
||||
}[method]
|
||||
|
||||
|
|
@ -0,0 +1,72 @@
|
|||
import pandas as pd
|
||||
import numpy as np
|
||||
import math, sys
|
||||
|
||||
def straw_cleaning(sensor_data_files, provider):
|
||||
|
||||
features = pd.read_csv(sensor_data_files["sensor_data"][0])
|
||||
|
||||
# Impute selected features event
|
||||
impute_phone_features = provider["IMPUTE_PHONE_SELECTED_EVENT_FEATURES"]
|
||||
if impute_phone_features["COMPUTE"]:
|
||||
if not 'phone_data_yield_rapids_ratiovalidyieldedminutes' in features.columns:
|
||||
raise KeyError("RAPIDS provider needs to impute the selected event features based on phone_data_yield_rapids_ratiovalidyieldedminutes column, please set config[PHONE_DATA_YIELD][PROVIDERS][RAPIDS][COMPUTE] to True and include 'ratiovalidyieldedminutes' in [FEATURES].")
|
||||
|
||||
phone_cols = [col for col in features if \
|
||||
col.startswith('phone_applications_foreground_rapids_') or
|
||||
col.startswith('phone_battery_rapids_') or
|
||||
col.startswith('phone_calls_rapids_') or
|
||||
col.startswith('phone_keyboard_rapids_') or
|
||||
col.startswith('phone_messages_rapids_') or
|
||||
col.startswith('phone_screen_rapids_') or
|
||||
col.startswith('phone_wifi_')]
|
||||
|
||||
mask = features['phone_data_yield_rapids_ratiovalidyieldedminutes'] > impute_phone_features['MIN_DATA_YIELDED_MINUTES_TO_IMPUTE']
|
||||
features.loc[mask, phone_cols] = impute(features[mask][phone_cols], method=impute_phone_features["TYPE"])
|
||||
|
||||
# Drop rows with the value of data_yield_column less than data_yield_ratio_threshold
|
||||
data_yield_unit = provider["DATA_YIELD_FEATURE"].split("_")[3].lower()
|
||||
data_yield_column = "phone_data_yield_rapids_ratiovalidyielded" + data_yield_unit
|
||||
|
||||
if not data_yield_column in features.columns:
|
||||
raise KeyError(f"RAPIDS provider needs to impute the selected event features based on {data_yield_column} column, please set config[PHONE_DATA_YIELD][PROVIDERS][RAPIDS][COMPUTE] to True and include 'ratiovalidyielded{data_yield_unit}' in [FEATURES].")
|
||||
|
||||
features = features[features[data_yield_column] >= provider["DATA_YIELD_RATIO_THRESHOLD"]]
|
||||
|
||||
# Remove cols if threshold of NaN values is passed
|
||||
features = features.loc[:, features.isna().sum() < provider["COLS_NAN_THRESHOLD"] * features.shape[0]]
|
||||
|
||||
# Remove cols where variance is 0
|
||||
if provider["COLS_VAR_THRESHOLD"]:
|
||||
features.drop(features.std()[features.std() == 0].index.values, axis=1, inplace=True)
|
||||
|
||||
# Drop highly correlated features - To-Do še en thershold var, ki je v config + kako se tretirajo NaNs?
|
||||
drop_corr_features = provider["DROP_HIGHLY_CORRELATED_FEATURES"]
|
||||
if drop_corr_features["COMPUTE"]:
|
||||
numerical_cols = features.select_dtypes(include=np.number).columns.tolist()
|
||||
|
||||
cor_matrix = features[numerical_cols].corr(method='spearman').abs()
|
||||
|
||||
upper_tri = cor_matrix.where(np.triu(np.ones(cor_matrix.shape), k=1).astype(np.bool))
|
||||
|
||||
to_drop = [column for column in upper_tri.columns if any(upper_tri[column] > drop_corr_features["CORR_THRESHOLD"])]
|
||||
|
||||
# Tukaj je še neka validacija s thresholdom, ampak ne razumem R kode "valid_pairs"
|
||||
features.drop(to_drop, axis=1, inplace=True)
|
||||
|
||||
# Remove rows if threshold of NaN values is passed
|
||||
min_count = math.ceil((1 - provider["ROWS_NAN_THRESHOLD"]) * features.shape[1]) # min not nan values in row
|
||||
features.dropna(axis=0, thresh=min_count, inplace=True)
|
||||
|
||||
return features
|
||||
|
||||
def impute(df, method='zero'):
|
||||
df.loc[:, df.isna().all()] = df.loc[:, df.isna().all()].fillna(0) # if column contains only NaN values impute it with 0
|
||||
return { # rest of the columns should be imputed with the selected method
|
||||
'zero': df.fillna(0),
|
||||
'mean': df.fillna(df.mean()),
|
||||
'median': df.fillna(df.median()),
|
||||
'k-nearest': None # To-Do
|
||||
}[method]
|
||||
|
||||
|
|
@ -3,9 +3,11 @@ library(tidyr)
|
|||
library(readr)
|
||||
|
||||
compute_data_yield_features <- function(data, feature_name, time_segment, provider){
|
||||
|
||||
data <- data %>% filter_data_by_segment(time_segment)
|
||||
if(nrow(data) == 0)
|
||||
if(nrow(data) == 0){
|
||||
return(tibble(local_segment = character(), ratiovalidyieldedminutes = numeric(), ratiovalidyieldedhours = numeric()))
|
||||
}
|
||||
features <- data %>%
|
||||
separate(timestamps_segment, into = c("start_timestamp", "end_timestamp"), convert = T, sep = ",") %>%
|
||||
mutate(duration_minutes = (end_timestamp - start_timestamp) / 60000,
|
||||
|
|
|
@ -26,8 +26,7 @@ if provider_key == "cr":
|
|||
windows_features_data.to_csv(snakemake.output[1], index=False)
|
||||
windows_features_data.to_csv(snakemake.output[0], index=False)
|
||||
else:
|
||||
windows_features_data.loc[:, ~windows_features_data.columns.isin(excluded_columns)] = \
|
||||
StandardScaler().fit_transform(windows_features_data.loc[:, ~windows_features_data.columns.isin(excluded_columns)])
|
||||
windows_features_data.loc[:, ~windows_features_data.columns.isin(excluded_columns)] = StandardScaler().fit_transform(windows_features_data.loc[:, ~windows_features_data.columns.isin(excluded_columns)])
|
||||
|
||||
windows_features_data.to_csv(snakemake.output[1], index=False)
|
||||
|
||||
|
@ -35,6 +34,17 @@ if provider_key == "cr":
|
|||
so_features_names = provider_main["WINDOWS"]["SECOND_ORDER_FEATURES"]
|
||||
windows_so_features_data = extract_second_order_features(windows_features_data, so_features_names, prefix)
|
||||
windows_so_features_data.to_csv(snakemake.output[0], index=False)
|
||||
else:
|
||||
pd.DataFrame().to_csv(snakemake.output[0], index=False)
|
||||
|
||||
else:
|
||||
pass #To-Do for the rest of the sensors.
|
||||
for sensor_features in sensor_data_files["sensor_features"]:
|
||||
if "/" + sensor_key + ".csv" in sensor_features:
|
||||
sensor_data = pd.read_csv(sensor_features)
|
||||
excluded_columns = ['local_segment', 'local_segment_label', 'local_segment_start_datetime', 'local_segment_end_datetime']
|
||||
|
||||
if not sensor_data.empty:
|
||||
sensor_data.loc[:, ~sensor_data.columns.isin(excluded_columns)] = StandardScaler().fit_transform(sensor_data.loc[:, ~sensor_data.columns.isin(excluded_columns)])
|
||||
|
||||
sensor_data.to_csv(snakemake.output[0], index=False)
|
||||
break
|
|
@ -0,0 +1,17 @@
|
|||
source("renv/activate.R")
|
||||
|
||||
library(tidyr)
|
||||
library(purrr)
|
||||
library("dplyr", warn.conflicts = F)
|
||||
library(stringr)
|
||||
|
||||
feature_files <- snakemake@input[["feature_files"]]
|
||||
|
||||
|
||||
features_of_all_participants <- tibble(filename = feature_files) %>% # create a data frame
|
||||
mutate(file_contents = map(filename, ~ read.csv(., stringsAsFactors = F, colClasses = c(local_segment = "character", local_segment_label = "character", local_segment_start_datetime="character", local_segment_end_datetime="character"))),
|
||||
pid = str_match(filename, ".*/(.*)/z_all_sensor_features.csv")[,2]) %>%
|
||||
unnest(cols = c(file_contents)) %>%
|
||||
select(-filename)
|
||||
|
||||
write.csv(features_of_all_participants, snakemake@output[[1]], row.names = FALSE)
|
|
@ -0,0 +1,23 @@
|
|||
import pandas as pd
|
||||
import numpy as np
|
||||
import seaborn as sns
|
||||
import matplotlib.pyplot as plt
|
||||
import os, sys
|
||||
|
||||
participant = "p032"
|
||||
|
||||
folder = f"/rapids/data/processed/features/{participant}/"
|
||||
for filename in os.listdir(folder):
|
||||
if filename.startswith("phone_"):
|
||||
df = pd.read_csv(f"{folder}{filename}")
|
||||
plt.figure()
|
||||
sns.heatmap(df[[col for col in df if col.startswith('phone_')]], cbar=True)
|
||||
plt.savefig(f'{participant}_{filename}.png', bbox_inches='tight')
|
||||
plt.close()
|
||||
|
||||
plt.figure()
|
||||
sns.heatmap(df[[col for col in df if col.startswith('phone_')]].isna(), cbar=True)
|
||||
plt.savefig(f'is_na_{participant}_{filename}.png', bbox_inches='tight')
|
||||
plt.close()
|
||||
|
||||
|
Loading…
Reference in New Issue