Switch to 30_before ERS with corresponding targets.

Change output files settings to nonstandardized.
Fix some bugs and set categorical columns as categories dtypes.
2022-11-29 11:35:49 +00:00 · 2022-11-29 11:19:43 +00:00 · 2022-11-28 12:44:25 +00:00 · 2022-11-16 09:50:18 +00:00 · 2022-11-15 11:21:51 +00:00 · 2022-11-15 10:14:07 +00:00
23 changed files with 361 additions and 1703 deletions
--- a/.gitignore
+++ b/.gitignore
@ -100,9 +100,6 @@ data/external/*
 !/data/external/wiki_tz.csv
 !/data/external/main_study_usernames.csv
 !/data/external/timezone.csv
-!/data/external/play_store_application_genre_catalogue.csv
-!/data/external/play_store_categories_count.csv
-

 data/raw/*
 !/data/raw/.gitkeep
--- a/README.md
+++ b/README.md
@ -16,7 +16,7 @@ By [MoSHI](https://www.moshi.pitt.edu/), [University of Pittsburgh](https://www.

 For RAPIDS installation refer to to the [documentation](https://www.rapids.science/1.8/setup/installation/)

-### For the installation of the Docker version
+## For the installation of the Docker version

 1. Follow the [instructions](https://www.rapids.science/1.8/setup/installation/) to setup RAPIDS via Docker (from scratch).

@ -46,7 +46,7 @@ Type R to go to the interactive R session and then:
    ```

 6. Install cr-features module 
-From: https://repo.ijs.si/matjazbostic/calculatingfeatures.git -> branch master. 
+From: https://repo.ijs.si/matjazbostic/calculatingfeatures.git -> branch modifications_for_rapids. 
 Then follow the "cr-features module" section below.  

 7. Install all required packages from environment.yml, prune also deletes conda packages not present in environment file.
@ -62,7 +62,7 @@ Then follow the "cr-features module" section below.
    conda env export --no-builds | sed 's/^.*libgfortran.*$/  - libgfortran/' | sed 's/^.*mkl=.*$/  - mkl/' >  environment.yml
    ```

-### cr-features module 
+## cr-features module 

 This RAPIDS extension uses cr-features library accessible [here](https://repo.ijs.si/matjazbostic/calculatingfeatures).

@ -78,124 +78,4 @@ To use cr-features library:
    e.g. pip install ./calculatingfeatures if the folder is copied to main parent directory
    cr-features package has to be built and installed everytime to get the newest version. 
    Or an the newest version of the docker image must be used.   
-    ```
-
-## Updating RAPIDS
-
-To update RAPIDS, first pull and merge [origin]( https://github.com/carissalow/rapids), such as with:
-
-```commandline
-git fetch --progress "origin" refs/heads/master
-git merge --no-ff origin/master
-```
-
-Next, update the conda and R virtual environment.
-
-```bash
-R -e 'renv::restore(repos = c(CRAN = "https://packagemanager.rstudio.com/all/__linux__/focal/latest"))'
-```
-
-## Custom configuration
-### Credentials
-
-As mentioned under [Database in RAPIDS documentation](https://www.rapids.science/1.6/snippets/database/), a `credentials.yaml` file is needed to connect to a database.
-It should contain:
-
-```yaml
-PSQL_STRAW:
-  database: staw
-  host: 212.235.208.113
-  password: password
-  port: 5432
-  user: staw_db
-```
-
-where`password` needs to be specified as well.
-
-## Possible installation issues
-### Missing dependencies for RPostgres
-
-To install `RPostgres` R package (used to connect to the PostgreSQL database), an error might occur:
-
-```text
------------------------- ANTICONF ERROR ---------------------------
-Configuration failed because libpq was not found. Try installing:
-   * deb: libpq-dev (Debian, Ubuntu, etc)
-   * rpm: postgresql-devel (Fedora, EPEL)
-   * rpm: postgreql8-devel, psstgresql92-devel, postgresql93-devel, or postgresql94-devel (Amazon Linux)
-   * csw: postgresql_dev (Solaris)
-   * brew: libpq (OSX)
-If libpq is already installed, check that either:
-  (i)  'pkg-config' is in your PATH AND PKG_CONFIG_PATH contains a libpq.pc file; or
-  (ii) 'pg_config' is in your PATH.
-If neither can detect , you can set INCLUDE_DIR
-and LIB_DIR manually via:
-  R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------[ ERROR MESSAGE ]----------------------------
-  <stdin>:1:10: fatal error: libpq-fe.h: No such file or directory
-compilation terminated.
-```
-
-The library requires `libpq` for compiling from source, so install accordingly.
-
-### Timezone environment variable for tidyverse (relevant for WSL2)
-
-One of the R packages, `tidyverse` might need access to the `TZ` environment variable during the installation.
-On Ubuntu 20.04 on WSL2 this triggers the following error:
-
-```text
-> install.packages('tidyverse')
-
-ERROR: configuration failed for package ‘xml2’
-System has not been booted with systemd as init system (PID 1). Can't operate.
-Failed to create bus connection: Host is down
-Warning in system("timedatectl", intern = TRUE) :
-  running command 'timedatectl' had status 1
-Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
-  namespace ‘xml2’ 1.3.1 is already loaded, but >= 1.3.2 is required
-Calls: <Anonymous> ... namespaceImportFrom -> asNamespace -> loadNamespace
-Execution halted
-ERROR: lazy loading failed for package ‘tidyverse’
-```
-
-This happens because WSL2 does not use the `timedatectl` service, which provides this variable.
-
-```bash
-~$ timedatectl
-System has not been booted with systemd as init system (PID 1). Can't operate.
-Failed to create bus connection: Host is down
-```
-
-and later 
-
-```bash 
-Warning message:
-In system("timedatectl", intern = TRUE) :
-  running command 'timedatectl' had status 1
-Execution halted
-```
-
-This can be amended by setting the environment variable manually before attempting to install `tidyverse`:
-
-```bash
-export TZ='Europe/Ljubljana'
-```
-
-Note: if this is needed to avoid runtime issues, you need to either define this environment variable in each new terminal window or (better) define it in your `~/.bashrc` or `~/.bash_profile`.
-
-## Possible runtime issues
-### Unix end of line characters
-
-Upon running rapids, an error might occur:
-
-```bash
-/usr/bin/env: ‘python3\r’: No such file or directory
-```
-
-This is due to Windows style end of line characters. 
-To amend this, I added a `.gitattributes` files to force `git` to checkout `rapids` using Unix EOL characters.
-If this still fails, `dos2unix` can be used to change them.
-
-### System has not been booted with systemd as init system (PID 1)
-
-See [the installation issue above](#Timezone-environment-variable-for-tidyverse-(relevant-for-WSL2)).
+    ```
--- a/9
+++ b/9
@ -174,15 +174,6 @@ for provider in config["PHONE_ESM"]["PROVIDERS"].keys():
        # files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv",pid=config["PIDS"]))
        # files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-for provider in config["PHONE_SPEECH"]["PROVIDERS"].keys():
-    if config["PHONE_SPEECH"]["PROVIDERS"][provider]["COMPUTE"]:
-        files_to_compute.extend(expand("data/raw/{pid}/phone_speech_raw.csv",pid=config["PIDS"]))
-        files_to_compute.extend(expand("data/raw/{pid}/phone_speech_with_datetime.csv",pid=config["PIDS"]))
-        files_to_compute.extend(expand("data/interim/{pid}/phone_speech_features/phone_speech_{language}_{provider_key}.csv",pid=config["PIDS"],language=get_script_language(config["PHONE_SPEECH"]["PROVIDERS"][provider]["SRC_SCRIPT"]),provider_key=provider.lower()))
-        files_to_compute.extend(expand("data/processed/features/{pid}/phone_speech.csv", pid=config["PIDS"]))
-        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
-        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
-
 # We can delete these if's as soon as we add feature PROVIDERS to any of these sensors
 if isinstance(config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"], dict):
    for provider in config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"].keys():
--- a/config.yaml
+++ b/config.yaml
@ -27,8 +27,6 @@ TIME_SEGMENTS: &time_segments
  TAILORED_EVENTS: # Only relevant if TYPE=EVENT
    COMPUTE: True
    SEGMENTING_METHOD: "30_before" # 30_before, 90_before, stress_event
-    INTERVAL_OF_INTEREST: 10 # duration of event of interest [minutes]
-    IOI_ERROR_TOLERANCE: 5 # interval of interest erorr tolerance (before and after IOI) [minutes]

 # See https://www.rapids.science/latest/setup/configuration/#timezone-of-your-study
 TIMEZONE: 
@ -104,9 +102,9 @@ PHONE_APPLICATIONS_CRASHES:
  CONTAINER: applications_crashes
  APPLICATION_CATEGORIES:
    CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
-    CATALOGUE_FILE: "data/external/play_store_application_genre_catalogue.csv"
-    UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
-    SCRAPE_MISSING_CATEGORIES: False # whether to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
+    CATALOGUE_FILE: "data/external/stachl_application_genre_catalogue.csv"
+    UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether or not to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
+    SCRAPE_MISSING_CATEGORIES: False # whether or not to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
  PROVIDERS: # None implemented yet but this sensor can be used in PHONE_DATA_YIELD

 # See https://www.rapids.science/latest/features/phone-applications-foreground/
@ -114,32 +112,24 @@ PHONE_APPLICATIONS_FOREGROUND:
  CONTAINER: applications
  APPLICATION_CATEGORIES:
    CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
-    CATALOGUE_FILE: "data/external/play_store_application_genre_catalogue.csv"
-    # Refer to data/external/play_store_categories_count.csv for a list of categories (genres) and their frequency.
-    UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
-    SCRAPE_MISSING_CATEGORIES: False # whether to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
+    CATALOGUE_FILE: "data/external/stachl_application_genre_catalogue.csv"
+    PACKAGE_NAMES_HASHED: True
+    UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether or not to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
+    SCRAPE_MISSING_CATEGORIES: False # whether or not to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
  PROVIDERS:
    RAPIDS:
      COMPUTE: True
      INCLUDE_EPISODE_FEATURES: True
-      SINGLE_CATEGORIES: ["Productivity", "Tools", "Communication", "Education", "Social"]
+      SINGLE_CATEGORIES: ["all", "email"]
      MULTIPLE_CATEGORIES:
-        games: ["Puzzle", "Card", "Casual", "Board", "Strategy", "Trivia", "Word", "Adventure", "Role Playing", "Simulation", "Board, Brain Games", "Racing"]
-        social: ["Communication", "Social", "Dating"]
-        productivity: ["Tools", "Productivity", "Finance", "Education", "News & Magazines", "Business", "Books & Reference"]
-        health: ["Health & Fitness", "Lifestyle", "Food & Drink", "Sports", "Medical", "Parenting"]
-        entertainment: ["Shopping", "Music & Audio", "Entertainment", "Travel & Local", "Photography", "Video Players & Editors", "Personalization", "House & Home", "Art & Design", "Auto & Vehicles", "Entertainment,Music & Video",
-                        "Puzzle", "Card", "Casual", "Board", "Strategy", "Trivia", "Word", "Adventure", "Role Playing", "Simulation", "Board, Brain Games", "Racing" # Add all games.
-        ]
-        maps_weather: ["Maps & Navigation", "Weather"]
+        social: ["socialnetworks", "socialmediatools"]
+        entertainment: ["entertainment", "gamingknowledge", "gamingcasual", "gamingadventure", "gamingstrategy", "gamingtoolscommunity", "gamingroleplaying", "gamingaction", "gaminglogic", "gamingsports", "gamingsimulation"]
      CUSTOM_CATEGORIES:
-      SINGLE_APPS: []
-      EXCLUDED_CATEGORIES: ["System", "STRAW"]
-      # Note: A special option here is "is_system_app".
-      # This excludes applications that have is_system_app = TRUE, which is a separate column in the table.
-      # However, all of these applications have been assigned System category.
-      # I will therefore filter by that category, which is a superset and is more complete. JL
-      EXCLUDED_APPS: []
+        social_media: ["com.google.android.youtube", "com.snapchat.android", "com.instagram.android", "com.zhiliaoapp.musically", "com.facebook.katana"]
+        dating: ["com.tinder", "com.relance.happycouple", "com.kiwi.joyride"]
+      SINGLE_APPS: ["top1global", "com.facebook.moments", "com.google.android.youtube", "com.twitter.android"] # There's no entropy for single apps
+      EXCLUDED_CATEGORIES: []
+      EXCLUDED_APPS: ["com.fitbit.FitbitMobile", "com.aware.plugin.upmc.cancer"] # TODO list system apps?
      FEATURES: 
        APP_EVENTS: ["countevent", "timeoffirstuse", "timeoflastuse", "frequencyentropy"]
        APP_EPISODES: ["countepisode", "minduration", "maxduration", "meanduration", "sumduration"]
@ -337,15 +327,6 @@ PHONE_SCREEN:
      EPISODE_TYPES: ["unlock"]
      SRC_SCRIPT: src/features/phone_screen/rapids/main.py

-# Custom added sensor
-PHONE_SPEECH:
-  CONTAINER: speech
-  PROVIDERS:
-    STRAW:
-      COMPUTE: True
-      FEATURES: ["meanspeech", "stdspeech", "nlargest", "nsmallest", "medianspeech"]
-      SRC_SCRIPT: src/features/phone_speech/straw/main.py
-
 # See https://www.rapids.science/latest/features/phone-wifi-connected/
 PHONE_WIFI_CONNECTED:
  CONTAINER: sensor_wifi
@ -729,8 +710,7 @@ ALL_CLEANING_OVERALL:
        COMPUTE: True
        MIN_OVERLAP_FOR_CORR_THRESHOLD: 0.5
        CORR_THRESHOLD: 0.95
-      STANDARDIZATION: True
-      TARGET_STANDARDIZATION: False
+      STANDARDIZATION: False
      SRC_SCRIPT: src/features/all_cleaning_overall/straw/main.py


@ -753,6 +733,7 @@ PARAMS_FOR_ANALYSIS:
  TARGET:
    COMPUTE: True
    LABEL: appraisal_stressfulness_event_mean
-    ALL_LABELS: [PANAS_positive_affect_mean, PANAS_negative_affect_mean, JCQ_job_demand_mean, JCQ_job_control_mean, JCQ_supervisor_support_mean, JCQ_coworker_support_mean, appraisal_stressfulness_period_mean]
+    ALL_LABELS: [PANAS_positive_affect_mean, PANAS_negative_affect_mean, JCQ_job_demand_mean, JCQ_job_control_mean, JCQ_supervisor_support_mean,
+                JCQ_coworker_support_mean, appraisal_stressfulness_period_mean, appraisal_stressfulness_event_mean, appraisal_threat_mean, appraisal_challenge_mean]
                # PANAS_positive_affect_mean, PANAS_negative_affect_mean, JCQ_job_demand_mean, JCQ_job_control_mean, JCQ_supervisor_support_mean, 
                # JCQ_coworker_support_mean, appraisal_stressfulness_period_mean, appraisal_stressfulness_event_mean, appraisal_threat_mean, appraisal_challenge_mean
--- a/data/external/play_store_application_genre_catalogue.csv
+++ b/data/external/play_store_application_genre_catalogue.csv
--- a/data/external/play_store_categories_count.csv
+++ b/data/external/play_store_categories_count.csv
@ -1,45 +0,0 @@
-genre,n
-System,261
-Tools,96
-Productivity,71
-Health & Fitness,60
-Finance,54
-Communication,39
-Music & Audio,39
-Shopping,38
-Lifestyle,33
-Education,28
-News & Magazines,24
-Maps & Navigation,23
-Entertainment,21
-Business,18
-Travel & Local,18
-Books & Reference,16
-Social,16
-Weather,16
-Food & Drink,14
-Sports,14
-Other,13
-Photography,13
-Puzzle,13
-Video Players & Editors,12
-Card,9
-Casual,9
-Personalization,8
-Medical,7
-Board,5
-Strategy,4
-House & Home,3
-Trivia,3
-Word,3
-Adventure,2
-Art & Design,2
-Auto & Vehicles,2
-Dating,2
-Role Playing,2
-STRAW,2
-Simulation,2
-"Board,Brain Games",1
-"Entertainment,Music & Video",1
-Parenting,1
-Racing,1
--- a/donotmakechanges.py
+++ b/donotmakechanges.py
@ -1,39 +0,0 @@
-"""
-Please do not make any changes, as RAPIDS is running on tmux server ...
-"""
-# !
-# !
-"""
-Please do not make any changes, as RAPIDS is running on tmux server ...
-"""
-# !
-# !
-"""
-Please do not make any changes, as RAPIDS is running on tmux server ...
-"""
-# !
-# !
-"""
-Please do not make any changes, as RAPIDS is running on tmux server ...
-"""
-# !
-# !
-"""
-Please do not make any changes, as RAPIDS is running on tmux server ...
-"""
-# !
-# !
-"""
-Please do not make any changes, as RAPIDS is running on tmux server ...
-"""
-# !
-# !
-"""
-Please do not make any changes, as RAPIDS is running on tmux server ...
-"""
-# !
-# !
-"""
-Please do not make any changes, as RAPIDS is running on tmux server ...
-"""
-# !
--- a/environment.yml
+++ b/environment.yml
@ -1,30 +1,165 @@
 name: rapids
 channels:
  - conda-forge
+  - defaults
 dependencies:
-    - auto-sklearn
-    - hmmlearn
-    - imbalanced-learn
-    - jsonschema
-    - lightgbm
-    - matplotlib
-    - numpy
-    - pandas
-    - peakutils
-    - pip
-    - plotly
-    - python-dateutil
-    - pytz
-    - pywavelets
-    - pyyaml
-    - scikit-learn
-    - scipy
-    - seaborn
-    - setuptools
-    - bioconda::snakemake 
-    - bioconda::snakemake-minimal
-    - tqdm
-    - xgboost
-    - pip:
-        - biosppy
-        - cr_features>=0.2
+  - _libgcc_mutex=0.1
+  - _openmp_mutex=4.5
+  - _py-xgboost-mutex=2.0
+  - appdirs=1.4.4
+  - arrow=0.16.0
+  - asn1crypto=1.4.0
+  - astropy=4.2.1
+  - attrs=20.3.0
+  - binaryornot=0.4.4
+  - blas=1.0
+  - brotlipy=0.7.0
+  - bzip2=1.0.8
+  - ca-certificates=2021.7.5
+  - certifi=2021.5.30
+  - cffi=1.14.4
+  - chardet=3.0.4
+  - click=7.1.2
+  - colorama=0.4.4
+  - cookiecutter=1.6.0
+  - cryptography=3.3.1
+  - datrie=0.8.2
+  - docutils=0.16
+  - future=0.18.2
+  - gitdb=4.0.5
+  - gitdb2=4.0.2
+  - gitpython=3.1.11
+  - idna=2.10
+  - imbalanced-learn=0.6.2
+  - importlib-metadata=2.0.0
+  - importlib_metadata=2.0.0
+  - intel-openmp=2019.4
+  - jinja2=2.11.2
+  - jinja2-time=0.2.0
+  - joblib=1.0.0
+  - jsonschema=3.2.0
+  - ld_impl_linux-64=2.36.1
+  - libblas=3.8.0
+  - libcblas=3.8.0
+  - libcxx=10.0.0
+  - libcxxabi=10.0.0
+  - libedit=3.1.20191231
+  - libffi=3.3
+  - libgcc-ng=11.2.0
+  - libgfortran
+  - libgfortran
+  - libgfortran
+  - liblapack=3.8.0
+  - libopenblas=0.3.10
+  - libstdcxx-ng=11.2.0
+  - libxgboost=0.90
+  - libzlib=1.2.11
+  - lightgbm=3.1.1
+  - llvm-openmp=10.0.0
+  - markupsafe=1.1.1
+  - mkl
+  - mkl-service=2.3.0
+  - mkl_fft=1.2.0
+  - mkl_random=1.1.1
+  - more-itertools=8.6.0
+  - ncurses=6.2
+  - numpy=1.19.2
+  - numpy-base=1.19.2
+  - openblas=0.3.4
+  - openssl=1.1.1k
+  - pandas=1.1.5
+  - pbr=5.5.1
+  - pip=20.3.3
+  - plotly=4.14.1
+  - poyo=0.5.0
+  - psutil=5.7.2
+  - py-xgboost=0.90
+  - pycparser=2.20
+  - pyerfa=1.7.1.1
+  - pyopenssl=20.0.1
+  - pysocks=1.7.1
+  - python=3.7.9
+  - python-dateutil=2.8.1
+  - python_abi=3.7
+  - pytz=2020.4
+  - pyyaml=5.3.1
+  - readline=8.0
+  - requests=2.25.0
+  - retrying=1.3.3
+  - setuptools=51.0.0
+  - six=1.15.0
+  - smmap=3.0.4
+  - smmap2=3.0.1
+  - sqlite=3.33.0
+  - threadpoolctl=2.1.0
+  - tk=8.6.10
+  - tqdm=4.62.0
+  - urllib3=1.25.11
+  - wheel=0.36.2
+  - whichcraft=0.6.1
+  - wrapt=1.12.1
+  - xgboost=0.90
+  - xz=5.2.5
+  - yaml=0.2.5
+  - zipp=3.4.0
+  - zlib=1.2.11
+  - pip:
+    - amply==0.1.4
+    - auto-sklearn==0.14.7
+    - bidict==0.22.0
+    - biosppy==0.8.0
+    - build==0.8.0
+    - cached-property==1.5.2
+    - cloudpickle==2.2.0
+    - configargparse==0.15.1
+    - configspace==0.4.21
+    - cr-features==0.2.1
+    - cycler==0.11.0
+    - cython==0.29.32
+    - dask==2022.2.0
+    - decorator==4.4.2
+    - distributed==2022.2.0
+    - distro==1.7.0
+    - emcee==3.1.2
+    - fonttools==4.33.2
+    - fsspec==2022.8.2
+    - h5py==3.6.0
+    - heapdict==1.0.1
+    - hmmlearn==0.2.7
+    - ipython-genutils==0.2.0
+    - jupyter-core==4.6.3
+    - kiwisolver==1.4.2
+    - liac-arff==2.5.0
+    - locket==1.0.0
+    - matplotlib==3.5.1
+    - msgpack==1.0.4
+    - nbformat==5.0.7
+    - opencv-python==4.5.5.64
+    - packaging==21.3
+    - partd==1.3.0
+    - peakutils==1.3.3
+    - pep517==0.13.0
+    - pillow==9.1.0
+    - pulp==2.4
+    - pynisher==0.6.4
+    - pyparsing==2.4.7
+    - pyrfr==0.8.3
+    - pyrsistent==0.15.5
+    - pywavelets==1.3.0
+    - ratelimiter==1.2.0.post0
+    - scikit-learn==0.24.2
+    - scipy==1.7.3
+    - seaborn==0.11.2
+    - shortuuid==1.0.8
+    - smac==1.2
+    - snakemake==5.30.2
+    - sortedcontainers==2.4.0
+    - tblib==1.7.0
+    - tomli==2.0.1
+    - toolz==0.12.0
+    - toposort==1.5
+    - tornado==6.2
+    - traitlets==4.3.3
+    - typing-extensions==4.2.0
+    - zict==2.2.0
+prefix: /opt/conda/envs/rapids
--- a/renv.lock
+++ b/renv.lock
@ -1,6 +1,6 @@
 {
  "R": {
-    "Version": "4.2.3",
+    "Version": "4.1.2",
    "Repositories": [
      {
        "Name": "CRAN",
@ -46,10 +46,10 @@
    },
    "Hmisc": {
      "Package": "Hmisc",
-      "Version": "5.0-1",
+      "Version": "4.4-2",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "bf9fe82c010a468fb32f913ff56d65e1"
+      "Repository": "RSPM",
+      "Hash": "66458e906b2112a8b1639964efd77d7c"
    },
    "KernSmooth": {
      "Package": "KernSmooth",
@ -104,7 +104,7 @@
      "Package": "RPostgres",
      "Version": "1.4.4",
      "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
      "Hash": "c593ecb8dbca9faf3906431be610ca28"
    },
    "Rcpp": {
@ -181,7 +181,7 @@
      "Package": "base64enc",
      "Version": "0.1-3",
      "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
      "Hash": "543776ae6848fde2f48ff3816d0628bc"
    },
    "bit": {
@ -221,24 +221,17 @@
    },
    "broom": {
      "Package": "broom",
-      "Version": "1.0.4",
+      "Version": "0.7.3",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "f62b2504021369a2449c54bbda362d30"
-    },
-    "cachem": {
-      "Package": "cachem",
-      "Version": "1.0.7",
-      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "cda74447c42f529de601fe4d4050daef"
+      "Repository": "RSPM",
+      "Hash": "5581a5ddc8fe2ac5e0d092ec2de4c4ae"
    },
    "callr": {
      "Package": "callr",
-      "Version": "3.7.3",
+      "Version": "3.5.1",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "9b2191ede20fa29828139b9900922e51"
+      "Repository": "RSPM",
+      "Hash": "b7d7f1e926dfcd57c74ce93f5c048e80"
    },
    "caret": {
      "Package": "caret",
@ -270,10 +263,10 @@
    },
    "cli": {
      "Package": "cli",
-      "Version": "3.6.1",
+      "Version": "2.2.0",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "89e6d8219950eac806ae0c489052048a"
+      "Repository": "RSPM",
+      "Hash": "3ef298932294b775fa0a3eeaa3a645b0"
    },
    "clipr": {
      "Package": "clipr",
@ -293,7 +286,7 @@
      "Package": "codetools",
      "Version": "0.2-18",
      "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
      "Hash": "019388fc48e48b3da0d3a76ff94608a8"
    },
    "colorspace": {
@ -310,13 +303,6 @@
      "Repository": "RSPM",
      "Hash": "0f22be39ec1d141fd03683c06f3a6e67"
    },
-    "conflicted": {
-      "Package": "conflicted",
-      "Version": "1.2.0",
-      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "bb097fccb22d156624fd07cd2894ddb6"
-    },
    "corpcor": {
      "Package": "corpcor",
      "Version": "1.6.9",
@ -333,10 +319,10 @@
    },
    "cpp11": {
      "Package": "cpp11",
-      "Version": "0.4.3",
+      "Version": "0.2.4",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "ed588261931ee3be2c700d22e94a29ab"
+      "Repository": "RSPM",
+      "Hash": "ba66e5a750d39067d888aa7af797fed2"
    },
    "crayon": {
      "Package": "crayon",
@ -368,10 +354,10 @@
    },
    "dbplyr": {
      "Package": "dbplyr",
-      "Version": "2.3.2",
+      "Version": "2.1.1",
      "Source": "Repository",
      "Repository": "CRAN",
-      "Hash": "d24305b92db333726aed162a2c23a147"
+      "Hash": "1f37fa4ab2f5f7eded42f78b9a887182"
    },
    "desc": {
      "Package": "desc",
@ -396,17 +382,17 @@
    },
    "dplyr": {
      "Package": "dplyr",
-      "Version": "1.1.1",
+      "Version": "1.0.5",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "eb5742d256a0d9306d85ea68756d8187"
+      "Repository": "RSPM",
+      "Hash": "d0d76c11ec807eb3f000eba4e3eb0f68"
    },
    "dtplyr": {
      "Package": "dtplyr",
-      "Version": "1.3.1",
+      "Version": "1.1.0",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "54ed3ea01b11e81a86544faaecfef8e2"
+      "Repository": "RSPM",
+      "Hash": "1e14e4c5b2814de5225312394bc316da"
    },
    "e1071": {
      "Package": "e1071",
@ -433,7 +419,7 @@
      "Package": "evaluate",
      "Version": "0.14",
      "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
      "Hash": "ec8ca05cffcc70569eaaad8469d2a3a7"
    },
    "fansi": {
@ -466,10 +452,10 @@
    },
    "forcats": {
      "Package": "forcats",
-      "Version": "1.0.0",
+      "Version": "0.5.0",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "1a0a9a3d5083d0d573c4214576f1e690"
+      "Repository": "RSPM",
+      "Hash": "1cb4279e697650f0bd78cd3601ee7576"
    },
    "foreach": {
      "Package": "foreach",
@ -506,13 +492,6 @@
      "Repository": "RSPM",
      "Hash": "f568ce73d3d59582b0f7babd0eb33d07"
    },
-    "gargle": {
-      "Package": "gargle",
-      "Version": "1.3.0",
-      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "bb3208dcdfeb2e68bf33c87601b3cbe3"
-    },
    "gclus": {
      "Package": "gclus",
      "Version": "1.3.2",
@ -536,10 +515,10 @@
    },
    "ggplot2": {
      "Package": "ggplot2",
-      "Version": "3.4.1",
+      "Version": "3.3.2",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "d494daf77c4aa7f084dbbe6ca5dcaca7"
+      "Repository": "RSPM",
+      "Hash": "4ded8b439797f7b1693bd3d238d0106b"
    },
    "ggraph": {
      "Package": "ggraph",
@ -578,30 +557,16 @@
    },
    "glue": {
      "Package": "glue",
-      "Version": "1.6.2",
+      "Version": "1.4.2",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "4f2596dfb05dac67b9dc558e5c6fba2e"
-    },
-    "googledrive": {
-      "Package": "googledrive",
-      "Version": "2.1.0",
-      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "e88ba642951bc8d1898ba0d12581850b"
-    },
-    "googlesheets4": {
-      "Package": "googlesheets4",
-      "Version": "1.1.0",
-      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "fd7b97bd862a14297b0bb7ed28a3dada"
+      "Repository": "RSPM",
+      "Hash": "6efd734b14c6471cfe443345f3e35e29"
    },
    "gower": {
      "Package": "gower",
      "Version": "0.2.2",
      "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
      "Hash": "be6a2b3529928bd803d1c437d1d43152"
    },
    "graphlayouts": {
@ -634,10 +599,10 @@
    },
    "haven": {
      "Package": "haven",
-      "Version": "2.5.2",
+      "Version": "2.3.1",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "8b331e659e67d757db0fcc28e689c501"
+      "Repository": "RSPM",
+      "Hash": "221d0ad75dfa03ebf17b1a4cc5c31dfc"
    },
    "highr": {
      "Package": "highr",
@ -648,10 +613,10 @@
    },
    "hms": {
      "Package": "hms",
-      "Version": "1.1.3",
+      "Version": "1.1.1",
      "Source": "Repository",
      "Repository": "CRAN",
-      "Hash": "b59377caa7ed00fa41808342002138f9"
+      "Hash": "5b8a2dd0fdbe2ab4f6081e6c7be6dfca"
    },
    "htmlTable": {
      "Package": "htmlTable",
@ -683,10 +648,10 @@
    },
    "httr": {
      "Package": "httr",
-      "Version": "1.4.5",
+      "Version": "1.4.2",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "f6844033201269bec3ca0097bc6c97b3"
+      "Repository": "RSPM",
+      "Hash": "a525aba14184fec243f9eaec62fbed43"
    },
    "huge": {
      "Package": "huge",
@ -695,13 +660,6 @@
      "Repository": "RSPM",
      "Hash": "a4cde4dd1d2551edb99a3273a4ad34ea"
    },
-    "ids": {
-      "Package": "ids",
-      "Version": "1.0.1",
-      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "99df65cfef20e525ed38c3d2577f7190"
-    },
    "igraph": {
      "Package": "igraph",
      "Version": "1.2.6",
@ -746,10 +704,10 @@
    },
    "jsonlite": {
      "Package": "jsonlite",
-      "Version": "1.8.4",
+      "Version": "1.7.2",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "a4269a09a9b865579b2635c77e572374"
+      "Repository": "RSPM",
+      "Hash": "98138e0994d41508c7a6b84a0600cfcb"
    },
    "knitr": {
      "Package": "knitr",
@ -802,10 +760,10 @@
    },
    "lifecycle": {
      "Package": "lifecycle",
-      "Version": "1.0.3",
+      "Version": "1.0.0",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "001cecbeac1cff9301bdc3775ee46a86"
+      "Repository": "RSPM",
+      "Hash": "3471fb65971f1a7b2d4ae7848cf2db8d"
    },
    "listenv": {
      "Package": "listenv",
@ -816,17 +774,17 @@
    },
    "lubridate": {
      "Package": "lubridate",
-      "Version": "1.9.2",
+      "Version": "1.7.9.2",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "e25f18436e3efd42c7c590a1c4c15390"
+      "Repository": "RSPM",
+      "Hash": "5b5b02f621d39a499def7923a5aee746"
    },
    "magrittr": {
      "Package": "magrittr",
-      "Version": "2.0.3",
+      "Version": "2.0.1",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "7ce2733a9826b3aeb1775d56fd305472"
+      "Repository": "RSPM",
+      "Hash": "41287f1ac7d28a92f0a286ed507928d3"
    },
    "markdown": {
      "Package": "markdown",
@ -842,13 +800,6 @@
      "Repository": "RSPM",
      "Hash": "67101e7448dfd9add4ac418623060262"
    },
-    "memoise": {
-      "Package": "memoise",
-      "Version": "2.0.1",
-      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "e2817ccf4a065c5d9d7f2cfbe7c1d78c"
-    },
    "mgcv": {
      "Package": "mgcv",
      "Version": "1.8-33",
@ -879,10 +830,10 @@
    },
    "modelr": {
      "Package": "modelr",
-      "Version": "0.1.11",
+      "Version": "0.1.8",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "4f50122dc256b1b6996a4703fecea821"
+      "Repository": "RSPM",
+      "Hash": "9fd59716311ee82cba83dc2826fc5577"
    },
    "munsell": {
      "Package": "munsell",
@ -937,7 +888,7 @@
      "Package": "parallelly",
      "Version": "1.29.0",
      "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
      "Hash": "b5f399c9ce96977e22ef32c20b6cfe87"
    },
    "pbapply": {
@ -956,10 +907,10 @@
    },
    "pillar": {
      "Package": "pillar",
-      "Version": "1.9.0",
+      "Version": "1.4.7",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "15da5a8412f317beeee6175fbc76f4bb"
+      "Repository": "RSPM",
+      "Hash": "3b3dd89b2ee115a8b54e93a34cd546b4"
    },
    "pkgbuild": {
      "Package": "pkgbuild",
@ -1026,10 +977,10 @@
    },
    "processx": {
      "Package": "processx",
-      "Version": "3.8.0",
+      "Version": "3.4.5",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "a33ee2d9bf07564efb888ad98410da84"
+      "Repository": "RSPM",
+      "Hash": "22aab6098cb14edd0a5973a8438b569b"
    },
    "prodlim": {
      "Package": "prodlim",
@ -1049,7 +1000,7 @@
      "Package": "progressr",
      "Version": "0.9.0",
      "Source": "Repository",
-      "Repository": "RSPM",
+      "Repository": "CRAN",
      "Hash": "ca0d80ecc29903f7579edbabd91f4199"
    },
    "promises": {
@ -1082,10 +1033,10 @@
    },
    "purrr": {
      "Package": "purrr",
-      "Version": "1.0.1",
+      "Version": "0.3.4",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "d71c815267c640f17ddbf7f16144b4bb"
+      "Repository": "RSPM",
+      "Hash": "97def703420c8ab10d8f0e6c72101e02"
    },
    "qap": {
      "Package": "qap",
@ -1101,13 +1052,6 @@
      "Repository": "RSPM",
      "Hash": "d35964686307333a7121eb41c7dcd4e0"
    },
-    "ragg": {
-      "Package": "ragg",
-      "Version": "1.2.5",
-      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "690bc058ea2b1b8a407d3cfe3dce3ef9"
-    },
    "rappdirs": {
      "Package": "rappdirs",
      "Version": "0.3.3",
@ -1117,17 +1061,17 @@
    },
    "readr": {
      "Package": "readr",
-      "Version": "2.1.4",
+      "Version": "1.4.0",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "b5047343b3825f37ad9d3b5d89aa1078"
+      "Repository": "RSPM",
+      "Hash": "2639976851f71f330264a9c9c3d43a61"
    },
    "readxl": {
      "Package": "readxl",
-      "Version": "1.4.2",
+      "Version": "1.3.1",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "2e6020b1399d95f947ed867045e9ca17"
+      "Repository": "RSPM",
+      "Hash": "63537c483c2dbec8d9e3183b3735254a"
    },
    "recipes": {
      "Package": "recipes",
@ -1166,10 +1110,10 @@
    },
    "reprex": {
      "Package": "reprex",
-      "Version": "2.0.2",
+      "Version": "0.3.0",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "d66fe009d4c20b7ab1927eb405db9ee2"
+      "Repository": "RSPM",
+      "Hash": "b06bfb3504cc8a4579fd5567646f745b"
    },
    "reshape2": {
      "Package": "reshape2",
@ -1194,10 +1138,10 @@
    },
    "rlang": {
      "Package": "rlang",
-      "Version": "1.1.0",
+      "Version": "0.4.10",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "dc079ccd156cde8647360f473c1fa718"
+      "Repository": "RSPM",
+      "Hash": "599df23c40a4fce9c7b4764f28c37857"
    },
    "rmarkdown": {
      "Package": "rmarkdown",
@ -1229,24 +1173,24 @@
    },
    "rstudioapi": {
      "Package": "rstudioapi",
-      "Version": "0.14",
+      "Version": "0.13",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "690bd2acc42a9166ce34845884459320"
+      "Repository": "RSPM",
+      "Hash": "06c85365a03fdaf699966cc1d3cf53ea"
    },
    "rvest": {
      "Package": "rvest",
-      "Version": "1.0.3",
+      "Version": "0.3.6",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "a4a5ac819a467808c60e36e92ddf195e"
+      "Repository": "RSPM",
+      "Hash": "a9795ccb2d608330e841998b67156764"
    },
    "scales": {
      "Package": "scales",
-      "Version": "1.2.1",
+      "Version": "1.1.1",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "906cb23d2f1c5680b8ce439b44c6fa63"
+      "Repository": "RSPM",
+      "Hash": "6f76f71042411426ec8df6c54f34e6dd"
    },
    "selectr": {
      "Package": "selectr",
@ -1292,17 +1236,17 @@
    },
    "stringi": {
      "Package": "stringi",
-      "Version": "1.7.12",
+      "Version": "1.5.3",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "ca8bd84263c77310739d2cf64d84d7c9"
+      "Repository": "RSPM",
+      "Hash": "a063ebea753c92910a4cca7b18bc1f05"
    },
    "stringr": {
      "Package": "stringr",
-      "Version": "1.5.0",
+      "Version": "1.4.0",
      "Source": "Repository",
      "Repository": "CRAN",
-      "Hash": "671a4d384ae9d32fc47a14e98bfa3dc8"
+      "Hash": "0759e6b6c0957edb1311028a49a35e76"
    },
    "survival": {
      "Package": "survival",
@ -1318,13 +1262,6 @@
      "Repository": "RSPM",
      "Hash": "b227d13e29222b4574486cfcbde077fa"
    },
-    "systemfonts": {
-      "Package": "systemfonts",
-      "Version": "1.0.4",
-      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "90b28393209827327de889f49935140a"
-    },
    "testthat": {
      "Package": "testthat",
      "Version": "3.0.1",
@ -1332,19 +1269,12 @@
      "Repository": "RSPM",
      "Hash": "17826764cb92d8b5aae6619896e5a161"
    },
-    "textshaping": {
-      "Package": "textshaping",
-      "Version": "0.3.6",
-      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "1ab6223d3670fac7143202cb6a2d43d5"
-    },
    "tibble": {
      "Package": "tibble",
-      "Version": "3.2.1",
+      "Version": "3.0.4",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "a84e2cc86d07289b3b6f5069df7a004c"
+      "Repository": "RSPM",
+      "Hash": "71dffd8544691c520dd8e41ed2d7e070"
    },
    "tidygraph": {
      "Package": "tidygraph",
@ -1355,24 +1285,24 @@
    },
    "tidyr": {
      "Package": "tidyr",
-      "Version": "1.3.0",
+      "Version": "1.1.2",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "e47debdc7ce599b070c8e78e8ac0cfcf"
+      "Repository": "RSPM",
+      "Hash": "c40b2d5824d829190f4b825f4496dfae"
    },
    "tidyselect": {
      "Package": "tidyselect",
-      "Version": "1.2.0",
+      "Version": "1.1.0",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "79540e5fcd9e0435af547d885f184fd5"
+      "Repository": "RSPM",
+      "Hash": "6ea435c354e8448819627cf686f66e0a"
    },
    "tidyverse": {
      "Package": "tidyverse",
-      "Version": "2.0.0",
+      "Version": "1.3.0",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "c328568cd14ea89a83bd4ca7f54ae07e"
+      "Repository": "RSPM",
+      "Hash": "bd51be662f359fa99021f3d51e911490"
    },
    "timeDate": {
      "Package": "timeDate",
@ -1381,13 +1311,6 @@
      "Repository": "RSPM",
      "Hash": "fde4fc571f5f61978652c229d4713845"
    },
-    "timechange": {
-      "Package": "timechange",
-      "Version": "0.2.0",
-      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "8548b44f79a35ba1791308b61e6012d7"
-    },
    "tinytex": {
      "Package": "tinytex",
      "Version": "0.28",
@ -1409,13 +1332,6 @@
      "Repository": "RSPM",
      "Hash": "fc77eb5297507cccfa3349a606061030"
    },
-    "tzdb": {
-      "Package": "tzdb",
-      "Version": "0.3.0",
-      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "b2e1cbce7c903eaf23ec05c58e59fb5e"
-    },
    "utf8": {
      "Package": "utf8",
      "Version": "1.1.4",
@ -1423,19 +1339,12 @@
      "Repository": "RSPM",
      "Hash": "4a5081acfb7b81a572e4384a7aaf2af1"
    },
-    "uuid": {
-      "Package": "uuid",
-      "Version": "1.1-0",
-      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "f1cb46c157d080b729159d407be83496"
-    },
    "vctrs": {
      "Package": "vctrs",
-      "Version": "0.6.1",
+      "Version": "0.3.8",
      "Source": "Repository",
      "Repository": "CRAN",
-      "Hash": "06eceb3a5d716fd0654cc23ca3d71a99"
+      "Hash": "ecf749a1b39ea72bd9b51b76292261f1"
    },
    "viridis": {
      "Package": "viridis",
@ -1451,13 +1360,6 @@
      "Repository": "RSPM",
      "Hash": "ce4f6271baa94776db692f1cb2055bee"
    },
-    "vroom": {
-      "Package": "vroom",
-      "Version": "1.6.1",
-      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "7015a74373b83ffaef64023f4a0f5033"
-    },
    "waldo": {
      "Package": "waldo",
      "Version": "0.2.3",
@ -1474,10 +1376,10 @@
    },
    "withr": {
      "Package": "withr",
-      "Version": "2.5.0",
+      "Version": "2.3.0",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "c0e49a9760983e81e55cdd9be92e7182"
+      "Repository": "RSPM",
+      "Hash": "7307d79f58d1885b38c4f4f1a8cb19dd"
    },
    "xfun": {
      "Package": "xfun",
@ -1488,10 +1390,10 @@
    },
    "xml2": {
      "Package": "xml2",
-      "Version": "1.3.3",
+      "Version": "1.3.2",
      "Source": "Repository",
-      "Repository": "CRAN",
-      "Hash": "40682ed6a969ea5abfd351eb67833adc"
+      "Repository": "RSPM",
+      "Hash": "d4d71a75dd3ea9eb5fa28cc21f9585e2"
    },
    "xtable": {
      "Package": "xtable",
--- a/rules/features.smk
+++ b/rules/features.smk
@ -345,19 +345,6 @@ rule esm_features:
    script:
        "../src/features/entry.py"

-rule phone_speech_python_features:
-    input:
-        sensor_data = "data/raw/{pid}/phone_speech_with_datetime.csv",
-        time_segments_labels = "data/interim/time_segments/{pid}_time_segments_labels.csv"
-    params:
-        provider = lambda wildcards: config["PHONE_SPEECH"]["PROVIDERS"][wildcards.provider_key.upper()],
-        provider_key = "{provider_key}",
-        sensor_key = "phone_speech"
-    output: 
-        "data/interim/{pid}/phone_speech_features/phone_speech_python_{provider_key}.csv"
-    script:
-        "../src/features/entry.py"
-
 rule phone_keyboard_python_features:
    input:
        sensor_data = "data/raw/{pid}/phone_keyboard_with_datetime.csv",
--- a/rules/preprocessing.smk
+++ b/rules/preprocessing.smk
@ -247,8 +247,6 @@ rule empatica_readable_datetime:
        include_past_periodic_segments = config["TIME_SEGMENTS"]["INCLUDE_PAST_PERIODIC_SEGMENTS"]
    output:
        "data/raw/{pid}/empatica_{sensor}_with_datetime.csv"
-    resources:
-        mem_mb=50000
    script:
        "../src/data/datetime/readable_datetime.R"

--- a/src/data/application_categories.R
+++ b/src/data/application_categories.R
@ -29,16 +29,23 @@ get_genre <- function(apps){
 apps <- read.csv(snakemake@input[[1]], stringsAsFactors = F)
 genre_catalogue <- data.frame()
 catalogue_source <- snakemake@params[["catalogue_source"]]
+package_names_hashed <- snakemake@params[["package_names_hashed"]]
 update_catalogue_file <- snakemake@params[["update_catalogue_file"]]
 scrape_missing_genres <- snakemake@params[["scrape_missing_genres"]]
 apps_with_genre <- data.frame(matrix(ncol=length(colnames(apps)) + 1,nrow=0, dimnames=list(NULL, c(colnames(apps), "genre"))))

+if (length(package_names_hashed) == 0) {package_names_hashed <- FALSE}
+
 if(nrow(apps) > 0){
  if(catalogue_source == "GOOGLE"){
    apps_with_genre <- apps %>% mutate(genre = NA_character_)
  } else if(catalogue_source == "FILE"){
    genre_catalogue <- read.csv(snakemake@params[["catalogue_file"]], colClasses = c("character", "character"))
-    apps_with_genre <- left_join(apps, genre_catalogue, by = "package_name")
+    if (package_names_hashed) {
+      apps_with_genre <- left_join(apps, genre_catalogue, by = "package_hash")
+    } else {
+      apps_with_genre <- left_join(apps, genre_catalogue, by = "package_name")
+    }
  }

  if(catalogue_source == "GOOGLE" || (catalogue_source == "FILE" && scrape_missing_genres)){
--- a/src/data/streams/aware_postgresql/format.yaml
+++ b/src/data/streams/aware_postgresql/format.yaml
@ -349,24 +349,3 @@ PHONE_WIFI_VISIBLE:
      COLUMN_MAPPINGS:
      SCRIPTS: # List any python or r scripts that mutate your raw data

-PHONE_SPEECH:
-  ANDROID:
-    RAPIDS_COLUMN_MAPPINGS:
-      TIMESTAMP: timestamp
-      DEVICE_ID: device_id
-      SPEECH_PROPORTION: speech_proportion
-    MUTATION:
-      COLUMN_MAPPINGS:
-      SCRIPTS: # List any python or r scripts that mutate your raw data
-  IOS:
-    RAPIDS_COLUMN_MAPPINGS:
-      TIMESTAMP: timestamp
-      DEVICE_ID: device_id
-      SPEECH_PROPORTION: speech_proportion
-    MUTATION:
-      COLUMN_MAPPINGS:
-      SCRIPTS: # List any python or r scripts that mutate your raw data
-
-
-      
-
--- a/src/data/streams/empatica_zip/container.py
+++ b/src/data/streams/empatica_zip/container.py
@ -136,9 +136,8 @@ def patch_ibi_with_bvp(ibi_data, bvp_data):
    # Begin with the cr-features part
    try:
        ibi_data, ibi_start_timestamp = empatica2d_to_array(ibi_data_file)
-    except (IndexError, KeyError) as e:
+    except IndexError as e:
        # Checks whether IBI.csv is empty
-        # It may raise a KeyError if df is empty here: startTimeStamp = df.time[0]
        df_test = pd.read_csv(ibi_data_file, names=['timings', 'inter_beat_interval'], header=None)
        if df_test.empty:
            df_test['timestamp'] = df_test['timings']
--- a/src/data/streams/rapids_columns.yaml
+++ b/src/data/streams/rapids_columns.yaml
@ -118,11 +118,6 @@ PHONE_SCREEN:
  - DEVICE_ID
  - SCREEN_STATUS

-PHONE_SPEECH:
-  - TIMESTAMP
-  - DEVICE_ID
-  - SPEECH_PROPORTION
-
 PHONE_WIFI_CONNECTED:
  - TIMESTAMP
  - DEVICE_ID
--- a/src/features/all_cleaning_individual/straw/main.py
+++ b/src/features/all_cleaning_individual/straw/main.py
@ -36,9 +36,6 @@ def straw_cleaning(sensor_data_files, provider):
    phone_data_yield_unit = provider["PHONE_DATA_YIELD_FEATURE"].split("_")[3].lower()
    phone_data_yield_column = "phone_data_yield_rapids_ratiovalidyielded" + phone_data_yield_unit

-    if features.empty:
-        return features
-
    features = edy.calculate_empatica_data_yield(features)

    if not phone_data_yield_column in features.columns and not "empatica_data_yield" in features.columns:
@ -120,7 +117,7 @@ def straw_cleaning(sensor_data_files, provider):
    esm_cols = features.loc[:, features.columns.str.startswith('phone_esm_straw')]

    if provider["COLS_VAR_THRESHOLD"]:
-        features.drop(features.std(numeric_only=True)[features.std(numeric_only=True) == 0].index.values, axis=1, inplace=True)
+        features.drop(features.std()[features.std() == 0].index.values, axis=1, inplace=True)

    fe5 = features.copy()

@ -134,7 +131,7 @@ def straw_cleaning(sensor_data_files, provider):
        valid_features = features[numerical_cols].loc[:, features[numerical_cols].isna().sum() < drop_corr_features['MIN_OVERLAP_FOR_CORR_THRESHOLD'] * features[numerical_cols].shape[0]]

        corr_matrix = valid_features.corr().abs()
-        upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool))
+        upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))
        to_drop = [column for column in upper.columns if any(upper[column] > drop_corr_features["CORR_THRESHOLD"])]

        features.drop(to_drop, axis=1, inplace=True)
@ -150,14 +147,12 @@ def straw_cleaning(sensor_data_files, provider):

    return features

-
-def k_nearest(df):
-    pd.set_option('display.max_columns', None)
-    imputer = KNNImputer(n_neighbors=3)
-    return pd.DataFrame(imputer.fit_transform(df), columns=df.columns)
-
-
 def impute(df, method='zero'):
+    
+    def k_nearest(df):
+        pd.set_option('display.max_columns', None)
+        imputer = KNNImputer(n_neighbors=3)
+        return pd.DataFrame(imputer.fit_transform(df), columns=df.columns)

    return {
        'zero': df.fillna(0),
@ -167,7 +162,6 @@ def impute(df, method='zero'):
        'knn': k_nearest(df) 
    }[method]

-
 def graph_bf_af(features, phase_name, plt_flag=False):
    if plt_flag:
        sns.set(rc={"figure.figsize":(16, 8)})
--- a/src/features/all_cleaning_overall/straw/main.py
+++ b/src/features/all_cleaning_overall/straw/main.py
@ -87,7 +87,6 @@ def straw_cleaning(sensor_data_files, provider, target):
    if features.empty:
        return pd.DataFrame(columns=excluded_columns)

-
    # (3) CONTEXTUAL IMPUTATION

    # Impute selected phone features with a high number
@ -146,7 +145,7 @@ def straw_cleaning(sensor_data_files, provider, target):
    # (5) REMOVE COLS WHERE VARIANCE IS 0

    if provider["COLS_VAR_THRESHOLD"]:
-        features.drop(features.std(numeric_only=True)[features.std(numeric_only=True) == 0].index.values, axis=1, inplace=True)
+        features.drop(features.std()[features.std() == 0].index.values, axis=1, inplace=True)

    graph_bf_af(features, "6variance_drop")

@ -170,12 +169,8 @@ def straw_cleaning(sensor_data_files, provider, target):
        # Expected warning within this code block
        with warnings.catch_warnings():
            warnings.simplefilter("ignore", category=RuntimeWarning)
-            if provider["TARGET_STANDARDIZATION"]:
-                features.loc[:, ~features.columns.isin(excluded_columns + ["pid"] + nominal_cols)] = \
-                    features.loc[:, ~features.columns.isin(excluded_columns + nominal_cols)].groupby('pid').transform(lambda x: StandardScaler().fit_transform(x.values[:,np.newaxis]).ravel())
-            else:
-                features.loc[:, ~features.columns.isin(excluded_columns + ["pid"] + nominal_cols + ['phone_esm_straw_' + target])] = \
-                    features.loc[:, ~features.columns.isin(excluded_columns + nominal_cols + ['phone_esm_straw_' + target])].groupby('pid').transform(lambda x: StandardScaler().fit_transform(x.values[:,np.newaxis]).ravel())
+            features.loc[:, ~features.columns.isin(excluded_columns + ["pid"] + nominal_cols)] = \
+                features.loc[:, ~features.columns.isin(excluded_columns + nominal_cols)].groupby('pid').transform(lambda x: StandardScaler().fit_transform(x.values[:,np.newaxis]).ravel())

    graph_bf_af(features, "8standardization")

@ -200,7 +195,7 @@ def straw_cleaning(sensor_data_files, provider, target):
        valid_features = features[numerical_cols].loc[:, features[numerical_cols].isna().sum() < drop_corr_features['MIN_OVERLAP_FOR_CORR_THRESHOLD'] * features[numerical_cols].shape[0]]

        corr_matrix = valid_features.corr().abs()
-        upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool))
+        upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))
        to_drop = [column for column in upper.columns if any(upper[column] > drop_corr_features["CORR_THRESHOLD"])]

        # sns.heatmap(corr_matrix, cmap="YlGnBu")
@ -233,25 +228,17 @@ def straw_cleaning(sensor_data_files, provider, target):
    if cat2: # Transform columns to category dtype (homelabel)
        features[cat2] = features[cat2].astype(int).astype('category')

-    # (10) DROP ALL WINDOW RELATED COLUMNS
-    win_count_cols = [col for col in features if "SO_windowsCount" in col]
-    if win_count_cols:
-        features.drop(columns=win_count_cols, inplace=True)
-
-    # (11) VERIFY IF THERE ARE ANY NANS LEFT IN THE DATAFRAME
+    # (10) VERIFY IF THERE ARE ANY NANS LEFT IN THE DATAFRAME
    if features.isna().any().any():
        raise ValueError("There are still some NaNs present in the dataframe. Please check for implementation errors.")

-
    return features

-
-def k_nearest(df):
-    imputer = KNNImputer(n_neighbors=3)
-    return pd.DataFrame(imputer.fit_transform(df), columns=df.columns)
-
-
 def impute(df, method='zero'):
+    
+    def k_nearest(df):
+        imputer = KNNImputer(n_neighbors=3)
+        return pd.DataFrame(imputer.fit_transform(df), columns=df.columns)

    return {
        'zero': df.fillna(0),
@ -261,7 +248,6 @@ def impute(df, method='zero'):
        'knn': k_nearest(df) 
    }[method]

-
 def graph_bf_af(features, phase_name, plt_flag=False):
    if plt_flag:
        sns.set(rc={"figure.figsize":(16, 8)})
--- a/src/features/cr_features_helper_methods.py
+++ b/src/features/cr_features_helper_methods.py
@ -15,13 +15,13 @@ def extract_second_order_features(intraday_features, so_features_names, prefix="
        so_features = pd.DataFrame()
        #print(intraday_features.drop("level_1", axis=1).groupby(["local_segment"]).nsmallest())
        if "mean" in so_features_names:
-            so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).mean(numeric_only=True).add_suffix("_SO_mean")], axis=1)
+            so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).mean().add_suffix("_SO_mean")], axis=1)
        
        if "median" in so_features_names:
-            so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).median(numeric_only=True).add_suffix("_SO_median")], axis=1)
+            so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).median().add_suffix("_SO_median")], axis=1)
        
        if "sd" in so_features_names:
-            so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).std(numeric_only=True).fillna(0).add_suffix("_SO_sd")], axis=1)
+            so_features = pd.concat([so_features, intraday_features.drop(prefix+"level_1", axis=1).groupby(groupby_cols).std().fillna(0).add_suffix("_SO_sd")], axis=1)
        
        if "nlargest" in so_features_names: # largest 5 -- maybe there is a faster groupby solution?
            for column in intraday_features.loc[:, ~intraday_features.columns.isin(groupby_cols+[prefix+"level_1"])]:
--- a/src/features/empatica_data_yield.py
+++ b/src/features/empatica_data_yield.py
@ -26,7 +26,7 @@ def calculate_empatica_data_yield(features): # TODO
    # Assigns 1 to values that are over 1 (in case of windows not being filled fully)
    features[empatica_data_yield_cols] = features[empatica_data_yield_cols].apply(lambda x: [y if y <= 1 or np.isnan(y) else 1 for y in x])
    
-    features["empatica_data_yield"] = features[empatica_data_yield_cols].mean(axis=1, numeric_only=True).fillna(0)
+    features["empatica_data_yield"] = features[empatica_data_yield_cols].mean(axis=1).fillna(0)
    features.drop(empatica_data_yield_cols, axis=1, inplace=True) # In case of if the advanced operations will later not be needed (e.g., weighted average)

    return features
--- a/src/features/empatica_inter_beat_interval/cr/main.py
+++ b/src/features/empatica_inter_beat_interval/cr/main.py
@ -54,7 +54,8 @@ def cr_features(sensor_data_files, time_segment, provider, filter_data_by_segmen
    data_types = {'local_timezone': 'str', 'device_id': 'str', 'timestamp': 'int64', 'inter_beat_interval': 'float64', 'timings': 'float64', 'local_date_time': 'str', 
                  'local_date': "str", 'local_time': "str", 'local_hour': "str", 'local_minute': "str", 'assigned_segments': "str"}

-    ibi_intraday_data = pd.read_csv(sensor_data_files["sensor_data"], dtype=data_types)
+    temperature_intraday_data = pd.read_csv(sensor_data_files["sensor_data"], dtype=data_types)
+    ibi_intraday_data = pd.read_csv(sensor_data_files["sensor_data"])

    requested_intraday_features = provider["FEATURES"]
    
--- a/src/features/phone_esm/straw/process_user_event_related_segments.py
+++ b/src/features/phone_esm/straw/process_user_event_related_segments.py
@ -49,14 +49,13 @@ def extract_ers(esm_df):
        extracted_ers (DataFrame): dataframe with all necessary information to write event-related segments file 
        in the correct format.
    """
-
-    pd.set_option("display.max_rows", 100)
+    pd.set_option("display.max_rows", 20)
    pd.set_option("display.max_columns", None)

    with open('config.yaml', 'r') as stream:
        config = yaml.load(stream, Loader=yaml.FullLoader)

-    pd.DataFrame(columns=["label"]).to_csv(snakemake.output[1]) # Create an empty stress_events_targets file 
+    pd.DataFrame(columns=["label", "intensity"]).to_csv(snakemake.output[1]) # Create an empty stress_events_targets file 

    esm_preprocessed = clean_up_esm(preprocess_esm(esm_df))

@ -106,9 +105,7 @@ def extract_ers(esm_df):
            extracted_ers["shift"] = extracted_ers["diffs"].apply(lambda x: format_timestamp(x))

    elif segmenting_method == "stress_event":
-        """
-        TODO: update documentation for this condition
-        This is a special case of the method as it consists of two important parts:
+        """This is a special case of the method as it consists of two important parts:
            (1) Generating of the ERS file (same as the methods above) and
            (2) Generating targets file alongside with the correct time segment labels.
        
@ -117,95 +114,58 @@ def extract_ers(esm_df):
        possiblity of the participant not remembering the start time percisely => this parameter can be manipulated with the variable
        "time_before_event" which is defined below. 
        
-        In case if the participant marked that no stressful event happened, the default of 30 minutes before the event is choosen. 
-        In this case, se_threat and se_challenge are NaN.
-        
        By default, this method also excludes all events that are longer then 2.5 hours so that the segments are easily comparable. 
        """
-
-        ioi = config["TIME_SEGMENTS"]["TAILORED_EVENTS"]["INTERVAL_OF_INTEREST"] * 60 # interval of interest in seconds
-        ioi_error_tolerance = config["TIME_SEGMENTS"]["TAILORED_EVENTS"]["IOI_ERROR_TOLERANCE"]  * 60 # interval of interest error tolerance in seconds 
-
        # Get and join required data
-        extracted_ers = esm_df.groupby(["device_id", "esm_session"])['timestamp'].apply(lambda x: math.ceil((x.max() - x.min()) / 1000)).reset_index().rename(columns={'timestamp': 'session_length'}) # questionnaire length
-        extracted_ers = extracted_ers[extracted_ers["session_length"] <= 15 * 60].reset_index(drop=True) # ensure that the longest duration of the questionnaire answering is 15 min
-        session_start_timestamp = esm_df.groupby(['device_id', 'esm_session'])['timestamp'].min().to_frame().rename(columns={'timestamp': 'session_start_timestamp'}) # questionnaire start timestamp
+        extracted_ers = esm_df.groupby(["device_id", "esm_session"])['timestamp'].apply(lambda x: math.ceil((x.max() - x.min()) / 1000)).reset_index().rename(columns={'timestamp': 'session_length'}) # questionnaire end timestamp
+        extracted_ers = extracted_ers[extracted_ers["session_length"] <= 15 * 60].reset_index(drop=True) # ensure that the longest duration of the questionnaire anwsering is 15 min
        session_end_timestamp = esm_df.groupby(['device_id', 'esm_session'])['timestamp'].max().to_frame().rename(columns={'timestamp': 'session_end_timestamp'}) # questionnaire end timestamp
-        
-        # Users' answers for the stressfulness event (se) start times and durations 
        se_time = esm_df[esm_df.questionnaire_id == 90.].set_index(['device_id', 'esm_session'])['esm_user_answer'].to_frame().rename(columns={'esm_user_answer': 'se_time'})
        se_duration = esm_df[esm_df.questionnaire_id == 91.].set_index(['device_id', 'esm_session'])['esm_user_answer'].to_frame().rename(columns={'esm_user_answer': 'se_duration'})
-
-        # Make se_durations to the appropriate lengths
-
-        # Extracted 3 targets that will be transfered in the csv file to the cleaning script. 
+        
+        # Extracted 3 targets that will be transfered with the csv file to the cleaning script. 
        se_stressfulness_event_tg = esm_df[esm_df.questionnaire_id == 87.].set_index(['device_id', 'esm_session'])['esm_user_answer_numeric'].to_frame().rename(columns={'esm_user_answer_numeric': 'appraisal_stressfulness_event'})
-        se_threat_tg = esm_df[esm_df.questionnaire_id == 88.].groupby(["device_id", "esm_session"]).mean(numeric_only=True)['esm_user_answer_numeric'].to_frame().rename(columns={'esm_user_answer_numeric': 'appraisal_threat'})
-        se_challenge_tg = esm_df[esm_df.questionnaire_id == 89.].groupby(["device_id", "esm_session"]).mean(numeric_only=True)['esm_user_answer_numeric'].to_frame().rename(columns={'esm_user_answer_numeric': 'appraisal_challenge'})
+        se_threat_tg = esm_df[esm_df.questionnaire_id == 88.].groupby(["device_id", "esm_session"]).mean()['esm_user_answer_numeric'].to_frame().rename(columns={'esm_user_answer_numeric': 'appraisal_threat'})
+        se_challenge_tg = esm_df[esm_df.questionnaire_id == 89.].groupby(["device_id", "esm_session"]).mean()['esm_user_answer_numeric'].to_frame().rename(columns={'esm_user_answer_numeric': 'appraisal_challenge'})

        # All relevant features are joined by inner join to remove standalone columns (e.g., stressfulness event target has larger count)
-        extracted_ers = extracted_ers.join(session_start_timestamp, on=['device_id', 'esm_session'], how='inner') \
-                                     .join(session_end_timestamp, on=['device_id', 'esm_session'], how='inner') \
+        extracted_ers = extracted_ers.join(session_end_timestamp, on=['device_id', 'esm_session'], how='inner') \
+                                     .join(se_time, on=['device_id', 'esm_session'], how='inner') \
+                                     .join(se_duration, on=['device_id', 'esm_session'], how='inner') \
                                     .join(se_stressfulness_event_tg, on=['device_id', 'esm_session'], how='inner') \
-                                     .join(se_time, on=['device_id', 'esm_session'], how='left') \
-                                     .join(se_duration, on=['device_id', 'esm_session'], how='left') \
-                                     .join(se_threat_tg, on=['device_id', 'esm_session'], how='left') \
-                                     .join(se_challenge_tg, on=['device_id', 'esm_session'], how='left')
+                                     .join(se_threat_tg, on=['device_id', 'esm_session'], how='inner') \
+                                     .join(se_challenge_tg, on=['device_id', 'esm_session'], how='inner')

-        # Filter-out the sessions that are not useful. Because of the ambiguity this excludes: 
+
+        # Filter sessions that are not useful. Because of the ambiguity this excludes: 
        # (1) straw event times that are marked as "0 - I don't remember"
-        extracted_ers = extracted_ers[~extracted_ers.se_time.astype(str).str.startswith("0 - ")]
+        # (2) straw event durations that are marked as "0 - I don't remember" 
+        extracted_ers = extracted_ers[(~extracted_ers.se_time.str.startswith("0 - ")) & (~extracted_ers.se_duration.str.startswith("0 - "))]
+
+        # Transform data into its final form, ready for the extraction
        extracted_ers.reset_index(drop=True, inplace=True)

-        extracted_ers.loc[extracted_ers.se_duration.astype(str).str.startswith("0 - "), 'se_duration'] = 0
-
-        # Add default duration in case if participant answered that no stressful event occured
-        extracted_ers["se_duration"] = extracted_ers["se_duration"].fillna(int((ioi + 2*ioi_error_tolerance) * 1000))
-
-        # Prepare data to fit the data structure in the CSV file ...
-        # Add the event time as the end of the questionnaire if no stress event occured
-        extracted_ers['se_time'] = extracted_ers['se_time'].fillna(extracted_ers['session_start_timestamp'])
-        # Type could be an int (timestamp [ms]) which stays the same, and datetime str which is converted to timestamp in miliseconds 
-        extracted_ers['event_timestamp'] = extracted_ers['se_time'].apply(lambda x: x if isinstance(x, int) else pd.to_datetime(x).timestamp() * 1000).astype('int64')
+        time_before_event = 5 * 60 # in seconds (5 minutes)
+        extracted_ers['event_timestamp'] = pd.to_datetime(extracted_ers['se_time']).apply(lambda x: x.timestamp() * 1000).astype('int64')
        extracted_ers['shift_direction'] = -1

-        """>>>>> begin section (could be optimized) <<<<<"""
-
        # Checks whether the duration is marked with "1 - It's still ongoing" which means that the end of the current questionnaire
        # is taken as end time of the segment. Else the user input duration is taken. 
        extracted_ers['se_duration'] = \
            np.where(
-                extracted_ers['se_duration'].astype(str).str.startswith("1 - "),
+                extracted_ers['se_duration'].str.startswith("1 - "),
                extracted_ers['session_end_timestamp'] - extracted_ers['event_timestamp'], 
                extracted_ers['se_duration']
            )

-        # This converts the rows of timestamps in miliseconds and the rows with datetime... to timestamp in seconds.
+        # This converts the rows of timestamps in miliseconds and the row with datetime to timestamp in seconds.
        extracted_ers['se_duration'] = \
-            extracted_ers['se_duration'].apply(lambda x: math.ceil(x / 1000) if isinstance(x, int) else (pd.to_datetime(x).hour * 60 + pd.to_datetime(x).minute) * 60)
+            extracted_ers['se_duration'].apply(lambda x: math.ceil(x / 1000) if isinstance(x, int) else (pd.to_datetime(x).hour * 60 + pd.to_datetime(x).minute) * 60) + time_before_event

-        # Check explicitley whether min duration is at least 0. This will eliminate rows that would be investigated after the end of the questionnaire.
-        extracted_ers = extracted_ers[extracted_ers['session_end_timestamp'] - extracted_ers['event_timestamp'] >= 0]
-        # Double check whether min se_duration is at least 0. Filter-out the rest. Negative values are considered invalid.
-        extracted_ers = extracted_ers[extracted_ers["se_duration"] >= 0].reset_index(drop=True)
+        extracted_ers['shift'] = format_timestamp(time_before_event)
+        extracted_ers['length'] = extracted_ers['se_duration'].apply(lambda x: format_timestamp(x))

-        """>>>>> end section <<<<<"""
-
-        # Simply override all durations to be of an equal amount
-        extracted_ers['se_duration'] = ioi + 2*ioi_error_tolerance 
-
-        # If target is 0 then shift by the total stress event duration, otherwise shift it by ioi_tolerance
-        extracted_ers['shift'] = \
-            np.where(
-                extracted_ers['appraisal_stressfulness_event'] == 0,
-                extracted_ers['se_duration'], 
-                ioi_error_tolerance
-            )
-
-        extracted_ers['shift'] = extracted_ers['shift'].apply(lambda x: format_timestamp(int(x)))
-        extracted_ers['length'] = extracted_ers['se_duration'].apply(lambda x: format_timestamp(int(x)))
-
-        # Drop event_timestamp duplicates in case in the user is referencing the same event over multiple questionnaires
+        # Drop event_timestamp duplicates in case of user referencing the same event over multiple questionnaires
        extracted_ers.drop_duplicates(subset=["event_timestamp"], keep='first', inplace=True)
        extracted_ers.reset_index(drop=True, inplace=True)

--- a/src/features/phone_locations/doryab/add_doryab_extra_columns.py
+++ b/src/features/phone_locations/doryab/add_doryab_extra_columns.py
@ -115,7 +115,7 @@ cluster_on = provider["CLUSTER_ON"]
 strategy = provider["INFER_HOME_LOCATION_STRATEGY"]
 days_threshold = provider["MINIMUM_DAYS_TO_DETECT_HOME_CHANGES"]

-if not location_data.timestamp.is_monotonic_increasing:
+if not location_data.timestamp.is_monotonic:
    location_data.sort_values(by=["timestamp"], inplace=True)

 location_data["duration_in_seconds"] = -1 * location_data.timestamp.diff(-1) / 1000
--- a/src/features/phone_speech/straw/main.py
+++ b/src/features/phone_speech/straw/main.py
@ -1,30 +0,0 @@
-import pandas as pd
-
-
-def straw_features(sensor_data_files, time_segment, provider, filter_data_by_segment, *args, **kwargs):
-    speech_data = pd.read_csv(sensor_data_files["sensor_data"])
-    requested_features = provider["FEATURES"]
-    # name of the features this function can compute+
-    base_features_names = ["meanspeech", "stdspeech", "nlargest", "nsmallest", "medianspeech"]
-    features_to_compute = list(set(requested_features) & set(base_features_names))
-    speech_features = pd.DataFrame(columns=["local_segment"] + features_to_compute)
-    
-    if not speech_data.empty:
-        speech_data = filter_data_by_segment(speech_data, time_segment)
-
-        if not speech_data.empty:
-            speech_features = pd.DataFrame()
-            if "meanspeech" in features_to_compute:
-                speech_features["meanspeech"] = speech_data.groupby(["local_segment"])['speech_proportion'].mean()
-            if "stdspeech" in features_to_compute:
-                speech_features["stdspeech"] = speech_data.groupby(["local_segment"])['speech_proportion'].std()
-            if "nlargest" in features_to_compute:
-                speech_features["nlargest"] = speech_data.groupby(["local_segment"])['speech_proportion'].apply(lambda x: x.nlargest(5).mean())
-            if "nsmallest" in features_to_compute:
-                speech_features["nsmallest"] = speech_data.groupby(["local_segment"])['speech_proportion'].apply(lambda x: x.nsmallest(5).mean())
-            if "medianspeech" in features_to_compute:
-                speech_features["medianspeech"] = speech_data.groupby(["local_segment"])['speech_proportion'].median()
-            
-            speech_features = speech_features.reset_index()
-
-    return speech_features
Author	SHA1	Message	Date
Primoz	8a6b52a97c	Switch to 30_before ERS with corresponding targets.	2022-11-29 11:35:49 +00:00
Primoz	244a053730	Change output files settings to nonstandardized.	2022-11-29 11:19:43 +00:00
Primoz	be0324fd01	Fix some bugs and set categorical columns as categories dtypes.	2022-11-28 12:44:25 +00:00
Primoz	99c2fab8f9	Fix a bug in the making of the individual model (when there is no target in the participants columns).	2022-11-16 09:50:18 +00:00
Primoz	286de93bfd	Fix some bugs and extend ERS and cleaning scripts with multiple stress event targets logic.	2022-11-15 11:21:51 +00:00
Primoz	ab803ee49c	Add additional appraisal targets.	2022-11-15 10:14:07 +00:00
Primoz	621f11b2d9	Fix a bug related to wrong user input (duplicated events).	2022-11-15 09:53:31 +00:00
Primoz	bd41f42a5d	Rename target_ to segmenting_ method.	2022-11-14 15:07:36 +00:00
Primoz	a543ce372f	Add comments for event_related_script understanding.	2022-11-14 15:04:16 +00:00
Primoz	74b454b07b	Apply changes to string answers to make them language-generic.	2022-11-11 09:15:12 +00:00
Primoz	6ebe83e47e	Improve the ERS extract method with a couple of validations.	2022-11-10 12:42:52 +00:00
Primoz	00350ef8ca	Change config for stressfulness event target method.	2022-11-10 10:32:58 +00:00
Primoz	e4985c9121	Override stressfulness event target with extracted values from csv.	2022-11-10 10:29:11 +00:00
Primoz	a668b6e8da	Extract ERS and stress event targets to csv files (completed).	2022-11-10 09:37:27 +00:00
Primoz	9199b53ded	Get, join and start processing required ERS stress event data.	2022-11-09 15:11:51 +00:00
Primoz	f3c6a66da9	Begin with stress events in the ERS script.	2022-11-08 15:53:43 +00:00
Primoz	0b3e9226b3	Make small corrections in ERS file.	2022-11-08 14:44:24 +00:00
Primoz	2d83f7ddec	Begin the ERS logic for 90-minutes events.	2022-11-08 11:32:05 +00:00
Primoz	1da72a7cbe	Rename targets method in config.	2022-11-08 09:45:37 +00:00
Primoz	9f441afc16	Begin ERS logic for 90-minutes events.	2022-11-04 15:09:04 +00:00
Primoz	c1c9f4d05a	Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning	2022-11-04 09:11:58 +00:00
Primoz	62f46ea376	Prepare method-based logic for ERS generating.	2022-11-04 09:11:53 +00:00
Primoz	7ab0280d7e	Correctly rename stressful event target variable.	2022-11-04 08:58:08 +00:00
Primoz	eefa9f3f4d	Add new target: stressfulness_event.	2022-11-03 14:49:54 +00:00
Primoz	5e8174dd41	Add new target: stressfulness_period.	2022-11-03 13:52:45 +00:00
Primoz	35c1a762e7	Improve filtering by esm_session and device_id.	2022-11-03 13:51:18 +00:00
Primoz	02264b21fd	Add logic for target selection in ERS processing.	2022-11-03 09:30:12 +00:00
Primoz	0ce8723bdb	Extend imputation logic within the cleaning script.	2022-11-02 14:01:21 +00:00
Primoz	30b38bfc02	Fix the generating procedure of ERS file for participants with multiple devices.	2022-10-28 09:00:13 +00:00
Primoz	cd137af15a	Config for 30 minute EMA segments.	2022-10-27 14:20:15 +00:00
Primoz	3c0585a566	Remove obsolete comments.	2022-10-27 14:12:56 +00:00
Primoz	6b487fcf7b	Set E4 data yield to 1 if it is over 1. Optimize E4 data_yield script.	2022-10-27 14:11:42 +00:00
Primoz	5d17c92e54	Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning	2022-10-26 14:18:20 +00:00
Primoz	a31fdd1479	Start to test empatica_data_yield precieved error.	2022-10-26 14:18:08 +00:00
Primoz	936324d234	Switch config for 30 minutes event related segments.	2022-10-26 14:17:27 +00:00
Primoz	da0a4596f8	Add additional ESM processing logic for ERS csv extraction.	2022-10-26 14:16:25 +00:00
Primoz	d4d74818e6	Fix a bug - missing time_segment column when df is empty	2022-10-26 14:14:32 +00:00
Primoz	14ff59914b	Fix to correct dtypes.	2022-10-26 09:59:46 +00:00
Primoz	6ab0ac5329	Optimize memory consumption with dtype definition while reading csv file.	2022-10-26 09:57:26 +00:00
Primoz	0d143e6aad	Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning	2022-10-25 15:28:27 +00:00
Primoz	8acac50125	Add safenet when features dataframe is empty.	2022-10-25 15:26:43 +00:00
Primoz	b92a3aa37a	Remove unwanted output or other error producing code.	2022-10-25 15:25:22 +00:00
Primoz	bfd637eb9c	Improve strings formatting in straw_events file.	2022-10-25 08:53:44 +00:00
Primoz	0d81ad5756	Debug assignment of segments to rows	2022-10-19 13:35:04 +00:00
Primoz	cea451d344	Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning	2022-10-18 09:15:06 +00:00
Primoz	e88bbd548f	Add new daily segment and filter by segment in the cleaning script.	2022-10-18 09:15:00 +00:00
Primoz	cf38d9f175	Implement ERS generating logic.	2022-10-17 15:07:33 +00:00
Primoz	f3ca56cdbf	Start with ERS logic integration within Snakemake.	2022-10-14 14:46:28 +00:00
Primoz	797aa98f4f	Config for ERS testing.	2022-10-12 15:51:50 +00:00
Primoz	9baff159cd	Changes needed for testing and starting of the Event-Related Segments.	2022-10-12 15:51:23 +00:00
Primoz	0f21273508	Bugs fix	2022-10-12 12:32:51 +00:00
Primoz	55517eb737	Necessary commit before proceeding.	2022-10-12 12:23:11 +00:00
Primoz	de15a52dba	Bug fix	2022-10-11 08:36:23 +00:00
Primoz	1ad25bb572	Few modifications of some imputation values in cleaning script and feature extraction.	2022-10-11 08:26:17 +00:00
Primoz	9884b383cf	Testing new data with AutoML.	2022-10-10 16:45:38 +00:00
Primoz	2dc89c083c	Small changes in cleaning overall	2022-10-07 08:52:12 +00:00
Primoz	001d400729	Clean features and create input files based on all possible targets.	2022-10-06 14:28:12 +00:00
Primoz	1e38d9bf1e	Standardization and correlation visualization in overall cleaning script.	2022-10-06 13:27:38 +00:00
Primoz	a34412a18d	E4 data yield corrections. Changes in overal cs - standardization.	2022-10-05 14:16:55 +00:00
Primoz	437459648f	Errors fix: individual script - treat participants missing data.	2022-10-05 13:35:05 +00:00
Primoz	53f6cc60d5	Config and cleaning script necessary changes ...	2022-10-03 13:06:39 +00:00
Primoz	bbeabeee6f	Last changes before processing on the server.	2022-10-03 12:53:31 +00:00
Primoz	44531c6d94	Code cleaning, reworking cleaning individual based on changes in overall script. Changes in thresholds.	2022-09-30 10:04:07 +00:00
Primoz	7ac7cd5a37	Preparation of the overall cleaning script.	2022-09-29 14:33:21 +00:00
Primoz	68fd69dada	Cleaning script for individuals: corrections and comments.	2022-09-29 11:55:25 +00:00
Primoz	a4f0d056a0	Fillna for app foreground and activity recognition	2022-09-29 11:44:27 +00:00
Primoz	6286e7a44c	firstuseafter column removed from contextual imputation	2022-09-28 12:47:08 +00:00
Primoz	9b3447febd	Contextual imputation correction	2022-09-28 12:40:05 +00:00
Primoz	d6adda30cf	Contextual imputation on time(first/last) features.	2022-09-28 12:37:51 +00:00
Primoz	8af4ef11dc	Contextual imputation by feature type.	2022-09-28 10:02:47 +00:00
Primoz	536b9494cd	Cleaning script corrections	2022-09-27 14:12:08 +00:00
Primoz	f0b87c9dd0	Debugging of the empatica data yield integration.	2022-09-27 09:54:15 +00:00
Primoz	7fcdb873fe	Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning	2022-09-27 07:50:29 +00:00
Primoz	5c7bb0f4c1	Config changes	2022-09-27 07:48:32 +00:00
Primoz	bd53dc1684	Empatica data yield usage in the cleaning script.	2022-09-26 15:54:00 +00:00
Primoz	d9a574c550	Changes in the cleaning script and preparation of empatica data yield method.	2022-09-23 13:24:50 +00:00
Primoz	19aa8707c0	Redefined cleaning steps after revision	2022-09-22 13:45:51 +00:00
Primoz	247d758cb7	Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning	2022-09-21 07:18:01 +00:00
Primoz	90ee99e4b9	Remove TODO comments	2022-09-21 07:16:00 +00:00
Primoz	7493aaa643	Small changes in cleaning scrtipt and missing vals testing.	2022-09-20 12:57:55 +00:00
Primoz	eaf4340afd	Small imputation and cleaning corrections.	2022-09-20 08:03:48 +00:00
Primoz	a96ea508c6	Fill NaN of Empatica's SD second order feature (must be tested).	2022-09-19 07:34:02 +00:00
Primoz	52e11cdcab	Configurations for new standardization path.	2022-09-19 07:25:54 +00:00
Primoz	92aff93e65	Remove standardization script.	2022-09-19 07:25:16 +00:00
Primoz	18b63127de	Removed all standardizaton rules and configurations.	2022-09-19 06:16:26 +00:00
Primoz	62982866cd	Phone wifi visible inspection (WIP)	2022-09-16 13:24:21 +00:00
Primoz	0ce6da5444	kNN imputation relocation and execution only on specific columns.	2022-09-16 11:30:08 +00:00
Primoz	e3b78c8a85	Impute selected phone features with 0. Wifi visible, screen, and light.	2022-09-16 10:58:57 +00:00
Primoz	7d85f75d21	Changes in phone features NaN values script.	2022-09-16 09:03:30 +00:00
Primoz	385e21409d	Changes in NaN values testing script.	2022-09-15 14:16:58 +00:00
Primoz	18002f59e1	Doryab bluetooth and locations features fill in NaN values.	2022-09-15 10:48:59 +00:00
Primoz	3cf7ca41aa	Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning	2022-09-14 15:38:32 +00:00
Primoz	d5ab5a0394	Writing testing scripts to determine the point of manual imputation.	2022-09-14 14:13:03 +00:00
Primoz	dfbb758902	Changes in AutoML params and environment.yml	2022-09-13 13:54:06 +00:00
Primoz	4ec371ed96	Testing auto-sklearn	2022-09-13 09:51:03 +00:00
Primoz	d27a4a71c8	Reorganisation and reordering of the cleaning script.	2022-09-12 13:44:17 +00:00
Primoz	15d792089d	Changes in cleaning script: - target extracted from config to remove rows where target is nan - prepared sns.heatmap for further missing values analysis - necessary changes in config and participant p01 - picture of heatmap which shows the values state after cleaning	2022-09-01 10:33:36 +00:00
Primoz	cb351e0ff6	Unnecessary line (rows with no target value will be removed in cleaning script).	2022-09-01 10:06:57 +00:00
Primoz	86299d346b	Impute phone and sms NAs with 0	2022-09-01 09:57:21 +00:00
Primoz	3f7ec80c18	Preparation a) phone_calls 0 imputation b) remove rows with NaN target	2022-08-31 10:18:50 +00:00