Bring back requested fields in config.yaml.

Update coding files based on 7e565c34db98265afcda922a337493781fdd8ed5 in supermodule.
Completely remove PACKAGE_NAMES_HASHED and instead provide a differently structured file.
2023-04-19 11:07:58 +02:00 · 2023-04-18 22:58:42 +02:00 · 2023-04-18 22:45:12 +02:00 · 2023-04-18 22:40:11 +02:00 · 2023-04-18 21:34:59 +02:00 · 2023-04-18 21:23:26 +02:00
1565 changed files with 60787 additions and 26496 deletions
--- a/.gitattributes
+++ b/.gitattributes
@ -0,0 +1,7 @@
+# We'll let Git's auto-detection algorithm infer if a file is text. If it is,
+# enforce LF line endings regardless of OS or git configurations.
+* text=auto eol=lf
+
+# Isolate binary files in case the auto-detection algorithm fails and
+# marks them as text files (which could brick them).
+*.{png,jpg,jpeg,gif,webp,woff,woff2} binary
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@ -7,27 +7,16 @@ assignees: ''

 ---

-**Describe the bug**
-A clear and concise description of what the bug is.
+This form is only for bug reports. For questions, feature requests, or feedback use our [Github discussions](https://github.com/carissalow/rapids/discussions)

-**To Reproduce**
-Steps to reproduce the behavior:
-1. Go to '...'
-2. Click on '....'
-3. Scroll down to '....'
-4. See error
+Please make sure to:

-**Expected behavior**
-A clear and concise description of what you expected to happen.
+* [ ] Debug and simplify the problem to create a minimal example. For example, reduce the problem to a single participant, sensor, and a few rows of data.
+* [ ] Provide a clear and succinct description of the problem (expected behavior vs actual behavior).
+* [ ] Attach your `config.yaml`, time segments file, and time zones file if appropriate.
+* [ ] Attach test data if possible, and any screenshots or extra resources that will help us debug the problem.
+* [ ] Share the commit you are running: `git rev-parse --short HEAD`
+* [ ] Share your OS version (e.g. Windows 10)
+* [ ] Share the device/sensor your are processing (e.g. phone accelerometer)

-**Screenshots**
-If applicable, add screenshots to help explain your problem.
-
-**Please complete the following information:**
- - OS: [e.g. MacOS]
- - Version [e.g. 22]
- - Type of mobile data you are dealing with (Android/iOS)
-
-
-**Additional context**
-Add any other context about the problem here.
+<!-- You can erase any parts of this template not applicable to your Issue. -->
--- a/.github/ISSUE_TEMPLATE/feature_request.md
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@ -1,20 +0,0 @@
---
-name: Feature request
-about: Suggest an idea for this project
-title: ''
-labels: ''
-assignees: ''
-
---
-
-**Is your feature request related to a problem? Please describe.**
-A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
-
-**Describe the solution you'd like**
-A clear and concise description of what you want to happen.
-
-**Describe alternatives you've considered**
-A clear and concise description of any alternative solutions or features you've considered.
-
-**Additional context**
-Add any other context or screenshots about the feature request here.
--- a/.github/workflows/docker.yaml
+++ b/.github/workflows/docker.yaml
@ -0,0 +1,30 @@
+name: docker
+on:
+  release:
+    types: [edited, released]
+jobs:
+  main:
+    runs-on: ubuntu-20.04
+    steps:
+      -
+        name: Set up QEMU
+        uses: docker/setup-qemu-action@v1
+      -
+        name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v1
+      -
+        name: Login to DockerHub
+        uses: docker/login-action@v1 
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_TOKEN }}
+      -
+        name: Build and push
+        id: docker_build
+        uses: docker/build-push-action@v2
+        with:
+          push: true
+          tags: moshiresearch/rapids:latest
+      -
+        name: Image digest
+        run: echo ${{ steps.docker_build.outputs.digest }}
--- a/.github/workflows/docs.yaml
+++ b/.github/workflows/docs.yaml
@ -0,0 +1,35 @@
+name: docs
+on:
+  push:
+    branches:
+      - develop
+    tags:
+      - "v[0-9]+.[0-9]+.[0-9]+"
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - if: ${{ github.ref == 'refs/heads/develop' }} #we delay develop because when we release a hotgix (tag + develop push), one of these push will be out of sync
+        uses: jakejarvis/wait-action@master
+        with:
+          time: '60s'
+      - uses: actions/setup-python@v2
+        with:
+          python-version: 3.x
+      - run: pip install git+https://${GH_TOKEN}@github.com/carissalow/mkdocs-material-insiders.git
+      - run: pip install mike
+      - uses: actions/checkout@v2
+        with:
+          fetch-depth: 0
+      - run: |
+          git config user.name github-actions
+          git config user.email github-actions@github.com
+      - run: echo "RELEASE_VERSION=${GITHUB_REF#refs/*/}" >> $GITHUB_ENV
+      - run: echo "DOCS_TAG=$(echo $RELEASE_VERSION | sed -n "s/v\([0-9]\+\.[0-9]\+\).*$/\1/p")" >> $GITHUB_ENV
+      - if: startsWith(github.ref, 'refs/tags')
+        run: mike deploy --push --update-aliases $DOCS_TAG latest
+      - if: ${{ github.ref == 'refs/heads/develop' }}
+        run: mike deploy --push --update-aliases dev
+
+env:
+  GH_TOKEN: ${{ secrets.GH_TOKEN }}
--- a/.github/workflows/tests.yaml
+++ b/.github/workflows/tests.yaml
@ -0,0 +1,83 @@
+name: tests
+
+on:
+  push:
+    branches-ignore:
+      - "master"
+    tags:
+      - "v[0-9]+.[0-9]+.[0-9]+"
+  pull_request:
+    branches:
+      - "develop"
+env:
+  RENV_PATHS_ROOT: ~/.local/share/renv
+
+jobs:
+  test-on-latest-ubuntu:
+    runs-on: ubuntu-20.04
+    steps:
+      - uses: actions/checkout@v2
+        with:
+          fetch-depth: 0
+      - run: "sed -i 's/name:.*/name: rapidstests/g' environment.yml"
+      - run: echo "RELEASE_VERSION=${GITHUB_REF#refs/*/}" >> $GITHUB_ENV
+      - run: echo "RELEASE_VERSION_URL=$(echo $RELEASE_VERSION | sed -e 's/\.//g')" >> $GITHUB_ENV
+      - run : |
+          sudo apt update
+          sudo apt install libglpk40
+          # sudo apt install libcurl4-openssl-dev
+          # sudo apt install libssl-dev
+          # sudo apt install libxml2-dev
+          sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
+          sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/' 
+          sudo apt install r-base
+      - name: Cache R packages
+        uses: actions/cache@v2
+        id: cacherenv
+        with:
+          path: ${{ env.RENV_PATHS_ROOT }}
+          key: ${{ runner.os }}-renv-${{ hashFiles('**/renv.lock') }}
+          restore-keys: |
+            ${{ runner.os }}-renv-
+      - name: Install R dependencies
+        if: steps.cacherenv.outputs.cache-hit != 'true'
+        run: sudo apt install libcurl4-openssl-dev
+      - name: Restore R packages
+        shell: Rscript {0}
+        run: |
+          if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv")
+          renv::restore(repos = c(CRAN = "https://packagemanager.rstudio.com/all/__linux__/focal/latest"))
+      - name: Cache conda packages
+        uses: actions/cache@v1
+        env:
+          # Increase this value to reset cache if environment.yml has not changed
+          CACHE_NUMBER: 0
+        with:
+          path: ~/conda_pkgs_dir
+          key:
+            ${{ runner.os }}-conda-${{ env.CACHE_NUMBER }}-${{
+            hashFiles('**/environment.yml') }}
+      - name: Restore conda packages
+        uses: conda-incubator/setup-miniconda@v2
+        with:
+          activate-environment: rapidstests
+          environment-file: environment.yml
+          use-only-tar-bz2: true # IMPORTANT: This needs to be set for caching to work properly!
+      - name: Run tests
+        shell: bash -l {0}
+        run : |
+            conda activate rapidstests
+            bash tests/scripts/run_tests.sh -t all
+      - name: Release tag
+        if: success() && startsWith(github.ref, 'refs/tags')
+        id: create_release
+        uses: actions/create-release@v1
+        env:
+          GITHUB_TOKEN: ${{ secrets.RAPIDS_RELEASES_TOKEN }} # This token is provided by Actions, you do not need to create your own token
+        with:
+          tag_name: ${{ github.ref }}
+          release_name: ${{ github.ref }}
+          body: |
+            See [change log](http://www.rapids.science/latest/change-log/#${{ env.RELEASE_VERSION_URL }})
+          draft: false
+          prerelease: false
--- a/.gitignore
+++ b/.gitignore
@ -93,8 +93,17 @@ packrat/*

 # exclude data from source control by default
 data/external/*
+!/data/external/empatica/empatica1/E4 Data.zip
 !/data/external/.gitkeep
 !/data/external/stachl_application_genre_catalogue.csv
+!/data/external/timesegments*.csv
+!/data/external/wiki_tz.csv
+!/data/external/main_study_usernames.csv
+!/data/external/timezone.csv
+!/data/external/play_store_application_genre_catalogue.csv
+!/data/external/play_store_categories_count.csv
+
+
 data/raw/*
 !/data/raw/.gitkeep
 data/interim/*
@ -107,5 +116,17 @@ reports/
 .RData
 .Rhistory
 sn_profile_*/
+!sn_profile_rapids
 settings.dcf
-tests/fakedata_generation/
+tests/fakedata_generation/
+site/
+credentials.yaml
+
+# Docker container and other files
+.devcontainer
+
+# Calculating features module
+calculatingfeatures/
+
+# Temp folder for rapids data/external
+rapids_temp_data/
--- a/.travis.yml
+++ b/.travis.yml
@ -1,104 +0,0 @@
-services:
- mysql
- docker
-sudo: required
-language: python
-jobs:
-  include:
-  - stage: Tests
-    name: Python 3.7 on Xenial Linux
-    os: linux
-    language: python
-    python: 3.7
-    before_install:
-    - /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
-    - export PATH=/home/linuxbrew/.linuxbrew/bin:$PATH
-    - source ~/.bashrc
-    - sudo apt-get install linuxbrew-wrapper
-    - brew tap --shallow linuxbrew/xorg
-    - brew install r
-    - R --version
-    - wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O
-      miniconda.sh;
-    - bash miniconda.sh -b -p $HOME/miniconda
-    - source "$HOME/miniconda/etc/profile.d/conda.sh"
-    - hash -r
-    - conda config --set always_yes yes --set changeps1 no
-    install:
-    - conda init bash
-    - conda update -q --all --yes conda
-    - conda env create -q -n test-environment python=$TRAVIS_PYTHON_VERSION --file
-      environment.yml
-    - conda activate test-environment
-    - snakemake -j1 renv_install
-    - R -e 'renv::settings$use.cache(FALSE)'
-    - snakemake -j1 renv_restore
-    cache:
-      directories:
-      - "/home/travis/.linuxbrew"
-      - "$HOME/.local/share/renv"
-      - "$TRAVIS_BUILD_DIR/renv/library"
-    script:
-    - bash tests/scripts/run_tests.sh
-  - name: Python 3.7 on macOS
-    os: osx
-    osx_image: xcode11.3
-    language: generic
-    before_install:
-    - HOMEBREW_NO_AUTO_UPDATE=1 brew install gcc@9
-    - HOMEBREW_NO_AUTO_UPDATE=1 brew install https://github.com/Homebrew/homebrew-core/raw/218998d/Formula/r.rb
-    - R --version
-    - HOMEBREW_NO_AUTO_UPDATE=1 brew install mysql
-    - HOMEBREW_NO_AUTO_UPDATE=1 brew services start mysql
-    - HOMEBREW_NO_AUTO_UPDATE=1 brew cask install miniconda
-    - eval "$(/opt/miniconda3/condabin/conda shell.bash hook)"
-    - eval "$(conda shell.bash hook)"
-    install:
-    - conda init bash
-    - conda update -q --all --yes conda
-    - conda env create -q -n test-environment python=$TRAVIS_PYTHON_VERSION --file
-      environment.yml
-    - conda activate test-environment
-    - snakemake -j1 renv_install
-    - R -e 'renv::settings$use.cache(FALSE)'
-    - snakemake -j1 renv_restore
-    env:
-    - RENV_PATHS_ROOT="$HOME/renv/cache"
-    cache:
-      directories:
-      - "/usr/local/lib/R"
-      - "$RENV_PATHS_ROOT"
-      - "$TRAVIS_BUILD_DIR/renv/library"
-    script:
-    - bash tests/scripts/run_tests.sh
-  - stage: deploy
-    name: Python 3.7 on Xenial Linux Docker
-    os: linux
-    language: python
-    script:
-    - docker build -t rapids .
-    - docker login -u "agamk" -p $DOCKERPWD
-    - docker tag rapids agamk/rapids:travislatest
-    - docker push agamk/rapids:travislatest
-branches:
-  only:
-  - master
-
-stages:
-  - name: deploy
-    if: branch = master AND \
-        type = push
-
-notifications:
-  email: false
-  slack:
-    secure: cJIpmIjb3zA5AMDBo9axF1v6fYNIgMm6s6UdMNOlHiT511xHGsaLUFej3lACwQLig4Gr94ySI61YdrP+RX1lFcYxusH+kUU/c8LX0PmSKNeKnycM3w/pCM+yTp/6oQG6ZrJD7pNm6zhB0xPL61uSmYhcr+JJ1sh4iLiON+J8/C+IfnAHm1ORkxJ0IxASkiP/LvaiAQDw8lNyYIZNWjSDNZbx68o1VNakyk6Vik3x8omiE3w33rzI2/JAx//QTxOq2J0dtV1AqYYSOWS4iXblV09NLBqgGrhAhrQ6+TbPHSPIyL/4EdhvS+YXO+SBWS7ODD7j/MuL6XiA4SujW72od2rgXNmOjFnlQvIrULO5bzv39BKKDkldvz9+XCyXLcjoLIwA/rmUnwMndNoC7NoD/CkQEevUxswXXB9811BmIFx/7GOHouVxwB2gaMAzkCroZJVwgbrc6ESSOVE5SMcb3wPMbpd8cXOgVZXJcmk5wK206zxXPigCvFfknqOnwDqRgyIWSFoTd/2wHppA7ND3R5U42nQTbEQ7MiONsOo61GlJTTxJELz32sLKl388AuAgOY7+0sqPibxMaHJkF1V4nYVTH0/H5bO/edK4VHMloJ6s0kuyko7LT5EMQf3pBJij5TnYmD2E60t+bSBAxHuH7WA5dvL+igjGEwROnxDc9pc=
-    on_success: always
-    template:
-    - Repo `%{repository_slug}` *%{result}* build (<%{build_url}|#%{build_number}>)
-      for commit (<%{compare_url}|%{commit}>) on branch `%{branch}`.
-    - 'Execution time: *%{duration}*'
-    - 'Message: %{message}'
-env:
-  global:
-    secure: FD2aOa8L3lWf1xClZ24uS59SOBjMH16sdSLPGkb6bQLwrKAQw6BVna5wOw3iRscZtx2iqEQw3LdLmNb6ftI4fgHhf7qoAZlKVlc2Q8wU4L623Ad8S//2Ny1AxXRyzwmRw4emmIUqRXiGaeZYkzcptf38+2d9PjHazVsL3A6T2vFK+VAQmZBq3Iblx0i3g25qevQxFUACH1FIpZsmn08cesblZp0MiQ7GOq4YhBAqmbraT4/w7yFe1rwm/yPSWeBQKu8tZeZnEW6/FPYidxxuBgl/BxTdVuIKHcVzL95Mu4q6Y7uVaYeGYgyxai8eyntpY2dPu0wN1ng4JxulwqKBdxkWFPdbBJSGnYQq5EmrqULjro7wk9GVLSN9Lx0QjcmZRbNbDH0rpgxcXS9mtvzmgFbmatdsMa3VrObqKL2yYMsPZ6e5N4ve3gTU5+sm6oz/zYNWK2CDN2f08BJuaoKv9hETTfvWaZitKT7lFZ2LpsDdHSPUtRiAviDcLZcCZsTQjyCi6JeKSF2aMQ0+4rCsZgFkqpmjEVJB5N6DMkdZaUn+4HrbGsivAHWQsDcvPTD4n2CUcboV407NFsckr3PlDy0+fNNHr2h45VjO7DxAwDIJAdiwlhbj9l9gn8i3aZOtMCT6p4xIC2CgqOcY4yOTHmyOswJwnkz3uoSOq3eNLR4=
--- a/4
+++ b/4
@ -6,6 +6,7 @@ RUN apt update && apt install -y \
    libssl-dev \
    libxml2-dev \
    libmysqlclient-dev \
+    libglpk40 \
    mysql-server
 RUN apt-get update && apt-get install -y gnupg
 RUN apt-get update && apt-get install -y software-properties-common
@ -15,6 +16,7 @@ RUN apt update && apt install -y r-base
 RUN apt install -y pandoc
 RUN apt install -y git
 RUN apt-get update && apt-get install -y vim
+RUN apt-get update && apt-get install -y nano
 RUN apt update && apt install -y unzip
 ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
 ENV PATH /opt/conda/bin:$PATH
@ -42,7 +44,7 @@ RUN conda update -n base -c defaults conda
 WORKDIR /rapids
 RUN conda env create -f environment.yml -n rapids
 RUN Rscript --vanilla -e 'install.packages("rmarkdown", repos="http://cran.us.r-project.org")'
-RUN R -e 'renv::restore()'
+RUN R -e 'renv::restore(repos = c(CRAN = "https://packagemanager.rstudio.com/all/__linux__/focal/latest"))'
 ADD https://osf.io/587wc/download data/external
 RUN mv data/external/download data/external/rapids_example.sql.zip
 RUN unzip data/external/rapids_example.sql.zip
--- a/README.md
+++ b/README.md
@ -1,11 +1,201 @@
+![GitHub release (latest SemVer)](https://img.shields.io/github/v/release/carissalow/rapids?style=plastic)
 [![Snakemake](https://img.shields.io/badge/snakemake-≥5.7.1-brightgreen.svg?style=flat)](https://snakemake.readthedocs.io)
-[![Documentation Status](https://readthedocs.org/projects/rapidspitt/badge/?version=latest)](https://rapidspitt.readthedocs.io/en/latest/?badge=latest)
-[![Build Status](https://travis-ci.com/carissalow/rapids.svg?branch=master)](https://travis-ci.com/carissalow/rapids)
+[![Documentation Status](https://github.com/carissalow/rapids/workflows/docs/badge.svg)](https://www.rapids.science/)
+![tests](https://github.com/carissalow/rapids/workflows/tests/badge.svg)
+[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg)](code_of_conduct.md) 

 # RAPIDS

 **R**eproducible **A**nalysis **Pi**peline for **D**ata **S**treams

-For more information refer to our [documentation](https://rapidspitt.readthedocs.io/en/latest/)
+For more information refer to our [documentation](http://www.rapids.science)

 By [MoSHI](https://www.moshi.pitt.edu/), [University of Pittsburgh](https://www.pitt.edu/)
+
+## Installation 
+
+For RAPIDS installation refer to to the [documentation](https://www.rapids.science/1.8/setup/installation/)
+
+### For the installation of the Docker version
+
+1. Follow the [instructions](https://www.rapids.science/1.8/setup/installation/) to setup RAPIDS via Docker (from scratch).
+
+2. Delete current contents in /rapids/ folder when in a container session.
+    ```
+    cd ..
+    rm -rf rapids/{*,.*}
+    cd rapids
+    ```
+
+3. Clone RAPIDS workspace from Git and checkout a specific branch.
+    ```
+    git clone "https://repo.ijs.si/junoslukan/rapids.git" .
+    git checkout <branch_name>
+    ```
+
+4. Install missing “libpq-dev” dependency with bash.
+    ```
+    apt-get update -y
+    apt-get install -y libpq-dev
+    ```
+
+5. Restore R venv.
+Type R to go to the interactive R session and then:
+    ```
+    renv::restore()
+    ```
+
+6. Install cr-features module 
+From: https://repo.ijs.si/matjazbostic/calculatingfeatures.git -> branch master. 
+Then follow the "cr-features module" section below.  
+
+7. Install all required packages from environment.yml, prune also deletes conda packages not present in environment file.
+    ```
+    conda env update --file environment.yml –prune
+    ```
+
+8. If you wish to update your R or Python venvs.
+    ```
+    R in interactive session:
+    renv::snapshot()
+    Python: 
+    conda env export --no-builds | sed 's/^.*libgfortran.*$/  - libgfortran/' | sed 's/^.*mkl=.*$/  - mkl/' >  environment.yml
+    ```
+
+### cr-features module 
+
+This RAPIDS extension uses cr-features library accessible [here](https://repo.ijs.si/matjazbostic/calculatingfeatures).
+
+To use cr-features library:
+
+- Follow the installation instructions in the [README.md](https://repo.ijs.si/matjazbostic/calculatingfeatures/-/blob/master/README.md).
+
+- Copy built calculatingfeatures folder into the RAPIDS workspace.
+
+- Install the cr-features package by:
+    ```
+    pip install path/to/the/calculatingfeatures/folder
+    e.g. pip install ./calculatingfeatures if the folder is copied to main parent directory
+    cr-features package has to be built and installed everytime to get the newest version. 
+    Or an the newest version of the docker image must be used.   
+    ```
+
+## Updating RAPIDS
+
+To update RAPIDS, first pull and merge [origin]( https://github.com/carissalow/rapids), such as with:
+
+```commandline
+git fetch --progress "origin" refs/heads/master
+git merge --no-ff origin/master
+```
+
+Next, update the conda and R virtual environment.
+
+```bash
+R -e 'renv::restore(repos = c(CRAN = "https://packagemanager.rstudio.com/all/__linux__/focal/latest"))'
+```
+
+## Custom configuration
+### Credentials
+
+As mentioned under [Database in RAPIDS documentation](https://www.rapids.science/1.6/snippets/database/), a `credentials.yaml` file is needed to connect to a database.
+It should contain:
+
+```yaml
+PSQL_STRAW:
+  database: staw
+  host: 212.235.208.113
+  password: password
+  port: 5432
+  user: staw_db
+```
+
+where`password` needs to be specified as well.
+
+## Possible installation issues
+### Missing dependencies for RPostgres
+
+To install `RPostgres` R package (used to connect to the PostgreSQL database), an error might occur:
+
+```text
+------------------------- ANTICONF ERROR ---------------------------
+Configuration failed because libpq was not found. Try installing:
+   * deb: libpq-dev (Debian, Ubuntu, etc)
+   * rpm: postgresql-devel (Fedora, EPEL)
+   * rpm: postgreql8-devel, psstgresql92-devel, postgresql93-devel, or postgresql94-devel (Amazon Linux)
+   * csw: postgresql_dev (Solaris)
+   * brew: libpq (OSX)
+If libpq is already installed, check that either:
+  (i)  'pkg-config' is in your PATH AND PKG_CONFIG_PATH contains a libpq.pc file; or
+  (ii) 'pg_config' is in your PATH.
+If neither can detect , you can set INCLUDE_DIR
+and LIB_DIR manually via:
+  R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
+--------------------------[ ERROR MESSAGE ]----------------------------
+  <stdin>:1:10: fatal error: libpq-fe.h: No such file or directory
+compilation terminated.
+```
+
+The library requires `libpq` for compiling from source, so install accordingly.
+
+### Timezone environment variable for tidyverse (relevant for WSL2)
+
+One of the R packages, `tidyverse` might need access to the `TZ` environment variable during the installation.
+On Ubuntu 20.04 on WSL2 this triggers the following error:
+
+```text
+> install.packages('tidyverse')
+
+ERROR: configuration failed for package ‘xml2’
+System has not been booted with systemd as init system (PID 1). Can't operate.
+Failed to create bus connection: Host is down
+Warning in system("timedatectl", intern = TRUE) :
+  running command 'timedatectl' had status 1
+Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
+  namespace ‘xml2’ 1.3.1 is already loaded, but >= 1.3.2 is required
+Calls: <Anonymous> ... namespaceImportFrom -> asNamespace -> loadNamespace
+Execution halted
+ERROR: lazy loading failed for package ‘tidyverse’
+```
+
+This happens because WSL2 does not use the `timedatectl` service, which provides this variable.
+
+```bash
+~$ timedatectl
+System has not been booted with systemd as init system (PID 1). Can't operate.
+Failed to create bus connection: Host is down
+```
+
+and later 
+
+```bash 
+Warning message:
+In system("timedatectl", intern = TRUE) :
+  running command 'timedatectl' had status 1
+Execution halted
+```
+
+This can be amended by setting the environment variable manually before attempting to install `tidyverse`:
+
+```bash
+export TZ='Europe/Ljubljana'
+```
+
+Note: if this is needed to avoid runtime issues, you need to either define this environment variable in each new terminal window or (better) define it in your `~/.bashrc` or `~/.bash_profile`.
+
+## Possible runtime issues
+### Unix end of line characters
+
+Upon running rapids, an error might occur:
+
+```bash
+/usr/bin/env: ‘python3\r’: No such file or directory
+```
+
+This is due to Windows style end of line characters. 
+To amend this, I added a `.gitattributes` files to force `git` to checkout `rapids` using Unix EOL characters.
+If this still fails, `dos2unix` can be used to change them.
+
+### System has not been booted with systemd as init system (PID 1)
+
+See [the installation issue above](#Timezone-environment-variable-for-tidyverse-(relevant-for-WSL2)).
--- a/545
+++ b/545
@ -1,8 +1,11 @@
+from snakemake.utils import validate
 configfile: "config.yaml"
+validate(config, "tools/config.schema.yaml")
 include: "rules/common.smk"
 include: "rules/renv.smk"
 include: "rules/preprocessing.smk"
 include: "rules/features.smk"
+include: "rules/models.smk"
 include: "rules/reports.smk"

 import itertools
@ -12,165 +15,433 @@ files_to_compute = []
 if len(config["PIDS"]) == 0:
    raise ValueError("Add participants IDs to PIDS in config.yaml. Remember to create their participant files in data/external")

-if config["PHONE_VALID_SENSED_BINS"]["COMPUTE"] or config["PHONE_VALID_SENSED_DAYS"]["COMPUTE"]: # valid sensed bins is necessary for sensed days, so we add these files anyways if sensed days are requested
-    if len(config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"]) == 0:
-            raise ValueError("If you want to compute PHONE_VALID_SENSED_BINS or PHONE_VALID_SENSED_DAYS, you need to add at least one table to [PHONE_VALID_SENSED_BINS][DB_TABLES] in config.yaml")
+for provider in config["PHONE_DATA_YIELD"]["PROVIDERS"].keys():
+    if config["PHONE_DATA_YIELD"]["PROVIDERS"][provider]["COMPUTE"]:
+        
+        allowed_phone_sensors = get_phone_sensor_names()
+        if not (set(config["PHONE_DATA_YIELD"]["SENSORS"]) <= set(allowed_phone_sensors)):
+            raise ValueError('\nInvalid sensor(s) for PHONE_DATA_YIELD. config["PHONE_DATA_YIELD"]["SENSORS"] can have '
+                            'one or more of the following phone sensors: {}.\nInstead you provided "{}".\n'
+                            'Keep in mind that the sensors\' CONTAINER attribute must point to a valid database table or file'\
+                            .format(', '.join(allowed_phone_sensors),
+                                    ', '.join(set(config["PHONE_DATA_YIELD"]["SENSORS"]) - set(allowed_phone_sensors))))
+        
+        files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=map(str.lower, config["PHONE_DATA_YIELD"]["SENSORS"])))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_yielded_timestamps.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_yielded_timestamps_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_data_yield_features/phone_data_yield_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_DATA_YIELD"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/phone_data_yield.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-    pids_android = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "android", config["PIDS"]))
-    pids_ios = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "ios", config["PIDS"]))
-    tables_android = [table for table in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"] if table not in [config["CONVERSATION"]["DB_TABLE"]["IOS"], config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]]] # for android, discard any ios tables that may exist
-    tables_ios = [table for table in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"] if table not in [config["CONVERSATION"]["DB_TABLE"]["ANDROID"], config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["ANDROID"]]] # for ios, discard any android tables that may exist
+for provider in config["PHONE_MESSAGES"]["PROVIDERS"].keys():
+    if config["PHONE_MESSAGES"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/phone_messages_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/phone_messages_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_messages_features/phone_messages_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_MESSAGES"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/phone_messages.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-    for pids,table in zip([pids_android, pids_ios], [tables_android, tables_ios]):
-        files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=pids, sensor=table))
-        files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=pids, sensor=table))
-    files_to_compute.extend(expand("data/interim/{pid}/phone_sensed_bins.csv", pid=config["PIDS"]))
-
-if config["PHONE_VALID_SENSED_DAYS"]["COMPUTE"]:
-    files_to_compute.extend(expand("data/interim/{pid}/phone_valid_sensed_days_{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins.csv",
-                                pid=config["PIDS"],
-                                min_valid_hours_per_day=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_HOURS_PER_DAY"],
-                                min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
-
-if config["MESSAGES"]["COMPUTE"]:
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["MESSAGES"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["MESSAGES"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/processed/{pid}/messages_{messages_type}_{day_segment}.csv", pid=config["PIDS"], messages_type = config["MESSAGES"]["TYPES"], day_segment = config["MESSAGES"]["DAY_SEGMENTS"]))
-
-if config["CALLS"]["COMPUTE"]:
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["CALLS"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["CALLS"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=config["PIDS"], sensor=config["CALLS"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/processed/{pid}/calls_{call_type}_{day_segment}.csv", pid=config["PIDS"], call_type=config["CALLS"]["TYPES"], day_segment = config["CALLS"]["DAY_SEGMENTS"]))
-
-if config["BARNETT_LOCATION"]["COMPUTE"]:
-    if config["BARNETT_LOCATION"]["LOCATIONS_TO_USE"] == "RESAMPLE_FUSED":
-        if config["BARNETT_LOCATION"]["DB_TABLE"] in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"]:
-            files_to_compute.extend(expand("data/interim/{pid}/phone_sensed_bins.csv", pid=config["PIDS"]))
-            files_to_compute.extend(expand("data/raw/{pid}/{sensor}_resampled.csv", pid=config["PIDS"], sensor=config["BARNETT_LOCATION"]["DB_TABLE"]))
+for provider in config["PHONE_CALLS"]["PROVIDERS"].keys():
+    if config["PHONE_CALLS"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/phone_calls_raw.csv", pid=config["PIDS"]))
+        if (provider == "RAPIDS") and (config["PHONE_CALLS"]["PROVIDERS"][provider]["FEATURES_TYPE"] == "EPISODES"):
+            files_to_compute.extend(expand("data/interim/{pid}/phone_calls_episodes.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/interim/{pid}/phone_calls_episodes_resampled.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/interim/{pid}/phone_calls_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
        else:
-            raise ValueError("Error: Add your locations table (and as many sensor tables as you have) to [PHONE_VALID_SENSED_BINS][DB_TABLES] in config.yaml. This is necessary to compute phone_sensed_bins (bins of time when the smartphone was sensing data) which is used to resample fused location data (RESAMPLED_FUSED)")            
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["BARNETT_LOCATION"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["BARNETT_LOCATION"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/processed/{pid}/location_barnett_{day_segment}.csv", pid=config["PIDS"], day_segment = config["BARNETT_LOCATION"]["DAY_SEGMENTS"]))
+            files_to_compute.extend(expand("data/raw/{pid}/phone_calls_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_calls_features/phone_calls_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_CALLS"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/phone_calls.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-if config["BLUETOOTH"]["COMPUTE"]:
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["BLUETOOTH"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["BLUETOOTH"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/processed/{pid}/bluetooth_{day_segment}.csv", pid=config["PIDS"], day_segment = config["BLUETOOTH"]["DAY_SEGMENTS"]))
+for provider in config["PHONE_BLUETOOTH"]["PROVIDERS"].keys():
+    if config["PHONE_BLUETOOTH"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/phone_bluetooth_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/phone_bluetooth_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_bluetooth_features/phone_bluetooth_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_BLUETOOTH"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/phone_bluetooth.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-if config["ACTIVITY_RECOGNITION"]["COMPUTE"]:
-    pids_android = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "android", config["PIDS"]))
-    pids_ios = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "ios", config["PIDS"]))
-    
-    for pids,table in zip([pids_android, pids_ios], [config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["ANDROID"], config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]]):
-        files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=pids, sensor=table))
-        files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=pids, sensor=table))
-        files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=pids, sensor=table))
-        files_to_compute.extend(expand("data/processed/{pid}/{sensor}_deltas.csv", pid=pids, sensor=table))
-    files_to_compute.extend(expand("data/processed/{pid}/activity_recognition_{day_segment}.csv",pid=config["PIDS"], day_segment = config["ACTIVITY_RECOGNITION"]["DAY_SEGMENTS"]))
+for provider in config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"].keys():
+    if config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/phone_activity_recognition_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/phone_activity_recognition_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_episodes.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_episodes_resampled.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_activity_recognition_features/phone_activity_recognition_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_ACTIVITY_RECOGNITION"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/phone_activity_recognition.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-if config["BATTERY"]["COMPUTE"]:
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["BATTERY"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["BATTERY"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=config["PIDS"], sensor=config["BATTERY"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/processed/{pid}/battery_deltas.csv", pid=config["PIDS"]))
-    files_to_compute.extend(expand("data/processed/{pid}/battery_{day_segment}.csv", pid = config["PIDS"], day_segment = config["BATTERY"]["DAY_SEGMENTS"]))
+for provider in config["PHONE_BATTERY"]["PROVIDERS"].keys():
+    if config["PHONE_BATTERY"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/phone_battery_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_battery_episodes.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_battery_episodes_resampled.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_battery_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_battery_features/phone_battery_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_BATTERY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/phone_battery.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-if config["SCREEN"]["COMPUTE"]:
-    if config["SCREEN"]["DB_TABLE"] in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"]:
-        files_to_compute.extend(expand("data/interim/{pid}/phone_sensed_bins.csv", pid=config["PIDS"]))
-    else:
-        raise ValueError("Error: Add your screen table (and as many sensor tables as you have) to [PHONE_VALID_SENSED_BINS][DB_TABLES] in config.yaml. This is necessary to compute phone_sensed_bins (bins of time when the smartphone was sensing data)")
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["SCREEN"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["SCREEN"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=config["PIDS"], sensor=config["SCREEN"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/processed/{pid}/screen_deltas.csv", pid=config["PIDS"]))
-    files_to_compute.extend(expand("data/processed/{pid}/screen_{day_segment}.csv", pid = config["PIDS"], day_segment = config["SCREEN"]["DAY_SEGMENTS"]))
+for provider in config["PHONE_SCREEN"]["PROVIDERS"].keys():
+    if config["PHONE_SCREEN"]["PROVIDERS"][provider]["COMPUTE"]:
+        # if "PHONE_SCREEN" in config["PHONE_DATA_YIELD"]["SENSORS"]:# not used for now because we took episodepersensedminutes out of the list of supported features
+        #     files_to_compute.extend(expand("data/interim/{pid}/phone_yielded_timestamps.csv", pid=config["PIDS"]))
+        # else:
+        #     raise ValueError("Error: Add PHONE_SCREEN (and as many PHONE_SENSORS as you have in your database) to [PHONE_DATA_YIELD][SENSORS] in config.yaml. This is necessary to compute phone_yielded_timestamps (time when the smartphone was sensing data)")
+        files_to_compute.extend(expand("data/raw/{pid}/phone_screen_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/phone_screen_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_screen_episodes.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_screen_episodes_resampled.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_screen_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_screen_features/phone_screen_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_SCREEN"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/phone_screen.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-if config["LIGHT"]["COMPUTE"]:
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["LIGHT"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["LIGHT"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/processed/{pid}/light_{day_segment}.csv", pid = config["PIDS"], day_segment = config["LIGHT"]["DAY_SEGMENTS"]))
+for provider in config["PHONE_LIGHT"]["PROVIDERS"].keys():
+    if config["PHONE_LIGHT"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/phone_light_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/phone_light_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_light_features/phone_light_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_LIGHT"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/phone_light.csv", pid=config["PIDS"],))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-if config["ACCELEROMETER"]["COMPUTE"]:
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["ACCELEROMETER"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["ACCELEROMETER"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/processed/{pid}/accelerometer_{day_segment}.csv", pid = config["PIDS"], day_segment = config["ACCELEROMETER"]["DAY_SEGMENTS"]))
+for provider in config["PHONE_ACCELEROMETER"]["PROVIDERS"].keys():
+    if config["PHONE_ACCELEROMETER"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/phone_accelerometer_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/phone_accelerometer_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_accelerometer_features/phone_accelerometer_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_ACCELEROMETER"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/phone_accelerometer.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-if config["APPLICATIONS_FOREGROUND"]["COMPUTE"]:
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["APPLICATIONS_FOREGROUND"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["APPLICATIONS_FOREGROUND"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/interim/{pid}/{sensor}_with_datetime_with_genre.csv", pid=config["PIDS"], sensor=config["APPLICATIONS_FOREGROUND"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/processed/{pid}/applications_foreground_{day_segment}.csv", pid = config["PIDS"], day_segment = config["APPLICATIONS_FOREGROUND"]["DAY_SEGMENTS"]))
+for provider in config["PHONE_APPLICATIONS_FOREGROUND"]["PROVIDERS"].keys():
+    if config["PHONE_APPLICATIONS_FOREGROUND"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/phone_applications_foreground_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/phone_applications_foreground_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/phone_applications_foreground_with_datetime_with_categories.csv", pid=config["PIDS"]))
+        if config["PHONE_APPLICATIONS_FOREGROUND"]["PROVIDERS"][provider]["INCLUDE_EPISODE_FEATURES"]:
+            files_to_compute.extend(expand("data/interim/{pid}/phone_app_episodes.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/interim/{pid}/phone_app_episodes_resampled.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/interim/{pid}/phone_app_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_applications_foreground_features/phone_applications_foreground_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_APPLICATIONS_FOREGROUND"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/phone_applications_foreground.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-if config["WIFI"]["COMPUTE"]:
-    if len(config["WIFI"]["DB_TABLE"]["VISIBLE_ACCESS_POINTS"]) > 0:
-        files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["WIFI"]["DB_TABLE"]["VISIBLE_ACCESS_POINTS"]))
-        files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["WIFI"]["DB_TABLE"]["VISIBLE_ACCESS_POINTS"]))
-        files_to_compute.extend(expand("data/processed/{pid}/wifi_{day_segment}.csv", pid = config["PIDS"], day_segment = config["WIFI"]["DAY_SEGMENTS"]))
+for provider in config["PHONE_WIFI_VISIBLE"]["PROVIDERS"].keys():
+    if config["PHONE_WIFI_VISIBLE"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/phone_wifi_visible_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/phone_wifi_visible_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_wifi_visible_features/phone_wifi_visible_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_WIFI_VISIBLE"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/phone_wifi_visible.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-    if len(config["WIFI"]["DB_TABLE"]["CONNECTED_ACCESS_POINTS"]) > 0:
-        files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["WIFI"]["DB_TABLE"]["CONNECTED_ACCESS_POINTS"]))
-        files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["WIFI"]["DB_TABLE"]["CONNECTED_ACCESS_POINTS"]))
-        files_to_compute.extend(expand("data/processed/{pid}/wifi_{day_segment}.csv", pid = config["PIDS"], day_segment = config["WIFI"]["DAY_SEGMENTS"]))
+for provider in config["PHONE_WIFI_CONNECTED"]["PROVIDERS"].keys():
+    if config["PHONE_WIFI_CONNECTED"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/phone_wifi_connected_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/phone_wifi_connected_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_wifi_connected_features/phone_wifi_connected_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_WIFI_CONNECTED"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/phone_wifi_connected.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-if config["HEARTRATE"]["COMPUTE"]:
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["HEARTRATE"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_{fitbit_data_type}_with_datetime.csv", pid=config["PIDS"], fitbit_data_type=["summary", "intraday"]))
-    files_to_compute.extend(expand("data/processed/{pid}/fitbit_heartrate_{day_segment}.csv", pid = config["PIDS"], day_segment = config["HEARTRATE"]["DAY_SEGMENTS"]))
+for provider in config["PHONE_CONVERSATION"]["PROVIDERS"].keys():    
+    if config["PHONE_CONVERSATION"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/phone_conversation_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/phone_conversation_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_conversation_features/phone_conversation_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_CONVERSATION"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/phone_conversation.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-if config["STEP"]["COMPUTE"]:
-    if config["STEP"]["EXCLUDE_SLEEP"]["EXCLUDE"] == True and config["STEP"]["EXCLUDE_SLEEP"]["TYPE"] == "FITBIT_BASED":
-        files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_{fitbit_data_type}_with_datetime.csv", pid=config["PIDS"], fitbit_data_type=["summary"]))
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["STEP"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/raw/{pid}/fitbit_step_{fitbit_data_type}_with_datetime.csv", pid=config["PIDS"], fitbit_data_type=["intraday"]))
-    files_to_compute.extend(expand("data/processed/{pid}/fitbit_step_{day_segment}.csv", pid = config["PIDS"], day_segment = config["STEP"]["DAY_SEGMENTS"]))
+for provider in config["PHONE_ESM"]["PROVIDERS"].keys():
+    if config["PHONE_ESM"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/phone_esm_raw.csv",pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/phone_esm_with_datetime.csv",pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_esm_clean.csv",pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_esm_features/phone_esm_{language}_{provider_key}.csv",pid=config["PIDS"],language=get_script_language(config["PHONE_ESM"]["PROVIDERS"][provider]["SRC_SCRIPT"]),provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/phone_esm.csv", pid=config["PIDS"]))
+        # files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv",pid=config["PIDS"]))
+        # files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-if config["SLEEP"]["COMPUTE"]:
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["SLEEP"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_{fitbit_data_type}_with_datetime.csv", pid=config["PIDS"], fitbit_data_type=["intraday", "summary"]))
-    files_to_compute.extend(expand("data/processed/{pid}/fitbit_sleep_{day_segment}.csv", pid = config["PIDS"], day_segment = config["SLEEP"]["DAY_SEGMENTS"]))
+for provider in config["PHONE_SPEECH"]["PROVIDERS"].keys():
+    if config["PHONE_SPEECH"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/phone_speech_raw.csv",pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/phone_speech_with_datetime.csv",pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_speech_features/phone_speech_{language}_{provider_key}.csv",pid=config["PIDS"],language=get_script_language(config["PHONE_SPEECH"]["PROVIDERS"][provider]["SRC_SCRIPT"]),provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/phone_speech.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-if config["CONVERSATION"]["COMPUTE"]:
-    pids_android = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "android", config["PIDS"]))
-    pids_ios = list(filter(lambda pid: infer_participant_platform("data/external/" + pid) == "ios", config["PIDS"]))
+# We can delete these if's as soon as we add feature PROVIDERS to any of these sensors
+if isinstance(config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"], dict):
+    for provider in config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"].keys():
+        if config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"][provider]["COMPUTE"]:
+            files_to_compute.extend(expand("data/raw/{pid}/phone_applications_crashes_raw.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/raw/{pid}/phone_applications_crashes_with_datetime.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/raw/{pid}/phone_applications_crashes_with_datetime_with_categories.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/interim/{pid}/phone_applications_crashes_features/phone_applications_crashes_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_APPLICATIONS_CRASHES"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+            files_to_compute.extend(expand("data/processed/features/{pid}/phone_applications_crashes.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+            files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-    for pids,table in zip([pids_android, pids_ios], [config["CONVERSATION"]["DB_TABLE"]["ANDROID"], config["CONVERSATION"]["DB_TABLE"]["IOS"]]):
-        files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=pids, sensor=table))
-        files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=pids, sensor=table))
-        files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=pids, sensor=table))
-    files_to_compute.extend(expand("data/processed/{pid}/conversation_{day_segment}.csv",pid=config["PIDS"], day_segment = config["CONVERSATION"]["DAY_SEGMENTS"]))
+if isinstance(config["PHONE_APPLICATIONS_NOTIFICATIONS"]["PROVIDERS"], dict):
+    for provider in config["PHONE_APPLICATIONS_NOTIFICATIONS"]["PROVIDERS"].keys():
+        if config["PHONE_APPLICATIONS_NOTIFICATIONS"]["PROVIDERS"][provider]["COMPUTE"]:
+            files_to_compute.extend(expand("data/raw/{pid}/phone_applications_notifications_raw.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/raw/{pid}/phone_applications_notifications_with_datetime.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/raw/{pid}/phone_applications_notifications_with_datetime_with_categories.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/interim/{pid}/phone_applications_notifications_features/phone_applications_notifications_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_APPLICATIONS_NOTIFICATIONS"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+            files_to_compute.extend(expand("data/processed/features/{pid}/phone_applications_notifications.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+            files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")

-if config["DORYAB_LOCATION"]["COMPUTE"]:
-    if config["DORYAB_LOCATION"]["LOCATIONS_TO_USE"] == "RESAMPLE_FUSED":
-        if config["DORYAB_LOCATION"]["DB_TABLE"] in config["PHONE_VALID_SENSED_BINS"]["DB_TABLES"]:
-            files_to_compute.extend(expand("data/interim/{pid}/phone_sensed_bins.csv", pid=config["PIDS"]))
-            files_to_compute.extend(expand("data/raw/{pid}/{sensor}_resampled.csv", pid=config["PIDS"], sensor=config["DORYAB_LOCATION"]["DB_TABLE"]))
+if isinstance(config["PHONE_KEYBOARD"]["PROVIDERS"], dict):
+    for provider in config["PHONE_KEYBOARD"]["PROVIDERS"].keys():    
+        if config["PHONE_KEYBOARD"]["PROVIDERS"][provider]["COMPUTE"]:
+            files_to_compute.extend(expand("data/raw/{pid}/phone_keyboard_raw.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/raw/{pid}/phone_keyboard_with_datetime.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/interim/{pid}/phone_keyboard_features/phone_keyboard_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_KEYBOARD"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+            files_to_compute.extend(expand("data/processed/features/{pid}/phone_keyboard.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+            files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+
+if isinstance(config["PHONE_LOG"]["PROVIDERS"], dict):
+    for provider in config["PHONE_LOG"]["PROVIDERS"].keys():    
+        if config["PHONE_LOG"]["PROVIDERS"][provider]["COMPUTE"]:
+            files_to_compute.extend(expand("data/raw/{pid}/phone_log_raw.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/raw/{pid}/phone_log_with_datetime.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/interim/{pid}/phone_log_features/phone_log_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_LOG"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+            files_to_compute.extend(expand("data/processed/features/{pid}/phone_log.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+            files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+
+for provider in config["PHONE_LOCATIONS"]["PROVIDERS"].keys():
+    if config["PHONE_LOCATIONS"]["PROVIDERS"][provider]["COMPUTE"]:
+        if config["PHONE_LOCATIONS"]["LOCATIONS_TO_USE"] in ["FUSED_RESAMPLED","ALL_RESAMPLED"]:
+            if "PHONE_LOCATIONS" in config["PHONE_DATA_YIELD"]["SENSORS"]:
+                files_to_compute.extend(expand("data/interim/{pid}/phone_yielded_timestamps.csv", pid=config["PIDS"]))
+            else:
+                raise ValueError("Error: Add PHONE_LOCATIONS (and as many PHONE_SENSORS as you have) to [PHONE_DATA_YIELD][SENSORS] in config.yaml. This is necessary to compute phone_yielded_timestamps (time when the smartphone was sensing data) which is used to resample fused location data (ALL_RESAMPLED and RESAMPLED_FUSED)")
+
+        if provider == "BARNETT":
+            files_to_compute.extend(expand("data/interim/{pid}/phone_locations_barnett_daily.csv", pid=config["PIDS"]))
+        if provider == "DORYAB":
+            files_to_compute.extend(expand("data/interim/{pid}/phone_locations_processed_with_datetime_with_doryab_columns_episodes.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/interim/{pid}/phone_locations_processed_with_datetime_with_doryab_columns_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
+
+        files_to_compute.extend(expand("data/raw/{pid}/phone_locations_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_locations_processed.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_locations_processed_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/phone_locations_features/phone_locations_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["PHONE_LOCATIONS"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/phone_locations.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+
+for provider in config["FITBIT_CALORIES_INTRADAY"]["PROVIDERS"].keys():
+    if config["FITBIT_CALORIES_INTRADAY"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/fitbit_calories_intraday_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/fitbit_calories_intraday_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/fitbit_calories_intraday_features/fitbit_calories_intraday_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["FITBIT_CALORIES_INTRADAY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_calories_intraday.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+
+for provider in config["FITBIT_DATA_YIELD"]["PROVIDERS"].keys():
+    if config["FITBIT_DATA_YIELD"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_intraday_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_intraday_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_data_yield.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+
+for provider in config["FITBIT_HEARTRATE_SUMMARY"]["PROVIDERS"].keys():
+    if config["FITBIT_HEARTRATE_SUMMARY"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_summary_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_summary_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/fitbit_heartrate_summary_features/fitbit_heartrate_summary_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["FITBIT_HEARTRATE_SUMMARY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_heartrate_summary.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+
+for provider in config["FITBIT_HEARTRATE_INTRADAY"]["PROVIDERS"].keys():
+    if config["FITBIT_HEARTRATE_INTRADAY"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_intraday_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/fitbit_heartrate_intraday_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/fitbit_heartrate_intraday_features/fitbit_heartrate_intraday_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["FITBIT_HEARTRATE_INTRADAY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_heartrate_intraday.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+
+for provider in config["FITBIT_SLEEP_SUMMARY"]["PROVIDERS"].keys():
+    if config["FITBIT_SLEEP_SUMMARY"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_summary_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_summary_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/fitbit_sleep_summary_features/fitbit_sleep_summary_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["FITBIT_SLEEP_SUMMARY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_sleep_summary.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+
+for provider in config["FITBIT_SLEEP_INTRADAY"]["PROVIDERS"].keys():
+    if config["FITBIT_SLEEP_INTRADAY"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_intraday_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_intraday_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/fitbit_sleep_intraday_episodes.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/fitbit_sleep_intraday_episodes_resampled.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/fitbit_sleep_intraday_episodes_resampled_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/fitbit_sleep_intraday_features/fitbit_sleep_intraday_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["FITBIT_SLEEP_INTRADAY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_sleep_intraday.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+
+for provider in config["FITBIT_STEPS_SUMMARY"]["PROVIDERS"].keys():
+    if config["FITBIT_STEPS_SUMMARY"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_summary_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_summary_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/fitbit_steps_summary_features/fitbit_steps_summary_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["FITBIT_STEPS_SUMMARY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_steps_summary.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+
+for provider in config["FITBIT_STEPS_INTRADAY"]["PROVIDERS"].keys():
+    if config["FITBIT_STEPS_INTRADAY"]["PROVIDERS"][provider]["COMPUTE"]:
+        
+        if config["FITBIT_STEPS_INTRADAY"]["EXCLUDE_SLEEP"]["TIME_BASED"]["EXCLUDE"] or config["FITBIT_STEPS_INTRADAY"]["EXCLUDE_SLEEP"]["FITBIT_BASED"]["EXCLUDE"]:
+            if config["FITBIT_STEPS_INTRADAY"]["EXCLUDE_SLEEP"]["FITBIT_BASED"]["EXCLUDE"]:
+                files_to_compute.extend(expand("data/raw/{pid}/fitbit_sleep_summary_raw.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/interim/{pid}/fitbit_steps_intraday_with_datetime_exclude_sleep.csv", pid=config["PIDS"]))
+
+        files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_intraday_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/fitbit_steps_intraday_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/fitbit_steps_intraday_features/fitbit_steps_intraday_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["FITBIT_STEPS_INTRADAY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/fitbit_steps_intraday.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+
+
+for provider in config["EMPATICA_ACCELEROMETER"]["PROVIDERS"].keys():
+    if config["EMPATICA_ACCELEROMETER"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/empatica_accelerometer_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/empatica_accelerometer_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/empatica_accelerometer_features/empatica_accelerometer_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["EMPATICA_ACCELEROMETER"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/empatica_accelerometer.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+     
+for provider in config["EMPATICA_HEARTRATE"]["PROVIDERS"].keys():
+    if config["EMPATICA_HEARTRATE"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/empatica_heartrate_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/empatica_heartrate_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/empatica_heartrate_features/empatica_heartrate_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["EMPATICA_HEARTRATE"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/empatica_heartrate.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+
+
+for provider in config["EMPATICA_TEMPERATURE"]["PROVIDERS"].keys():
+    if config["EMPATICA_TEMPERATURE"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/empatica_temperature_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/empatica_temperature_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/empatica_temperature_features/empatica_temperature_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["EMPATICA_TEMPERATURE"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/empatica_temperature.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+
+for provider in config["EMPATICA_ELECTRODERMAL_ACTIVITY"]["PROVIDERS"].keys():
+    if config["EMPATICA_ELECTRODERMAL_ACTIVITY"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/empatica_electrodermal_activity_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/empatica_electrodermal_activity_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/empatica_electrodermal_activity_features/empatica_electrodermal_activity_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["EMPATICA_ELECTRODERMAL_ACTIVITY"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/empatica_electrodermal_activity.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+
+for provider in config["EMPATICA_BLOOD_VOLUME_PULSE"]["PROVIDERS"].keys():
+    if config["EMPATICA_BLOOD_VOLUME_PULSE"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/empatica_blood_volume_pulse_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/empatica_blood_volume_pulse_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/empatica_blood_volume_pulse_features/empatica_blood_volume_pulse_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["EMPATICA_BLOOD_VOLUME_PULSE"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/empatica_blood_volume_pulse.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+
+for provider in config["EMPATICA_INTER_BEAT_INTERVAL"]["PROVIDERS"].keys():
+    if config["EMPATICA_INTER_BEAT_INTERVAL"]["PROVIDERS"][provider]["COMPUTE"]:
+        files_to_compute.extend(expand("data/raw/{pid}/empatica_inter_beat_interval_raw.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/raw/{pid}/empatica_inter_beat_interval_with_datetime.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/interim/{pid}/empatica_inter_beat_interval_features/empatica_inter_beat_interval_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["EMPATICA_INTER_BEAT_INTERVAL"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+        files_to_compute.extend(expand("data/processed/features/{pid}/empatica_inter_beat_interval.csv", pid=config["PIDS"]))
+        files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+        files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+     
+if isinstance(config["EMPATICA_TAGS"]["PROVIDERS"], dict):
+    for provider in config["EMPATICA_TAGS"]["PROVIDERS"].keys():
+        if config["EMPATICA_TAGS"]["PROVIDERS"][provider]["COMPUTE"]:
+            files_to_compute.extend(expand("data/raw/{pid}/empatica_tags_raw.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/raw/{pid}/empatica_tags_with_datetime.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/interim/{pid}/empatica_tags_features/empatica_tags_{language}_{provider_key}.csv", pid=config["PIDS"], language=get_script_language(config["EMPATICA_TAGS"]["PROVIDERS"][provider]["SRC_SCRIPT"]), provider_key=provider.lower()))
+            files_to_compute.extend(expand("data/processed/features/{pid}/empatica_tags.csv", pid=config["PIDS"]))
+            files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features.csv", pid=config["PIDS"]))
+            files_to_compute.append("data/processed/features/all_participants/all_sensor_features.csv")
+
+# Visualization for Data Exploration
+if config["HISTOGRAM_PHONE_DATA_YIELD"]["PLOT"]:
+    files_to_compute.append("reports/data_exploration/histogram_phone_data_yield.html")
+
+if config["HEATMAP_SENSORS_PER_MINUTE_PER_TIME_SEGMENT"]["PLOT"]:
+    files_to_compute.extend(expand("reports/interim/{pid}/heatmap_sensors_per_minute_per_time_segment.html", pid=config["PIDS"]))
+    files_to_compute.append("reports/data_exploration/heatmap_sensors_per_minute_per_time_segment.html")
+
+if config["HEATMAP_SENSOR_ROW_COUNT_PER_TIME_SEGMENT"]["PLOT"]:
+    files_to_compute.extend(expand("reports/interim/{pid}/heatmap_sensor_row_count_per_time_segment.html", pid=config["PIDS"]))
+    files_to_compute.append("reports/data_exploration/heatmap_sensor_row_count_per_time_segment.html")
+
+if config["HEATMAP_PHONE_DATA_YIELD_PER_PARTICIPANT_PER_TIME_SEGMENT"]["PLOT"]:
+    if not config["PHONE_DATA_YIELD"]["PROVIDERS"]["RAPIDS"]["COMPUTE"]:
+        raise ValueError("Error: [PHONE_DATA_YIELD][PROVIDERS][RAPIDS][COMPUTE] must be True in config.yaml to get heatmaps of overall data yield.")
+    files_to_compute.append("reports/data_exploration/heatmap_phone_data_yield_per_participant_per_time_segment.html")
+
+if config["HEATMAP_FEATURE_CORRELATION_MATRIX"]["PLOT"]:
+    files_to_compute.append("reports/data_exploration/heatmap_feature_correlation_matrix.html")
+
+# Data Cleaning
+for provider in config["ALL_CLEANING_INDIVIDUAL"]["PROVIDERS"].keys():
+    if config["ALL_CLEANING_INDIVIDUAL"]["PROVIDERS"][provider]["COMPUTE"]:
+        if provider == "STRAW":
+            files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features_cleaned_" + provider.lower() + "_py.csv", pid=config["PIDS"]))
        else:
-            raise ValueError("Error: Add your locations table (and as many sensor tables as you have) to [PHONE_VALID_SENSED_BINS][DB_TABLES] in config.yaml. This is necessary to compute phone_sensed_bins (bins of time when the smartphone was sensing data) which is used to resample fused location data (RESAMPLED_FUSED)")      
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["DORYAB_LOCATION"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["DORYAB_LOCATION"]["DB_TABLE"]))
-    files_to_compute.extend(expand("data/processed/{pid}/location_doryab_{segment}.csv", pid=config["PIDS"], segment = config["DORYAB_LOCATION"]["DAY_SEGMENTS"]))
+            files_to_compute.extend(expand("data/processed/features/{pid}/all_sensor_features_cleaned_" + provider.lower() + "_R.csv", pid=config["PIDS"]))

-# visualization for data exploration
-if config["HEATMAP_FEATURES_CORRELATIONS"]["PLOT"]:
-    files_to_compute.extend(expand("reports/data_exploration/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/heatmap_features_correlations.html", min_valid_hours_per_day=config["HEATMAP_FEATURES_CORRELATIONS"]["MIN_VALID_HOURS_PER_DAY"], min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
-    
-if config["HISTOGRAM_VALID_SENSED_HOURS"]["PLOT"]:
-    files_to_compute.extend(expand("reports/data_exploration/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/histogram_valid_sensed_hours.html", min_valid_hours_per_day=config["HISTOGRAM_VALID_SENSED_HOURS"]["MIN_VALID_HOURS_PER_DAY"], min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
+for provider in config["ALL_CLEANING_OVERALL"]["PROVIDERS"].keys():
+    if config["ALL_CLEANING_OVERALL"]["PROVIDERS"][provider]["COMPUTE"]:
+        if provider == "STRAW":
+            for target in config["PARAMS_FOR_ANALYSIS"]["TARGET"]["ALL_LABELS"]:
+                files_to_compute.extend(expand("data/processed/features/all_participants/all_sensor_features_cleaned_" + provider.lower() +"_py_(" + target + ").csv"))
+        else:
+            files_to_compute.extend(expand("data/processed/features/all_participants/all_sensor_features_cleaned_" + provider.lower() +"_R.csv"))     

-if config["HEATMAP_DAYS_BY_SENSORS"]["PLOT"]:
-    files_to_compute.extend(expand("reports/interim/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/{pid}/heatmap_days_by_sensors.html", pid=config["PIDS"], min_valid_hours_per_day=config["HEATMAP_DAYS_BY_SENSORS"]["MIN_VALID_HOURS_PER_DAY"], min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
-    files_to_compute.extend(expand("reports/data_exploration/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/heatmap_days_by_sensors_all_participants.html", min_valid_hours_per_day=config["HEATMAP_DAYS_BY_SENSORS"]["MIN_VALID_HOURS_PER_DAY"], min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
-
-if config["HEATMAP_SENSED_BINS"]["PLOT"]:
-    files_to_compute.extend(expand("reports/interim/heatmap_sensed_bins/{pid}/heatmap_sensed_bins.html", pid=config["PIDS"]))
-    files_to_compute.extend(["reports/data_exploration/heatmap_sensed_bins_all_participants.html"])
-
-if config["OVERALL_COMPLIANCE_HEATMAP"]["PLOT"]:
-    files_to_compute.extend(expand("reports/data_exploration/{min_valid_hours_per_day}hours_{min_valid_bins_per_hour}bins/overall_compliance_heatmap.html", min_valid_hours_per_day=config["OVERALL_COMPLIANCE_HEATMAP"]["MIN_VALID_HOURS_PER_DAY"], min_valid_bins_per_hour=config["PHONE_VALID_SENSED_DAYS"]["MIN_VALID_BINS_PER_HOUR"]))
+# Baseline features
+if config["PARAMS_FOR_ANALYSIS"]["BASELINE"]["COMPUTE"]:
+    files_to_compute.extend(expand("data/raw/baseline_merged.csv"))
+    files_to_compute.extend(expand("data/raw/{pid}/participant_baseline_raw.csv", pid=config["PIDS"]))
+    files_to_compute.extend(expand("data/interim/{pid}/baseline_questionnaires.csv", pid=config["PIDS"]))
+    files_to_compute.extend(expand("data/processed/features/{pid}/baseline_features.csv", pid=config["PIDS"]))

+# Targets (labels)
+if config["PARAMS_FOR_ANALYSIS"]["TARGET"]["COMPUTE"]:
+    files_to_compute.extend(expand("data/processed/models/individual_model/{pid}/input.csv", pid=config["PIDS"]))
+    for target in config["PARAMS_FOR_ANALYSIS"]["TARGET"]["ALL_LABELS"]:
+        files_to_compute.extend(expand("data/processed/models/population_model/input_" + target + ".csv"))

 rule all:
    input:
--- a/src/data/init.py
+++ b/src/data/init.py
--- a/automl_test.py
+++ b/automl_test.py
@ -0,0 +1,57 @@
+from pprint import pprint
+import sklearn.metrics
+import autosklearn.regression
+
+import datetime
+import importlib
+import os
+import sys
+
+import numpy as np
+import matplotlib.pyplot as plt
+import pandas as pd
+import seaborn as sns
+import yaml
+
+from sklearn import linear_model, svm, kernel_ridge, gaussian_process
+from sklearn.model_selection import LeaveOneGroupOut, cross_val_score, train_test_split
+from sklearn.metrics import mean_squared_error, r2_score
+from sklearn.impute import SimpleImputer
+
+model_input = pd.read_csv("data/processed/models/population_model/input_PANAS_negative_affect_mean.csv") # Standardizirani podatki
+
+model_input.dropna(axis=1, how="all", inplace=True)
+model_input.dropna(axis=0, how="any", subset=["target"], inplace=True)
+
+categorical_feature_colnames = ["gender", "startlanguage"]
+categorical_feature_colnames += [col for col in model_input.columns if "mostcommonactivity" in col or "homelabel" in col]
+categorical_features = model_input[categorical_feature_colnames].copy()
+mode_categorical_features = categorical_features.mode().iloc[0]
+categorical_features = categorical_features.fillna(mode_categorical_features)
+categorical_features = categorical_features.apply(lambda col: col.astype("category"))
+if not categorical_features.empty:
+    categorical_features = pd.get_dummies(categorical_features)
+numerical_features = model_input.drop(categorical_feature_colnames, axis=1)
+model_in = pd.concat([numerical_features, categorical_features], axis=1)
+
+index_columns = ["local_segment", "local_segment_label", "local_segment_start_datetime", "local_segment_end_datetime"]
+model_in.set_index(index_columns, inplace=True)
+
+X_train, X_test, y_train, y_test = train_test_split(model_in.drop(["target", "pid"], axis=1), model_in["target"], test_size=0.30)
+
+automl = autosklearn.regression.AutoSklearnRegressor(
+    time_left_for_this_task=7200,
+    per_run_time_limit=120
+)
+automl.fit(X_train, y_train, dataset_name='straw')
+
+print(automl.leaderboard())
+pprint(automl.show_models(), indent=4)
+
+train_predictions = automl.predict(X_train)
+print("Train R2 score:", sklearn.metrics.r2_score(y_train, train_predictions))
+test_predictions = automl.predict(X_test)
+print("Test R2 score:", sklearn.metrics.r2_score(y_test, test_predictions))
+
+import sys
+sys.exit()
--- a/code_of_conduct.md
+++ b/code_of_conduct.md
@ -0,0 +1,134 @@
+
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone, regardless of age, body
+size, visible or invisible disability, ethnicity, sex characteristics, gender
+identity and expression, level of experience, education, socio-economic status,
+nationality, personal appearance, race, religion, or sexual identity
+and orientation.
+
+We pledge to act and interact in ways that contribute to an open, welcoming,
+diverse, inclusive, and healthy community.
+
+## Our Standards
+
+Examples of behavior that contributes to a positive environment for our
+community include:
+
+* Demonstrating empathy and kindness toward other people
+* Being respectful of differing opinions, viewpoints, and experiences
+* Giving and gracefully accepting constructive feedback
+* Accepting responsibility and apologizing to those affected by our mistakes,
+  and learning from the experience
+* Focusing on what is best not just for us as individuals, but for the
+  overall community
+
+Examples of unacceptable behavior include:
+
+* The use of sexualized language or imagery, and sexual attention or
+  advances of any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or email
+  address, without their explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Enforcement Responsibilities
+
+Community leaders are responsible for clarifying and enforcing our standards of
+acceptable behavior and will take appropriate and fair corrective action in
+response to any behavior that they deem inappropriate, threatening, offensive,
+or harmful.
+
+Community leaders have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are
+not aligned to this Code of Conduct, and will communicate reasons for moderation
+decisions when appropriate.
+
+## Scope
+
+This Code of Conduct applies within all community spaces, and also applies when
+an individual is officially representing the community in public spaces.
+Examples of representing our community include using an official e-mail address,
+posting via an official social media account, or acting as an appointed
+representative at an online or offline event.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the community leaders responsible for enforcement at
+moshi@pitt.edu.
+All complaints will be reviewed and investigated promptly and fairly.
+
+All community leaders are obligated to respect the privacy and security of the
+reporter of any incident.
+
+## Enforcement Guidelines
+
+Community leaders will follow these Community Impact Guidelines in determining
+the consequences for any action they deem in violation of this Code of Conduct:
+
+### 1. Correction
+
+**Community Impact**: Use of inappropriate language or other behavior deemed
+unprofessional or unwelcome in the community.
+
+**Consequence**: A private, written warning from community leaders, providing
+clarity around the nature of the violation and an explanation of why the
+behavior was inappropriate. A public apology may be requested.
+
+### 2. Warning
+
+**Community Impact**: A violation through a single incident or series
+of actions.
+
+**Consequence**: A warning with consequences for continued behavior. No
+interaction with the people involved, including unsolicited interaction with
+those enforcing the Code of Conduct, for a specified period of time. This
+includes avoiding interactions in community spaces as well as external channels
+like social media. Violating these terms may lead to a temporary or
+permanent ban.
+
+### 3. Temporary Ban
+
+**Community Impact**: A serious violation of community standards, including
+sustained inappropriate behavior.
+
+**Consequence**: A temporary ban from any sort of interaction or public
+communication with the community for a specified period of time. No public or
+private interaction with the people involved, including unsolicited interaction
+with those enforcing the Code of Conduct, is allowed during this period.
+Violating these terms may lead to a permanent ban.
+
+### 4. Permanent Ban
+
+**Community Impact**: Demonstrating a pattern of violation of community
+standards, including sustained inappropriate behavior,  harassment of an
+individual, or aggression toward or disparagement of classes of individuals.
+
+**Consequence**: A permanent ban from any sort of public interaction within
+the community.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 2.0, available at
+[https://www.contributor-covenant.org/version/2/0/code_of_conduct.html][v2.0].
+
+Community Impact Guidelines were inspired by 
+[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
+
+For answers to common questions about this code of conduct, see the FAQ at
+[https://www.contributor-covenant.org/faq][FAQ]. Translations are available 
+at [https://www.contributor-covenant.org/translations][translations].
+
+[homepage]: https://www.contributor-covenant.org
+[v2.0]: https://www.contributor-covenant.org/version/2/0/code_of_conduct.html
+[Mozilla CoC]: https://github.com/mozilla/diversity
+[FAQ]: https://www.contributor-covenant.org/faq
+[translations]: https://www.contributor-covenant.org/translations
+
--- a/config.yaml
+++ b/config.yaml
@ -1,244 +1,758 @@
-# Participants to include in the analysis
-# You must create a file for each participant named pXXX containing their device_id. This can be done manually or automatically
-PIDS: [test01]
+########################################################################################################################
+#                                              GLOBAL CONFIGURATION                                                    #
+########################################################################################################################

-# Global var with common day segments
-DAY_SEGMENTS: &day_segments
-  [daily, morning, afternoon, evening, night]
+# See https://www.rapids.science/latest/setup/configuration/#participant-files
+PIDS: ['p031', 'p032', 'p033', 'p034', 'p035', 'p036', 'p037', 'p038', 'p039', 'p040', 'p042', 'p043', 'p044', 'p045', 'p046', 'p049', 'p050', 'p052', 'p053', 'p054', 'p055', 'p057', 'p058', 'p059', 'p060', 'p061', 'p062', 'p064', 'p067', 'p068', 'p069', 'p070', 'p071', 'p072', 'p073', 'p074', 'p075', 'p076', 'p077', 'p078', 'p079', 'p080', 'p081', 'p082', 'p083', 'p084', 'p085', 'p086', 'p088', 'p089', 'p090', 'p091', 'p092', 'p093', 'p106', 'p107']

-# Global timezone
-# Use codes from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
-# Double check your code, for example EST is not US Eastern Time.
-TIMEZONE: &timezone
-  America/New_York
+# See https://www.rapids.science/latest/setup/configuration/#automatic-creation-of-participant-files
+CREATE_PARTICIPANT_FILES:
+  USERNAMES_CSV: "data/external/main_study_usernames.csv"
+  CSV_FILE_PATH: "data/external/main_study_participants.csv" # see docs for required format
+  PHONE_SECTION:
+    ADD: True
+    IGNORED_DEVICE_IDS: []
+  FITBIT_SECTION:
+    ADD: False
+    IGNORED_DEVICE_IDS: []
+  EMPATICA_SECTION:
+    ADD: True
+    IGNORED_DEVICE_IDS: []

-DATABASE_GROUP: &database_group
-  MY_GROUP
+# See https://www.rapids.science/latest/setup/configuration/#time-segments
+TIME_SEGMENTS: &time_segments
+  TYPE: EVENT # FREQUENCY, PERIODIC, EVENT
+  FILE: "data/external/straw_events.csv"
+  INCLUDE_PAST_PERIODIC_SEGMENTS: TRUE # Only relevant if TYPE=PERIODIC, see docs
+  TAILORED_EVENTS: # Only relevant if TYPE=EVENT
+    COMPUTE: True
+    SEGMENTING_METHOD: "30_before" # 30_before, 90_before, stress_event
+    INTERVAL_OF_INTEREST: 10 # duration of event of interest [minutes]
+    IOI_ERROR_TOLERANCE: 5 # interval of interest erorr tolerance (before and after IOI) [minutes]

-DOWNLOAD_PARTICIPANTS:
-  IGNORED_DEVICE_IDS: [] # for example "5a1dd68c-6cd1-48fe-ae1e-14344ac5215f"
-  GROUP: *database_group
+# See https://www.rapids.science/latest/setup/configuration/#timezone-of-your-study
+TIMEZONE: 
+    TYPE: MULTIPLE
+    SINGLE:
+      TZCODE: Europe/Ljubljana
+    MULTIPLE:
+      TZ_FILE: data/external/timezone.csv
+      TZCODES_FILE: data/external/multiple_timezones.csv
+      IF_MISSING_TZCODE: USE_DEFAULT
+      DEFAULT_TZCODE: Europe/Ljubljana
+      FITBIT: 
+        ALLOW_MULTIPLE_TZ_PER_DEVICE: False
+        INFER_FROM_SMARTPHONE_TZ: False

-# Download data config
-DOWNLOAD_DATASET:
-  GROUP: *database_group
+########################################################################################################################
+#                                                 PHONE                                                                #
+########################################################################################################################

-# Readable datetime config
-READABLE_DATETIME:
-  FIXED_TIMEZONE: *timezone
+# See https://www.rapids.science/latest/setup/configuration/#data-stream-configuration
+PHONE_DATA_STREAMS:
+  USE: aware_postgresql
+  
+  # AVAILABLE:
+  aware_mysql: 
+    DATABASE_GROUP: MY_GROUP

-PHONE_VALID_SENSED_BINS:
-  COMPUTE: False # This flag is automatically ignored (set to True) if you are extracting PHONE_VALID_SENSED_DAYS or screen or Barnett's location features
-  BIN_SIZE: &bin_size 5 # (in minutes)
-  # Add as many sensor tables as you have, they all improve the computation of PHONE_VALID_SENSED_BINS and PHONE_VALID_SENSED_DAYS. 
-  # If you are extracting screen or Barnett's location features, screen and locations tables are mandatory.
-  DB_TABLES: []
+  aware_postgresql:
+    DATABASE_GROUP: PSQL_STRAW
+  
+  aware_csv:
+    FOLDER: data/external/aware_csv
+  
+  aware_influxdb: 
+    DATABASE_GROUP: MY_GROUP

-PHONE_VALID_SENSED_DAYS:
-  COMPUTE: False
-  MIN_VALID_HOURS_PER_DAY: &min_valid_hours_per_day [16] # (out of 24) MIN_HOURS_PER_DAY
-  MIN_VALID_BINS_PER_HOUR: &min_valid_bins_per_hour [6] # (out of 60min/BIN_SIZE bins)
+# Sensors ------

-# Communication SMS features config, TYPES and FEATURES keys need to match
-MESSAGES:
-  COMPUTE: False
-  DB_TABLE: messages
-  TYPES : [received, sent]
-  FEATURES: 
-    received: [count, distinctcontacts, timefirstmessage, timelastmessage, countmostfrequentcontact]
-    sent: [count, distinctcontacts, timefirstmessage, timelastmessage, countmostfrequentcontact]
-  DAY_SEGMENTS: *day_segments  
+# https://www.rapids.science/latest/features/phone-accelerometer/
+PHONE_ACCELEROMETER:
+  CONTAINER: accelerometer
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: False
+      FEATURES: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
+      SRC_SCRIPT: src/features/phone_accelerometer/rapids/main.py
+    PANDA:
+      COMPUTE: False
+      VALID_SENSED_MINUTES: False
+      FEATURES:
+        exertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
+        nonexertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
+      SRC_SCRIPT: src/features/phone_accelerometer/panda/main.py

-# Communication call features config, TYPES and FEATURES keys need to match
-CALLS:
-  COMPUTE: False
-  DB_TABLE: calls
-  TYPES: [missed, incoming, outgoing]
-  FEATURES:
-    missed:  [count, distinctcontacts, timefirstcall, timelastcall, countmostfrequentcontact]
-    incoming: [count, distinctcontacts, meanduration, sumduration, minduration, maxduration, stdduration, modeduration, entropyduration, timefirstcall, timelastcall, countmostfrequentcontact]
-    outgoing: [count, distinctcontacts, meanduration, sumduration, minduration, maxduration, stdduration, modeduration, entropyduration, timefirstcall, timelastcall, countmostfrequentcontact]
-  DAY_SEGMENTS: *day_segments
-
-APPLICATION_GENRES:
-  CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
-  CATALOGUE_FILE: "data/external/stachl_application_genre_catalogue.csv"
-  UPDATE_CATALOGUE_FILE: false # if CATALOGUE_SOURCE is equal to FILE, whether or not to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
-  SCRAPE_MISSING_GENRES: false # whether or not to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
-
-RESAMPLE_FUSED_LOCATION:
-  CONSECUTIVE_THRESHOLD: 30 # minutes, only replicate location samples to the next sensed bin if the phone did not stop collecting data for more than this threshold
-  TIME_SINCE_VALID_LOCATION: 720 # minutes, only replicate location samples to consecutive sensed bins if they were logged within this threshold after a valid location row
-  TIMEZONE: *timezone
-
-BARNETT_LOCATION:
-  COMPUTE: False
-  DB_TABLE: locations
-  DAY_SEGMENTS: [daily] # These features are only available on a daily basis
-  FEATURES: ["hometime","disttravelled","rog","maxdiam","maxhomedist","siglocsvisited","avgflightlen","stdflightlen","avgflightdur","stdflightdur","probpause","siglocentropy","circdnrtn","wkenddayrtn"]
-  LOCATIONS_TO_USE: ALL # ALL, ALL_EXCEPT_FUSED OR RESAMPLE_FUSED
-  ACCURACY_LIMIT: 51 # meters, drops location coordinates with an accuracy higher than this. This number means there's a 68% probability the true location is within this radius
-  TIMEZONE: *timezone
-  MINUTES_DATA_USED: False # Use this for quality control purposes, how many minutes of data (location coordinates gruped by minute) were used to compute features
-
-DORYAB_LOCATION:
-  COMPUTE: False
-  DB_TABLE: locations
-  DAY_SEGMENTS: *day_segments
-  FEATURES: ["locationvariance","loglocationvariance","totaldistance","averagespeed","varspeed","circadianmovement","numberofsignificantplaces","numberlocationtransitions","radiusgyration","timeattop1location","timeattop2location","timeattop3location","movingtostaticratio","outlierstimepercent","maxlengthstayatclusters","minlengthstayatclusters","meanlengthstayatclusters","stdlengthstayatclusters","locationentropy","normalizedlocationentropy"]
-  LOCATIONS_TO_USE: ALL # ALL, ALL_EXCEPT_FUSED OR RESAMPLE_FUSED  
-  DBSCAN_EPS: 10 # meters
-  DBSCAN_MINSAMPLES: 5
-  THRESHOLD_STATIC : 1 # km/h
-  MAXIMUM_GAP_ALLOWED: 300
-  MINUTES_DATA_USED: False
-  SAMPLING_FREQUENCY: 0
-
-BLUETOOTH:
-  COMPUTE: False
-  DB_TABLE: bluetooth
-  DAY_SEGMENTS: *day_segments
-  FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
-
-ACTIVITY_RECOGNITION:
-  COMPUTE: False
-  DB_TABLE: 
-    ANDROID: plugin_google_activity_recognition
+# See https://www.rapids.science/latest/features/phone-activity-recognition/
+PHONE_ACTIVITY_RECOGNITION:
+  CONTAINER: 
+    ANDROID: google_ar
    IOS: plugin_ios_activity_recognition
-  DAY_SEGMENTS: *day_segments
-  FEATURES: ["count","mostcommonactivity","countuniqueactivities","activitychangecount","sumstationary","summobile","sumvehicle"]
+  EPISODE_THRESHOLD_BETWEEN_ROWS: 5 # minutes. Max time difference for two consecutive rows to be considered within the same AR episode.
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: True
+      FEATURES: ["count", "mostcommonactivity", "countuniqueactivities", "durationstationary", "durationmobile", "durationvehicle"]
+      ACTIVITY_CLASSES:
+        STATIONARY: ["still", "tilting"]
+        MOBILE: ["on_foot", "walking", "running", "on_bicycle"]
+        VEHICLE: ["in_vehicle"]
+      SRC_SCRIPT: src/features/phone_activity_recognition/rapids/main.py

-BATTERY:
-  COMPUTE: False
-  DB_TABLE: battery
-  DAY_SEGMENTS: *day_segments
-  FEATURES: ["countdischarge", "sumdurationdischarge", "countcharge", "sumdurationcharge", "avgconsumptionrate", "maxconsumptionrate"]
+# See https://www.rapids.science/latest/features/phone-applications-crashes/
+PHONE_APPLICATIONS_CRASHES:
+  CONTAINER: applications_crashes
+  APPLICATION_CATEGORIES:
+    CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
+    CATALOGUE_FILE: "data/external/play_store_application_genre_catalogue.csv"
+    UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
+    SCRAPE_MISSING_CATEGORIES: False # whether to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
+  PROVIDERS: # None implemented yet but this sensor can be used in PHONE_DATA_YIELD

-SCREEN:
-  COMPUTE: False
-  DB_TABLE: screen
-  DAY_SEGMENTS: *day_segments
-  REFERENCE_HOUR_FIRST_USE: 0
-  IGNORE_EPISODES_SHORTER_THAN: 0 # in minutes, set to 0 to disable
-  IGNORE_EPISODES_LONGER_THAN: 0 # in minutes, set to 0 to disable
-  FEATURES_DELTAS: ["countepisode", "episodepersensedminutes", "sumduration", "maxduration", "minduration", "avgduration", "stdduration", "firstuseafter"]
-  EPISODE_TYPES: ["unlock"]
+# See https://www.rapids.science/latest/features/phone-applications-foreground/
+PHONE_APPLICATIONS_FOREGROUND:
+  CONTAINER: applications
+  APPLICATION_CATEGORIES:
+    CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
+    CATALOGUE_FILE: "data/external/play_store_application_genre_catalogue.csv"
+    # Refer to data/external/play_store_categories_count.csv for a list of categories (genres) and their frequency.
+    UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
+    SCRAPE_MISSING_CATEGORIES: False # whether to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: True
+      INCLUDE_EPISODE_FEATURES: True
+      SINGLE_CATEGORIES: ["Productivity", "Tools", "Communication", "Education", "Social"]
+      MULTIPLE_CATEGORIES:
+        games: ["Puzzle", "Card", "Casual", "Board", "Strategy", "Trivia", "Word", "Adventure", "Role Playing", "Simulation", "Board, Brain Games", "Racing"]
+        social: ["Communication", "Social", "Dating"]
+        productivity: ["Tools", "Productivity", "Finance", "Education", "News & Magazines", "Business", "Books & Reference"]
+        health: ["Health & Fitness", "Lifestyle", "Food & Drink", "Sports", "Medical", "Parenting"]
+        entertainment: ["Shopping", "Music & Audio", "Entertainment", "Travel & Local", "Photography", "Video Players & Editors", "Personalization", "House & Home", "Art & Design", "Auto & Vehicles", "Entertainment,Music & Video",
+                        "Puzzle", "Card", "Casual", "Board", "Strategy", "Trivia", "Word", "Adventure", "Role Playing", "Simulation", "Board, Brain Games", "Racing" # Add all games.
+        ]
+        maps_weather: ["Maps & Navigation", "Weather"]
+      CUSTOM_CATEGORIES:
+      SINGLE_APPS: []
+      EXCLUDED_CATEGORIES: ["System", "STRAW"]
+      # Note: A special option here is "is_system_app".
+      # This excludes applications that have is_system_app = TRUE, which is a separate column in the table.
+      # However, all of these applications have been assigned System category.
+      # I will therefore filter by that category, which is a superset and is more complete. JL
+      EXCLUDED_APPS: []
+      FEATURES: 
+        APP_EVENTS: ["countevent", "timeoffirstuse", "timeoflastuse", "frequencyentropy"]
+        APP_EPISODES: ["countepisode", "minduration", "maxduration", "meanduration", "sumduration"]
+      IGNORE_EPISODES_SHORTER_THAN: 0 # in minutes, set to 0 to disable
+      IGNORE_EPISODES_LONGER_THAN: 300 # in minutes, set to 0 to disable
+      SRC_SCRIPT: src/features/phone_applications_foreground/rapids/main.py

-LIGHT:
-  COMPUTE: False
-  DB_TABLE: light
-  DAY_SEGMENTS: *day_segments
-  FEATURES: ["count", "maxlux", "minlux", "avglux", "medianlux", "stdlux"]
+# See https://www.rapids.science/latest/features/phone-applications-notifications/
+PHONE_APPLICATIONS_NOTIFICATIONS:
+  CONTAINER: notifications
+  APPLICATION_CATEGORIES:
+    CATALOGUE_SOURCE: FILE # FILE (genres are read from CATALOGUE_FILE) or GOOGLE (genres are scrapped from the Play Store)
+    CATALOGUE_FILE: "data/external/stachl_application_genre_catalogue.csv"
+    UPDATE_CATALOGUE_FILE: False # if CATALOGUE_SOURCE is equal to FILE, whether or not to update CATALOGUE_FILE, if CATALOGUE_SOURCE is equal to GOOGLE all scraped genres will be saved to CATALOGUE_FILE
+    SCRAPE_MISSING_CATEGORIES: False # whether or not to scrape missing genres, only effective if CATALOGUE_SOURCE is equal to FILE. If CATALOGUE_SOURCE is equal to GOOGLE, all genres are scraped anyway
+  PROVIDERS: # None implemented yet but this sensor can be used in PHONE_DATA_YIELD

-ACCELEROMETER:
-  COMPUTE: False
-  DB_TABLE: accelerometer
-  DAY_SEGMENTS: *day_segments
-  FEATURES:
-    MAGNITUDE: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
-    EXERTIONAL_ACTIVITY_EPISODE: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
-    NONEXERTIONAL_ACTIVITY_EPISODE: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
-    VALID_SENSED_MINUTES: False
+# See https://www.rapids.science/latest/features/phone-battery/
+PHONE_BATTERY:
+  CONTAINER: battery
+  EPISODE_THRESHOLD_BETWEEN_ROWS: 30 # minutes. Max time difference for two consecutive rows to be considered within the same battery episode.
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: True
+      FEATURES: ["countdischarge", "sumdurationdischarge", "countcharge", "sumdurationcharge", "avgconsumptionrate", "maxconsumptionrate"]
+      SRC_SCRIPT: src/features/phone_battery/rapids/main.py

-APPLICATIONS_FOREGROUND:
-  COMPUTE: False
-  DB_TABLE: applications_foreground
-  DAY_SEGMENTS: *day_segments
-  SINGLE_CATEGORIES: ["all", "email"]
-  MULTIPLE_CATEGORIES:
-    social: ["socialnetworks", "socialmediatools"]
-    entertainment: ["entertainment", "gamingknowledge", "gamingcasual", "gamingadventure", "gamingstrategy", "gamingtoolscommunity", "gamingroleplaying", "gamingaction", "gaminglogic", "gamingsports", "gamingsimulation"]
-  SINGLE_APPS: ["top1global", "com.facebook.moments", "com.google.android.youtube", "com.twitter.android"] # There's no entropy for single apps
-  EXCLUDED_CATEGORIES: ["system_apps"]
-  EXCLUDED_APPS: ["com.fitbit.FitbitMobile", "com.aware.plugin.upmc.cancer"]
-  FEATURES: ["count", "timeoffirstuse", "timeoflastuse", "frequencyentropy"]
+# See https://www.rapids.science/latest/features/phone-bluetooth/
+PHONE_BLUETOOTH:
+  CONTAINER: bluetooth
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: False
+      FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
+      SRC_SCRIPT: src/features/phone_bluetooth/rapids/main.R

-HEARTRATE:
-  COMPUTE: False
-  DB_TABLE: fitbit_data
-  DAY_SEGMENTS: *day_segments
-  SUMMARY_FEATURES: ["restinghr"] # calories features' accuracy depend on the accuracy of the participants fitbit profile (e.g. heigh, weight) use with care: ["caloriesoutofrange", "caloriesfatburn", "caloriescardio", "caloriespeak"] 
-  INTRADAY_FEATURES: ["maxhr", "minhr", "avghr", "medianhr", "modehr", "stdhr", "diffmaxmodehr", "diffminmodehr", "entropyhr", "minutesonoutofrangezone", "minutesonfatburnzone", "minutesoncardiozone", "minutesonpeakzone"]
+    DORYAB:
+      COMPUTE: True
+      FEATURES: 
+        ALL: 
+            DEVICES: ["countscans", "uniquedevices", "meanscans", "stdscans"]
+            SCANS_MOST_FREQUENT_DEVICE: ["withinsegments", "acrosssegments", "acrossdataset"]
+            SCANS_LEAST_FREQUENT_DEVICE: ["withinsegments", "acrosssegments", "acrossdataset"]
+        OWN: 
+            DEVICES: ["countscans", "uniquedevices", "meanscans", "stdscans"]
+            SCANS_MOST_FREQUENT_DEVICE: ["withinsegments", "acrosssegments", "acrossdataset"]
+            SCANS_LEAST_FREQUENT_DEVICE: ["withinsegments", "acrosssegments", "acrossdataset"]
+        OTHERS:
+            DEVICES: ["countscans", "uniquedevices", "meanscans", "stdscans"]
+            SCANS_MOST_FREQUENT_DEVICE: ["withinsegments", "acrosssegments", "acrossdataset"]
+            SCANS_LEAST_FREQUENT_DEVICE: ["withinsegments", "acrosssegments", "acrossdataset"]
+      SRC_SCRIPT: src/features/phone_bluetooth/doryab/main.py

-STEP:
-  COMPUTE: False
-  DB_TABLE: fitbit_data
-  DAY_SEGMENTS: *day_segments
-  EXCLUDE_SLEEP:
-    EXCLUDE: False
-    TYPE: FIXED # FIXED OR FITBIT_BASED (CONFIGURE FITBIT's SLEEP DB_TABLE)
-    FIXED:
-      START: "23:00"
-      END: "07:00"
-  FEATURES:
-    ALL_STEPS: ["sumallsteps", "maxallsteps", "minallsteps", "avgallsteps", "stdallsteps"]
-    SEDENTARY_BOUT: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration"]
-    ACTIVE_BOUT: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration"]
-  THRESHOLD_ACTIVE_BOUT: 10 # steps
-  INCLUDE_ZERO_STEP_ROWS: False
+# See https://www.rapids.science/latest/features/phone-calls/
+PHONE_CALLS:
+  CONTAINER: call
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: True
+      FEATURES_TYPE: EPISODES # EVENTS or EPISODES
+      CALL_TYPES: [missed, incoming, outgoing]
+      FEATURES:
+        missed:  [count, distinctcontacts, timefirstcall, timelastcall, countmostfrequentcontact]
+        incoming: [count, distinctcontacts, meanduration, sumduration, minduration, maxduration, stdduration, modeduration, entropyduration, timefirstcall, timelastcall, countmostfrequentcontact]
+        outgoing: [count, distinctcontacts, meanduration, sumduration, minduration, maxduration, stdduration, modeduration, entropyduration, timefirstcall, timelastcall, countmostfrequentcontact]
+      SRC_SCRIPT: src/features/phone_calls/rapids/main.R

-SLEEP:
-  COMPUTE: False
-  DB_TABLE: fitbit_data
-  DAY_SEGMENTS: *day_segments
-  SLEEP_TYPES: ["main", "nap", "all"]
-  SUMMARY_FEATURES: ["sumdurationafterwakeup", "sumdurationasleep", "sumdurationawake", "sumdurationtofallasleep", "sumdurationinbed", "avgefficiency", "countepisode"]
-
-WIFI:
-  COMPUTE: False
-  DB_TABLE: 
-    VISIBLE_ACCESS_POINTS: "wifi" # if you only have a CONNECTED_ACCESS_POINTS table, set this value to ""
-    CONNECTED_ACCESS_POINTS: "sensor_wifi" # if you only have a VISIBLE_ACCESS_POINTS table, set this value to ""
-  DAY_SEGMENTS: *day_segments
-  FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
-
-CONVERSATION:
-  COMPUTE: False
-  DB_TABLE: 
+# See https://www.rapids.science/latest/features/phone-conversation/
+PHONE_CONVERSATION: # TODO Adapt for speech
+  CONTAINER: 
    ANDROID: plugin_studentlife_audio_android
    IOS: plugin_studentlife_audio
-  DAY_SEGMENTS: *day_segments
-  FEATURES: ["minutessilence", "minutesnoise", "minutesvoice", "minutesunknown","sumconversationduration","avgconversationduration",
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: False
+      FEATURES: ["minutessilence", "minutesnoise", "minutesvoice", "minutesunknown","sumconversationduration","avgconversationduration",
    "sdconversationduration","minconversationduration","maxconversationduration","timefirstconversation","timelastconversation","noisesumenergy",
    "noiseavgenergy","noisesdenergy","noiseminenergy","noisemaxenergy","voicesumenergy",
    "voiceavgenergy","voicesdenergy","voiceminenergy","voicemaxenergy","silencesensedfraction","noisesensedfraction",
    "voicesensedfraction","unknownsensedfraction","silenceexpectedfraction","noiseexpectedfraction","voiceexpectedfraction",
    "unknownexpectedfraction","countconversation"]
-  RECORDINGMINUTES: 1
-  PAUSEDMINUTES : 3
+      RECORDING_MINUTES: 1
+      PAUSED_MINUTES : 3
+      SRC_SCRIPT: src/features/phone_conversation/rapids/main.py

-### Visualizations ################################################################
-HEATMAP_FEATURES_CORRELATIONS:
+# See https://www.rapids.science/latest/features/phone-data-yield/
+PHONE_DATA_YIELD:
+  SENSORS: [#PHONE_ACCELEROMETER,
+            PHONE_ACTIVITY_RECOGNITION,
+            PHONE_APPLICATIONS_FOREGROUND,
+            PHONE_APPLICATIONS_NOTIFICATIONS,
+            PHONE_BATTERY,
+            PHONE_BLUETOOTH,
+            PHONE_CALLS,
+            PHONE_LIGHT,
+            PHONE_LOCATIONS,
+            PHONE_MESSAGES,
+            PHONE_SCREEN,
+            PHONE_WIFI_VISIBLE]
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: True
+      FEATURES: [ratiovalidyieldedminutes, ratiovalidyieldedhours]
+      MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS: 0.5 # 0 to 1, minimum percentage of valid minutes in an hour to be considered valid.
+      SRC_SCRIPT: src/features/phone_data_yield/rapids/main.R
+
+PHONE_ESM:
+  CONTAINER: esm
+  PROVIDERS:
+    STRAW:
+      COMPUTE: True
+      SCALES: ["PANAS_positive_affect", "PANAS_negative_affect", "JCQ_job_demand", "JCQ_job_control", "JCQ_supervisor_support", "JCQ_coworker_support", 
+              "appraisal_stressfulness_period", "appraisal_stressfulness_event", "appraisal_threat", "appraisal_challenge"]
+      FEATURES: [mean]
+      SRC_SCRIPT: src/features/phone_esm/straw/main.py
+
+# See https://www.rapids.science/latest/features/phone-keyboard/
+PHONE_KEYBOARD:
+  CONTAINER: keyboard
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: False
+      FEATURES: ["sessioncount","averageinterkeydelay","averagesessionlength","changeintextlengthlessthanminusone","changeintextlengthequaltominusone","changeintextlengthequaltoone","changeintextlengthmorethanone","maxtextlength","lastmessagelength","totalkeyboardtouches"]
+      SRC_SCRIPT: src/features/phone_keyboard/rapids/main.py
+
+# See https://www.rapids.science/latest/features/phone-light/
+PHONE_LIGHT:
+  CONTAINER: light_sensor
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: True
+      FEATURES: ["count", "maxlux", "minlux", "avglux", "medianlux", "stdlux"]
+      SRC_SCRIPT: src/features/phone_light/rapids/main.py
+
+# See https://www.rapids.science/latest/features/phone-locations/
+PHONE_LOCATIONS:
+  CONTAINER: locations
+  LOCATIONS_TO_USE: ALL_RESAMPLED # ALL, GPS, ALL_RESAMPLED, OR FUSED_RESAMPLED
+  FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD: 30 # minutes, only replicate location samples to the next sensed bin if the phone did not stop collecting data for more than this threshold
+  FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION: 720 # minutes, only replicate location samples to consecutive sensed bins if they were logged within this threshold after a valid location row
+  ACCURACY_LIMIT: 100 # meters, drops location coordinates with an accuracy equal or higher than this. This number means there's a 68% probability the true location is within this radius
+
+  PROVIDERS:
+    DORYAB:
+      COMPUTE: True
+      FEATURES: ["locationvariance","loglocationvariance","totaldistance","avgspeed","varspeed", "numberofsignificantplaces","numberlocationtransitions","radiusgyration","timeattop1location","timeattop2location","timeattop3location","movingtostaticratio","outlierstimepercent","maxlengthstayatclusters","minlengthstayatclusters","avglengthstayatclusters","stdlengthstayatclusters","locationentropy","normalizedlocationentropy","timeathome", "homelabel"]
+      DBSCAN_EPS: 100 # meters
+      DBSCAN_MINSAMPLES: 5
+      THRESHOLD_STATIC : 1 # km/h
+      MAXIMUM_ROW_GAP: 300 # seconds
+      MINUTES_DATA_USED: False
+      CLUSTER_ON: PARTICIPANT_DATASET # PARTICIPANT_DATASET, TIME_SEGMENT, TIME_SEGMENT_INSTANCE
+      INFER_HOME_LOCATION_STRATEGY: DORYAB_STRATEGY # DORYAB_STRATEGY, SUN_LI_VEGA_STRATEGY
+      MINIMUM_DAYS_TO_DETECT_HOME_CHANGES: 3
+      CLUSTERING_ALGORITHM: DBSCAN # DBSCAN, OPTICS
+      RADIUS_FOR_HOME: 100
+      SRC_SCRIPT: src/features/phone_locations/doryab/main.py
+
+    BARNETT:
+      COMPUTE: True
+      FEATURES: ["hometime","disttravelled","rog","maxdiam","maxhomedist","siglocsvisited","avgflightlen","stdflightlen","avgflightdur","stdflightdur","probpause","siglocentropy","circdnrtn","wkenddayrtn"]
+      IF_MULTIPLE_TIMEZONES: USE_MOST_COMMON
+      MINUTES_DATA_USED: False # Use this for quality control purposes, how many minutes of data (location coordinates gruped by minute) were used to compute features
+      SRC_SCRIPT: src/features/phone_locations/barnett/main.R
+
+# See https://www.rapids.science/latest/features/phone-log/
+PHONE_LOG:
+  CONTAINER: 
+    ANDROID: aware_log
+    IOS: ios_aware_log
+  PROVIDERS: # None implemented yet but this sensor can be used in PHONE_DATA_YIELD
+
+# See https://www.rapids.science/latest/features/phone-messages/
+PHONE_MESSAGES:
+  CONTAINER: sms
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: True
+      MESSAGES_TYPES : [received, sent]
+      FEATURES: 
+        received: [count, distinctcontacts, timefirstmessage, timelastmessage, countmostfrequentcontact]
+        sent: [count, distinctcontacts, timefirstmessage, timelastmessage, countmostfrequentcontact]
+      SRC_SCRIPT: src/features/phone_messages/rapids/main.R
+
+# See https://www.rapids.science/latest/features/phone-screen/
+PHONE_SCREEN:
+  CONTAINER: screen
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: True
+      REFERENCE_HOUR_FIRST_USE: 0
+      IGNORE_EPISODES_SHORTER_THAN: 0 # in minutes, set to 0 to disable
+      IGNORE_EPISODES_LONGER_THAN: 360 # in minutes, set to 0 to disable
+      FEATURES: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration", "firstuseafter"] # "episodepersensedminutes" needs to be added later
+      EPISODE_TYPES: ["unlock"]
+      SRC_SCRIPT: src/features/phone_screen/rapids/main.py
+
+# Custom added sensor
+PHONE_SPEECH:
+  CONTAINER: speech
+  PROVIDERS:
+    STRAW:
+      COMPUTE: True
+      FEATURES: ["meanspeech", "stdspeech", "nlargest", "nsmallest", "medianspeech"]
+      SRC_SCRIPT: src/features/phone_speech/straw/main.py
+
+# See https://www.rapids.science/latest/features/phone-wifi-connected/
+PHONE_WIFI_CONNECTED:
+  CONTAINER: sensor_wifi
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: False
+      FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
+      SRC_SCRIPT: src/features/phone_wifi_connected/rapids/main.R
+
+# See https://www.rapids.science/latest/features/phone-wifi-visible/
+PHONE_WIFI_VISIBLE:
+  CONTAINER: wifi
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: True
+      FEATURES: ["countscans", "uniquedevices", "countscansmostuniquedevice"]
+      SRC_SCRIPT: src/features/phone_wifi_visible/rapids/main.R
+
+
+
+########################################################################################################################
+#                                                 FITBIT                                                               #
+########################################################################################################################
+
+# See https://www.rapids.science/latest/setup/configuration/#data-stream-configuration
+FITBIT_DATA_STREAMS:
+  USE: fitbitjson_mysql
+  
+  # AVAILABLE:
+  fitbitjson_mysql: 
+    DATABASE_GROUP: MY_GROUP
+    SLEEP_SUMMARY_LAST_NIGHT_END: 660 # a number ranged from 0 (midnight) to 1439 (23:59) which denotes number of minutes after midnight. By default, 660 (11:00).
+  
+  fitbitparsed_mysql: 
+    DATABASE_GROUP: MY_GROUP
+    SLEEP_SUMMARY_LAST_NIGHT_END: 660 # a number ranged from 0 (midnight) to 1439 (23:59) which denotes number of minutes after midnight. By default, 660 (11:00).
+  
+  fitbitjson_csv: 
+    FOLDER: data/external/fitbit_csv
+    SLEEP_SUMMARY_LAST_NIGHT_END: 660 # a number ranged from 0 (midnight) to 1439 (23:59) which denotes number of minutes after midnight. By default, 660 (11:00).
+
+  fitbitparsed_csv: 
+    FOLDER: data/external/fitbit_csv
+    SLEEP_SUMMARY_LAST_NIGHT_END: 660 # a number ranged from 0 (midnight) to 1439 (23:59) which denotes number of minutes after midnight. By default, 660 (11:00).
+
+# Sensors ------
+
+# See https://www.rapids.science/latest/features/fitbit-calories-intraday/
+FITBIT_CALORIES_INTRADAY:
+  CONTAINER: fitbit_data
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: False
+      EPISODE_TYPE: [sedentary, lightlyactive, fairlyactive, veryactive, mvpa, lowmet, highmet]
+      EPISODE_TIME_THRESHOLD: 5 # minutes
+      EPISODE_MET_THRESHOLD: 3
+      EPISODE_MVPA_CATEGORIES: [fairlyactive, veryactive]
+      EPISODE_REFERENCE_TIME: MIDNIGHT # or START_OF_THE_SEGMENT
+      FEATURES: [count, sumduration, avgduration, minduration, maxduration, stdduration, starttimefirst, endtimefirst, starttimelast, endtimelast, starttimelongest, endtimelongest, summet, avgmet, maxmet, minmet, stdmet, sumcalories, avgcalories, maxcalories, mincalories, stdcalories]
+      SRC_SCRIPT: src/features/fitbit_calories_intraday/rapids/main.R
+
+# See https://www.rapids.science/latest/features/fitbit-data-yield/
+FITBIT_DATA_YIELD:
+  SENSOR: FITBIT_HEARTRATE_INTRADAY
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: False
+      FEATURES: [ratiovalidyieldedminutes, ratiovalidyieldedhours]
+      MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS: 0.5 #  0 to 1, minimum percentage of valid minutes in an hour to be considered valid.
+      SRC_SCRIPT: src/features/fitbit_data_yield/rapids/main.R
+
+
+# See https://www.rapids.science/latest/features/fitbit-heartrate-summary/
+FITBIT_HEARTRATE_SUMMARY:
+  CONTAINER: heartrate_summary
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: False
+      FEATURES: ["maxrestinghr", "minrestinghr", "avgrestinghr", "medianrestinghr", "moderestinghr", "stdrestinghr", "diffmaxmoderestinghr", "diffminmoderestinghr", "entropyrestinghr"] # calories features' accuracy depend on the accuracy of the participants fitbit profile (e.g. height, weight) use these with care: ["sumcaloriesoutofrange", "maxcaloriesoutofrange", "mincaloriesoutofrange", "avgcaloriesoutofrange", "mediancaloriesoutofrange", "stdcaloriesoutofrange", "entropycaloriesoutofrange", "sumcaloriesfatburn", "maxcaloriesfatburn", "mincaloriesfatburn", "avgcaloriesfatburn", "mediancaloriesfatburn", "stdcaloriesfatburn", "entropycaloriesfatburn", "sumcaloriescardio", "maxcaloriescardio", "mincaloriescardio", "avgcaloriescardio", "mediancaloriescardio", "stdcaloriescardio", "entropycaloriescardio", "sumcaloriespeak", "maxcaloriespeak", "mincaloriespeak", "avgcaloriespeak", "mediancaloriespeak", "stdcaloriespeak", "entropycaloriespeak"]
+      SRC_SCRIPT: src/features/fitbit_heartrate_summary/rapids/main.py
+
+# See https://www.rapids.science/latest/features/fitbit-heartrate-intraday/
+FITBIT_HEARTRATE_INTRADAY:
+  CONTAINER: heartrate_intraday
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: False
+      FEATURES: ["maxhr", "minhr", "avghr", "medianhr", "modehr", "stdhr", "diffmaxmodehr", "diffminmodehr", "entropyhr", "minutesonoutofrangezone", "minutesonfatburnzone", "minutesoncardiozone", "minutesonpeakzone"]
+      SRC_SCRIPT: src/features/fitbit_heartrate_intraday/rapids/main.py
+
+# See https://www.rapids.science/latest/features/fitbit-sleep-summary/
+FITBIT_SLEEP_SUMMARY:
+  CONTAINER: sleep_summary
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: False
+      FEATURES: ["firstwaketime", "lastwaketime", "firstbedtime", "lastbedtime", "countepisode", "avgefficiency", "sumdurationafterwakeup", "sumdurationasleep", "sumdurationawake", "sumdurationtofallasleep", "sumdurationinbed", "avgdurationafterwakeup", "avgdurationasleep", "avgdurationawake", "avgdurationtofallasleep", "avgdurationinbed"]
+      SLEEP_TYPES: ["main", "nap", "all"]
+      SRC_SCRIPT: src/features/fitbit_sleep_summary/rapids/main.py
+
+# See https://www.rapids.science/latest/features/fitbit-sleep-intraday/
+FITBIT_SLEEP_INTRADAY:
+  CONTAINER: sleep_intraday
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: False
+      FEATURES:
+        LEVELS_AND_TYPES: [countepisode, sumduration, maxduration, minduration, avgduration, medianduration, stdduration]
+        RATIOS_TYPE: [count, duration]
+        RATIOS_SCOPE: [ACROSS_LEVELS, ACROSS_TYPES, WITHIN_LEVELS, WITHIN_TYPES]
+      SLEEP_LEVELS:
+        INCLUDE_ALL_GROUPS: True
+        CLASSIC: [awake, restless, asleep]
+        STAGES: [wake, deep, light, rem]
+        UNIFIED: [awake, asleep]
+      SLEEP_TYPES: [main, nap, all]
+      SRC_SCRIPT: src/features/fitbit_sleep_intraday/rapids/main.py
+    PRICE:
+      COMPUTE: False
+      FEATURES: [avgduration, avgratioduration, avgstarttimeofepisodemain, avgendtimeofepisodemain, avgmidpointofepisodemain, stdstarttimeofepisodemain, stdendtimeofepisodemain, stdmidpointofepisodemain, socialjetlag, rmssdmeanstarttimeofepisodemain, rmssdmeanendtimeofepisodemain, rmssdmeanmidpointofepisodemain, rmssdmedianstarttimeofepisodemain, rmssdmedianendtimeofepisodemain, rmssdmedianmidpointofepisodemain]
+      SLEEP_LEVELS:
+        INCLUDE_ALL_GROUPS: True
+        CLASSIC: [awake, restless, asleep]
+        STAGES: [wake, deep, light, rem]
+        UNIFIED: [awake, asleep]
+      DAY_TYPES: [WEEKEND, WEEK, ALL]
+      LAST_NIGHT_END: 660 # number of minutes after midnight (11:00) 11*60
+      SRC_SCRIPT: src/features/fitbit_sleep_intraday/price/main.py
+
+# See https://www.rapids.science/latest/features/fitbit-steps-summary/
+FITBIT_STEPS_SUMMARY:
+  CONTAINER: steps_summary
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: False
+      FEATURES: ["maxsumsteps", "minsumsteps", "avgsumsteps", "mediansumsteps", "stdsumsteps"]
+      SRC_SCRIPT: src/features/fitbit_steps_summary/rapids/main.py
+
+# See https://www.rapids.science/latest/features/fitbit-steps-intraday/
+FITBIT_STEPS_INTRADAY:
+  CONTAINER: steps_intraday
+  EXCLUDE_SLEEP: # you can exclude step data that was logged during sleep periods
+    TIME_BASED:
+      EXCLUDE: False
+      START_TIME: "23:00"
+      END_TIME: "07:00"
+    FITBIT_BASED:
+      EXCLUDE: False
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: False
+      FEATURES:
+        STEPS: ["sum", "max", "min", "avg", "std", "firststeptime", "laststeptime"]
+        SEDENTARY_BOUT: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration"]
+        ACTIVE_BOUT: ["countepisode", "sumduration", "maxduration", "minduration", "avgduration", "stdduration"]
+      REFERENCE_HOUR: 0
+      THRESHOLD_ACTIVE_BOUT: 10 # steps
+      INCLUDE_ZERO_STEP_ROWS: False
+      SRC_SCRIPT: src/features/fitbit_steps_intraday/rapids/main.py
+
+
+########################################################################################################################
+#                                                 EMPATICA                                                             #
+########################################################################################################################
+
+EMPATICA_DATA_STREAMS:
+  USE: empatica_zip
+  
+  # AVAILABLE:
+  empatica_zip: 
+    FOLDER: data/external/empatica
+
+# Sensors ------
+
+# See https://www.rapids.science/latest/features/empatica-accelerometer/
+EMPATICA_ACCELEROMETER:
+  CONTAINER: ACC
+  PROVIDERS:
+    DBDP:
+      COMPUTE: False
+      FEATURES: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
+      SRC_SCRIPT: src/features/empatica_accelerometer/dbdp/main.py
+    CR:
+      COMPUTE: True
+      FEATURES: ["totalMagnitudeBand", "absoluteMeanBand", "varianceBand"] # Acc features
+      WINDOWS:
+        COMPUTE: True
+        WINDOW_LENGTH: 15 # specify window length in seconds
+        SECOND_ORDER_FEATURES: ['mean', 'median', 'sd', 'nlargest', 'nsmallest', 'count_windows']
+      SRC_SCRIPT: src/features/empatica_accelerometer/cr/main.py
+
+
+# See https://www.rapids.science/latest/features/empatica-heartrate/
+EMPATICA_HEARTRATE:
+  CONTAINER: HR
+  PROVIDERS:
+    DBDP:
+      COMPUTE: False
+      FEATURES: ["maxhr", "minhr", "avghr", "medianhr", "modehr", "stdhr", "diffmaxmodehr", "diffminmodehr", "entropyhr"]
+      SRC_SCRIPT: src/features/empatica_heartrate/dbdp/main.py
+
+# See https://www.rapids.science/latest/features/empatica-temperature/
+EMPATICA_TEMPERATURE:
+  CONTAINER: TEMP
+  PROVIDERS:
+    DBDP:
+      COMPUTE: False
+      FEATURES: ["maxtemp", "mintemp", "avgtemp", "mediantemp", "modetemp", "stdtemp", "diffmaxmodetemp", "diffminmodetemp", "entropytemp"]
+      SRC_SCRIPT: src/features/empatica_temperature/dbdp/main.py
+    CR:
+      COMPUTE: True
+      FEATURES: ["maximum", "minimum", "meanAbsChange", "longestStrikeAboveMean", "longestStrikeBelowMean", 
+                  "stdDev", "median", "meanChange", "sumSquared", "squareSumOfComponent", "sumOfSquareComponents"]
+      WINDOWS:
+        COMPUTE: True
+        WINDOW_LENGTH: 300 # specify window length in seconds
+        SECOND_ORDER_FEATURES: ['mean', 'median', 'sd', 'nlargest', 'nsmallest', 'count_windows']
+      SRC_SCRIPT: src/features/empatica_temperature/cr/main.py
+
+# See https://www.rapids.science/latest/features/empatica-electrodermal-activity/
+EMPATICA_ELECTRODERMAL_ACTIVITY:
+  CONTAINER: EDA
+  PROVIDERS:
+    DBDP:
+      COMPUTE: False
+      FEATURES: ["maxeda", "mineda", "avgeda", "medianeda", "modeeda", "stdeda", "diffmaxmodeeda", "diffminmodeeda", "entropyeda"]
+      SRC_SCRIPT: src/features/empatica_electrodermal_activity/dbdp/main.py
+    CR:
+      COMPUTE: True
+      FEATURES: ['mean', 'std', 'q25', 'q75', 'qd', 'deriv', 'power', 'numPeaks', 'ratePeaks', 'powerPeaks', 'sumPosDeriv', 'propPosDeriv', 'derivTonic', 
+                  'sigTonicDifference', 'freqFeats','maxPeakAmplitudeChangeBefore', 'maxPeakAmplitudeChangeAfter', 'avgPeakAmplitudeChangeBefore', 
+                  'avgPeakAmplitudeChangeAfter', 'avgPeakChangeRatio', 'maxPeakIncreaseTime', 'maxPeakDecreaseTime', 'maxPeakDuration', 'maxPeakChangeRatio',
+                  'avgPeakIncreaseTime', 'avgPeakDecreaseTime', 'avgPeakDuration', 'signalOverallChange', 'changeDuration', 'changeRate', 'significantIncrease', 
+                  'significantDecrease']
+      WINDOWS:
+        COMPUTE: True
+        WINDOW_LENGTH: 60 # specify window length in seconds
+        SECOND_ORDER_FEATURES: ['mean', 'median', 'sd', 'nlargest', 'nsmallest', count_windows, eda_num_peaks_non_zero]
+        IMPUTE_NANS: True
+      SRC_SCRIPT: src/features/empatica_electrodermal_activity/cr/main.py
+
+# See https://www.rapids.science/latest/features/empatica-blood-volume-pulse/
+EMPATICA_BLOOD_VOLUME_PULSE:
+  CONTAINER: BVP
+  PROVIDERS:
+    DBDP:
+      COMPUTE: False
+      FEATURES: ["maxbvp", "minbvp", "avgbvp", "medianbvp", "modebvp", "stdbvp", "diffmaxmodebvp", "diffminmodebvp", "entropybvp"]
+      SRC_SCRIPT: src/features/empatica_blood_volume_pulse/dbdp/main.py
+    CR:
+      COMPUTE: False
+      FEATURES: ['meanHr', 'ibi', 'sdnn', 'sdsd', 'rmssd', 'pnn20', 'pnn50', 'sd', 'sd2', 'sd1/sd2', 'numRR', # Time features
+                  'VLF', 'LF', 'LFnorm', 'HF', 'HFnorm', 'LF/HF', 'fullIntegral'] # Freq features
+      WINDOWS:
+        COMPUTE: True
+        WINDOW_LENGTH: 300 # specify window length in seconds
+        SECOND_ORDER_FEATURES: ['mean', 'median', 'sd', 'nlargest', 'nsmallest', 'count_windows', 'hrv_num_windows_non_nan']
+      SRC_SCRIPT: src/features/empatica_blood_volume_pulse/cr/main.py
+
+# See https://www.rapids.science/latest/features/empatica-inter-beat-interval/
+EMPATICA_INTER_BEAT_INTERVAL:
+  CONTAINER: IBI
+  PROVIDERS:
+    DBDP:
+      COMPUTE: False
+      FEATURES: ["maxibi", "minibi", "avgibi", "medianibi", "modeibi", "stdibi", "diffmaxmodeibi", "diffminmodeibi", "entropyibi"]
+      SRC_SCRIPT: src/features/empatica_inter_beat_interval/dbdp/main.py
+    CR:
+      COMPUTE: True
+      FEATURES: ['meanHr', 'ibi', 'sdnn', 'sdsd', 'rmssd', 'pnn20', 'pnn50', 'sd', 'sd2', 'sd1/sd2', 'numRR', # Time features
+                  'VLF', 'LF', 'LFnorm', 'HF', 'HFnorm', 'LF/HF', 'fullIntegral'] # Freq features            
+      PATCH_WITH_BVP: True
+      WINDOWS:
+        COMPUTE: True
+        WINDOW_LENGTH: 300 # specify window length in seconds
+        SECOND_ORDER_FEATURES: ['mean', 'median', 'sd', 'nlargest', 'nsmallest', 'count_windows', 'hrv_num_windows_non_nan']
+      SRC_SCRIPT: src/features/empatica_inter_beat_interval/cr/main.py
+
+# See https://www.rapids.science/latest/features/empatica-tags/
+EMPATICA_TAGS:
+  CONTAINER: TAGS
+  PROVIDERS:  # None implemented yet
+
+
+########################################################################################################################
+#                                                 PLOTS                                                                #
+########################################################################################################################
+
+# Data quality ------
+
+# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#1-histograms-of-phone-data-yield
+HISTOGRAM_PHONE_DATA_YIELD:
+  PLOT: False
+
+# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#2-heatmaps-of-overall-data-yield
+HEATMAP_PHONE_DATA_YIELD_PER_PARTICIPANT_PER_TIME_SEGMENT:
+  PLOT: False
+  TIME: RELATIVE_TIME # ABSOLUTE_TIME or RELATIVE_TIME
+
+# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#3-heatmap-of-recorded-phone-sensors
+HEATMAP_SENSORS_PER_MINUTE_PER_TIME_SEGMENT:
+  PLOT: False
+
+# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#4-heatmap-of-sensor-row-count
+HEATMAP_SENSOR_ROW_COUNT_PER_TIME_SEGMENT:
+  PLOT: False
+  SENSORS: []
+
+# Features ------
+
+# See https://www.rapids.science/latest/visualizations/feature-visualizations/#1-heatmap-correlation-matrix
+HEATMAP_FEATURE_CORRELATION_MATRIX:
  PLOT: False
  MIN_ROWS_RATIO: 0.5
-  MIN_VALID_HOURS_PER_DAY: *min_valid_hours_per_day
-  MIN_VALID_BINS_PER_HOUR: *min_valid_bins_per_hour
-  PHONE_FEATURES: [accelerometer, activity_recognition, applications_foreground, battery, calls_incoming, calls_missed, calls_outgoing, conversation, light, location_doryab, messages_received, messages_sent, screen]
-  FITBIT_FEATURES: [fitbit_heartrate, fitbit_step, fitbit_sleep]
  CORR_THRESHOLD: 0.1
  CORR_METHOD: "pearson" # choose from {"pearson", "kendall", "spearman"}

-HISTOGRAM_VALID_SENSED_HOURS:
-  PLOT: False
-  MIN_VALID_HOURS_PER_DAY: *min_valid_hours_per_day
-  MIN_VALID_BINS_PER_HOUR: *min_valid_bins_per_hour

-HEATMAP_DAYS_BY_SENSORS:
-  PLOT: False
-  MIN_VALID_HOURS_PER_DAY: *min_valid_hours_per_day
-  MIN_VALID_BINS_PER_HOUR: *min_valid_bins_per_hour
-  EXPECTED_NUM_OF_DAYS: -1
-  DB_TABLES: [accelerometer, applications_foreground, battery, bluetooth, calls, light, locations, messages, screen, wifi, sensor_wifi, plugin_google_activity_recognition, plugin_ios_activity_recognition, plugin_studentlife_audio_android, plugin_studentlife_audio]
+########################################################################################################################
+#                                                    Data Cleaning                                                     #
+########################################################################################################################

-HEATMAP_SENSED_BINS:
-  PLOT: False
-  BIN_SIZE: *bin_size
+ALL_CLEANING_INDIVIDUAL:
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: False
+      IMPUTE_SELECTED_EVENT_FEATURES:
+        COMPUTE: False
+        MIN_DATA_YIELDED_MINUTES_TO_IMPUTE: 0.33
+      COLS_NAN_THRESHOLD: 1 # set to 1 to disable
+      COLS_VAR_THRESHOLD: True
+      ROWS_NAN_THRESHOLD: 1 # set to 1 to disable
+      DATA_YIELD_FEATURE: RATIO_VALID_YIELDED_HOURS # RATIO_VALID_YIELDED_HOURS or RATIO_VALID_YIELDED_MINUTES
+      DATA_YIELD_RATIO_THRESHOLD: 0 # set to 0 to disable
+      DROP_HIGHLY_CORRELATED_FEATURES:
+        COMPUTE: True
+        MIN_OVERLAP_FOR_CORR_THRESHOLD: 0.5
+        CORR_THRESHOLD: 0.95
+      SRC_SCRIPT: src/features/all_cleaning_individual/rapids/main.R
+    STRAW:
+      COMPUTE: True
+      PHONE_DATA_YIELD_FEATURE: RATIO_VALID_YIELDED_MINUTES # RATIO_VALID_YIELDED_HOURS or RATIO_VALID_YIELDED_MINUTES
+      PHONE_DATA_YIELD_RATIO_THRESHOLD: 0.5 # set to 0 to disable
+      EMPATICA_DATA_YIELD_RATIO_THRESHOLD: 0.5 # set to 0 to disable
+      ROWS_NAN_THRESHOLD: 0.33 # set to 1 to disable
+      COLS_NAN_THRESHOLD: 0.9 # set to 1 to remove only columns that contains all (100% of) NaN
+      COLS_VAR_THRESHOLD: True
+      DROP_HIGHLY_CORRELATED_FEATURES:
+        COMPUTE: True
+        MIN_OVERLAP_FOR_CORR_THRESHOLD: 0.5
+        CORR_THRESHOLD: 0.95
+      STANDARDIZATION: True
+      SRC_SCRIPT: src/features/all_cleaning_individual/straw/main.py

-OVERALL_COMPLIANCE_HEATMAP:
-  PLOT: False
-  ONLY_SHOW_VALID_DAYS: False
-  EXPECTED_NUM_OF_DAYS: -1
-  BIN_SIZE: *bin_size
-  MIN_VALID_HOURS_PER_DAY: *min_valid_hours_per_day
-  MIN_VALID_BINS_PER_HOUR: *min_valid_bins_per_hour
+ALL_CLEANING_OVERALL:
+  PROVIDERS:
+    RAPIDS:
+      COMPUTE: False
+      IMPUTE_SELECTED_EVENT_FEATURES:
+        COMPUTE: False
+        MIN_DATA_YIELDED_MINUTES_TO_IMPUTE: 0.33
+      COLS_NAN_THRESHOLD: 1 # set to 1 to disable
+      COLS_VAR_THRESHOLD: True
+      ROWS_NAN_THRESHOLD: 1 # set to 1 to disable
+      DATA_YIELD_FEATURE: RATIO_VALID_YIELDED_HOURS # RATIO_VALID_YIELDED_HOURS or RATIO_VALID_YIELDED_MINUTES
+      DATA_YIELD_RATIO_THRESHOLD: 0 # set to 0 to disable
+      DROP_HIGHLY_CORRELATED_FEATURES:
+        COMPUTE: True
+        MIN_OVERLAP_FOR_CORR_THRESHOLD: 0.5
+        CORR_THRESHOLD: 0.95
+      SRC_SCRIPT: src/features/all_cleaning_overall/rapids/main.R
+    STRAW:
+      COMPUTE: True
+      PHONE_DATA_YIELD_FEATURE: RATIO_VALID_YIELDED_MINUTES # RATIO_VALID_YIELDED_HOURS or RATIO_VALID_YIELDED_MINUTES
+      PHONE_DATA_YIELD_RATIO_THRESHOLD: 0.5 # set to 0 to disable
+      EMPATICA_DATA_YIELD_RATIO_THRESHOLD: 0.5 # set to 0 to disable
+      ROWS_NAN_THRESHOLD: 0.33 # set to 1 to disable
+      COLS_NAN_THRESHOLD: 0.8 # set to 1 to remove only columns that contains all (100% of) NaN
+      COLS_VAR_THRESHOLD: True
+      DROP_HIGHLY_CORRELATED_FEATURES:
+        COMPUTE: True
+        MIN_OVERLAP_FOR_CORR_THRESHOLD: 0.5
+        CORR_THRESHOLD: 0.95
+      STANDARDIZATION: True
+      TARGET_STANDARDIZATION: False
+      SRC_SCRIPT: src/features/all_cleaning_overall/straw/main.py

+
+########################################################################################################################
+#                                                      Baseline                                                        #
+########################################################################################################################
+
+PARAMS_FOR_ANALYSIS:
+  BASELINE:
+    COMPUTE: True
+    FOLDER: data/external/baseline
+    CONTAINER: [results-survey637813_final.csv,  # Slovenia
+                results-survey358134_final.csv,  # Belgium 1
+                results-survey413767_final.csv  # Belgium 2
+    ]
+    QUESTION_LIST: survey637813+question_text.csv
+    FEATURES: [age, gender, startlanguage, limesurvey_demand, limesurvey_control, limesurvey_demand_control_ratio, limesurvey_demand_control_ratio_quartile]
+    CATEGORICAL_FEATURES: [gender]
+
+  TARGET:
+    COMPUTE: True
+    LABEL: appraisal_stressfulness_event_mean
+    ALL_LABELS: [PANAS_positive_affect_mean, PANAS_negative_affect_mean, JCQ_job_demand_mean, JCQ_job_control_mean, JCQ_supervisor_support_mean, JCQ_coworker_support_mean, appraisal_stressfulness_period_mean]
+                # PANAS_positive_affect_mean, PANAS_negative_affect_mean, JCQ_job_demand_mean, JCQ_job_control_mean, JCQ_supervisor_support_mean, 
+                # JCQ_coworker_support_mean, appraisal_stressfulness_period_mean, appraisal_stressfulness_event_mean, appraisal_threat_mean, appraisal_challenge_mean
--- a/data/external/aware_csv/calls.csv
+++ b/data/external/aware_csv/calls.csv
@ -0,0 +1,9 @@
+"_id","timestamp","device_id","call_type","call_duration","trace"
+1,1587663260695,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",2,14,"d5e84f8af01b2728021d4f43f53a163c0c90000c"
+2,1587739118007,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",3,0,"47c125dc7bd163b8612cdea13724a814917b6e93"
+5,1587746544891,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",2,95,"9cc793ffd6e88b1d850ce540b5d7e000ef5650d4"
+6,1587911379859,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",2,63,"51fb9344e988049a3fec774c7ca622358bf80264"
+7,1587992647361,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",3,0,"2a862a7730cfdfaf103a9487afe3e02935fd6e02"
+8,1588020039448,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",1,11,"a2c53f6a086d98622c06107780980cf1bb4e37bd"
+11,1588176189024,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",2,65,"56589df8c830c70e330b644921ed38e08d8fd1f3"
+12,1588197745079,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524",3,0,"cab458018a8ed3b626515e794c70b6f415318adc"
--- a/data/external/empatica/empatica1/E4
+++ b/data/external/empatica/empatica1/E4
--- a/data/external/main_study_usernames.csv
+++ b/data/external/main_study_usernames.csv
@ -0,0 +1,57 @@
+label,empatica_id
+uploader_79170,A0245B
+uploader_89788,A02731
+uploader_68294,A02705
+uploader_92856,A024AF
+uploader_23726,A0231C
+uploader_66620,A02305
+uploader_58435,A026B5
+uploader_87801,A022A8
+uploader_96055,A027BA
+uploader_69549,A0226C
+uploader_26363,A0263D
+uploader_72010,A023FA
+uploader_13997,A024AF
+uploader_31156,A02305
+uploader_63187,A027BA
+uploader_94821,A022A8
+uploader_65413,A023F1;A023FA
+uploader_36488,A02713
+uploader_91087,A0231C
+uploader_35174,A025D1
+uploader_73880,A02705
+uploader_78650,A02731
+uploader_70578,A0245B
+uploader_88313,A02736
+uploader_58482,A0261A
+uploader_80601,A027BA
+uploader_93729,A0226C
+uploader_61663,A0245B
+uploader_80848,A025D1
+uploader_57312,A023F9;A02361;A027A0
+uploader_52087,A02666
+uploader_98770,A02953
+uploader_51327,A0245F
+uploader_11737,A02732
+uploader_77440,A0264E
+uploader_57277,A02422
+uploader_13098,A026E5
+uploader_80719,A023C8
+uploader_54698,A02953
+uploader_95571,A02853
+uploader_21880,A024DC
+uploader_92905,A02920
+uploader_12108,A023F4
+uploader_17436,A026E5
+uploader_58440,A0273F
+uploader_22172,A0245F
+uploader_39250,A02422
+uploader_15311,A023F9
+uploader_45766,A02920
+uploader_23096,A02361
+uploader_78243,A02422
+uploader_58777,A0245F
+uploader_82941,A02666
+uploader_89606,A023F4
+uploader_82969,A023C8
+uploader_53573,A024DC;A02361
--- a/data/external/participant_files/p01.yaml
+++ b/data/external/participant_files/p01.yaml
@ -0,0 +1,11 @@
+PHONE:
+  DEVICE_IDS: [4b62a655-cbf0-4ac0-a448-06726f45b56a]
+  PLATFORMS: [android]
+  LABEL: uploader_53573
+  START_DATE: 2021-05-21 09:21:24
+  END_DATE: 2021-07-12 17:32:07
+EMPATICA:
+  DEVICE_IDS: [uploader_53573]
+  LABEL: uploader_53573
+  START_DATE: 2021-05-21 09:21:24
+  END_DATE: 2021-07-12 17:32:07
--- a/data/external/play_store_application_genre_catalogue.csv
+++ b/data/external/play_store_application_genre_catalogue.csv
--- a/data/external/play_store_categories_count.csv
+++ b/data/external/play_store_categories_count.csv
@ -0,0 +1,45 @@
+genre,n
+System,261
+Tools,96
+Productivity,71
+Health & Fitness,60
+Finance,54
+Communication,39
+Music & Audio,39
+Shopping,38
+Lifestyle,33
+Education,28
+News & Magazines,24
+Maps & Navigation,23
+Entertainment,21
+Business,18
+Travel & Local,18
+Books & Reference,16
+Social,16
+Weather,16
+Food & Drink,14
+Sports,14
+Other,13
+Photography,13
+Puzzle,13
+Video Players & Editors,12
+Card,9
+Casual,9
+Personalization,8
+Medical,7
+Board,5
+Strategy,4
+House & Home,3
+Trivia,3
+Word,3
+Adventure,2
+Art & Design,2
+Auto & Vehicles,2
+Dating,2
+Role Playing,2
+STRAW,2
+Simulation,2
+"Board,Brain Games",1
+"Entertainment,Music & Video",1
+Parenting,1
+Racing,1
--- a/data/external/timesegments_daily.csv
+++ b/data/external/timesegments_daily.csv
@ -0,0 +1,3 @@
+label,start_time,length,repeats_on,repeats_value
+daily,04:00:00,23H 59M 59S,every_day,0
+working_day,04:00:00,18H 00M 00S,every_day,0
--- a/data/external/timesegments_default.csv
+++ b/data/external/timesegments_default.csv
@ -0,0 +1,2 @@
+label,start_time,length
+daily,00:00:00,"23H 59M 59S"
--- a/data/external/timesegments_event.csv
+++ b/data/external/timesegments_event.csv
@ -0,0 +1,14 @@
+label,event_timestamp,length,shift,shift_direction,device_id
+stress,1587661220000,1H,0M,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+stress,1587747620000,4H,4H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+stress,1587906020000,3H,0M,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+stress,1588003220000,7H,4H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+
+stress,1588172420000,9H,0H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+mood,1587661220000,1H,0H,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+mood,1587747620000,1D,0H,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+mood,1587906020000,7D,0H,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+survey1,1587661220000,10H,10H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+survey2,1587661220000,10H,5H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+survey3,1587661220000,10H,0H,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
+
--- a/data/external/timesegments_frequency.csv
+++ b/data/external/timesegments_frequency.csv
@ -0,0 +1,2 @@
+label,length
+fiveminutes,5
--- a/data/external/timesegments_periodic.csv
+++ b/data/external/timesegments_periodic.csv
@ -0,0 +1,2 @@
+label,start_time,length,repeats_on,repeats_value
+daily,00:00:00,23H 59M 59S,every_day,0
--- a/data/external/timezone.csv
+++ b/data/external/timezone.csv
--- a/data/external/wiki_tz.csv
+++ b/data/external/wiki_tz.csv
@ -0,0 +1,595 @@
+Country code,"Latitude, longitude ±DDMM(SS)±DDDMM(SS)",TZ database name,Portion of country covered,Status,UTC offset ±hh:mm,UTC DST offset ±hh:mm,Notes
+CI,+0519−00402,Africa/Abidjan,,Canonical,+00:00,+00:00,
+GH,+0533−00013,Africa/Accra,,Canonical,+00:00,+00:00,
+ET,+0902+03842,Africa/Addis_Ababa,,Alias,+03:00,+03:00,Link to Africa/Nairobi
+DZ,+3647+00303,Africa/Algiers,,Canonical,+01:00,+01:00,
+ER,+1520+03853,Africa/Asmara,,Alias,+03:00,+03:00,Link to Africa/Nairobi
+ER,+1520+03853,Africa/Asmera,,Deprecated,+03:00,+03:00,Link to Africa/Nairobi
+ML,+1239−00800,Africa/Bamako,,Alias,+00:00,+00:00,Link to Africa/Abidjan
+CF,+0422+01835,Africa/Bangui,,Alias,+01:00,+01:00,Link to Africa/Lagos
+GM,+1328−01639,Africa/Banjul,,Alias,+00:00,+00:00,Link to Africa/Abidjan
+GW,+1151−01535,Africa/Bissau,,Canonical,+00:00,+00:00,
+MW,−1547+03500,Africa/Blantyre,,Alias,+02:00,+02:00,Link to Africa/Maputo
+CG,−0416+01517,Africa/Brazzaville,,Alias,+01:00,+01:00,Link to Africa/Lagos
+BI,−0323+02922,Africa/Bujumbura,,Alias,+02:00,+02:00,Link to Africa/Maputo
+EG,+3003+03115,Africa/Cairo,,Canonical,+02:00,+02:00,
+MA,+3339−00735,Africa/Casablanca,,Canonical,+01:00,+00:00,
+ES,+3553−00519,Africa/Ceuta,"Ceuta, Melilla",Canonical,+01:00,+02:00,
+GN,+0931−01343,Africa/Conakry,,Alias,+00:00,+00:00,Link to Africa/Abidjan
+SN,+1440−01726,Africa/Dakar,,Alias,+00:00,+00:00,Link to Africa/Abidjan
+TZ,−0648+03917,Africa/Dar_es_Salaam,,Alias,+03:00,+03:00,Link to Africa/Nairobi
+DJ,+1136+04309,Africa/Djibouti,,Alias,+03:00,+03:00,Link to Africa/Nairobi
+CM,+0403+00942,Africa/Douala,,Alias,+01:00,+01:00,Link to Africa/Lagos
+EH,+2709−01312,Africa/El_Aaiun,,Canonical,+01:00,+00:00,
+SL,+0830−01315,Africa/Freetown,,Alias,+00:00,+00:00,Link to Africa/Abidjan
+BW,−2439+02555,Africa/Gaborone,,Alias,+02:00,+02:00,Link to Africa/Maputo
+ZW,−1750+03103,Africa/Harare,,Alias,+02:00,+02:00,Link to Africa/Maputo
+ZA,−2615+02800,Africa/Johannesburg,,Canonical,+02:00,+02:00,
+SS,+0451+03137,Africa/Juba,,Canonical,+02:00,+02:00,
+UG,+0019+03225,Africa/Kampala,,Alias,+03:00,+03:00,Link to Africa/Nairobi
+SD,+1536+03232,Africa/Khartoum,,Canonical,+02:00,+02:00,
+RW,−0157+03004,Africa/Kigali,,Alias,+02:00,+02:00,Link to Africa/Maputo
+CD,−0418+01518,Africa/Kinshasa,Dem. Rep. of Congo (west),Alias,+01:00,+01:00,Link to Africa/Lagos
+NG,+0627+00324,Africa/Lagos,West Africa Time,Canonical,+01:00,+01:00,
+GA,+0023+00927,Africa/Libreville,,Alias,+01:00,+01:00,Link to Africa/Lagos
+TG,+0608+00113,Africa/Lome,,Alias,+00:00,+00:00,Link to Africa/Abidjan
+AO,−0848+01314,Africa/Luanda,,Alias,+01:00,+01:00,Link to Africa/Lagos
+CD,−1140+02728,Africa/Lubumbashi,Dem. Rep. of Congo (east),Alias,+02:00,+02:00,Link to Africa/Maputo
+ZM,−1525+02817,Africa/Lusaka,,Alias,+02:00,+02:00,Link to Africa/Maputo
+GQ,+0345+00847,Africa/Malabo,,Alias,+01:00,+01:00,Link to Africa/Lagos
+MZ,−2558+03235,Africa/Maputo,Central Africa Time,Canonical,+02:00,+02:00,
+LS,−2928+02730,Africa/Maseru,,Alias,+02:00,+02:00,Link to Africa/Johannesburg
+SZ,−2618+03106,Africa/Mbabane,,Alias,+02:00,+02:00,Link to Africa/Johannesburg
+SO,+0204+04522,Africa/Mogadishu,,Alias,+03:00,+03:00,Link to Africa/Nairobi
+LR,+0618−01047,Africa/Monrovia,,Canonical,+00:00,+00:00,
+KE,−0117+03649,Africa/Nairobi,,Canonical,+03:00,+03:00,
+TD,+1207+01503,Africa/Ndjamena,,Canonical,+01:00,+01:00,
+NE,+1331+00207,Africa/Niamey,,Alias,+01:00,+01:00,Link to Africa/Lagos
+MR,+1806−01557,Africa/Nouakchott,,Alias,+00:00,+00:00,Link to Africa/Abidjan
+BF,+1222−00131,Africa/Ouagadougou,,Alias,+00:00,+00:00,Link to Africa/Abidjan
+BJ,+0629+00237,Africa/Porto-Novo,,Alias,+01:00,+01:00,Link to Africa/Lagos
+ST,+0020+00644,Africa/Sao_Tome,,Canonical,+00:00,+00:00,
+ML,,Africa/Timbuktu,,Deprecated,+00:00,+00:00,Link to Africa/Abidjan
+LY,+3254+01311,Africa/Tripoli,,Canonical,+02:00,+02:00,
+TN,+3648+01011,Africa/Tunis,,Canonical,+01:00,+01:00,
+NA,−2234+01706,Africa/Windhoek,,Canonical,+02:00,+02:00,
+US,+515248−1763929,America/Adak,Aleutian Islands,Canonical,−10:00,−09:00,
+US,+611305−1495401,America/Anchorage,Alaska (most areas),Canonical,−09:00,−08:00,
+AI,+1812−06304,America/Anguilla,,Alias,−04:00,−04:00,Link to America/Port_of_Spain
+AG,+1703−06148,America/Antigua,,Alias,−04:00,−04:00,Link to America/Port_of_Spain
+BR,−0712−04812,America/Araguaina,Tocantins,Canonical,−03:00,−03:00,
+AR,−3436−05827,America/Argentina/Buenos_Aires,"Buenos Aires (BA, CF)",Canonical,−03:00,−03:00,
+AR,−2828−06547,America/Argentina/Catamarca,Catamarca (CT); Chubut (CH),Canonical,−03:00,−03:00,
+AR,,America/Argentina/ComodRivadavia,,Deprecated,−03:00,−03:00,Link to America/Argentina/Catamarca
+AR,−3124−06411,America/Argentina/Cordoba,"Argentina (most areas: CB, CC, CN, ER, FM, MN, SE, SF)",Canonical,−03:00,−03:00,
+AR,−2411−06518,America/Argentina/Jujuy,Jujuy (JY),Canonical,−03:00,−03:00,
+AR,−2926−06651,America/Argentina/La_Rioja,La Rioja (LR),Canonical,−03:00,−03:00,
+AR,−3253−06849,America/Argentina/Mendoza,Mendoza (MZ),Canonical,−03:00,−03:00,
+AR,−5138−06913,America/Argentina/Rio_Gallegos,Santa Cruz (SC),Canonical,−03:00,−03:00,
+AR,−2447−06525,America/Argentina/Salta,"Salta (SA, LP, NQ, RN)",Canonical,−03:00,−03:00,
+AR,−3132−06831,America/Argentina/San_Juan,San Juan (SJ),Canonical,−03:00,−03:00,
+AR,−3319−06621,America/Argentina/San_Luis,San Luis (SL),Canonical,−03:00,−03:00,
+AR,−2649−06513,America/Argentina/Tucuman,Tucumán (TM),Canonical,−03:00,−03:00,
+AR,−5448−06818,America/Argentina/Ushuaia,Tierra del Fuego (TF),Canonical,−03:00,−03:00,
+AW,+1230−06958,America/Aruba,,Alias,−04:00,−04:00,Link to America/Curacao
+PY,−2516−05740,America/Asuncion,,Canonical,−04:00,−03:00,
+CA,+484531−0913718,America/Atikokan,EST - ON (Atikokan); NU (Coral H),Canonical,−05:00,−05:00,
+US,,America/Atka,,Deprecated,−10:00,−09:00,Link to America/Adak
+BR,−1259−03831,America/Bahia,Bahia,Canonical,−03:00,−03:00,
+MX,+2048−10515,America/Bahia_Banderas,Central Time - Bahía de Banderas,Canonical,−06:00,−05:00,
+BB,+1306−05937,America/Barbados,,Canonical,−04:00,−04:00,
+BR,−0127−04829,America/Belem,Pará (east); Amapá,Canonical,−03:00,−03:00,
+BZ,+1730−08812,America/Belize,,Canonical,−06:00,−06:00,
+CA,+5125−05707,America/Blanc-Sablon,AST - QC (Lower North Shore),Canonical,−04:00,−04:00,
+BR,+0249−06040,America/Boa_Vista,Roraima,Canonical,−04:00,−04:00,
+CO,+0436−07405,America/Bogota,,Canonical,−05:00,−05:00,
+US,+433649−1161209,America/Boise,Mountain - ID (south); OR (east),Canonical,−07:00,−06:00,
+AR,−3436−05827,America/Buenos_Aires,,Deprecated,−03:00,−03:00,Link to America/Argentina/Buenos_Aires
+CA,+690650−1050310,America/Cambridge_Bay,Mountain - NU (west),Canonical,−07:00,−06:00,
+BR,−2027−05437,America/Campo_Grande,Mato Grosso do Sul,Canonical,−04:00,−04:00,
+MX,+2105−08646,America/Cancun,Eastern Standard Time - Quintana Roo,Canonical,−05:00,−05:00,
+VE,+1030−06656,America/Caracas,,Canonical,−04:00,−04:00,
+AR,−2828−06547,America/Catamarca,,Deprecated,−03:00,−03:00,Link to America/Argentina/Catamarca
+GF,+0456−05220,America/Cayenne,,Canonical,−03:00,−03:00,
+KY,+1918−08123,America/Cayman,,Alias,−05:00,−05:00,Link to America/Panama
+US,+415100−0873900,America/Chicago,Central (most areas),Canonical,−06:00,−05:00,
+MX,+2838−10605,America/Chihuahua,Mountain Time - Chihuahua (most areas),Canonical,−07:00,−06:00,
+CA,,America/Coral_Harbour,,Deprecated,−05:00,−05:00,Link to America/Atikokan
+AR,−3124−06411,America/Cordoba,,Deprecated,−03:00,−03:00,Link to America/Argentina/Cordoba
+CR,+0956−08405,America/Costa_Rica,,Canonical,−06:00,−06:00,
+CA,+4906−11631,America/Creston,MST - BC (Creston),Canonical,−07:00,−07:00,
+BR,−1535−05605,America/Cuiaba,Mato Grosso,Canonical,−04:00,−04:00,
+CW,+1211−06900,America/Curacao,,Canonical,−04:00,−04:00,
+GL,+7646−01840,America/Danmarkshavn,National Park (east coast),Canonical,+00:00,+00:00,
+CA,+6404−13925,America/Dawson,MST - Yukon (west),Canonical,−07:00,−07:00,
+CA,+5946−12014,America/Dawson_Creek,"MST - BC (Dawson Cr, Ft St John)",Canonical,−07:00,−07:00,
+US,+394421−1045903,America/Denver,Mountain (most areas),Canonical,−07:00,−06:00,
+US,+421953−0830245,America/Detroit,Eastern - MI (most areas),Canonical,−05:00,−04:00,
+DM,+1518−06124,America/Dominica,,Alias,−04:00,−04:00,Link to America/Port_of_Spain
+CA,+5333−11328,America/Edmonton,Mountain - AB; BC (E); SK (W),Canonical,−07:00,−06:00,
+BR,−0640−06952,America/Eirunepe,Amazonas (west),Canonical,−05:00,−05:00,
+SV,+1342−08912,America/El_Salvador,,Canonical,−06:00,−06:00,
+MX,,America/Ensenada,,Deprecated,−08:00,−07:00,Link to America/Tijuana
+CA,+5848−12242,America/Fort_Nelson,MST - BC (Ft Nelson),Canonical,−07:00,−07:00,
+US,,America/Fort_Wayne,,Deprecated,−05:00,−04:00,Link to America/Indiana/Indianapolis
+BR,−0343−03830,America/Fortaleza,"Brazil (northeast: MA, PI, CE, RN, PB)",Canonical,−03:00,−03:00,
+CA,+4612−05957,America/Glace_Bay,Atlantic - NS (Cape Breton),Canonical,−04:00,−03:00,
+GL,+6411−05144,America/Godthab,,Deprecated,−03:00,−02:00,Link to America/Nuuk
+CA,+5320−06025,America/Goose_Bay,Atlantic - Labrador (most areas),Canonical,−04:00,−03:00,
+TC,+2128−07108,America/Grand_Turk,,Canonical,−05:00,−04:00,
+GD,+1203−06145,America/Grenada,,Alias,−04:00,−04:00,Link to America/Port_of_Spain
+GP,+1614−06132,America/Guadeloupe,,Alias,−04:00,−04:00,Link to America/Port_of_Spain
+GT,+1438−09031,America/Guatemala,,Canonical,−06:00,−06:00,
+EC,−0210−07950,America/Guayaquil,Ecuador (mainland),Canonical,−05:00,−05:00,
+GY,+0648−05810,America/Guyana,,Canonical,−04:00,−04:00,
+CA,+4439−06336,America/Halifax,Atlantic - NS (most areas); PE,Canonical,−04:00,−03:00,
+CU,+2308−08222,America/Havana,,Canonical,−05:00,−04:00,
+MX,+2904−11058,America/Hermosillo,Mountain Standard Time - Sonora,Canonical,−07:00,−07:00,
+US,+394606−0860929,America/Indiana/Indianapolis,Eastern - IN (most areas),Canonical,−05:00,−04:00,
+US,+411745−0863730,America/Indiana/Knox,Central - IN (Starke),Canonical,−06:00,−05:00,
+US,+382232−0862041,America/Indiana/Marengo,Eastern - IN (Crawford),Canonical,−05:00,−04:00,
+US,+382931−0871643,America/Indiana/Petersburg,Eastern - IN (Pike),Canonical,−05:00,−04:00,
+US,+375711−0864541,America/Indiana/Tell_City,Central - IN (Perry),Canonical,−06:00,−05:00,
+US,+384452−0850402,America/Indiana/Vevay,Eastern - IN (Switzerland),Canonical,−05:00,−04:00,
+US,+384038−0873143,America/Indiana/Vincennes,"Eastern - IN (Da, Du, K, Mn)",Canonical,−05:00,−04:00,
+US,+410305−0863611,America/Indiana/Winamac,Eastern - IN (Pulaski),Canonical,−05:00,−04:00,
+US,+394606−0860929,America/Indianapolis,,Deprecated,−05:00,−04:00,Link to America/Indiana/Indianapolis
+CA,+682059−1334300,America/Inuvik,Mountain - NT (west),Canonical,−07:00,−06:00,
+CA,+6344−06828,America/Iqaluit,Eastern - NU (most east areas),Canonical,−05:00,−04:00,
+JM,+175805−0764736,America/Jamaica,,Canonical,−05:00,−05:00,
+AR,−2411−06518,America/Jujuy,,Deprecated,−03:00,−03:00,Link to America/Argentina/Jujuy
+US,+581807−1342511,America/Juneau,Alaska - Juneau area,Canonical,−09:00,−08:00,
+US,+381515−0854534,America/Kentucky/Louisville,Eastern - KY (Louisville area),Canonical,−05:00,−04:00,
+US,+364947−0845057,America/Kentucky/Monticello,Eastern - KY (Wayne),Canonical,−05:00,−04:00,
+US,+411745−0863730,America/Knox_IN,,Deprecated,−06:00,−05:00,Link to America/Indiana/Knox
+BQ,+120903−0681636,America/Kralendijk,,Alias,−04:00,−04:00,Link to America/Curacao
+BO,−1630−06809,America/La_Paz,,Canonical,−04:00,−04:00,
+PE,−1203−07703,America/Lima,,Canonical,−05:00,−05:00,
+US,+340308−1181434,America/Los_Angeles,Pacific,Canonical,−08:00,−07:00,
+US,+381515−0854534,America/Louisville,,Deprecated,−05:00,−04:00,Link to America/Kentucky/Louisville
+SX,+180305−0630250,America/Lower_Princes,,Alias,−04:00,−04:00,Link to America/Curacao
+BR,−0940−03543,America/Maceio,"Alagoas, Sergipe",Canonical,−03:00,−03:00,
+NI,+1209−08617,America/Managua,,Canonical,−06:00,−06:00,
+BR,−0308−06001,America/Manaus,Amazonas (east),Canonical,−04:00,−04:00,
+MF,+1804−06305,America/Marigot,,Alias,−04:00,−04:00,Link to America/Port_of_Spain
+MQ,+1436−06105,America/Martinique,,Canonical,−04:00,−04:00,
+MX,+2550−09730,America/Matamoros,"Central Time US - Coahuila, Nuevo León, Tamaulipas (US border)",Canonical,−06:00,−05:00,
+MX,+2313−10625,America/Mazatlan,"Mountain Time - Baja California Sur, Nayarit, Sinaloa",Canonical,−07:00,−06:00,
+AR,−3253−06849,America/Mendoza,,Deprecated,−03:00,−03:00,Link to America/Argentina/Mendoza
+US,+450628−0873651,America/Menominee,Central - MI (Wisconsin border),Canonical,−06:00,−05:00,
+MX,+2058−08937,America/Merida,"Central Time - Campeche, Yucatán",Canonical,−06:00,−05:00,
+US,+550737−1313435,America/Metlakatla,Alaska - Annette Island,Canonical,−09:00,−08:00,
+MX,+1924−09909,America/Mexico_City,Central Time,Canonical,−06:00,−05:00,
+PM,+4703−05620,America/Miquelon,,Canonical,−03:00,−02:00,
+CA,+4606−06447,America/Moncton,Atlantic - New Brunswick,Canonical,−04:00,−03:00,
+MX,+2540−10019,America/Monterrey,"Central Time - Durango; Coahuila, Nuevo León, Tamaulipas (most areas)",Canonical,−06:00,−05:00,
+UY,−345433−0561245,America/Montevideo,,Canonical,−03:00,−03:00,
+CA,,America/Montreal,,Deprecated,−05:00,−04:00,Link to America/Toronto
+MS,+1643−06213,America/Montserrat,,Alias,−04:00,−04:00,Link to America/Port_of_Spain
+BS,+2505−07721,America/Nassau,,Canonical,−05:00,−04:00,
+US,+404251−0740023,America/New_York,Eastern (most areas),Canonical,−05:00,−04:00,
+CA,+4901−08816,America/Nipigon,"Eastern - ON, QC (no DST 1967-73)",Canonical,−05:00,−04:00,
+US,+643004−1652423,America/Nome,Alaska (west),Canonical,−09:00,−08:00,
+BR,−0351−03225,America/Noronha,Atlantic islands,Canonical,−02:00,−02:00,
+US,+471551−1014640,America/North_Dakota/Beulah,Central - ND (Mercer),Canonical,−06:00,−05:00,
+US,+470659−1011757,America/North_Dakota/Center,Central - ND (Oliver),Canonical,−06:00,−05:00,
+US,+465042−1012439,America/North_Dakota/New_Salem,Central - ND (Morton rural),Canonical,−06:00,−05:00,
+GL,+6411−05144,America/Nuuk,Greenland (most areas),Canonical,−03:00,−02:00,
+MX,+2934−10425,America/Ojinaga,Mountain Time US - Chihuahua (US border),Canonical,−07:00,−06:00,
+PA,+0858−07932,America/Panama,,Canonical,−05:00,−05:00,
+CA,+6608−06544,America/Pangnirtung,Eastern - NU (Pangnirtung),Canonical,−05:00,−04:00,
+SR,+0550−05510,America/Paramaribo,,Canonical,−03:00,−03:00,
+US,+332654−1120424,America/Phoenix,MST - Arizona (except Navajo),Canonical,−07:00,−07:00,
+HT,+1832−07220,America/Port-au-Prince,,Canonical,−05:00,−04:00,
+TT,+1039−06131,America/Port_of_Spain,,Canonical,−04:00,−04:00,
+BR,,America/Porto_Acre,,Deprecated,−05:00,−05:00,Link to America/Rio_Branco
+BR,−0846−06354,America/Porto_Velho,Rondônia,Canonical,−04:00,−04:00,
+PR,+182806−0660622,America/Puerto_Rico,,Canonical,−04:00,−04:00,
+CL,−5309−07055,America/Punta_Arenas,Region of Magallanes,Canonical,−03:00,−03:00,Magallanes Region
+CA,+4843−09434,America/Rainy_River,"Central - ON (Rainy R, Ft Frances)",Canonical,−06:00,−05:00,
+CA,+624900−0920459,America/Rankin_Inlet,Central - NU (central),Canonical,−06:00,−05:00,
+BR,−0803−03454,America/Recife,Pernambuco,Canonical,−03:00,−03:00,
+CA,+5024−10439,America/Regina,CST - SK (most areas),Canonical,−06:00,−06:00,
+CA,+744144−0944945,America/Resolute,Central - NU (Resolute),Canonical,−06:00,−05:00,
+BR,−0958−06748,America/Rio_Branco,Acre,Canonical,−05:00,−05:00,
+AR,,America/Rosario,,Deprecated,−03:00,−03:00,Link to America/Argentina/Cordoba
+MX,,America/Santa_Isabel,,Deprecated,−08:00,−07:00,Link to America/Tijuana
+BR,−0226−05452,America/Santarem,Pará (west),Canonical,−03:00,−03:00,
+CL,−3327−07040,America/Santiago,Chile (most areas),Canonical,−04:00,−03:00,
+DO,+1828−06954,America/Santo_Domingo,,Canonical,−04:00,−04:00,
+BR,−2332−04637,America/Sao_Paulo,"Brazil (southeast: GO, DF, MG, ES, RJ, SP, PR, SC, RS)",Canonical,−03:00,−03:00,
+GL,+7029−02158,America/Scoresbysund,Scoresbysund/Ittoqqortoormiit,Canonical,−01:00,+00:00,
+US,,America/Shiprock,,Deprecated,−07:00,−06:00,Link to America/Denver
+US,+571035−1351807,America/Sitka,Alaska - Sitka area,Canonical,−09:00,−08:00,
+BL,+1753−06251,America/St_Barthelemy,,Alias,−04:00,−04:00,Link to America/Port_of_Spain
+CA,+4734−05243,America/St_Johns,Newfoundland; Labrador (southeast),Canonical,−03:30,−02:30,
+KN,+1718−06243,America/St_Kitts,,Alias,−04:00,−04:00,Link to America/Port_of_Spain
+LC,+1401−06100,America/St_Lucia,,Alias,−04:00,−04:00,Link to America/Port_of_Spain
+VI,+1821−06456,America/St_Thomas,,Alias,−04:00,−04:00,Link to America/Port_of_Spain
+VC,+1309−06114,America/St_Vincent,,Alias,−04:00,−04:00,Link to America/Port_of_Spain
+CA,+5017−10750,America/Swift_Current,CST - SK (midwest),Canonical,−06:00,−06:00,
+HN,+1406−08713,America/Tegucigalpa,,Canonical,−06:00,−06:00,
+GL,+7634−06847,America/Thule,Thule/Pituffik,Canonical,−04:00,−03:00,
+CA,+4823−08915,America/Thunder_Bay,Eastern - ON (Thunder Bay),Canonical,−05:00,−04:00,
+MX,+3232−11701,America/Tijuana,Pacific Time US - Baja California,Canonical,−08:00,−07:00,
+CA,+4339−07923,America/Toronto,"Eastern - ON, QC (most areas)",Canonical,−05:00,−04:00,
+VG,+1827−06437,America/Tortola,,Alias,−04:00,−04:00,Link to America/Port_of_Spain
+CA,+4916−12307,America/Vancouver,Pacific - BC (most areas),Canonical,−08:00,−07:00,
+VI,,America/Virgin,,Deprecated,−04:00,−04:00,Link to America/Port_of_Spain
+CA,+6043−13503,America/Whitehorse,MST - Yukon (east),Canonical,−07:00,−07:00,
+CA,+4953−09709,America/Winnipeg,Central - ON (west); Manitoba,Canonical,−06:00,−05:00,
+US,+593249−1394338,America/Yakutat,Alaska - Yakutat,Canonical,−09:00,−08:00,
+CA,+6227−11421,America/Yellowknife,Mountain - NT (central),Canonical,−07:00,−06:00,
+AQ,−6617+11031,Antarctica/Casey,Casey,Canonical,+11:00,+11:00,
+AQ,−6835+07758,Antarctica/Davis,Davis,Canonical,+07:00,+07:00,
+AQ,−6640+14001,Antarctica/DumontDUrville,Dumont-d'Urville,Canonical,+10:00,+10:00,
+AU,−5430+15857,Antarctica/Macquarie,Macquarie Island,Canonical,+10:00,+11:00,
+AQ,−6736+06253,Antarctica/Mawson,Mawson,Canonical,+05:00,+05:00,
+AQ,−7750+16636,Antarctica/McMurdo,"New Zealand time - McMurdo, South Pole",Alias,+12:00,+13:00,Link to Pacific/Auckland
+AQ,−6448−06406,Antarctica/Palmer,Palmer,Canonical,−03:00,−03:00,Chilean Antarctica Region
+AQ,−6734−06808,Antarctica/Rothera,Rothera,Canonical,−03:00,−03:00,
+AQ,,Antarctica/South_Pole,,Deprecated,+12:00,+13:00,Link to Pacific/Auckland
+AQ,−690022+0393524,Antarctica/Syowa,Syowa,Canonical,+03:00,+03:00,
+AQ,−720041+0023206,Antarctica/Troll,Troll,Canonical,+00:00,+02:00,Previously used +01:00 for a brief period between standard and daylight time.[2]
+AQ,−7824+10654,Antarctica/Vostok,Vostok,Canonical,+06:00,+06:00,
+SJ,+7800+01600,Arctic/Longyearbyen,,Alias,+01:00,+02:00,Link to Europe/Oslo
+YE,+1245+04512,Asia/Aden,,Alias,+03:00,+03:00,Link to Asia/Riyadh
+KZ,+4315+07657,Asia/Almaty,Kazakhstan (most areas),Canonical,+06:00,+06:00,
+JO,+3157+03556,Asia/Amman,,Canonical,+02:00,+03:00,
+RU,+6445+17729,Asia/Anadyr,MSK+09 - Bering Sea,Canonical,+12:00,+12:00,
+KZ,+4431+05016,Asia/Aqtau,Mangghystaū/Mankistau,Canonical,+05:00,+05:00,
+KZ,+5017+05710,Asia/Aqtobe,Aqtöbe/Aktobe,Canonical,+05:00,+05:00,
+TM,+3757+05823,Asia/Ashgabat,,Canonical,+05:00,+05:00,
+TM,+3757+05823,Asia/Ashkhabad,,Deprecated,+05:00,+05:00,Link to Asia/Ashgabat
+KZ,+4707+05156,Asia/Atyrau,Atyraū/Atirau/Gur'yev,Canonical,+05:00,+05:00,
+IQ,+3321+04425,Asia/Baghdad,,Canonical,+03:00,+03:00,
+BH,+2623+05035,Asia/Bahrain,,Alias,+03:00,+03:00,Link to Asia/Qatar
+AZ,+4023+04951,Asia/Baku,,Canonical,+04:00,+04:00,
+TH,+1345+10031,Asia/Bangkok,Indochina (most areas),Canonical,+07:00,+07:00,
+RU,+5322+08345,Asia/Barnaul,MSK+04 - Altai,Canonical,+07:00,+07:00,
+LB,+3353+03530,Asia/Beirut,,Canonical,+02:00,+03:00,
+KG,+4254+07436,Asia/Bishkek,,Canonical,+06:00,+06:00,
+BN,+0456+11455,Asia/Brunei,,Canonical,+08:00,+08:00,
+IN,+2232+08822,Asia/Calcutta,,Deprecated,+05:30,+05:30,Link to Asia/Kolkata
+RU,+5203+11328,Asia/Chita,MSK+06 - Zabaykalsky,Canonical,+09:00,+09:00,
+MN,+4804+11430,Asia/Choibalsan,"Dornod, Sükhbaatar",Canonical,+08:00,+08:00,
+CN,,Asia/Chongqing,,Deprecated,+08:00,+08:00,Link to Asia/Shanghai
+CN,,Asia/Chungking,,Deprecated,+08:00,+08:00,Link to Asia/Shanghai
+LK,+0656+07951,Asia/Colombo,,Canonical,+05:30,+05:30,
+BD,+2343+09025,Asia/Dacca,,Deprecated,+06:00,+06:00,Link to Asia/Dhaka
+SY,+3330+03618,Asia/Damascus,,Canonical,+02:00,+03:00,
+BD,+2343+09025,Asia/Dhaka,,Canonical,+06:00,+06:00,
+TL,−0833+12535,Asia/Dili,,Canonical,+09:00,+09:00,
+AE,+2518+05518,Asia/Dubai,,Canonical,+04:00,+04:00,
+TJ,+3835+06848,Asia/Dushanbe,,Canonical,+05:00,+05:00,
+CY,+3507+03357,Asia/Famagusta,Northern Cyprus,Canonical,+02:00,+03:00,
+PS,+3130+03428,Asia/Gaza,Gaza Strip,Canonical,+02:00,+03:00,
+CN,,Asia/Harbin,,Deprecated,+08:00,+08:00,Link to Asia/Shanghai
+PS,+313200+0350542,Asia/Hebron,West Bank,Canonical,+02:00,+03:00,
+VN,+1045+10640,Asia/Ho_Chi_Minh,Vietnam (south),Canonical,+07:00,+07:00,
+HK,+2217+11409,Asia/Hong_Kong,,Canonical,+08:00,+08:00,
+MN,+4801+09139,Asia/Hovd,"Bayan-Ölgii, Govi-Altai, Hovd, Uvs, Zavkhan",Canonical,+07:00,+07:00,
+RU,+5216+10420,Asia/Irkutsk,"MSK+05 - Irkutsk, Buryatia",Canonical,+08:00,+08:00,
+TR,+4101+02858,Asia/Istanbul,,Alias,+03:00,+03:00,Link to Europe/Istanbul
+ID,−0610+10648,Asia/Jakarta,"Java, Sumatra",Canonical,+07:00,+07:00,
+ID,−0232+14042,Asia/Jayapura,New Guinea (West Papua / Irian Jaya); Malukus/Moluccas,Canonical,+09:00,+09:00,
+IL,+314650+0351326,Asia/Jerusalem,,Canonical,+02:00,+03:00,
+AF,+3431+06912,Asia/Kabul,,Canonical,+04:30,+04:30,
+RU,+5301+15839,Asia/Kamchatka,MSK+09 - Kamchatka,Canonical,+12:00,+12:00,
+PK,+2452+06703,Asia/Karachi,,Canonical,+05:00,+05:00,
+CN,,Asia/Kashgar,,Deprecated,+06:00,+06:00,Link to Asia/Urumqi[note 1]
+NP,+2743+08519,Asia/Kathmandu,,Canonical,+05:45,+05:45,
+NP,+2743+08519,Asia/Katmandu,,Deprecated,+05:45,+05:45,Link to Asia/Kathmandu
+RU,+623923+1353314,Asia/Khandyga,"MSK+06 - Tomponsky, Ust-Maysky",Canonical,+09:00,+09:00,
+IN,+2232+08822,Asia/Kolkata,,Canonical,+05:30,+05:30,"Note: Different zones in history, see Time in India."
+RU,+5601+09250,Asia/Krasnoyarsk,MSK+04 - Krasnoyarsk area,Canonical,+07:00,+07:00,
+MY,+0310+10142,Asia/Kuala_Lumpur,Malaysia (peninsula),Canonical,+08:00,+08:00,
+MY,+0133+11020,Asia/Kuching,"Sabah, Sarawak",Canonical,+08:00,+08:00,
+KW,+2920+04759,Asia/Kuwait,,Alias,+03:00,+03:00,Link to Asia/Riyadh
+MO,+221150+1133230,Asia/Macao,,Deprecated,+08:00,+08:00,Link to Asia/Macau
+MO,+221150+1133230,Asia/Macau,,Canonical,+08:00,+08:00,
+RU,+5934+15048,Asia/Magadan,MSK+08 - Magadan,Canonical,+11:00,+11:00,
+ID,−0507+11924,Asia/Makassar,"Borneo (east, south); Sulawesi/Celebes, Bali, Nusa Tengarra; Timor (west)",Canonical,+08:00,+08:00,
+PH,+1435+12100,Asia/Manila,,Canonical,+08:00,+08:00,
+OM,+2336+05835,Asia/Muscat,,Alias,+04:00,+04:00,Link to Asia/Dubai
+CY,+3510+03322,Asia/Nicosia,Cyprus (most areas),Canonical,+02:00,+03:00,
+RU,+5345+08707,Asia/Novokuznetsk,MSK+04 - Kemerovo,Canonical,+07:00,+07:00,
+RU,+5502+08255,Asia/Novosibirsk,MSK+04 - Novosibirsk,Canonical,+07:00,+07:00,
+RU,+5500+07324,Asia/Omsk,MSK+03 - Omsk,Canonical,+06:00,+06:00,
+KZ,+5113+05121,Asia/Oral,West Kazakhstan,Canonical,+05:00,+05:00,
+KH,+1133+10455,Asia/Phnom_Penh,,Alias,+07:00,+07:00,Link to Asia/Bangkok
+ID,−0002+10920,Asia/Pontianak,"Borneo (west, central)",Canonical,+07:00,+07:00,
+KP,+3901+12545,Asia/Pyongyang,,Canonical,+09:00,+09:00,
+QA,+2517+05132,Asia/Qatar,,Canonical,+03:00,+03:00,
+KZ,+5312+06337,Asia/Qostanay,Qostanay/Kostanay/Kustanay,Canonical,+06:00,+06:00,
+KZ,+4448+06528,Asia/Qyzylorda,Qyzylorda/Kyzylorda/Kzyl-Orda,Canonical,+05:00,+05:00,
+MM,,Asia/Rangoon,,Deprecated,+06:30,+06:30,Link to Asia/Yangon
+SA,+2438+04643,Asia/Riyadh,,Canonical,+03:00,+03:00,
+VN,,Asia/Saigon,,Deprecated,+07:00,+07:00,Link to Asia/Ho_Chi_Minh
+RU,+4658+14242,Asia/Sakhalin,MSK+08 - Sakhalin Island,Canonical,+11:00,+11:00,
+UZ,+3940+06648,Asia/Samarkand,Uzbekistan (west),Canonical,+05:00,+05:00,
+KR,+3733+12658,Asia/Seoul,,Canonical,+09:00,+09:00,
+CN,+3114+12128,Asia/Shanghai,Beijing Time,Canonical,+08:00,+08:00,
+SG,+0117+10351,Asia/Singapore,,Canonical,+08:00,+08:00,
+RU,+6728+15343,Asia/Srednekolymsk,MSK+08 - Sakha (E); North Kuril Is,Canonical,+11:00,+11:00,
+TW,+2503+12130,Asia/Taipei,,Canonical,+08:00,+08:00,
+UZ,+4120+06918,Asia/Tashkent,Uzbekistan (east),Canonical,+05:00,+05:00,
+GE,+4143+04449,Asia/Tbilisi,,Canonical,+04:00,+04:00,
+IR,+3540+05126,Asia/Tehran,,Canonical,+03:30,+04:30,
+IL,,Asia/Tel_Aviv,,Deprecated,+02:00,+03:00,Link to Asia/Jerusalem
+BT,+2728+08939,Asia/Thimbu,,Deprecated,+06:00,+06:00,Link to Asia/Thimphu
+BT,+2728+08939,Asia/Thimphu,,Canonical,+06:00,+06:00,
+JP,+353916+1394441,Asia/Tokyo,,Canonical,+09:00,+09:00,
+RU,+5630+08458,Asia/Tomsk,MSK+04 - Tomsk,Canonical,+07:00,+07:00,
+ID,,Asia/Ujung_Pandang,,Deprecated,+08:00,+08:00,Link to Asia/Makassar
+MN,+4755+10653,Asia/Ulaanbaatar,Mongolia (most areas),Canonical,+08:00,+08:00,
+MN,,Asia/Ulan_Bator,,Deprecated,+08:00,+08:00,Link to Asia/Ulaanbaatar
+CN,+4348+08735,Asia/Urumqi,Xinjiang Time,Canonical,+06:00,+06:00,The Asia/Urumqi entry in the tz database reflected the use of Xinjiang Time by part of the local population. Consider using Asia/Shanghai for Beijing Time if that is preferred.
+RU,+643337+1431336,Asia/Ust-Nera,MSK+07 - Oymyakonsky,Canonical,+10:00,+10:00,
+LA,+1758+10236,Asia/Vientiane,,Alias,+07:00,+07:00,Link to Asia/Bangkok
+RU,+4310+13156,Asia/Vladivostok,MSK+07 - Amur River,Canonical,+10:00,+10:00,
+RU,+6200+12940,Asia/Yakutsk,MSK+06 - Lena River,Canonical,+09:00,+09:00,
+MM,+1647+09610,Asia/Yangon,,Canonical,+06:30,+06:30,
+RU,+5651+06036,Asia/Yekaterinburg,MSK+02 - Urals,Canonical,+05:00,+05:00,
+AM,+4011+04430,Asia/Yerevan,,Canonical,+04:00,+04:00,
+PT,+3744−02540,Atlantic/Azores,Azores,Canonical,−01:00,+00:00,
+BM,+3217−06446,Atlantic/Bermuda,,Canonical,−04:00,−03:00,
+ES,+2806−01524,Atlantic/Canary,Canary Islands,Canonical,+00:00,+01:00,
+CV,+1455−02331,Atlantic/Cape_Verde,,Canonical,−01:00,−01:00,
+FO,+6201−00646,Atlantic/Faeroe,,Deprecated,+00:00,+01:00,Link to Atlantic/Faroe
+FO,+6201−00646,Atlantic/Faroe,,Canonical,+00:00,+01:00,
+SJ,,Atlantic/Jan_Mayen,,Deprecated,+01:00,+02:00,Link to Europe/Oslo
+PT,+3238−01654,Atlantic/Madeira,Madeira Islands,Canonical,+00:00,+01:00,
+IS,+6409−02151,Atlantic/Reykjavik,,Canonical,+00:00,+00:00,
+GS,−5416−03632,Atlantic/South_Georgia,,Canonical,−02:00,−02:00,
+SH,−1555−00542,Atlantic/St_Helena,,Alias,+00:00,+00:00,Link to Africa/Abidjan
+FK,−5142−05751,Atlantic/Stanley,,Canonical,−03:00,−03:00,
+AU,,Australia/ACT,,Deprecated,+10:00,+11:00,Link to Australia/Sydney
+AU,−3455+13835,Australia/Adelaide,South Australia,Canonical,+09:30,+10:30,
+AU,−2728+15302,Australia/Brisbane,Queensland (most areas),Canonical,+10:00,+10:00,
+AU,−3157+14127,Australia/Broken_Hill,New South Wales (Yancowinna),Canonical,+09:30,+10:30,
+AU,,Australia/Canberra,,Deprecated,+10:00,+11:00,Link to Australia/Sydney
+AU,,Australia/Currie,,Deprecated,+10:00,+11:00,Link to Australia/Hobart
+AU,−1228+13050,Australia/Darwin,Northern Territory,Canonical,+09:30,+09:30,
+AU,−3143+12852,Australia/Eucla,Western Australia (Eucla),Canonical,+08:45,+08:45,
+AU,−4253+14719,Australia/Hobart,Tasmania,Canonical,+10:00,+11:00,
+AU,,Australia/LHI,,Deprecated,+10:30,+11:00,Link to Australia/Lord_Howe
+AU,−2016+14900,Australia/Lindeman,Queensland (Whitsunday Islands),Canonical,+10:00,+10:00,
+AU,−3133+15905,Australia/Lord_Howe,Lord Howe Island,Canonical,+10:30,+11:00,This is the only time zone in the world that uses 30-minute DST transitions.
+AU,−3749+14458,Australia/Melbourne,Victoria,Canonical,+10:00,+11:00,
+AU,,Australia/North,,Deprecated,+09:30,+09:30,Link to Australia/Darwin
+AU,,Australia/NSW,,Deprecated,+10:00,+11:00,Link to Australia/Sydney
+AU,−3157+11551,Australia/Perth,Western Australia (most areas),Canonical,+08:00,+08:00,
+AU,,Australia/Queensland,,Deprecated,+10:00,+10:00,Link to Australia/Brisbane
+AU,,Australia/South,,Deprecated,+09:30,+10:30,Link to Australia/Adelaide
+AU,−3352+15113,Australia/Sydney,New South Wales (most areas),Canonical,+10:00,+11:00,
+AU,,Australia/Tasmania,,Deprecated,+10:00,+11:00,Link to Australia/Hobart
+AU,,Australia/Victoria,,Deprecated,+10:00,+11:00,Link to Australia/Melbourne
+AU,,Australia/West,,Deprecated,+08:00,+08:00,Link to Australia/Perth
+AU,,Australia/Yancowinna,,Deprecated,+09:30,+10:30,Link to Australia/Broken_Hill
+BR,,Brazil/Acre,,Deprecated,−05:00,−05:00,Link to America/Rio_Branco
+BR,,Brazil/DeNoronha,,Deprecated,−02:00,−02:00,Link to America/Noronha
+BR,,Brazil/East,,Deprecated,−03:00,−03:00,Link to America/Sao_Paulo
+BR,,Brazil/West,,Deprecated,−04:00,−04:00,Link to America/Manaus
+CA,,Canada/Atlantic,,Deprecated,−04:00,−03:00,Link to America/Halifax
+CA,,Canada/Central,,Deprecated,−06:00,−05:00,Link to America/Winnipeg
+CA,,Canada/Eastern,,Deprecated,−05:00,−04:00,Link to America/Toronto
+CA,,Canada/Mountain,,Deprecated,−07:00,−06:00,Link to America/Edmonton
+CA,,Canada/Newfoundland,,Deprecated,−03:30,−02:30,Link to America/St_Johns
+CA,,Canada/Pacific,,Deprecated,−08:00,−07:00,Link to America/Vancouver
+CA,,Canada/Saskatchewan,,Deprecated,−06:00,−06:00,Link to America/Regina
+CA,,Canada/Yukon,,Deprecated,−07:00,−07:00,Link to America/Whitehorse
+,,CET,,Deprecated,+01:00,+02:00,"Choose a zone that observes CET, such as Europe/Paris."
+CL,,Chile/Continental,,Deprecated,−04:00,−03:00,Link to America/Santiago
+CL,,Chile/EasterIsland,,Deprecated,−06:00,−05:00,Link to Pacific/Easter
+,,CST6CDT,,Deprecated,−06:00,−05:00,"Choose a zone that observes CST with United States daylight saving time rules, such as America/Chicago."
+CU,,Cuba,,Deprecated,−05:00,−04:00,Link to America/Havana
+,,EET,,Deprecated,+02:00,+03:00,"Choose a zone that observes EET, such as Europe/Sofia."
+EG,,Egypt,,Deprecated,+02:00,+02:00,Link to Africa/Cairo
+IE,,Eire,,Deprecated,+01:00,+00:00,Link to Europe/Dublin
+,,EST,,Deprecated,−05:00,−05:00,"Choose a zone that currently observes EST without daylight saving time, such as America/Cancun."
+,,EST5EDT,,Deprecated,−05:00,−04:00,"Choose a zone that observes EST with United States daylight saving time rules, such as America/New_York."
+,,Etc/GMT,,Canonical,+00:00,+00:00,
+,,Etc/GMT+0,,Alias,+00:00,+00:00,Link to Etc/GMT
+,,Etc/GMT+1,,Canonical,−01:00,−01:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT+10,,Canonical,−10:00,−10:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT+11,,Canonical,−11:00,−11:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT+12,,Canonical,−12:00,−12:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT+2,,Canonical,−02:00,−02:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT+3,,Canonical,−03:00,−03:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT+4,,Canonical,−04:00,−04:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT+5,,Canonical,−05:00,−05:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT+6,,Canonical,−06:00,−06:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT+7,,Canonical,−07:00,−07:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT+8,,Canonical,−08:00,−08:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT+9,,Canonical,−09:00,−09:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT-0,,Alias,+00:00,+00:00,Link to Etc/GMT
+,,Etc/GMT-1,,Canonical,+01:00,+01:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT-10,,Canonical,+10:00,+10:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT-11,,Canonical,+11:00,+11:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT-12,,Canonical,+12:00,+12:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT-13,,Canonical,+13:00,+13:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT-14,,Canonical,+14:00,+14:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT-2,,Canonical,+02:00,+02:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT-3,,Canonical,+03:00,+03:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT-4,,Canonical,+04:00,+04:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT-5,,Canonical,+05:00,+05:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT-6,,Canonical,+06:00,+06:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT-7,,Canonical,+07:00,+07:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT-8,,Canonical,+08:00,+08:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT-9,,Canonical,+09:00,+09:00,Sign is intentionally inverted. See the Etc area description.
+,,Etc/GMT0,,Alias,+00:00,+00:00,Link to Etc/GMT
+,,Etc/Greenwich,,Deprecated,+00:00,+00:00,Link to Etc/GMT
+,,Etc/UCT,,Deprecated,+00:00,+00:00,Link to Etc/UTC
+,,Etc/Universal,,Deprecated,+00:00,+00:00,Link to Etc/UTC
+,,Etc/UTC,,Canonical,+00:00,+00:00,
+,,Etc/Zulu,,Deprecated,+00:00,+00:00,Link to Etc/UTC
+NL,+5222+00454,Europe/Amsterdam,,Canonical,+01:00,+02:00,
+AD,+4230+00131,Europe/Andorra,,Canonical,+01:00,+02:00,
+RU,+4621+04803,Europe/Astrakhan,MSK+01 - Astrakhan,Canonical,+04:00,+04:00,
+GR,+3758+02343,Europe/Athens,,Canonical,+02:00,+03:00,
+GB,,Europe/Belfast,,Deprecated,+00:00,+01:00,Link to Europe/London
+RS,+4450+02030,Europe/Belgrade,,Canonical,+01:00,+02:00,
+DE,+5230+01322,Europe/Berlin,Germany (most areas),Canonical,+01:00,+02:00,"In 1945, the Trizone did not follow Berlin's switch to DST, see Time in Germany"
+SK,+4809+01707,Europe/Bratislava,,Alias,+01:00,+02:00,Link to Europe/Prague
+BE,+5050+00420,Europe/Brussels,,Canonical,+01:00,+02:00,
+RO,+4426+02606,Europe/Bucharest,,Canonical,+02:00,+03:00,
+HU,+4730+01905,Europe/Budapest,,Canonical,+01:00,+02:00,
+DE,+4742+00841,Europe/Busingen,Busingen,Alias,+01:00,+02:00,Link to Europe/Zurich
+MD,+4700+02850,Europe/Chisinau,,Canonical,+02:00,+03:00,
+DK,+5540+01235,Europe/Copenhagen,,Canonical,+01:00,+02:00,
+IE,+5320−00615,Europe/Dublin,,Canonical,+01:00,+00:00,
+GI,+3608−00521,Europe/Gibraltar,,Canonical,+01:00,+02:00,
+GG,+492717−0023210,Europe/Guernsey,,Alias,+00:00,+01:00,Link to Europe/London
+FI,+6010+02458,Europe/Helsinki,,Canonical,+02:00,+03:00,
+IM,+5409−00428,Europe/Isle_of_Man,,Alias,+00:00,+01:00,Link to Europe/London
+TR,+4101+02858,Europe/Istanbul,,Canonical,+03:00,+03:00,
+JE,+491101−0020624,Europe/Jersey,,Alias,+00:00,+01:00,Link to Europe/London
+RU,+5443+02030,Europe/Kaliningrad,MSK-01 - Kaliningrad,Canonical,+02:00,+02:00,
+UA,+5026+03031,Europe/Kiev,Ukraine (most areas),Canonical,+02:00,+03:00,
+RU,+5836+04939,Europe/Kirov,MSK+00 - Kirov,Canonical,+03:00,+03:00,
+PT,+3843−00908,Europe/Lisbon,Portugal (mainland),Canonical,+00:00,+01:00,
+SI,+4603+01431,Europe/Ljubljana,,Alias,+01:00,+02:00,Link to Europe/Belgrade
+GB,+513030−0000731,Europe/London,,Canonical,+00:00,+01:00,
+LU,+4936+00609,Europe/Luxembourg,,Canonical,+01:00,+02:00,
+ES,+4024−00341,Europe/Madrid,Spain (mainland),Canonical,+01:00,+02:00,
+MT,+3554+01431,Europe/Malta,,Canonical,+01:00,+02:00,
+AX,+6006+01957,Europe/Mariehamn,,Alias,+02:00,+03:00,Link to Europe/Helsinki
+BY,+5354+02734,Europe/Minsk,,Canonical,+03:00,+03:00,
+MC,+4342+00723,Europe/Monaco,,Canonical,+01:00,+02:00,
+RU,+554521+0373704,Europe/Moscow,MSK+00 - Moscow area,Canonical,+03:00,+03:00,
+CY,+3510+03322,Europe/Nicosia,,Alias,+02:00,+03:00,Link to Asia/Nicosia
+NO,+5955+01045,Europe/Oslo,,Canonical,+01:00,+02:00,
+FR,+4852+00220,Europe/Paris,,Canonical,+01:00,+02:00,
+ME,+4226+01916,Europe/Podgorica,,Alias,+01:00,+02:00,Link to Europe/Belgrade
+CZ,+5005+01426,Europe/Prague,,Canonical,+01:00,+02:00,
+LV,+5657+02406,Europe/Riga,,Canonical,+02:00,+03:00,
+IT,+4154+01229,Europe/Rome,,Canonical,+01:00,+02:00,
+RU,+5312+05009,Europe/Samara,"MSK+01 - Samara, Udmurtia",Canonical,+04:00,+04:00,
+SM,+4355+01228,Europe/San_Marino,,Alias,+01:00,+02:00,Link to Europe/Rome
+BA,+4352+01825,Europe/Sarajevo,,Alias,+01:00,+02:00,Link to Europe/Belgrade
+RU,+5134+04602,Europe/Saratov,MSK+01 - Saratov,Canonical,+04:00,+04:00,
+UA,+4457+03406,Europe/Simferopol,Crimea,Canonical,+03:00,+03:00,Disputed - Reflects data in the TZDB.[note 2]
+MK,+4159+02126,Europe/Skopje,,Alias,+01:00,+02:00,Link to Europe/Belgrade
+BG,+4241+02319,Europe/Sofia,,Canonical,+02:00,+03:00,
+SE,+5920+01803,Europe/Stockholm,,Canonical,+01:00,+02:00,
+EE,+5925+02445,Europe/Tallinn,,Canonical,+02:00,+03:00,
+AL,+4120+01950,Europe/Tirane,,Canonical,+01:00,+02:00,
+MD,,Europe/Tiraspol,,Deprecated,+02:00,+03:00,Link to Europe/Chisinau
+RU,+5420+04824,Europe/Ulyanovsk,MSK+01 - Ulyanovsk,Canonical,+04:00,+04:00,
+UA,+4837+02218,Europe/Uzhgorod,Transcarpathia,Canonical,+02:00,+03:00,
+LI,+4709+00931,Europe/Vaduz,,Alias,+01:00,+02:00,Link to Europe/Zurich
+VA,+415408+0122711,Europe/Vatican,,Alias,+01:00,+02:00,Link to Europe/Rome
+AT,+4813+01620,Europe/Vienna,,Canonical,+01:00,+02:00,
+LT,+5441+02519,Europe/Vilnius,,Canonical,+02:00,+03:00,
+RU,+4844+04425,Europe/Volgograd,MSK+00 - Volgograd,Canonical,+03:00,+03:00,
+PL,+5215+02100,Europe/Warsaw,,Canonical,+01:00,+02:00,
+HR,+4548+01558,Europe/Zagreb,,Alias,+01:00,+02:00,Link to Europe/Belgrade
+UA,+4750+03510,Europe/Zaporozhye,Zaporozhye and east Lugansk,Canonical,+02:00,+03:00,
+CH,+4723+00832,Europe/Zurich,Swiss time,Canonical,+01:00,+02:00,
+,,Factory,,Canonical,+00:00,+00:00,
+GB,,GB,,Deprecated,+00:00,+01:00,Link to Europe/London
+GB,,GB-Eire,,Deprecated,+00:00,+01:00,Link to Europe/London
+,,GMT,,Alias,+00:00,+00:00,Link to Etc/GMT
+,,GMT+0,,Deprecated,+00:00,+00:00,Link to Etc/GMT
+,,GMT-0,,Deprecated,+00:00,+00:00,Link to Etc/GMT
+,,GMT0,,Deprecated,+00:00,+00:00,Link to Etc/GMT
+,,Greenwich,,Deprecated,+00:00,+00:00,Link to Etc/GMT
+HK,+2217+11409,Hongkong,,Deprecated,+08:00,+08:00,Link to Asia/Hong_Kong
+,,HST,,Deprecated,−10:00,−10:00,"Choose a zone that currently observes HST without daylight saving time, such as Pacific/Honolulu."
+IS,,Iceland,,Deprecated,+00:00,+00:00,Link to Atlantic/Reykjavik
+MG,−1855+04731,Indian/Antananarivo,,Alias,+03:00,+03:00,Link to Africa/Nairobi
+IO,−0720+07225,Indian/Chagos,,Canonical,+06:00,+06:00,
+CX,−1025+10543,Indian/Christmas,,Canonical,+07:00,+07:00,
+CC,−1210+09655,Indian/Cocos,,Canonical,+06:30,+06:30,
+KM,−1141+04316,Indian/Comoro,,Alias,+03:00,+03:00,Link to Africa/Nairobi
+TF,−492110+0701303,Indian/Kerguelen,"Kerguelen, St Paul Island, Amsterdam Island",Canonical,+05:00,+05:00,
+SC,−0440+05528,Indian/Mahe,,Canonical,+04:00,+04:00,
+MV,+0410+07330,Indian/Maldives,,Canonical,+05:00,+05:00,
+MU,−2010+05730,Indian/Mauritius,,Canonical,+04:00,+04:00,
+YT,−1247+04514,Indian/Mayotte,,Alias,+03:00,+03:00,Link to Africa/Nairobi
+RE,−2052+05528,Indian/Reunion,"Réunion, Crozet, Scattered Islands",Canonical,+04:00,+04:00,
+IR,,Iran,,Deprecated,+03:30,+04:30,Link to Asia/Tehran
+IL,,Israel,,Deprecated,+02:00,+03:00,Link to Asia/Jerusalem
+JM,+175805−0764736,Jamaica,,Deprecated,−05:00,−05:00,Link to America/Jamaica
+JP,,Japan,,Deprecated,+09:00,+09:00,Link to Asia/Tokyo
+MH,+0905+16720,Kwajalein,,Deprecated,+12:00,+12:00,Link to Pacific/Kwajalein
+LY,,Libya,,Deprecated,+02:00,+02:00,Link to Africa/Tripoli
+,,MET,,Deprecated,+01:00,+02:00,"Choose a zone that observes MET (sames as CET), such as Europe/Paris."
+MX,,Mexico/BajaNorte,,Deprecated,−08:00,−07:00,Link to America/Tijuana
+MX,,Mexico/BajaSur,,Deprecated,−07:00,−06:00,Link to America/Mazatlan
+MX,,Mexico/General,,Deprecated,−06:00,−05:00,Link to America/Mexico_City
+,,MST,,Deprecated,−07:00,−07:00,"Choose a zone that currently observes MST without daylight saving time, such as America/Phoenix."
+,,MST7MDT,,Deprecated,−07:00,−06:00,"Choose a zone that observes MST with United States daylight saving time rules, such as America/Denver."
+US,,Navajo,,Deprecated,−07:00,−06:00,Link to America/Denver
+NZ,,NZ,,Deprecated,+12:00,+13:00,Link to Pacific/Auckland
+NZ,,NZ-CHAT,,Deprecated,+12:45,+13:45,Link to Pacific/Chatham
+WS,−1350−17144,Pacific/Apia,,Canonical,+13:00,+14:00,
+NZ,−3652+17446,Pacific/Auckland,New Zealand time,Canonical,+12:00,+13:00,
+PG,−0613+15534,Pacific/Bougainville,Bougainville,Canonical,+11:00,+11:00,
+NZ,−4357−17633,Pacific/Chatham,Chatham Islands,Canonical,+12:45,+13:45,
+FM,+0725+15147,Pacific/Chuuk,"Chuuk/Truk, Yap",Canonical,+10:00,+10:00,
+CL,−2709−10926,Pacific/Easter,Easter Island,Canonical,−06:00,−05:00,
+VU,−1740+16825,Pacific/Efate,,Canonical,+11:00,+11:00,
+KI,−0308−17105,Pacific/Enderbury,Phoenix Islands,Canonical,+13:00,+13:00,
+TK,−0922−17114,Pacific/Fakaofo,,Canonical,+13:00,+13:00,
+FJ,−1808+17825,Pacific/Fiji,,Canonical,+12:00,+13:00,
+TV,−0831+17913,Pacific/Funafuti,,Canonical,+12:00,+12:00,
+EC,−0054−08936,Pacific/Galapagos,Galápagos Islands,Canonical,−06:00,−06:00,
+PF,−2308−13457,Pacific/Gambier,Gambier Islands,Canonical,−09:00,−09:00,
+SB,−0932+16012,Pacific/Guadalcanal,,Canonical,+11:00,+11:00,
+GU,+1328+14445,Pacific/Guam,,Canonical,+10:00,+10:00,
+US,+211825−1575130,Pacific/Honolulu,Hawaii,Canonical,−10:00,−10:00,
+UM,,Pacific/Johnston,,Deprecated,−10:00,−10:00,Link to Pacific/Honolulu
+KI,+0152−15720,Pacific/Kiritimati,Line Islands,Canonical,+14:00,+14:00,
+FM,+0519+16259,Pacific/Kosrae,Kosrae,Canonical,+11:00,+11:00,
+MH,+0905+16720,Pacific/Kwajalein,Kwajalein,Canonical,+12:00,+12:00,
+MH,+0709+17112,Pacific/Majuro,Marshall Islands (most areas),Canonical,+12:00,+12:00,
+PF,−0900−13930,Pacific/Marquesas,Marquesas Islands,Canonical,−09:30,−09:30,
+UM,+2813−17722,Pacific/Midway,Midway Islands,Alias,−11:00,−11:00,Link to Pacific/Pago_Pago
+NR,−0031+16655,Pacific/Nauru,,Canonical,+12:00,+12:00,
+NU,−1901−16955,Pacific/Niue,,Canonical,−11:00,−11:00,
+NF,−2903+16758,Pacific/Norfolk,,Canonical,+11:00,+12:00,
+NC,−2216+16627,Pacific/Noumea,,Canonical,+11:00,+11:00,
+AS,−1416−17042,Pacific/Pago_Pago,"Samoa, Midway",Canonical,−11:00,−11:00,
+PW,+0720+13429,Pacific/Palau,,Canonical,+09:00,+09:00,
+PN,−2504−13005,Pacific/Pitcairn,,Canonical,−08:00,−08:00,
+FM,+0658+15813,Pacific/Pohnpei,Pohnpei/Ponape,Canonical,+11:00,+11:00,
+FM,,Pacific/Ponape,,Deprecated,+11:00,+11:00,Link to Pacific/Pohnpei
+PG,−0930+14710,Pacific/Port_Moresby,Papua New Guinea (most areas),Canonical,+10:00,+10:00,
+CK,−2114−15946,Pacific/Rarotonga,,Canonical,−10:00,−10:00,
+MP,+1512+14545,Pacific/Saipan,,Alias,+10:00,+10:00,Link to Pacific/Guam
+WS,,Pacific/Samoa,,Deprecated,−11:00,−11:00,Link to Pacific/Pago_Pago
+PF,−1732−14934,Pacific/Tahiti,Society Islands,Canonical,−10:00,−10:00,
+KI,+0125+17300,Pacific/Tarawa,Gilbert Islands,Canonical,+12:00,+12:00,
+TO,−2110−17510,Pacific/Tongatapu,,Canonical,+13:00,+13:00,
+FM,,Pacific/Truk,,Deprecated,+10:00,+10:00,Link to Pacific/Chuuk
+UM,+1917+16637,Pacific/Wake,Wake Island,Canonical,+12:00,+12:00,
+WF,−1318−17610,Pacific/Wallis,,Canonical,+12:00,+12:00,
+FM,,Pacific/Yap,,Deprecated,+10:00,+10:00,Link to Pacific/Chuuk
+PL,,Poland,,Deprecated,+01:00,+02:00,Link to Europe/Warsaw
+PT,,Portugal,,Deprecated,+00:00,+01:00,Link to Europe/Lisbon
+CN,,PRC,,Deprecated,+08:00,+08:00,Link to Asia/Shanghai
+,,PST8PDT,,Deprecated,−08:00,−07:00,"Choose a zone that observes PST with United States daylight saving time rules, such as America/Los_Angeles."
+TW,,ROC,,Deprecated,+08:00,+08:00,Link to Asia/Taipei
+KR,,ROK,,Deprecated,+09:00,+09:00,Link to Asia/Seoul
+SG,+0117+10351,Singapore,,Deprecated,+08:00,+08:00,Link to Asia/Singapore
+TR,,Turkey,,Deprecated,+03:00,+03:00,Link to Europe/Istanbul
+,,UCT,,Deprecated,+00:00,+00:00,Link to Etc/UTC
+,,Universal,,Deprecated,+00:00,+00:00,Link to Etc/UTC
+US,,US/Alaska,,Deprecated,−09:00,−08:00,Link to America/Anchorage
+US,,US/Aleutian,,Deprecated,−10:00,−09:00,Link to America/Adak
+US,,US/Arizona,,Deprecated,−07:00,−07:00,Link to America/Phoenix
+US,,US/Central,,Deprecated,−06:00,−05:00,Link to America/Chicago
+US,,US/East-Indiana,,Deprecated,−05:00,−04:00,Link to America/Indiana/Indianapolis
+US,,US/Eastern,,Deprecated,−05:00,−04:00,Link to America/New_York
+US,,US/Hawaii,,Deprecated,−10:00,−10:00,Link to Pacific/Honolulu
+US,,US/Indiana-Starke,,Deprecated,−06:00,−05:00,Link to America/Indiana/Knox
+US,,US/Michigan,,Deprecated,−05:00,−04:00,Link to America/Detroit
+US,,US/Mountain,,Deprecated,−07:00,−06:00,Link to America/Denver
+US,,US/Pacific,,Deprecated,−08:00,−07:00,Link to America/Los_Angeles
+WS,,US/Samoa,,Deprecated,−11:00,−11:00,Link to Pacific/Pago_Pago
+,,UTC,,Alias,+00:00,+00:00,Link to Etc/UTC
+RU,,W-SU,,Deprecated,+03:00,+03:00,Link to Europe/Moscow
+,,WET,,Deprecated,+00:00,+01:00,"Choose a zone that observes WET, such as Europe/Lisbon."
+,,Zulu,,Deprecated,+00:00,+00:00,Link to Etc/UTC
--- a/docs/CNAME
+++ b/docs/CNAME
@ -0,0 +1 @@
+www.rapids.science
--- a/docs/Makefile
+++ b/docs/Makefile
@ -1,153 +0,0 @@
-# Makefile for Sphinx documentation
-#
-
-# You can set these variables from the command line.
-SPHINXOPTS    =
-SPHINXBUILD   = sphinx-build
-PAPER         =
-BUILDDIR      = _build
-
-# Internal variables.
-PAPEROPT_a4     = -D latex_paper_size=a4
-PAPEROPT_letter = -D latex_paper_size=letter
-ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
-# the i18n builder cannot share the environment and doctrees with the others
-I18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
-
-.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
-
-help:
-	@echo "Please use \`make <target>' where <target> is one of"
-	@echo "  html       to make standalone HTML files"
-	@echo "  dirhtml    to make HTML files named index.html in directories"
-	@echo "  singlehtml to make a single large HTML file"
-	@echo "  pickle     to make pickle files"
-	@echo "  json       to make JSON files"
-	@echo "  htmlhelp   to make HTML files and a HTML help project"
-	@echo "  qthelp     to make HTML files and a qthelp project"
-	@echo "  devhelp    to make HTML files and a Devhelp project"
-	@echo "  epub       to make an epub"
-	@echo "  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
-	@echo "  latexpdf   to make LaTeX files and run them through pdflatex"
-	@echo "  text       to make text files"
-	@echo "  man        to make manual pages"
-	@echo "  texinfo    to make Texinfo files"
-	@echo "  info       to make Texinfo files and run them through makeinfo"
-	@echo "  gettext    to make PO message catalogs"
-	@echo "  changes    to make an overview of all changed/added/deprecated items"
-	@echo "  linkcheck  to check all external links for integrity"
-	@echo "  doctest    to run all doctests embedded in the documentation (if enabled)"
-
-clean:
-	-rm -rf $(BUILDDIR)/*
-
-html:
-	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
-	@echo
-	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
-
-dirhtml:
-	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
-	@echo
-	@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
-
-singlehtml:
-	$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
-	@echo
-	@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
-
-pickle:
-	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
-	@echo
-	@echo "Build finished; now you can process the pickle files."
-
-json:
-	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
-	@echo
-	@echo "Build finished; now you can process the JSON files."
-
-htmlhelp:
-	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
-	@echo
-	@echo "Build finished; now you can run HTML Help Workshop with the" \
-	      ".hhp project file in $(BUILDDIR)/htmlhelp."
-
-qthelp:
-	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
-	@echo
-	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
-	      ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
-	@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/moshi-aware.qhcp"
-	@echo "To view the help file:"
-	@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/moshi-aware.qhc"
-
-devhelp:
-	$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
-	@echo
-	@echo "Build finished."
-	@echo "To view the help file:"
-	@echo "# mkdir -p $$HOME/.local/share/devhelp/moshi-aware"
-	@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/moshi-aware"
-	@echo "# devhelp"
-
-epub:
-	$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
-	@echo
-	@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
-
-latex:
-	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
-	@echo
-	@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
-	@echo "Run \`make' in that directory to run these through (pdf)latex" \
-	      "(use \`make latexpdf' here to do that automatically)."
-
-latexpdf:
-	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
-	@echo "Running LaTeX files through pdflatex..."
-	$(MAKE) -C $(BUILDDIR)/latex all-pdf
-	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
-
-text:
-	$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
-	@echo
-	@echo "Build finished. The text files are in $(BUILDDIR)/text."
-
-man:
-	$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
-	@echo
-	@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
-
-texinfo:
-	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
-	@echo
-	@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
-	@echo "Run \`make' in that directory to run these through makeinfo" \
-	      "(use \`make info' here to do that automatically)."
-
-info:
-	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
-	@echo "Running Texinfo files through makeinfo..."
-	make -C $(BUILDDIR)/texinfo info
-	@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
-
-gettext:
-	$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
-	@echo
-	@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
-
-changes:
-	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
-	@echo
-	@echo "The overview file is in $(BUILDDIR)/changes."
-
-linkcheck:
-	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
-	@echo
-	@echo "Link check complete; look for any errors in the above output " \
-	      "or in $(BUILDDIR)/linkcheck/output.txt."
-
-doctest:
-	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
-	@echo "Testing of doctests in the sources finished, look at the " \
-	      "results in $(BUILDDIR)/doctest/output.txt."
--- a/docs/analysis/complete-workflow-example.md
+++ b/docs/analysis/complete-workflow-example.md
@ -0,0 +1,95 @@
+# Analysis Workflow Example
+
+!!! info "TL;DR"
+    - In addition to using RAPIDS to extract behavioral features, create plots, and clean sensor features, you can structure your data analysis within RAPIDS (i.e. creating ML/statistical models and evaluating your models)
+    - We include an analysis example in RAPIDS that covers raw data processing, feature extraction, cleaning, machine learning modeling, and evaluation
+    - Use this example as a guide to structure your own analysis within RAPIDS
+    - RAPIDS analysis workflows are compatible with your favorite data science tools and libraries
+    - RAPIDS analysis workflows are reproducible and we encourage you to publish them along with your research papers
+
+## Why should I integrate my analysis in RAPIDS?
+Even though the bulk of RAPIDS current functionality is related to the computation of behavioral features, we recommend RAPIDS as a complementary tool to create a mobile data analysis workflow. This is because the cookiecutter data science file organization guidelines, the use of Snakemake, the provided behavioral features, and the reproducible R and Python development environments allow researchers to divide an analysis workflow into small parts that can be audited, shared in an online repository, reproduced in other computers, and understood by other people as they follow a familiar and consistent structure. We believe these advantages outweigh the time needed to learn how to create these workflows in RAPIDS.
+
+We clarify that to create analysis workflows in RAPIDS, researchers can still use any data manipulation tools, editors, libraries or languages they are already familiar with. RAPIDS is meant to be the final destination of analysis code that was developed in interactive notebooks or stand-alone scripts. For example, a user can compute call and location features using RAPIDS, then, they can use Jupyter notebooks to explore feature cleaning approaches and once the cleaning code is final, it can be moved to RAPIDS as a new step in the pipeline. In turn, the output of this cleaning step can be used to explore machine learning models and once a model is finished, it can also be transferred to RAPIDS as a step of its own. The idea is that when it is time to publish a piece of research, a RAPIDS workflow can be shared in a public repository as is.
+
+In the following sections we share an example of how we structured an analysis workflow in RAPIDS.
+
+## Analysis workflow structure
+To accurately reflect the complexity of a real-world modeling scenario, we decided not to oversimplify this example. Importantly, every step in this example follows a basic structure: an input file and parameters are manipulated by an R or Python script that saves the results to an output file. Input files, parameters, output files and scripts are grouped into Snakemake rules that are described on `smk` files in the rules folder (we point the reader to the relevant rule(s) of each step). 
+
+Researchers can use these rules and scripts as a guide to create their own as it is expected every modeling project will have different requirements, data and goals but ultimately most follow a similar chainned pattern.
+
+!!! hint
+    The example's config file is `example_profile/example_config.yaml` and its Snakefile is in `example_profile/Snakefile`. The config file is already configured to process the sensor data as explained in [Analysis workflow modules](#analysis-workflow-modules).
+
+## Description of the study modeled in our analysis workflow example
+Our example is based on a hypothetical study that recruited 2 participants that underwent surgery and collected mobile data for at least one week before and one week after the procedure. Participants wore a Fitbit device and installed the AWARE client in their personal Android and iOS smartphones to collect mobile data 24/7. In addition, participants completed daily severity ratings of 12 common symptoms on a scale from 0 to 10 that we summed up into a daily symptom burden score. 
+
+The goal of this workflow is to find out if we can predict the daily symptom burden score of a participant. Thus, we framed this question as a binary classification problem with two classes, high and low symptom burden based on the scores above and below average of each participant. We also want to compare the performance of individual (personalized) models vs a population model. 
+
+In total, our example workflow has nine steps that are in charge of sensor data preprocessing, feature extraction, feature cleaning, machine learning model training and model evaluation (see figure below). We ship this workflow with RAPIDS and share files with [test data](https://osf.io/wbg23/) in an Open Science Framework repository. 
+
+<figure>
+  <img src="../../img/analysis_workflow.png" max-width="100%" />
+  <figcaption>Modules of RAPIDS example workflow, from raw data to model evaluation</figcaption>
+</figure>
+
+
+## Configure and run the analysis workflow example
+1.	[Install](../../setup/installation) RAPIDS
+2.	Unzip the CSV files inside [rapids_example_csv.zip](https://osf.io/wbg23/) in `data/external/example_workflow/*.csv`.
+3.	Create the participant files for this example by running:
+    ```bash
+    ./rapids -j1 create_example_participant_files
+    ```
+4.	Run the example pipeline with:
+    ```bash
+    ./rapids -j1 --profile example_profile
+    ```
+
+Note you will see a lot of warning messages, you can ignore them since they happen because we ran ML algorithms with a small fake dataset.
+
+## Modules of our analysis workflow example
+
+??? info "1. Feature extraction"
+    We extract daily behavioral features for data yield, received and sent messages, missed, incoming and outgoing calls, resample fused location data using Doryab provider, activity recognition, battery, Bluetooth, screen, light, applications foreground, conversations, Wi-Fi connected, Wi-Fi visible, Fitbit heart rate summary and intraday data, Fitbit sleep summary data, and Fitbit step summary and intraday data without excluding sleep periods with an active bout threshold of 10 steps. In total, we obtained 245 daily sensor features over 12 days per participant. 
+
+??? info "2. Extract demographic data."
+    It is common to have demographic data in addition to mobile and target (ground truth) data. In this example we include participants’ age, gender and the number of days they spent in hospital after their surgery as features in our model. We extract these three columns from the `data/external/example_workflow/participant_info.csv` file. As these three features remain the same within participants, they are used only on the population model. Refer to the `demographic_features` rule in `rules/models.smk`.
+
+??? info "3. Create target labels."
+    The two classes for our machine learning binary classification problem are high and low symptom burden. Target values are already stored in the `data/external/example_workflow/participant_target.csv` file. A new rule/script can be created if further manipulation is necessary. Refer to the `parse_targets` rule in `rules/models.smk`.
+
+??? info "4. Feature merging."
+    These daily features are stored on a CSV file per sensor, a CSV file per participant, and a CSV file including all features from all participants (in every case each column represents a feature and each row represents a day). Refer to the `merge_sensor_features_for_individual_participants` and `merge_sensor_features_for_all_participants` rules in `rules/features.smk`.
+
+??? info "5. Data visualization."
+    At this point the user can use the five plots RAPIDS provides (or implement new ones) to explore and understand the quality of the raw data and extracted features and decide what sensors, days, or participants to include and exclude. Refer to `rules/reports.smk` to find the rules that generate these plots.
+
+??? info "6. Feature cleaning."
+    In this stage we perform four steps to clean our sensor feature file. First, we discard days with a data yield hour ratio less than or equal to 0.75, i.e. we include days with at least 18 hours of data. Second, we drop columns (features) with more than 30% of missing rows. Third, we drop columns with zero variance. Fourth, we drop rows (days) with more than 30% of missing columns (features). In this cleaning stage several parameters are created and exposed in `example_profile/example_config.yaml`. 
+
+    After this step, we kept 173 features over 11 days for the individual model of p01, 101 features over 12 days for the individual model of p02 and 117 features over 22 days for the population model. Note that the difference in the number of features between p01 and p02 is mostly due to iOS restrictions that stops researchers from collecting the same number of sensors than in Android phones. 
+    
+    Feature cleaning for the individual models is done in the `clean_sensor_features_for_individual_participants` rule and for the population model in the `clean_sensor_features_for_all_participants` rule in `rules/models.smk`.
+
+??? info "7. Merge features and targets."
+    In this step we merge the cleaned features and target labels for our individual models in the `merge_features_and_targets_for_individual_model` rule in `rules/features.smk`. Additionally, we merge the cleaned features, target labels, and demographic features of our two participants for the population model in the `merge_features_and_targets_for_population_model` rule in `rules/features.smk`. These two merged files are the input for our individual and population models. 
+
+??? info "8. Modelling."
+    This stage has three phases: model building, training and evaluation. 
+
+    In the building phase we impute, normalize and oversample our dataset.  Missing numeric values in each column are imputed with their mean and we impute missing categorical values with their mode. We normalize each numeric column with one of three strategies (min-max, z-score, and scikit-learn package’s robust scaler) and we one-hot encode each categorial feature as a numerical array. We oversample our imbalanced dataset using SMOTE (Synthetic Minority Over-sampling Technique) or a Random Over sampler from scikit-learn. All these parameters are exposed in `example_profile/example_config.yaml`.
+
+    In the training phase, we create eight models: logistic regression, k-nearest neighbors, support vector machine, decision tree, random forest, gradient boosting classifier, extreme gradient boosting classifier and a light gradient boosting machine. We cross-validate each model with an inner cycle to tune hyper-parameters based on the Macro F1 score and an outer cycle to predict the test set on a model with the best hyper-parameters. Both cross-validation cycles use a leave-one-out strategy. Parameters for each model like weights and learning rates are exposed in `example_profile/example_config.yaml`.
+
+    Finally, in the evaluation phase we compute the accuracy, Macro F1, kappa, area under the curve and per class precision, recall and F1 score of all folds of the outer cross-validation cycle.
+    
+    Refer to the `modelling_for_individual_participants` rule for the individual modeling and to the `modelling_for_all_participants` rule for the population modeling, both in `rules/models.smk`.
+
+??? info "9. Compute model baselines."
+    We create three baselines to evaluate our classification models.
+    
+    First, a majority classifier that labels each test sample with the majority class of our training data. Second, a random weighted classifier that predicts each test observation sampling at random from a binomial distribution based on the ratio of our target labels. Third, a decision tree classifier based solely on the demographic features of each participant. As we do not have demographic features for individual model, this baseline is only available for population model. 
+    
+    Our baseline metrics (e.g. accuracy, precision, etc.) are saved into a CSV file, ready to be compared to our modeling results. Refer to the `baselines_for_individual_model` rule for the individual model baselines and to the `baselines_for_population_model` rule for population model baselines, both in `rules/models.smk`.
--- a/docs/analysis/data-cleaning.md
+++ b/docs/analysis/data-cleaning.md
@ -0,0 +1,92 @@
+Data Cleaning
+=============
+
+The goal of this module is to perform basic clean tasks on the behavioral features that RAPIDS computes. You might need to do further processing depending on your analysis objectives. This module can clean features at the individual level and at the study level. If you are interested in creating individual models (using each participant's features independently of the others) use [`ALL_CLEANING_INDIVIDUAL`]. If you are interested in creating population models (using everyone's data in the same model) use [`ALL_CLEANING_OVERALL`]
+    
+## Clean sensor features for individual participants
+
+!!! info "File Sequence"
+    ```bash
+    - data/processed/features/{pid}/all_sensor_features.csv
+    - data/processed/features/{pid}/all_sensor_features_cleaned_{provider_key}.csv
+    ```
+
+### RAPIDS provider
+
+Parameters description for `[ALL_CLEANING_INDIVIDUAL][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]` | Set to `True` to execute the cleaning tasks described below. You can use the parameters of each task to tweak them or deactivate them|
+|`[IMPUTE_SELECTED_EVENT_FEATURES]`     | Fill NAs with 0 only for event-based features, see table below
+|`[COLS_NAN_THRESHOLD]`                 | Discard columns with missing value ratios higher than `[COLS_NAN_THRESHOLD]`. Set to 1 to disable
+|`[COLS_VAR_THRESHOLD]`                 | Set to `True` to discard columns with zero variance
+|`[ROWS_NAN_THRESHOLD]`                 | Discard rows with missing value ratios higher than `[ROWS_NAN_THRESHOLD]`. Set to 1 to disable
+|`[DATA_YIELD_FEATURE]`                 | `RATIO_VALID_YIELDED_HOURS` or `RATIO_VALID_YIELDED_MINUTES`
+|`[DATA_YIELD_RATIO_THRESHOLD]`         | Discard rows with `ratiovalidyieldedhours` or `ratiovalidyieldedminutes` feature less than `[DATA_YIELD_RATIO_THRESHOLD]`. The feature name is determined by `[DATA_YIELD_FEATURE]` parameter. Set to 0 to disable
+|`DROP_HIGHLY_CORRELATED_FEATURES`      | Discard highly correlated features, see table below
+
+Parameters description for `[ALL_CLEANING_INDIVIDUAL][PROVIDERS][RAPIDS][IMPUTE_SELECTED_EVENT_FEATURES]`:
+
+|Parameters                             | Description                                                    |
+|-------------------------------------- |----------------------------------------------------------------|
+|`[COMPUTE]`                            | Set to `True` to fill NAs with 0 for phone event-based features
+|`[MIN_DATA_YIELDED_MINUTES_TO_IMPUTE]` | Any feature value in a time segment instance with phone data yield > `[MIN_DATA_YIELDED_MINUTES_TO_IMPUTE]` will be replaced with a zero. See below for an explanation. |
+
+Parameters description for `[ALL_CLEANING_INDIVIDUAL][PROVIDERS][RAPIDS][DROP_HIGHLY_CORRELATED_FEATURES]`:
+
+|Parameters                             | Description                                                    |
+|-------------------------------------- |----------------------------------------------------------------|
+|`[COMPUTE]`                            | Set to `True` to drop highly correlated features
+|`[MIN_OVERLAP_FOR_CORR_THRESHOLD]`     | Minimum ratio of observations required per pair of columns (features) to be considered as a valid correlation. 
+|`[CORR_THRESHOLD]` | The absolute values of pair-wise correlations are calculated. If two variables have a valid correlation higher than `[CORR_THRESHOLD]`, we looks at the mean absolute correlation of each variable and removes the variable with the largest mean absolute correlation.
+
+Steps to clean sensor features for individual participants. It only considers the **phone sensors** currently.
+
+??? info "1. Fill NA with 0 for the selected event features."
+    Some event features should be zero instead of NA. In this step, we fill those missing features with 0 when the `phone_data_yield_rapids_ratiovalidyieldedminutes` column is higher than the `[IMPUTE_SELECTED_EVENT_FEATURES][MIN_DATA_YIELDED_MINUTES_TO_IMPUTE]` parameter. Plugins such as Activity Recognition sensor are not considered. You can skip this step by setting `[IMPUTE_SELECTED_EVENT_FEATURES][COMPUTE]` to `False`.
+    
+    Take phone calls sensor as an example. If there are no calls records during a time segment for a participant, then (1) the calls sensor was not working during that time segment; or (2) the calls sensor was working and the participant did not have any calls during that time segment. To differentiate these two situations, we assume the selected sensors are working when `phone_data_yield_rapids_ratiovalidyieldedminutes > [MIN_DATA_YIELDED_MINUTES_TO_IMPUTE]`.
+
+    The following phone event-based features are considered currently:
+
+      - Application foreground: countevent, countepisode, minduration, maxduration, meanduration, sumduration.
+      - Battery: all features.
+      - Calls: count, distinctcontacts, sumduration, minduration, maxduration, meanduration, modeduration.
+      - Keyboard: sessioncount, averagesessionlength, changeintextlengthlessthanminusone, changeintextlengthequaltominusone, changeintextlengthequaltoone, changeintextlengthmorethanone, maxtextlength, totalkeyboardtouches.
+      - Messages: count, distinctcontacts.
+      - Screen: sumduration, maxduration, minduration, avgduration, countepisode.
+      - WiFi: all connected and visible features.
+
+??? info "2. Discard unreliable rows."
+    Extracted features might be not reliable if the sensor only works for a short period during a time segment. In this step, we discard rows when the `phone_data_yield_rapids_ratiovalidyieldedminutes` column or the `phone_data_yield_rapids_ratiovalidyieldedhours` column is less than the `[DATA_YIELD_RATIO_THRESHOLD]` parameter. We recommend using `phone_data_yield_rapids_ratiovalidyieldedminutes` column (set `[DATA_YIELD_FEATURE]` to `RATIO_VALID_YIELDED_MINUTES`) on time segments that are shorter than two or three hours and `phone_data_yield_rapids_ratiovalidyieldedhours` (set `[DATA_YIELD_FEATURE]` to `RATIO_VALID_YIELDED_HOURS`) for longer segments. We do not recommend you to skip this step, but you can do it by setting `[DATA_YIELD_RATIO_THRESHOLD]` to 0.
+
+??? info "3. Discard columns (features) with too many missing values."
+    In this step, we discard columns with missing value ratios higher than `[COLS_NAN_THRESHOLD]`. We do not recommend you to skip this step, but you can do it by setting `[COLS_NAN_THRESHOLD]` to 1.
+
+??? info "4. Discard columns (features) with zero variance."
+    In this step, we discard columns with zero variance. We do not recommend you to skip this step, but you can do it by setting `[COLS_VAR_THRESHOLD]` to `False`.
+
+??? info "5. Drop highly correlated features."
+    As highly correlated features might not bring additional information and will increase the complexity of a model, we drop them in this step. The absolute values of pair-wise correlations are calculated. Each correlation vector between two variables is regarded as valid only if the ratio of valid value pairs (i.e. non NA pairs) is greater than or equal to `[DROP_HIGHLY_CORRELATED_FEATURES][MIN_OVERLAP_FOR_CORR_THRESHOLD]`. If two variables have a correlation coefficient higher than `[DROP_HIGHLY_CORRELATED_FEATURES][CORR_THRESHOLD]`, we look at the mean absolute correlation of each variable and remove the variable with the largest mean absolute correlation. This step can be skipped by setting `[DROP_HIGHLY_CORRELATED_FEATURES][COMPUTE]` to False.
+
+??? info "6. Discard rows with too many missing values."
+    In this step, we discard rows with missing value ratios higher than `[ROWS_NAN_THRESHOLD]`. We do not recommend you to skip this step, but you can do it by setting `[ROWS_NAN_THRESHOLD]` to 1. In other words, we are discarding time segments (e.g. days) that did not have enough data to be considered reliable. This step is similar to step 2 except the ratio is computed based on NA values instead of a phone data yield threshold.
+
+
+
+
+## Clean sensor features for all participants
+
+!!! info "File Sequence"
+    ```bash
+    - data/processed/features/all_participants/all_sensor_features.csv
+    - data/processed/features/all_participants/all_sensor_features_cleaned_{provider_key}.csv
+    ```
+
+
+### RAPIDS provider
+
+Parameters description and the steps are the same as the above [RAPIDS provider](#rapids-provider) section for individual participants.
+
+
--- a/docs/analysis/minimal.md
+++ b/docs/analysis/minimal.md
@ -0,0 +1,153 @@
+Minimal Working Example
+=======================
+
+This is a quick guide for creating and running a simple pipeline to extract missing, outgoing, and incoming `call` features for `24 hr` (`00:00:00` to `23:59:59`) and `night` (`00:00:00` to `05:59:59`) time segments of every day of data of one participant that was monitored on the US East coast with an Android smartphone.
+
+1. Install RAPIDS and make sure your `conda` environment is active (see [Installation](../../setup/installation))
+3. Download this [CSV file](../img/calls.csv) and save it as `data/external/aware_csv/calls.csv`
+2. Make the changes listed below for the corresponding [Configuration](../../setup/configuration) step (we provide an example of what the relevant sections in your `config.yml` will look like after you are done)
+    
+    ??? info "Required configuration changes (*click to expand*)"
+        1. **Supported [data streams](../../setup/configuration#supported-data-streams).** 
+            
+            Based on the docs, we decided to use the `aware_csv` data stream because we are processing aware data saved in a CSV file. We will use this label in a later step; there's no need to type it or save it anywhere yet.
+
+        3. **Create your [participants file](../../setup/configuration#participant-files).**
+        
+            Since we are processing data from a single participant, you only need to create a single participant file called `p01.yaml` in `data/external/participant_files`. This participant file only has a `PHONE` section because this hypothetical participant was only monitored with a smartphone. Note that for a real analysis, you can do this [automatically with a CSV file](../../setup/configuration##automatic-creation-of-participant-files)
+            
+            1. Add `p01` to `[PIDS]` in `config.yaml`
+
+            1. Create a file in `data/external/participant_files/p01.yaml` with the following content:
+
+                ```yaml
+                PHONE:
+                    DEVICE_IDS: [a748ee1a-1d0b-4ae9-9074-279a2b6ba524] # the participant's AWARE device id
+                    PLATFORMS: [android] # or ios
+                    LABEL: MyTestP01 # any string
+                    START_DATE: 2020-01-01 # this can also be empty
+                    END_DATE: 2021-01-01 # this can also be empty
+                ```
+        
+        4. **Select what [time segments](../../setup/configuration#time-segments) you want to extract features on.** 
+        
+            1. Set `[TIME_SEGMENTS][FILE]` to `data/external/timesegments_periodic.csv` 
+
+            1. Create a file in `data/external/timesegments_periodic.csv` with the following content
+            
+                ```csv
+                label,start_time,length,repeats_on,repeats_value
+                daily,00:00:00,23H 59M 59S,every_day,0
+                night,00:00:00,5H 59M 59S,every_day,0
+                ```
+        
+        2. **Choose the [timezone of your study](../../setup/configuration#timezone-of-your-study).** 
+        
+            We will use the default time zone settings since this example is processing data collected on the US East Coast (`America/New_York`)
+
+            ```yaml
+            TIMEZONE: 
+                TYPE: SINGLE
+                SINGLE:
+                    TZCODE: America/New_York
+            ```
+
+         5. **Modify your [device data stream configuration](../../setup/configuration#data-stream-configuration)**
+            
+            1. Set `[PHONE_DATA_STREAMS][USE]` to `aware_csv`. 
+            
+            2. We will use the default value for `[PHONE_DATA_STREAMS][aware_csv][FOLDER]` since we already stored the test calls CSV file there.
+
+         6. **Select what [sensors and features](../../setup/configuration#sensor-and-features-to-process) you want to process.** 
+         
+            1. Set `[PHONE_CALLS][CONTAINER]` to `calls.csv` in the `config.yaml` file.
+
+            1. Set `[PHONE_CALLS][PROVIDERS][RAPIDS][COMPUTE]` to `True` in the `config.yaml` file.
+
+
+    !!! example "Example of the `config.yaml` sections after the changes outlined above"
+
+        This will be your `config.yaml` after following the instructions above. Click on the numbered markers to know more.
+
+        ``` { .yaml .annotate } 
+        PIDS: [p01] # (1)
+        
+        TIMEZONE:
+            TYPE: SINGLE # (2)
+            SINGLE:
+                TZCODE: America/New_York
+
+        # ... other irrelevant sections
+
+        TIME_SEGMENTS: &time_segments
+            TYPE: PERIODIC # (3)
+            FILE: "data/external/timesegments_periodic.csv" # (4)
+            INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE
+
+        PHONE_DATA_STREAMS:
+            USE: aware_csv # (5)
+
+            aware_csv:
+                FOLDER: data/external/aware_csv # (6)
+
+        # ... other irrelevant sections
+
+        ############## PHONE ###########################################################
+        ################################################################################
+
+        # ... other irrelevant sections
+
+        # Communication call features config, TYPES and FEATURES keys need to match
+        PHONE_CALLS:
+            CONTAINER: calls.csv  # (7) 
+            PROVIDERS:
+                RAPIDS:
+                    COMPUTE: True # (8)
+                    CALL_TYPES: ...
+        ```
+
+        1. We added `p01` to PIDS after creating the participant file:
+            ```bash
+            data/external/participant_files/p01.yaml
+            ```
+
+            With the following content:
+            ```yaml
+            PHONE:
+                DEVICE_IDS: [a748ee1a-1d0b-4ae9-9074-279a2b6ba524] # the participant's AWARE device id
+                PLATFORMS: [android] # or ios
+                LABEL: MyTestP01 # any string
+                START_DATE: 2020-01-01 # this can also be empty
+                END_DATE: 2021-01-01 # this can also be empty
+            ```
+
+        2. We use the default `SINGLE` time zone.
+
+        3. We use the default `PERIODIC` time segment `[TYPE]`
+
+        4. We created this time segments file with these lines:
+
+            ```csv
+            label,start_time,length,repeats_on,repeats_value
+            daily,00:00:00,23H 59M 59S,every_day,0
+            night,001:00:00,5H 59M 59S,every_day,0
+            ```
+
+        5. We set `[USE]` to `aware_device` to tell RAPIDS to process sensor data collected with the AWARE Framework stored in CSV files.
+
+        6. We used the default `[FOLDER]` for `awre_csv` since we already stored our test `calls.csv` file there
+
+        7. We changed `[CONTAINER]` to `calls.csv` to process our test call data.
+
+        8. We flipped `[COMPUTE]` to `True` to extract call behavioral features using the `RAPIDS` feature provider.
+
+3. Run RAPIDS
+    ```bash
+    ./rapids -j1
+    ```
+4. The call features for daily and morning time segments will be in 
+   ```
+   data/processed/features/all_participants/all_sensor_features.csv
+   ```
+
+
--- a/docs/change-log.md
+++ b/docs/change-log.md
@ -0,0 +1,165 @@
+# Change Log
+## v1.8.0
+- Add data stream for AWARE Micro server
+- Fix the NA bug in PHONE_LOCATIONS BARNETT provider
+- Fix the bug of data type for call_duration field
+- Fix the index bug of heatmap_sensors_per_minute_per_time_segment
+## v1.7.1
+- Update docs for Git Flow section
+- Update RAPIDS paper information
+## v1.7.0
+- Add firststeptime and laststeptime features to FITBIT_STEPS_INTRADAY RAPIDS provider
+- Update tests for Fitbit steps intraday features
+- Add tests for phone battery features
+- Add a data cleaning module to replace NAs with 0 in selected event-based features, discard unreliable rows and columns, discard columns with zero variance, and discard highly correlated columns
+## v1.6.0
+- Refactor PHONE_CALLS RAPIDS provider to compute features based on call episodes or events
+- Refactor PHONE_LOCATIONS DORYAB provider to compute features based on location episodes
+- Temporary revert PHONE_LOCATIONS BARNETT provider to use R script
+- Update the default IGNORE_EPISODES_LONGER_THAN to be 6 hours for screen RAPIDS provider
+- Fix the bug of step intraday features when INCLUDE_ZERO_STEP_ROWS is False
+## v1.5.0
+- Update Barnett location features with faster Python implementation
+- Fix rounding bug in data yield features
+- Add tests for data yield, Fitbit and accelerometer features
+- Small fixes of documentation
+## v1.4.1
+- Update home page
+- Add PHONE_MESSAGES tests
+## v1.4.0
+- Add new Application Foreground episode features and tests
+- Update VSCode setup instructions for our Docker container
+- Add tests for phone calls features
+- Add tests for WiFI features and fix a bug that incorrectly counted the most scanned device within the current time segment instances instead of globally
+- Add tests for phone conversation features
+- Add tests for Bluetooth features and choose the most scanned device alphabetically when ties exist
+- Add tests for Activity Recognition features and fix iOS unknown activity parsing
+- Fix Fitbit bug that parsed date-times with the current time zone in rare cases
+- Update the visualizations to be more precise and robust with different time segments.
+- Fix regression crash of the example analysis workflow
+## v1.3.0
+- Refactor PHONE_LOCATIONS DORYAB provider. Fix bugs and faster execution up to 30x
+- New PHONE_KEYBOARD features
+- Add a new strategy to infer home location that can handle multiple homes for the same participant
+- Add module to exclude sleep episodes from steps intraday features
+- Fix PID matching when joining data from multiple participants. Now, we can handle PIDS with an arbitrary format.
+- Fix bug that did not correctly parse participants with more than 2 phones or more than 1 wearable
+- Fix crash when no phone data yield is needed to process location data (ALL & GPS location providers)
+- Remove location rows with the same timestamp based on their accuracy
+- Fix PHONE_CONVERSATION bug that produced inaccurate ratio features when time segments were not daily.
+- Other minor bug fixes
+## v1.2.0
+- Sleep summary and intraday features are more consistent.
+- Add wake and bedtime features for sleep summary data.
+- Fix bugs with sleep PRICE features.
+- Update home page
+- Add contributing guide
+## v1.1.1
+- Fix length of periodic segments on days with DLS
+- Fix crash when scraping data for an app that does not exist
+- Add tests for phone screen data
+## v1.1.0
+- Add Fitbit calories intraday features
+## v1.0.1
+- Fix crash in `chunk_episodes` of `utils.py` for multi time zone data
+- Fix crash in BT Doryab provider when the number of clusters is 2
+- Fix Fitbit multi time zone inference from phone data (simplify)
+- Fix missing columns when the input for phone data yield is empty
+- Fix wrong date time labels for event segments for multi time zone data (all labels are computed based on a single tz)
+- Fix periodic segment crash when there are no segments to assign (only affects wday, mday, qday, or yday) 
+- Fix crash in Analysis Workflow with new suffix in segments' labels
+## v1.0.0
+- Add a new [Overview](../setup/overview/) page.
+- You can [extend](../datastreams/add-new-data-streams/) RAPIDS with your own [data streams](../datastreams/data-streams-introduction/). Data streams are data collected with other sensing apps besides AWARE (like Beiwe, mindLAMP), and stored in other data containers (databases, files) besides MySQL.
+- Support to analyze Empatica wearable data (thanks to Joe Kim and  Brinnae Bent from the [DBDP](https://dbdp.org/))
+- Support to analyze AWARE data stored in [CSV files](../datastreams/aware-csv/) and [InfluxDB](../datastreams/aware-influxdb/) databases
+- Support to analyze data collected over [multiple time zones](../setup/configuration/#multiple-timezones)
+- Support for [sleep intraday features](../features/fitbit-sleep-intraday/) from the core team and also from the community (thanks to Stephen Price)
+- Users can comment on the documentation (powered by utterances).
+- `SCR_SCRIPT` and `SRC_LANGUAGE` are replaced by `SRC_SCRIPT`.
+- Add RAPIDS new logo
+- Move Citation and Minimal Example page to the Setup section
+- Add `config.yaml` validation schema and documentation. Now it's more difficult to modify the `config.yaml` file with invalid values.
+- Add new `time at home` Doryab location feature
+- Add and home coordinates to the location data file so location providers can build features based on it.
+- If you are migrating from RAPIDS 0.4.3 or older, check this [guide](../migrating-from-old-versions/#migrating-from-rapids-04x-or-older)
+## v0.4.3
+- Fix bug when any of the rows from any sensor do not belong a time segment
+## v0.4.2
+- Update battery testing
+- Fix location processing bug when certain columns don't exist
+- Fix HR intraday bug when minutesonZONE features were 0 
+- Update FAQs
+- Fix HR summary bug when restinghr=0 (ignore those rows)
+- Fix ROG, location entropy and normalized entropy in Doryab location provider
+- Remove sampling frequency dependance in Doryab location provider
+- Update documentation of Doryab location provider
+- Add new `FITBIT_DATA_YIELD` `RAPIDS` provider
+- Deprecate Doryab circadian movement feature until it is fixed
+## v0.4.1
+- Fix bug when no error message was displayed for an empty `[PHONE_DATA_YIELD][SENSORS]` when resampling location data
+## v0.4.0
+- Add four new phone sensors that can be used for PHONE_DATA_YIELD
+- Add code so new feature providers can be added for the new four sensors
+- Add new clustering algorithm (OPTICS) for Doryab features
+- Update default EPS parameter for Doryab location clustering
+- Add clearer error message for invalid phone data yield sensors
+- Add ALL_RESAMPLED flag and accuracy limit for location features
+- Add FAQ about null characters in phone tables
+- Reactivate light and wifi tests and update testing docs
+- Fix bug when parsing Fitbit steps data
+- Fix bugs when merging features from empty time segments
+- Fix minor issues in the documentation
+## v0.3.2
+- Update docker and linux instructions to use RSPM binary repo for for faster installation
+- Update CI to create a release on a tagged push that passes the tests
+- Clarify in DB credential configuration that we only support MySQL
+- Add Windows installation instructions
+- Fix bugs in the create_participants_file script
+- Fix bugs in Fitbit data parsing.
+- Fixed Doryab location features context of clustering.
+- Fixed the wrong shifting while calculating distance in Doryab location features.
+- Refactored the haversine function
+## v0.3.1
+- Update installation docs for RAPIDS' docker container
+- Fix example analysis use of accelerometer data in a plot
+- Update FAQ
+- Update minimal example documentation
+- Minor doc updates
+## v0.3.0
+- Update R and Python virtual environments
+- Add GH actions CI support for tests and docker
+- Add release and test badges to README
+## v0.2.6
+- Fix old versions banner on nested pages
+## v0.2.5
+- Fix docs deploy typo
+## v0.2.4
+- Fix broken links in landing page and docs deploy
+## v0.2.3
+- Fix participant IDS in the example analysis workflow
+## v0.2.2
+- Fix readme link to docs
+## v0.2.1
+- FIx link to the most recent version in the old version banner
+
+## v0.2.0
+- Add new `PHONE_BLUETOOTH` `DORYAB` provider
+- Deprecate `PHONE_BLUETOOTH` `RAPIDS` provider
+- Fix bug in `filter_data_by_segment` for Python when dataset was empty
+- Minor doc updates
+- New FAQ item
+
+## v0.1.0
+- New and more consistent docs (this website). The [previous docs](https://rapidspitt.readthedocs.io/en/latest/) are marked as beta 
+- Consolidate [configuration](../setup/configuration) instructions
+- Flexible [time segments](../setup/configuration#time-segments)
+- Simplify Fitbit behavioral feature extraction and [documentation](../features/fitbit-heartrate-summary)
+- Sensor's configuration and output is more consistent
+- Update [visualizations](../visualizations/data-quality-visualizations) to handle flexible day segments
+- Create a RAPIDS [execution](../setup/execution) script that allows re-computation of the pipeline after configuration changes
+- Add [citation](../citation) guide
+- Update [virtual environment](../developers/virtual-environments) guide
+- Update analysis workflow [example](../workflow-examples/analysis)
+- Add a [Code of Conduct](../code_of_conduct)
+- Update [Team](../team) page
--- a/docs/citation.md
+++ b/docs/citation.md
@ -0,0 +1,63 @@
+# Cite RAPIDS and providers
+
+!!! done "RAPIDS and the community"
+    RAPIDS is a community effort and as such we want to continue recognizing the contributions from other researchers. Besides citing RAPIDS, we ask you to cite any of the authors listed below if you used those sensor providers in your analysis, thank you!
+
+## RAPIDS
+
+If you used RAPIDS, please cite [this paper](https://www.frontiersin.org/article/10.3389/fdgth.2021.769823).
+
+!!! cite "RAPIDS et al. citation"
+    Vega, J., Li, M., Aguillera, K., Goel, N., Joshi, E., Khandekar, K., ... & Low, C. A. (2021). Reproducible Analysis Pipeline for Data Streams (RAPIDS): Open-Source Software to Process Data Collected with Mobile Devices. Frontiers in Digital Health, 168.
+
+## DBDP (all Empatica sensors)
+
+If you computed features using the provider  `[DBDP]` of any of the Empatica sensors (accelerometer, heart rate, temperature, EDA, BVP, IBI, tags) cite [this paper](https://www.cambridge.org/core/journals/journal-of-clinical-and-translational-science/article/digital-biomarker-discovery-pipeline-an-open-source-software-platform-for-the-development-of-digital-biomarkers-using-mhealth-and-wearables-data/A6696CEF138247077B470F4800090E63) in addition to RAPIDS.
+
+!!! cite "Bent et al. citation"
+    Bent, B., Wang, K., Grzesiak, E., Jiang, C., Qi, Y., Jiang, Y., Cho, P., Zingler, K., Ogbeide, F.I., Zhao, A., Runge, R., Sim, I., Dunn, J. (2020). The Digital Biomarker Discovery Pipeline: An open source software platform for the development of digital biomarkers using mHealth and wearables data. Journal of Clinical and Translational Science, 1-28. doi:10.1017/cts.2020.511
+
+
+## Panda (accelerometer)
+
+If you computed accelerometer features using the provider  `[PHONE_ACCLEROMETER][PANDA]` cite [this paper](https://pubmed.ncbi.nlm.nih.gov/31657854/) in addition to RAPIDS.
+
+!!! cite "Panda et al. citation"
+    Panda N, Solsky I, Huang EJ, Lipsitz S, Pradarelli JC, Delisle M, Cusack JC, Gadd MA, Lubitz CC, Mullen JT, Qadan M, Smith BL, Specht M, Stephen AE, Tanabe KK, Gawande AA, Onnela JP, Haynes AB. Using Smartphones to Capture Novel Recovery Metrics After Cancer Surgery. JAMA Surg. 2020 Feb 1;155(2):123-129. doi: 10.1001/jamasurg.2019.4702. PMID: 31657854; PMCID: PMC6820047.
+
+## Stachl (applications foreground)
+
+If you computed applications foreground features using the app category (genre) catalogue in  `[PHONE_APPLICATIONS_FOREGROUND][RAPIDS]` cite [this paper](https://www.pnas.org/content/117/30/17680) in addition to RAPIDS.
+
+!!! cite "Stachl et al. citation"
+    Clemens Stachl, Quay Au, Ramona Schoedel, Samuel D. Gosling, Gabriella M. Harari, Daniel Buschek, Sarah Theres Völkel, Tobias Schuwerk, Michelle Oldemeier, Theresa Ullmann, Heinrich Hussmann, Bernd Bischl, Markus Bühner. Proceedings of the National Academy of Sciences Jul 2020, 117 (30) 17680-17687; DOI: 10.1073/pnas.1920484117 
+
+## Doryab (bluetooth)
+
+If you computed bluetooth features using the provider `[PHONE_BLUETOOTH][DORYAB]` cite [this paper](https://arxiv.org/abs/1812.10394) in addition to RAPIDS.
+
+!!! cite "Doryab et al. citation"
+    Doryab, A., Chikarsel, P., Liu, X., & Dey, A. K. (2019). Extraction of Behavioral Features from Smartphone and Wearable Data. ArXiv:1812.10394 [Cs, Stat]. http://arxiv.org/abs/1812.10394
+
+## Barnett (locations)
+
+If you computed locations features using the provider `[PHONE_LOCATIONS][BARNETT]` cite [this paper](https://doi.org/10.1093/biostatistics/kxy059) and [this paper](https://doi.org/10.1145/2750858.2805845) in addition to RAPIDS.
+
+!!! cite "Barnett et al. citation"
+    Ian Barnett, Jukka-Pekka Onnela, Inferring mobility measures from GPS traces with missing data, Biostatistics, Volume 21, Issue 2, April 2020, Pages e98–e112, https://doi.org/10.1093/biostatistics/kxy059
+
+!!! cite "Canzian et al. citation"
+    Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp '15). Association for Computing Machinery, New York, NY, USA, 1293–1304. DOI:https://doi.org/10.1145/2750858.2805845
+
+## Doryab (locations)
+
+If you computed locations features using the provider `[PHONE_LOCATIONS][DORYAB]` cite [this paper](https://arxiv.org/abs/1812.10394) and [this paper](https://doi.org/10.1145/2750858.2805845) in addition to RAPIDS. In addition, if you used the `SUN_LI_VEGA_STRATEGY` strategy, cite [this paper](https://www.jmir.org/2020/9/e19992/) as well.
+
+!!! cite "Doryab et al. citation"
+    Doryab, A., Chikarsel, P., Liu, X., & Dey, A. K. (2019). Extraction of Behavioral Features from Smartphone and Wearable Data. ArXiv:1812.10394 [Cs, Stat]. http://arxiv.org/abs/1812.10394
+
+!!! cite "Canzian et al. citation"
+    Luca Canzian and Mirco Musolesi. 2015. Trajectories of depression: unobtrusive monitoring of depressive states by means of smartphone mobility traces analysis. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp '15). Association for Computing Machinery, New York, NY, USA, 1293–1304. DOI:https://doi.org/10.1145/2750858.2805845
+
+!!! cite "Sun et al. citation"
+    Sun S, Folarin AA, Ranjan Y, Rashid Z, Conde P, Stewart C, Cummins N, Matcham F, Dalla Costa G, Simblett S, Leocani L, Lamers F, Sørensen PS, Buron M, Zabalza A, Guerrero Pérez AI, Penninx BW, Siddi S, Haro JM, Myin-Germeys I, Rintala A, Wykes T, Narayan VA, Comi G, Hotopf M, Dobson RJ, RADAR-CNS Consortium. Using Smartphones and Wearable Devices to Monitor Behavioral Changes During COVID-19. J Med Internet Res 2020;22(9):e19992
--- a/docs/code_of_conduct.md
+++ b/docs/code_of_conduct.md
@ -0,0 +1,134 @@
+
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone, regardless of age, body
+size, visible or invisible disability, ethnicity, sex characteristics, gender
+identity and expression, level of experience, education, socio-economic status,
+nationality, personal appearance, race, religion, or sexual identity
+and orientation.
+
+We pledge to act and interact in ways that contribute to an open, welcoming,
+diverse, inclusive, and healthy community.
+
+## Our Standards
+
+Examples of behavior that contributes to a positive environment for our
+community include:
+
+* Demonstrating empathy and kindness toward other people
+* Being respectful of differing opinions, viewpoints, and experiences
+* Giving and gracefully accepting constructive feedback
+* Accepting responsibility and apologizing to those affected by our mistakes,
+  and learning from the experience
+* Focusing on what is best not just for us as individuals, but for the
+  overall community
+
+Examples of unacceptable behavior include:
+
+* The use of sexualized language or imagery, and sexual attention or
+  advances of any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or email
+  address, without their explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Enforcement Responsibilities
+
+Community leaders are responsible for clarifying and enforcing our standards of
+acceptable behavior and will take appropriate and fair corrective action in
+response to any behavior that they deem inappropriate, threatening, offensive,
+or harmful.
+
+Community leaders have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are
+not aligned to this Code of Conduct, and will communicate reasons for moderation
+decisions when appropriate.
+
+## Scope
+
+This Code of Conduct applies within all community spaces, and also applies when
+an individual is officially representing the community in public spaces.
+Examples of representing our community include using an official e-mail address,
+posting via an official social media account, or acting as an appointed
+representative at an online or offline event.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the community leaders responsible for enforcement at
+moshi@pitt.edu.
+All complaints will be reviewed and investigated promptly and fairly.
+
+All community leaders are obligated to respect the privacy and security of the
+reporter of any incident.
+
+## Enforcement Guidelines
+
+Community leaders will follow these Community Impact Guidelines in determining
+the consequences for any action they deem in violation of this Code of Conduct:
+
+### 1. Correction
+
+**Community Impact**: Use of inappropriate language or other behavior deemed
+unprofessional or unwelcome in the community.
+
+**Consequence**: A private, written warning from community leaders, providing
+clarity around the nature of the violation and an explanation of why the
+behavior was inappropriate. A public apology may be requested.
+
+### 2. Warning
+
+**Community Impact**: A violation through a single incident or series
+of actions.
+
+**Consequence**: A warning with consequences for continued behavior. No
+interaction with the people involved, including unsolicited interaction with
+those enforcing the Code of Conduct, for a specified period of time. This
+includes avoiding interactions in community spaces as well as external channels
+like social media. Violating these terms may lead to a temporary or
+permanent ban.
+
+### 3. Temporary Ban
+
+**Community Impact**: A serious violation of community standards, including
+sustained inappropriate behavior.
+
+**Consequence**: A temporary ban from any sort of interaction or public
+communication with the community for a specified period of time. No public or
+private interaction with the people involved, including unsolicited interaction
+with those enforcing the Code of Conduct, is allowed during this period.
+Violating these terms may lead to a permanent ban.
+
+### 4. Permanent Ban
+
+**Community Impact**: Demonstrating a pattern of violation of community
+standards, including sustained inappropriate behavior,  harassment of an
+individual, or aggression toward or disparagement of classes of individuals.
+
+**Consequence**: A permanent ban from any sort of public interaction within
+the community.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 2.0, available at
+[https://www.contributor-covenant.org/version/2/0/code_of_conduct.html][v2.0].
+
+Community Impact Guidelines were inspired by 
+[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
+
+For answers to common questions about this code of conduct, see the FAQ at
+[https://www.contributor-covenant.org/faq][FAQ]. Translations are available 
+at [https://www.contributor-covenant.org/translations][translations].
+
+[homepage]: https://www.contributor-covenant.org
+[v2.0]: https://www.contributor-covenant.org/version/2/0/code_of_conduct.html
+[Mozilla CoC]: https://github.com/mozilla/diversity
+[FAQ]: https://www.contributor-covenant.org/faq
+[translations]: https://www.contributor-covenant.org/translations
+
--- a/docs/common-errors.md
+++ b/docs/common-errors.md
@ -0,0 +1,263 @@
+# Common Errors
+
+##  Cannot connect to your MySQL server
+
+???+ failure "Problem"
+    ```bash
+    **Error in .local(drv, \...) :** **Failed to connect to database: Error:
+    Can\'t initialize character set unknown (path: compiled\_in)** :
+
+    Calls: dbConnect -> dbConnect -> .local -> .Call
+    Execution halted
+    [Tue Mar 10 19:40:15 2020]
+    Error in rule download_dataset:
+        jobid: 531
+        output: data/raw/p60/locations_raw.csv
+
+    RuleException:
+    CalledProcessError in line 20 of /home/ubuntu/rapids/rules/preprocessing.snakefile:
+    Command 'set -euo pipefail;  Rscript --vanilla /home/ubuntu/rapids/.snakemake/scripts/tmp_2jnvqs7.download_dataset.R' returned non-zero exit status 1.
+    File "/home/ubuntu/rapids/rules/preprocessing.snakefile", line 20, in __rule_download_dataset
+    File "/home/ubuntu/anaconda3/envs/moshi-env/lib/python3.7/concurrent/futures/thread.py", line 57, in run
+    Shutting down, this might take some time.
+    Exiting because a job execution failed. Look above for error message
+    ```
+
+???+ done "Solution"
+    Please make sure the `DATABASE_GROUP` in `config.yaml` matches your DB credentials group in `.env`.
+
+---
+
+## Cannot start mysql in linux via `brew services start mysql`
+
+???+ failure "Problem"
+    Cannot start mysql in linux via `brew services start mysql`
+
+???+ done "Solution"
+    Use `mysql.server start`
+
+---
+
+## Every time I run force the download_dataset rule all rules are executed
+
+???+ failure "Problem"
+    When running `snakemake -j1 -R pull_phone_data` or `./rapids -j1 -R pull_phone_data` all the rules and files are re-computed
+
+???+ done "Solution"
+    This is expected behavior. The advantage of using `snakemake` under the hood is that every time a file containing data is modified every rule that depends on that file will be re-executed to update their results. In this case, since `download_dataset` updates all the raw data, and you are forcing the rule with the flag `-R` every single rule that depends on those raw files will be executed.
+---
+
+## Error `Table XXX doesn't exist` while running the `download_phone_data` or `download_fitbit_data` rule.
+
+???+ failure "Problem"
+    ```bash
+    Error in .local(conn, statement, ...) : 
+      could not run statement: Table 'db_name.table_name' doesn't exist
+    Calls: colnames ... .local -> dbSendQuery -> dbSendQuery -> .local -> .Call
+    Execution halted
+    ```
+
+???+ done "Solution"
+    Please make sure the sensors listed in `[PHONE_VALID_SENSED_BINS][PHONE_SENSORS]` and the `[CONTAINER]` of each sensor you activated in `config.yaml`  match your database tables or files.
+
+---
+## How do I install RAPIDS on Ubuntu 16.04
+
+???+ done "Solution"
+    1.  Install dependencies (Homebrew - if not installed):
+        -   `sudo apt-get install libmariadb-client-lgpl-dev libxml2-dev libssl-dev`
+        -   Install [brew](https://docs.brew.sh/Homebrew-on-Linux) for linux and add the following line to `~/.bashrc`: `export PATH=$HOME/.linuxbrew/bin:$PATH`
+        -   `source ~/.bashrc`
+
+    1.  Install MySQL
+        -   `brew install mysql`
+        -   `brew services start mysql`
+
+    2.  Install R, pandoc and rmarkdown:
+        -   `brew install r`
+        -   `brew install gcc@6` (needed due to this [bug](https://github.com/Homebrew/linuxbrew-core/issues/17812))
+        -   `HOMEBREW_CC=gcc-6 brew install pandoc`
+
+    3.  Install miniconda using these [instructions](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html)
+
+    4.  Clone our repo:
+        -   `git clone https://github.com/carissalow/rapids`
+
+    5.  Create a python virtual environment:
+        -   `cd rapids`
+        -   `conda env create -f environment.yml -n MY_ENV_NAME`
+        -   `conda activate MY_ENV_NAME`
+
+    6.  Install R packages and virtual environment:
+        -   `snakemake renv_install`
+        -   `snakemake renv_init`
+        -   `snakemake renv_restore`
+
+        This step could take several minutes to complete. Please be patient and let it run until completion.
+---
+
+## `mysql.h` cannot be found
+
+???+ failure "Problem"
+    ```bash
+    --------------------------[ ERROR MESSAGE ]----------------------------
+    <stdin>:1:10: fatal error: mysql.h: No such file or directory
+    compilation terminated.
+    -----------------------------------------------------------------------
+    ERROR: configuration failed for package 'RMySQL'
+    ```
+
+???+ done "Solution"
+    ```bash
+    sudo apt install libmariadbclient-dev
+    ```
+
+---
+## No package `libcurl` found
+
+???+ failure "Problem"
+    `libcurl` cannot be found
+
+???+ done "Solution"
+    Install `libcurl`
+    ```bash
+    sudo apt install libcurl4-openssl-dev
+    ```
+
+---
+## Configuration failed because `openssl` was not found.
+
+???+ failure "Problem"
+    `openssl` cannot be found
+
+???+ done "Solution"
+    Install `openssl`
+    ```bash
+    sudo apt install libssl-dev
+    ```
+---
+## Configuration failed because `libxml-2.0` was not found
+
+???+ failure "Problem"
+    `libxml-2.0` cannot be found
+
+???+ done "Solution"
+    Install `libxml-2.0`
+    ```bash
+    sudo apt install libxml2-dev
+    ```
+
+---
+## SSL connection error when running RAPIDS
+
+???+ failure "Problem"
+    You are getting the following error message when running RAPIDS:
+    ```bash
+    Error: Failed to connect: SSL connection error: error:1425F102:SSL routines:ssl_choose_client_version:unsupported protocol.
+    ```
+
+???+ done "Solution"
+    This is a bug in Ubuntu 20.04 when trying to connect to an old MySQL server with MySQL client 8.0. You should get the same error message if you try to connect from the command line. There you can add the option `--ssl-mode=DISABLED` but we can\'t do this from the R connector.
+
+    If you can\'t update your server, the quickest solution would be to import your database to another server or to a local environment. Alternatively, you could replace `mysql-client` and `libmysqlclient-dev` with `mariadb-client` and `libmariadbclient-dev` and reinstall renv. More info about this issue [here](https://bugs.launchpad.net/ubuntu/+source/mysql-8.0/+bug/1872541)
+
+---
+## `DB_TABLES` key not found
+
+???+ failure "Problem"
+    If you get the following error `KeyError in line 43 of preprocessing.smk: 'PHONE_SENSORS'`, it means that the indentation of the key `[PHONE_SENSORS]` is not matching the other child elements of `PHONE_VALID_SENSED_BINS`
+    
+???+ done "Solution"
+    You need to add or remove any leading whitespaces as needed on that line.
+
+    ```yaml
+    PHONE_VALID_SENSED_BINS:
+        COMPUTE: False # This flag is automatically ignored (set to True) if you are extracting PHONE_VALID_SENSED_DAYS or screen or Barnett's location features
+        BIN_SIZE: &bin_size 5 # (in minutes)
+        PHONE_SENSORS: []
+    ```
+
+---
+## Error while updating your conda environment in Ubuntu
+
+???+ failure "Problem"
+    You get the following error:
+    ```bash
+    CondaMultiError: CondaVerificationError: The package for tk located at /home/ubuntu/miniconda2/pkgs/tk-8.6.9-hed695b0_1003
+        appears to be corrupted. The path 'include/mysqlStubs.h'
+        specified in the package manifest cannot be found.
+    ClobberError: This transaction has incompatible packages due to a shared path.
+        packages: conda-forge/linux-64::llvm-openmp-10.0.0-hc9558a2_0, anaconda/linux-64::intel-openmp-2019.4-243
+        path: 'lib/libiomp5.so'
+    ```
+
+???+ done "Solution"
+    Reinstall conda
+
+## Embedded nul in string
+
+???+ failure "Problem"
+    You get the following error when downloading sensor data:
+    ```bash
+    Error in result_fetch(res@ptr, n = n) : 
+      embedded nul in string:
+    ```
+
+???+ done "Solution"
+    This problem is due to the way `RMariaDB` handles a mismatch between data types in R and MySQL (see [this issue](https://github.com/r-dbi/RMariaDB/issues/121)). Since it seems this problem won't be handled by `RMariaDB`, you have two options:
+    
+    1. Remove the the null character from the conflictive table cell(s). You can adapt the following query on a MySQL server 8.0 or older
+        ```sql
+        update YOUR_TABLE set YOUR_COLUMN = regexp_replace(YOUR_COLUMN, '\0', '');
+        ```
+    2. If it's not feasible to modify your data you can try swapping `RMariaDB` with `RMySQL`. Just have in mind you might have problems connecting to modern MySQL servers running in Linux:
+        - Add `RMySQL` to the renv environment by running the following command in a terminal open on RAPIDS root folder
+        ```bash
+        R -e 'renv::install("RMySQL")'
+        ```
+        - Go to `src/data/streams/pull_phone_data.R` or `src/data/streams/pull_fitbit_data.R` and replace `library(RMariaDB)` with `library(RMySQL)`
+        - In the same file(s) replace `dbEngine <- dbConnect(MariaDB(), default.file = "./.env", group = group)` with `dbEngine <- dbConnect(MySQL(), default.file = "./.env", group = group)`
+## There is no package called `RMariaDB`
+
+???+ failure "Problem"
+    You get the following error when executing RAPIDS:
+    ```bash
+    Error in library(RMariaDB) : there is no package called 'RMariaDB'
+    Execution halted
+    ```
+
+???+ done "Solution"
+    In RAPIDS v0.1.0 we replaced `RMySQL` R package with `RMariaDB`, this error means your R virtual environment is out of date, to update it run `snakemake -j1 renv_restore`
+    
+## Unrecognized output timezone "America/New_York"
+
+???+ failure "Problem"
+    When running RAPIDS with R 4.0.3 on MacOS on M1, lubridate may throw an error associated with the timezone.
+    ```bash
+    Error in C_force_tz(time, tz = tzone, roll):
+       CCTZ: Unrecognized output timezone: "America/New_York"
+    Calls: get_timestamp_filter ... .parse_date_time -> .strptime -> force_tz -> C_force_tz
+    ```
+???+ done "Solution"
+   This is because R timezone library is not set. Please add `Sys.setenv(“TZDIR” = file.path(R.home(), “share”, “zoneinfo”))` to the file active.R in renv folder to set the timezone library. For further details on how to test if `TZDIR` is properly set, please refer to `https://github.com/tidyverse/lubridate/issues/928#issuecomment-720059233`. 
+   
+## Unimplemented MAX_NO_FIELD_TYPES
+
+???+ failure "Problem"
+    You get the following error when downloading Fitbit data:
+    ```bash
+    Error: Unimplemented MAX_NO_FIELD_TYPES
+    Execution halted
+    ```
+???+ done "Solution"
+    At the moment RMariaDB [cannot handle](https://github.com/r-dbi/RMariaDB/issues/127) MySQL columns of JSON type. Change the type of your Fitbit data column to `longtext` (note that the content will not change and will still be a JSON object just interpreted as a string).
+    
+## Running RAPIDS on Apple Silicon M1 Mac
+
+???+ failure "Problem"
+     You get the following error when installing pandoc or running rapids:
+     ```bash
+     MoSHI/rapids/renv/staging/1/00LOCK-KernSmooth/00new/KernSmooth/libs/KernSmooth.so: mach-0, but wrong architecture
+     ```
+???+ done "Solution"
+    As of Feb 2020 in M1 macs, R needs to be installed via brew under Rosetta (x86 arch) due to some incompatibility with selected R libraries. To do this, run your terminal [via Rosetta](https://www.youtube.com/watch?v=nv2ylxro7rM&t=138s), then proceed with the usual brew installation command. x86 homebrew should be installed in `/usr/local/bin/brew `, you can check which brew you are using by typing `which brew`. Then use x86 homebrew to install R and restore RAPIDS packages (`renv_restore`). 
--- a/docs/conf.py
+++ b/docs/conf.py
@ -1,244 +0,0 @@
-# -*- coding: utf-8 -*-
-#
-# RAPIDS documentation build configuration file, created by
-# sphinx-quickstart.
-#
-# This file is execfile()d with the current directory set to its containing dir.
-#
-# Note that not all possible configuration values are present in this
-# autogenerated file.
-#
-# All configuration values have a default; values that are commented out
-# serve to show the default.
-
-import os
-import sys
-
-# If extensions (or modules to document with autodoc) are in another directory,
-# add these directories to sys.path here. If the directory is relative to the
-# documentation root, use os.path.abspath to make it absolute, like shown here.
-# sys.path.insert(0, os.path.abspath('.'))
-
-# -- General configuration -----------------------------------------------------
-
-# If your documentation needs a minimal Sphinx version, state it here.
-# needs_sphinx = '1.0'
-
-# Add any Sphinx extension module names here, as strings. They can be extensions
-# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
-extensions = []
-
-# Add any paths that contain templates here, relative to this directory.
-templates_path = ['_templates']
-
-# The suffix of source filenames.
-source_suffix = '.rst'
-
-# The encoding of source files.
-# source_encoding = 'utf-8-sig'
-
-# The master toctree document.
-master_doc = 'index'
-
-# General information about the project.
-project = u'RAPIDS'
-
-# The version info for the project you're documenting, acts as replacement for
-# |version| and |release|, also used in various other places throughout the
-# built documents.
-#
-# The short X.Y version.
-version = '0.1'
-# The full version, including alpha/beta/rc tags.
-release = '0.1'
-
-# The language for content autogenerated by Sphinx. Refer to documentation
-# for a list of supported languages.
-# language = None
-
-# There are two options for replacing |today|: either, you set today to some
-# non-false value, then it is used:
-# today = ''
-# Else, today_fmt is used as the format for a strftime call.
-# today_fmt = '%B %d, %Y'
-
-# List of patterns, relative to source directory, that match files and
-# directories to ignore when looking for source files.
-exclude_patterns = ['_build']
-
-# The reST default role (used for this markup: `text`) to use for all documents.
-# default_role = None
-
-# If true, '()' will be appended to :func: etc. cross-reference text.
-# add_function_parentheses = True
-
-# If true, the current module name will be prepended to all description
-# unit titles (such as .. function::).
-# add_module_names = True
-
-# If true, sectionauthor and moduleauthor directives will be shown in the
-# output. They are ignored by default.
-# show_authors = False
-
-# The name of the Pygments (syntax highlighting) style to use.
-pygments_style = 'sphinx'
-
-# A list of ignored prefixes for module index sorting.
-# modindex_common_prefix = []
-
-
-# -- Options for HTML output ---------------------------------------------------
-
-# The theme to use for HTML and HTML Help pages.  See the documentation for
-# a list of builtin themes.
-html_theme = 'sphinx_rtd_theme'
-
-# Theme options are theme-specific and customize the look and feel of a theme
-# further.  For a list of options available for each theme, see the
-# documentation.
-# html_theme_options = {}
-
-# Add any paths that contain custom themes here, relative to this directory.
-# html_theme_path = []
-
-# The name for this set of Sphinx documents.  If None, it defaults to
-# "<project> v<release> documentation".
-# html_title = None
-
-# A shorter title for the navigation bar.  Default is the same as html_title.
-# html_short_title = None
-
-# The name of an image file (relative to this directory) to place at the top
-# of the sidebar.
-# html_logo = None
-
-# The name of an image file (within the static path) to use as favicon of the
-# docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
-# pixels large.
-# html_favicon = None
-
-# Add any paths that contain custom static files (such as style sheets) here,
-# relative to this directory. They are copied after the builtin static files,
-# so a file named "default.css" will overwrite the builtin "default.css".
-html_static_path = ['_static']
-
-# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
-# using the given strftime format.
-# html_last_updated_fmt = '%b %d, %Y'
-
-# If true, SmartyPants will be used to convert quotes and dashes to
-# typographically correct entities.
-# html_use_smartypants = True
-
-# Custom sidebar templates, maps document names to template names.
-# html_sidebars = {}
-
-# Additional templates that should be rendered to pages, maps page names to
-# template names.
-# html_additional_pages = {}
-
-# If false, no module index is generated.
-# html_domain_indices = True
-
-# If false, no index is generated.
-# html_use_index = True
-
-# If true, the index is split into individual pages for each letter.
-# html_split_index = False
-
-# If true, links to the reST sources are added to the pages.
-# html_show_sourcelink = True
-
-# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
-# html_show_sphinx = True
-
-# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
-# html_show_copyright = True
-
-# If true, an OpenSearch description file will be output, and all pages will
-# contain a <link> tag referring to it.  The value of this option must be the
-# base URL from which the finished HTML is served.
-# html_use_opensearch = ''
-
-# This is the file name suffix for HTML files (e.g. ".xhtml").
-# html_file_suffix = None
-
-# Output file base name for HTML help builder.
-htmlhelp_basename = 'rapidsdoc'
-
-
-# -- Options for LaTeX output --------------------------------------------------
-
-latex_elements = {
-    # The paper size ('letterpaper' or 'a4paper').
-    # 'papersize': 'letterpaper',
-
-    # The font size ('10pt', '11pt' or '12pt').
-    # 'pointsize': '10pt',
-
-    # Additional stuff for the LaTeX preamble.
-    # 'preamble': '',
-}
-
-# Grouping the document tree into LaTeX files. List of tuples
-# (source start file, target name, title, author, documentclass [howto/manual]).
-latex_documents = [
-    ('index',
-     'rapids.tex',
-     u'RAPIDS Documentation',
-     u"RAPIDS", 'manual'),
-]
-
-# The name of an image file (relative to this directory) to place at the top of
-# the title page.
-# latex_logo = None
-
-# For "manual" documents, if this is true, then toplevel headings are parts,
-# not chapters.
-# latex_use_parts = False
-
-# If true, show page references after internal links.
-# latex_show_pagerefs = False
-
-# If true, show URL addresses after external links.
-# latex_show_urls = False
-
-# Documents to append as an appendix to all manuals.
-# latex_appendices = []
-
-# If false, no module index is generated.
-# latex_domain_indices = True
-
-
-# -- Options for manual page output --------------------------------------------
-
-# One entry per manual page. List of tuples
-# (source start file, name, description, authors, manual section).
-man_pages = [
-    ('index', 'RAPIDS', u'RAPIDS Documentation',
-     [u"RAPIDS"], 1)
-]
-
-# If true, show URL addresses after external links.
-# man_show_urls = False
-
-
-# -- Options for Texinfo output ------------------------------------------------
-
-# Grouping the document tree into Texinfo files. List of tuples
-# (source start file, target name, title, author,
-#  dir menu entry, description, category)
-texinfo_documents = [
-    ('index', 'RAPIDS', u'RAPIDS Documentation',
-     u"RAPIDS", 'RAPIDS',
-     'Reproducible Analysis Pipeline for Data Streams', 'Miscellaneous'),
-]
-
-# Documents to append as an appendix to all manuals.
-# texinfo_appendices = []
-
-# If false, no module index is generated.
-# texinfo_domain_indices = True
-
-# How to display URL addresses: 'footnote', 'no', or 'inline'.
-# texinfo_show_urls = 'footnote'
--- a/docs/contributing.md
+++ b/docs/contributing.md
@ -0,0 +1,56 @@
+# Contributing
+
+Thank you for taking the time to contribute! 
+
+All changes, small or big, are welcome, and regardless of who you are, we are always happy to work together to make your contribution as strong as possible. We follow the [Covenant Code of Conduct](../code_of_conduct), so we ask you to uphold it. Be kind to everyone in the community, and please report unacceptable behavior to moshiresearch@gmail.com.
+
+## Questions, Feature Requests, and Discussions
+
+Post any questions, feature requests, or discussions in our [GitHub Discussions tab](https://github.com/carissalow/rapids/discussions).
+
+## Bug Reports
+
+Report any bugs in our [GithHub issue tracker](https://github.com/carissalow/rapids/issues) keeping in mind to:
+
+- Debug and simplify the problem to create a minimal example. For example, reduce the problem to a single participant, sensor, and a few rows of data.
+- Provide a clear and succinct description of the problem (expected behavior vs. actual behavior).
+- Attach your `config.yaml`, time segments file, and time zones file if appropriate.
+- Attach test data if possible and any screenshots or extra resources that will help us debug the problem.
+- Share the commit you are running: `git rev-parse --short HEAD`
+- Share your OS version (e.g., Windows 10)
+- Share the device/sensor you are processing (e.g., phone accelerometer)
+
+## Documentation Contributions
+
+If you want to fix a typo or any other minor changes, you can edit the file online by clicking on the pencil icon at the top right of any page and opening a pull request using [Github's website](https://docs.github.com/en/github/managing-files-in-a-repository/editing-files-in-your-repository)
+
+If your changes are more complex, clone RAPIDS' repository, setup the dev environment for our documentation with this [tutorial](../developers/documentation), and submit any changes on a new *feature branch* following our [git flow](../developers/git-flow).
+
+## Code Contributions
+
+!!! hint "Hints for any code changes"
+    - To submit any new code, use a new *feature branch* following our [git flow](../developers/git-flow).
+    - If you neeed a new Python or R package in RAPIDS' virtual environments, follow this [tutorial](../developers/virtual-environments/)
+    - If you need to change the `config.yaml` you will need to update its validation schema with this [tutorial](../developers/validation-schema-config/)
+
+### New Data Streams
+
+*New data containers.* If you want to process data from a device RAPIDS supports ([see this table](../datastreams/data-streams-introduction/)) but it's stored in a database engine or file type we don't support yet, [implement a new data stream container and format](../datastreams/add-new-data-streams/). You can copy and paste the `format.yaml` of one of the other streams of the device you are targeting.
+
+*New sensing apps.* If you want to add support for new smartphone sensing apps like Beiwe, [implement a new data stream container and format](../datastreams/add-new-data-streams/).
+
+*New wearable devices.* If you want to add support for a new wearable, open a [Github discussion](https://github.com/carissalow/rapids/discussions), so we can add the necessary initial configuration files and code.
+
+### New Behavioral Features
+
+If you want to add new [behavioral features](../features/feature-introduction/) for mobile sensors RAPIDS already supports, follow this [tutorial](../features/add-new-features/). A sensor is supported if it has a configuration section in `config.yaml`.
+
+If you want to add new [behavioral features](../features/feature-introduction/) for mobile sensors RAPIDS does not support yet, open a [Github discussion](https://github.com/carissalow/rapids/discussions), so we can add the necessary initial configuration files and code.
+
+### New Tests
+
+If you want to add new tests for existent behavioral features, follow this [tutorial](../developers/testing).
+
+### New Visualizations
+
+Open a [Github discussion](https://github.com/carissalow/rapids/discussions), so we can add the necessary initial configuration files and code.
--- a/docs/datastreams/add-new-data-streams.md
+++ b/docs/datastreams/add-new-data-streams.md
@ -0,0 +1,350 @@
+# Add New Data Streams
+
+A data stream is a set of sensor data collected using a specific type of **device** with a specific **format** and stored in a specific **container**. RAPIDS is agnostic to data streams' formats and container; see the [Data Streams Introduction](../data-streams-introduction) for a list of supported streams.
+
+**A container** is queried with an R or Python script that connects to the database, API or file where your stream's raw data is stored. 
+
+**A format** is described using a `format.yaml` file that specifies how to map and mutate your stream's raw data to match the data and format RAPIDS needs.
+
+The most common cases when you would want to implement a new data stream are:
+
+- You collected data with a mobile sensing app RAPIDS does not support yet. For example, [Beiwe](https://www.beiwe.org/) data stored in MySQL. You will need to define a new format file and a new container script.
+- You collected data with a mobile sensing app RAPIDS supports, but this data is stored in a container that RAPIDS can't connect to yet. For example, AWARE data stored in PostgreSQL. In this case, you can reuse the format file of the `aware_mysql` stream, but you will need to implement a new container script.
+
+!!! hint
+    Both the `container.[R|py]` and the `format.yaml` are stored in `./src/data/streams/[stream_name]` where `[stream_name]` can be `aware_mysql` for example.
+
+## Implement a Container
+
+The `container` script of a data stream can be implemented in R (strongly recommended) or python. This script must have two functions if you are implementing a stream for phone data or one function otherwise. The script can contain other auxiliary functions.
+
+First of all, add any parameters your script might need in `config.yaml` under `(device)_DATA_STREAMS`. These parameters will be available in the `stream_parameters` argument of the one or two functions you implement.  For example, if you are adding support for `Beiwe` data stored in `PostgreSQL` and your container needs a set of credentials to connect to a database, your new data stream configuration would be:
+
+```yaml hl_lines="7 8"
+PHONE_DATA_STREAMS:
+  USE: aware_python
+  
+  # AVAILABLE:
+  aware_mysql: 
+    DATABASE_GROUP: MY_GROUP
+  beiwe_postgresql: 
+    DATABASE_GROUP: MY_GROUP # users define this group (user, password, host, etc.) in credentials.yaml
+```
+
+Then implement one or both of the following functions:
+
+=== "pull_data"
+
+    This function returns the data columns for a specific sensor and participant. It has the following parameters:
+
+    | Param              | Description                                                                                           |   
+    |--------------------|-------------------------------------------------------------------------------------------------------|
+    | stream_parameters | Any parameters (keys/values) set by the user in any `[DEVICE_DATA_STREAMS][stream_name]` key of `config.yaml`. For example, `[DATABASE_GROUP]` inside `[FITBIT_DATA_STREAMS][fitbitjson_mysql]` | 
+    | sensor_container   | The value set by the user in any `[DEVICE_SENSOR][CONTAINER]` key of `config.yaml`. It can be a table, file path, or whatever data source you want to support that contains the **data from a single sensor for all participants**. For example, `[PHONE_ACCELEROMETER][CONTAINER]`|
+    | device             | The device id that you need to get the data for (this is set by the user in the [participant files](../../setup/configuration/#participant-files)). For example, in AWARE this device id is a uuid|
+    | columns            | A list of the columns that you need to get from `sensor_container`. You specify these columns in your stream's `format.yaml`|
+
+
+    !!! example
+        This is the `pull_data` function we implemented for `aware_mysql`. Note that we can `message`, `warn` or `stop` the user during execution.
+
+        ```r
+        pull_data <- function(stream_parameters, device, sensor_container, columns){
+            # get_db_engine is an auxiliary function not shown here for brevity bu can be found in src/data/streams/aware_mysql/container.R
+            dbEngine <- get_db_engine(stream_parameters$DATABASE_GROUP)
+            query <- paste0("SELECT ", paste(columns, collapse = ",")," FROM ", sensor_container, " WHERE device_id = '", device,"'")
+            # Letting the user know what we are doing
+            message(paste0("Executing the following query to download data: ", query)) 
+            sensor_data <- dbGetQuery(dbEngine, query)
+            
+            dbDisconnect(dbEngine)
+            
+            if(nrow(sensor_data) == 0)
+                warning(paste("The device '", device,"' did not have data in ", sensor_container))
+
+            return(sensor_data)
+        }
+        ```
+
+=== "infer_device_os"
+
+    !!! warning
+        This function is only necessary for phone data streams. 
+    
+    RAPIDS allows users to use the keyword `infer` (previously `multiple`) to [automatically infer](../../setup/configuration/#structure-of-participants-files) the mobile Operative System a phone was running. 
+    
+    If you have a way to infer the OS of a device id, implement this function. For example, for AWARE data we use the `aware_device` table.
+ 
+    If you don't have a way to infer the OS, call `stop("Error Message")` so other users know they can't use `infer` or the inference failed, and they have to assign the OS manually in the participant file.
+    
+    This function returns the operative system (`android` or `ios`) for a specific phone device id. It has the following parameters:
+
+    | Param              | Description                                                                                           |   
+    |--------------------|-------------------------------------------------------------------------------------------------------|
+    | stream_parameters | Any parameters (keys/values) set by the user in any `[DEVICE_DATA_STREAMS][stream_name]` key of `config.yaml`. For example, `[DATABASE_GROUP]` inside `[FITBIT_DATA_STREAMS][fitbitjson_mysql]` | 
+    | device             | The device id that you need to infer the OS for (this is set by the user in the [participant files](../../setup/configuration/#participant-files)). For example, in AWARE this device id is a uuid|
+
+
+    !!! example
+        This is the `infer_device_os` function we implemented for `aware_mysql`. Note that we can `message`, `warn` or `stop` the user during execution.
+
+        ```r
+        infer_device_os <- function(stream_parameters, device){
+            # get_db_engine is an auxiliary function not shown here for brevity bu can be found in src/data/streams/aware_mysql/container.R
+            group <- stream_parameters$DATABASE_GROUP
+            
+            dbEngine <- dbConnect(MariaDB(), default.file = "./.env", group = group)
+            query <- paste0("SELECT device_id,brand FROM aware_device WHERE device_id = '", device, "'")
+            message(paste0("Executing the following query to infer phone OS: ", query)) 
+            os <- dbGetQuery(dbEngine, query)
+            dbDisconnect(dbEngine)
+            
+            if(nrow(os) > 0)
+                return(os %>% mutate(os = ifelse(brand == "iPhone", "ios", "android")) %>% pull(os))
+            else
+                stop(paste("We cannot infer the OS of the following device id because it does not exist in the aware_device table:", device))
+            
+            return(os)
+        }
+        ```
+
+## Implement a Format
+
+A format file `format.yaml` describes the mapping between your stream's raw data and the data that RAPIDS needs. This file has a section per sensor (e.g. `PHONE_ACCELEROMETER`), and each section has two attributes (keys):
+
+1. `RAPIDS_COLUMN_MAPPINGS` are mappings between the columns RAPIDS needs and the columns your raw data already has. 
+
+    1. The reserved keyword `FLAG_TO_MUTATE` flags columns that RAPIDS requires but that are not initially present in your container (database, CSV file). These columns have to be created by your mutation scripts.
+
+2. `MUTATION`. Sometimes your raw data needs to be transformed to match the format RAPIDS can handle (including creating columns marked as `FLAG_TO_MUTATE`)
+    
+    2. `COLUMN_MAPPINGS` are mappings between the columns a mutation `SCRIPT` needs and the columns your raw data has.
+
+    2. `SCRIPTS` are a collection of R or Python scripts that transform one or more raw data columns into the format RAPIDS needs.
+
+!!! hint
+    `[RAPIDS_COLUMN_MAPPINGS]` and `[MUTATE][COLUMN_MAPPINGS]` have a `key` (left-hand side string) and a `value` (right-hand side string). The `values` are the names used to pulled columns from a container (e.g., columns in a database table). All `values` are renamed to their `keys` in lower case. The renamed columns are sent to every mutation script within the `data` argument, and the final output is the input RAPIDS process further.
+
+    For example, let's assume we are implementing `beiwe_mysql` and defining the following format for `PHONE_FAKESENSOR`:
+
+    ```yaml
+    PHONE_FAKESENSOR:
+        ANDROID:
+            RAPIDS_COLUMN_MAPPINGS:
+                TIMESTAMP: beiwe_timestamp
+                DEVICE_ID: beiwe_deviceID
+                MAGNITUDE_SQUARED: FLAG_TO_MUTATE
+            MUTATE:
+                COLUMN_MAPPINGS:
+                    MAGNITUDE: beiwe_value
+                SCRIPTS:
+                  - src/data/streams/mutations/phone/square_magnitude.py
+    ```
+
+    RAPIDS will:
+
+    1. Download `beiwe_timestamp`, `beiwe_deviceID`, and `beiwe_value` from the container of `beiwe_mysql` (MySQL DB)
+    2. Rename these columns to `timestamp`, `device_id`, and `magnitude`, respectively.
+    3. Execute `square_magnitude.py` with a data frame as an argument containing the renamed columns. This script will square `magnitude` and rename it to `magnitude_squared`
+    4. Verify the data frame returned by `square_magnitude.py` has the columns RAPIDS needs `timestamp`, `device_id`, and `magnitude_squared`.
+    5. Use this data frame as the input to be processed in the pipeline.
+
+    Note that although `RAPIDS_COLUMN_MAPPINGS` and `[MUTATE][COLUMN_MAPPINGS]` keys are in capital letters for readability (e.g. `MAGNITUDE_SQUARED`), the names of the final columns you mutate in your scripts should be lower case.
+    
+
+Let's explain in more depth this column mapping with examples.
+
+### Name mapping
+
+The mapping for some sensors is straightforward. For example, accelerometer data most of the time has a timestamp, three axes (x,y,z), and a device id that produced it. AWARE and a different sensing app like Beiwe likely logged accelerometer data in the same way but with different column names. In this case, we only need to match Beiwe data columns to RAPIDS columns one-to-one:
+
+```yaml hl_lines="4 5 6 7 8"
+PHONE_ACCELEROMETER:
+  ANDROID:
+    RAPIDS_COLUMN_MAPPINGS:
+      TIMESTAMP: beiwe_timestamp
+      DEVICE_ID: beiwe_deviceID
+      DOUBLE_VALUES_0: beiwe_x
+      DOUBLE_VALUES_1: beiwe_y
+      DOUBLE_VALUES_2: beiwe_z
+    MUTATE:
+      COLUMN_MAPPINGS:
+      SCRIPTS: # it's ok if this is empty
+```
+
+### Value mapping
+For some sensors, we need to map column names and values. For example, screen data has ON and OFF events; let's suppose Beiwe represents an ON event with the number `1,` but RAPIDS identifies ON events with the number `2`. In this case, we need to mutate the raw data coming from Beiwe and replace all `1`s with `2`s.
+
+We do this by listing one or more R or Python scripts in `MUTATION_SCRIPTS` that will be executed in order. We usually store all mutation scripts under `src/data/streams/mutations/[device]/[platform]/` and they can be reused across data streams.
+
+```yaml hl_lines="10"
+PHONE_SCREEN:
+  ANDROID:
+    RAPIDS_COLUMN_MAPPINGS:
+      TIMESTAMP: beiwe_timestamp
+      DEVICE_ID: beiwe_deviceID
+      EVENT: beiwe_event
+     MUTATE:
+      COLUMN_MAPPINGS:
+      SCRIPTS:
+        - src/data/streams/mutations/phone/beiwe/beiwe_screen_map.py
+```
+
+!!! hint
+    - A `MUTATION_SCRIPT` can also be used to clean/preprocess your data before extracting behavioral features.
+    - A mutation script has to have a `main` function that receives two arguments, `data` and `stream_parameters`.
+    - The `stream_parameters` argument contains the `config.yaml` key/values of your data stream (this is the same argument that your `container.[py|R]` script receives, see [Implement a Container](#implement-a-container)). 
+
+    === "python"
+        Example of a python mutation script
+        ```python
+        import pandas as pd
+
+        def main(data, stream_parameters):
+            # mutate data
+            return(data)
+        ```
+    === "R"
+        Example of a R mutation script
+        ```r
+        source("renv/activate.R") # needed to use RAPIDS renv environment
+        library(dplyr)
+
+        main <- function(data, stream_parameters){
+            # mutate data
+            return(data)
+        }
+        ```
+
+### Complex mapping
+Sometimes, your raw data doesn't even have the same columns RAPIDS expects for a sensor. For example, let's pretend Beiwe stores `PHONE_ACCELEROMETER` axis data in a single column called `acc_col` instead of three. You have to create a `MUTATION_SCRIPT` to split `acc_col` into three columns `x`, `y`, and `z`. 
+
+For this, you mark the three axes columns RAPIDS needs in `[RAPIDS_COLUMN_MAPPINGS]` with the word `FLAG_TO_MUTATE`, map `acc_col` in `[MUTATION][COLUMN_MAPPINGS]`, and list a Python script under `[MUTATION][SCRIPTS]` with the code to split `acc_col`. See an example below.
+
+RAPIDS expects that every column mapped as `FLAG_TO_MUTATE` will be generated by your mutation script, so it won't try to retrieve them from your container (database, CSV file, etc.). 
+
+In our example, `acc_col` will be fetched from the stream's container and renamed to `JOINED_AXES` because `beiwe_split_acc.py` will split it into `double_values_0`, `double_values_1`, and `double_values_2`.
+
+```yaml hl_lines="6 7 8 11 13"
+PHONE_ACCELEROMETER:
+  ANDROID:
+    RAPIDS_COLUMN_MAPPINGS:
+      TIMESTAMP: beiwe_timestamp
+      DEVICE_ID: beiwe_deviceID
+      DOUBLE_VALUES_0: FLAG_TO_MUTATE
+      DOUBLE_VALUES_1: FLAG_TO_MUTATE
+      DOUBLE_VALUES_2: FLAG_TO_MUTATE
+    MUTATE:
+      COLUMN_MAPPINGS:
+        JOINED_AXES: acc_col
+      SCRIPTS:
+        - src/data/streams/mutations/phone/beiwe/beiwe_split_acc.py
+```
+
+This is a draft of `beiwe_split_acc.py` `MUTATION_SCRIPT`:
+```python
+import pandas as pd
+
+def main(data, stream_parameters):
+    # data has the acc_col
+    # split acc_col into three columns: double_values_0, double_values_1, double_values_2 to match RAPIDS format
+    # remove acc_col since we don't need it anymore
+    return(data)
+```
+
+### OS complex mapping
+There is a special case for a complex mapping scenario for smartphone data streams. The Android and iOS sensor APIs return data in different formats for certain sensors (like screen, activity recognition, battery, among others). 
+
+In case you didn't notice, the examples we have used so far are grouped under an `ANDROID` key, which means they will be applied to data collected by Android phones. Additionally, each sensor has an `IOS` key for a similar purpose. We use the complex mapping described above to transform iOS data into an Android format (it's always iOS to Android and any new phone data stream must do the same).
+
+For example, this is the `format.yaml` key for `PHONE_ACTVITY_RECOGNITION`. Note that the `ANDROID` mapping is simple (one-to-one) but the `IOS` mapping is complex with three `FLAG_TO_MUTATE` columns, two `[MUTATE][COLUMN_MAPPINGS]` mappings, and one `[MUTATION][SCRIPT]`.
+
+```yaml hl_lines="16 17 18 21 22 24"
+PHONE_ACTIVITY_RECOGNITION:
+  ANDROID:
+    RAPIDS_COLUMN_MAPPINGS:
+      TIMESTAMP: timestamp
+      DEVICE_ID: device_id
+      ACTIVITY_TYPE: activity_type
+      ACTIVITY_NAME: activity_name
+      CONFIDENCE: confidence
+    MUTATION:
+      COLUMN_MAPPINGS:
+      SCRIPTS:
+  IOS:
+    RAPIDS_COLUMN_MAPPINGS:
+      TIMESTAMP: timestamp
+      DEVICE_ID: device_id
+      ACTIVITY_TYPE: FLAG_TO_MUTATE
+      ACTIVITY_NAME: FLAG_TO_MUTATE
+      CONFIDENCE: FLAG_TO_MUTATE
+    MUTATION:
+      COLUMN_MAPPINGS:
+        ACTIVITIES: activities
+        CONFIDENCE: confidence
+      SCRIPTS:
+        - "src/data/streams/mutations/phone/aware/activity_recogniton_ios_unification.R"
+```
+
+??? "Example activity_recogniton_ios_unification.R"
+    In this `MUTATION_SCRIPT` we create `ACTIVITY_NAME` and `ACTIVITY_TYPE` based on `activities`, and map `confidence` iOS values to Android values.
+    ```R
+    source("renv/activate.R")
+    library("dplyr", warn.conflicts = F)
+    library(stringr)
+
+    clean_ios_activity_column <- function(ios_gar){
+        ios_gar <- ios_gar %>%
+            mutate(activities = str_replace_all(activities, pattern = '("|\\[|\\])', replacement = ""))
+
+        existent_multiple_activities <- ios_gar %>%
+            filter(str_detect(activities, ",")) %>% 
+            group_by(activities) %>%
+            summarise(mutiple_activities = unique(activities), .groups = "drop_last") %>% 
+            pull(mutiple_activities)
+
+        known_multiple_activities <- c("stationary,automotive")
+        unkown_multiple_actvities <- setdiff(existent_multiple_activities, known_multiple_activities)
+        if(length(unkown_multiple_actvities) > 0){
+            stop(paste0("There are unkwown combinations of ios activities, you need to implement the decision of the ones to keep: ", unkown_multiple_actvities))
+        }
+
+        ios_gar <- ios_gar %>%
+            mutate(activities = str_replace_all(activities, pattern = "stationary,automotive", replacement = "automotive"))
+        
+        return(ios_gar)
+    }
+
+    unify_ios_activity_recognition <- function(ios_gar){
+        # We only need to unify Google Activity Recognition data for iOS
+        # discard rows where activities column is blank
+        ios_gar <- ios_gar[-which(ios_gar$activities == ""), ]
+        # clean "activities" column of ios_gar
+        ios_gar <- clean_ios_activity_column(ios_gar)
+
+        # make it compatible with android version: generate "activity_name" and "activity_type" columns
+        ios_gar  <-  ios_gar %>% 
+            mutate(activity_name = case_when(activities == "automotive" ~ "in_vehicle",
+                                            activities == "cycling" ~ "on_bicycle",
+                                            activities == "walking" ~ "walking",
+                                            activities == "running" ~ "running",
+                                            activities == "stationary" ~ "still"),
+                    activity_type = case_when(activities == "automotive" ~ 0,
+                                            activities == "cycling" ~ 1,
+                                            activities == "walking" ~ 7,
+                                            activities == "running" ~ 8,
+                                            activities == "stationary" ~ 3,
+                                            activities == "unknown" ~ 4),
+                    confidence = case_when(confidence == 0 ~ 0,
+                                          confidence == 1 ~ 50,
+                                          confidence == 2 ~ 100)
+                                        ) %>% 
+            select(-activities)
+        
+        return(ios_gar)
+    }
+
+    main <- function(data, stream_parameters){
+        return(unify_ios_activity_recognition(data, stream_parameters))
+    }
+    ```
--- a/docs/datastreams/aware-csv.md
+++ b/docs/datastreams/aware-csv.md
@ -0,0 +1,32 @@
+# `aware_csv`
+
+This [data stream](../../datastreams/data-streams-introduction) handles iOS and Android sensor data collected with the [AWARE Framework](https://awareframework.com/) and stored in CSV files.
+
+!!! warning
+    The CSV files have to use `,` as separator, `\` as escape character (do not escape `"` with `""`), and wrap any string columns with `"`.
+
+    See examples in the CSV files inside [rapids_example_csv.zip](https://osf.io/wbg23/)
+
+    ??? example "Example of a valid CSV file"
+        ```csv
+        "_id","timestamp","device_id","activities","confidence","stationary","walking","running","automotive","cycling","unknown","label"
+        1,1587528000000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,""
+        2,1587528060000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
+        3,1587528120000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
+        4,1587528180000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
+        5,1587528240000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
+        6,1587528300000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
+        7,1587528360000,"13dbc8a3-dae3-4834-823a-4bc96a7d459d","[\"stationary\"]",2,1,0,0,0,0,0,"supplement"
+        ```
+
+## Container
+A CSV file per sensor, each containing the data for all participants. 
+
+The script to connect and download data from this container is at:
+```bash
+src/data/streams/aware_csv/container.R
+```
+
+## Format
+
+--8<---- "docs/snippets/aware_format.md"
--- a/docs/datastreams/aware-influxdb.md
+++ b/docs/datastreams/aware-influxdb.md
@ -0,0 +1,18 @@
+# `aware_influxdb (beta)`
+
+!!! warning
+    This data stream is being released in beta while we test it thoroughly. 
+
+This [data stream](../../datastreams/data-streams-introduction) handles iOS and Android sensor data collected with the [AWARE Framework](https://awareframework.com/) and stored in an InfluxDB database.
+
+## Container
+An InfluxDB database with a table per sensor, each containing the data for all participants.
+
+The script to connect and download data from this container is at:
+```bash
+src/data/streams/aware_influxdb/container.R
+```
+
+## Format
+
+--8<---- "docs/snippets/aware_format.md"
--- a/docs/datastreams/aware-micro-mysql.md
+++ b/docs/datastreams/aware-micro-mysql.md
@ -0,0 +1,15 @@
+# `aware_micro_mysql`
+
+This [data stream](../../datastreams/data-streams-introduction) handles iOS and Android sensor data collected with the [AWARE Framework's](https://awareframework.com/) [AWARE Micro](https://github.com/denzilferreira/aware-micro) server and stored in a MySQL database.
+
+## Container
+A MySQL database with a table per sensor, each containing the data for all participants. Sensor data is stored in a JSON field within each table called `data`
+
+The script to connect and download data from this container is at:
+```bash
+src/data/streams/aware_micro_mysql/container.R
+```
+
+## Format
+
+--8<---- "docs/snippets/aware_format.md"
--- a/docs/datastreams/aware-mysql.md
+++ b/docs/datastreams/aware-mysql.md
@ -0,0 +1,15 @@
+# `aware_mysql`
+
+This [data stream](../../datastreams/data-streams-introduction) handles iOS and Android sensor data collected with the [AWARE Framework](https://awareframework.com/) and stored in a MySQL database.
+
+## Container
+A MySQL database with a table per sensor, each containing the data for all participants. This is the default database created by the old PHP AWARE server (as opposed to the new JavaScript Micro server).
+
+The script to connect and download data from this container is at:
+```bash
+src/data/streams/aware_mysql/container.R
+```
+
+## Format
+
+--8<---- "docs/snippets/aware_format.md"
--- a/docs/datastreams/data-streams-introduction.md
+++ b/docs/datastreams/data-streams-introduction.md
@ -0,0 +1,26 @@
+# Data Streams Introduction
+
+A data stream is a set of sensor data collected using a specific type of **device** with a specific **format** and stored in a specific **container**.
+
+For example, the `aware_mysql` data stream handles smartphone data (**device**) collected with the [AWARE Framework](https://awareframework.com/) (**format**) stored in a MySQL database (**container**). Similarly, smartphone data collected with [Beiwe](https://www.beiwe.org/) will have a different format and could be stored in a container like a PostgreSQL database or a CSV file.
+
+If you want to process a data stream using RAPIDS, make sure that your data is stored in a supported **format** and **container** (see table below). 
+
+If RAPIDS doesn't support your data stream yet (e.g. Beiwe data stored in PostgreSQL, or AWARE data stored in SQLite), you can always [implement a new data stream](../add-new-data-streams). If it's something you think other people might be interested on, we will be happy to include your new data stream in RAPIDS, so get in touch!.
+
+!!! hint
+    Currently, you can add new data streams for smartphones, Fitbit, and Empatica devices. If you need RAPIDS to process data from **other devices**, like Oura Rings or Actigraph wearables, get in touch. It is a more complicated process that could take a couple of days to implement for someone familiar with R or Python, but we would be happy to work on it together.
+
+For reference, these are the data streams we currently support: 
+
+| Data Stream | Device | Format | Container | Docs
+|--|--|--|--|--|
+| `aware_mysql`| Phone | AWARE app | MySQL | [link](../aware-mysql)
+| `aware_micro_mysql`| Phone | AWARE Micro server | MySQL | [link](../aware-micro-mysql)
+| `aware_csv`| Phone | AWARE app | CSV files | [link](../aware-csv)
+| `aware_influxdb` (beta)| Phone | AWARE app | InfluxDB | [link](../aware-influxdb)
+| `fitbitjson_mysql`| Fitbit | JSON (per [Fitbit's API](https://dev.fitbit.com/build/reference/web-api/)) | MySQL | [link](../fitbitjson-mysql)
+| `fitbitjson_csv`| Fitbit | JSON (per [Fitbit's API](https://dev.fitbit.com/build/reference/web-api/)) | CSV files | [link](../fitbitjson-csv)
+| `fitbitparsed_mysql`| Fitbit | Parsed (parsed API data) | MySQL | [link](../fitbitparsed-mysql)
+| `fitbitparsed_csv`| Fitbit | Parsed (parsed API data)  | CSV files | [link](../fitbitparsed-csv)
+| `empatica_zip`| Empatica | [E4 Connect](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) | ZIP files | [link](../empatica-zip)
--- a/docs/datastreams/empatica-zip.md
+++ b/docs/datastreams/empatica-zip.md
@ -0,0 +1,136 @@
+# `empatica_zip`
+This [data stream](../../datastreams/data-streams-introduction) handles Empatica sensor data downloaded as zip files using the [E4 Connect](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-). 
+
+## Container
+
+You need to create a subfolder for every participant named after their `device id` inside the folder specified by `[EMPATICA_DATA_STREAMS][empatica_zipfiles][FOLDER]`. You can add one or more Empatica zip files to any subfolder. 
+
+The script to connect and download data from this container is at:
+```bash
+src/data/streams/empatica_zip/container.R
+```
+
+## Format
+
+
+The `format.yaml` maps and transforms columns in your raw data stream to the [mandatory columns RAPIDS needs for Empatica sensors](../mandatory-empatica-format). This file is at:
+
+```bash
+src/data/streams/empatica_zip/format.yaml
+```
+
+All columns are mutated from the raw data in the zip files so you don't need to modify any column mappings.
+
+??? info "EMPATICA_ACCELEROMETER"
+
+    
+    **RAPIDS_COLUMN_MAPPINGS**
+
+    | RAPIDS column   | Stream column   |
+    |-----------------|-----------------|
+    | TIMESTAMP | timestamp|
+    | DEVICE_ID | device_id|
+    | DOUBLE_VALUES_0 | double_values_0|
+    | DOUBLE_VALUES_1 | double_values_1|
+    | DOUBLE_VALUES_2 | double_values_2|
+
+    **MUTATION**
+
+    - **COLUMN_MAPPINGS** (None)
+    - **SCRIPTS** (None)
+
+??? info "EMPATICA_HEARTRATE"
+
+    
+    **RAPIDS_COLUMN_MAPPINGS**
+
+    | RAPIDS column   | Stream column   |
+    |-----------------|-----------------|
+    |TIMESTAMP | timestamp|
+    |DEVICE_ID | device_id|
+    |HEARTRATE | heartrate|
+
+    **MUTATION**
+
+    - **COLUMN_MAPPINGS** (None)
+    - **SCRIPTS** (None)
+
+??? info "EMPATICA_TEMPERATURE"
+
+    
+    **RAPIDS_COLUMN_MAPPINGS**
+
+    | RAPIDS column   | Stream column   |
+    |-----------------|-----------------|
+    |TIMESTAMP | timestamp|
+    |DEVICE_ID | device_id|
+    |TEMPERATURE | temperature|
+
+    **MUTATION**
+
+    - **COLUMN_MAPPINGS** (None)
+    - **SCRIPTS** (None)
+
+??? info "EMPATICA_ELECTRODERMAL_ACTIVITY"
+
+    
+    **RAPIDS_COLUMN_MAPPINGS**
+
+    | RAPIDS column   | Stream column   |
+    |-----------------|-----------------|
+    |TIMESTAMP | timestamp|
+    |DEVICE_ID | device_id|
+    |ELECTRODERMAL_ACTIVITY | electrodermal_activity|
+
+    **MUTATION**
+
+    - **COLUMN_MAPPINGS** (None)
+    - **SCRIPTS** (None)
+
+??? info "EMPATICA_BLOOD_VOLUME_PULSE"
+
+    
+    **RAPIDS_COLUMN_MAPPINGS**
+
+    | RAPIDS column   | Stream column   |
+    |-----------------|-----------------|
+    |TIMESTAMP | timestamp|
+    |DEVICE_ID | device_id|
+    |BLOOD_VOLUME_PULSE | blood_volume_pulse|
+
+    **MUTATION**
+
+    - **COLUMN_MAPPINGS** (None)
+    - **SCRIPTS** (None)
+
+??? info "EMPATICA_INTER_BEAT_INTERVAL"
+
+    
+    **RAPIDS_COLUMN_MAPPINGS**
+
+    | RAPIDS column   | Stream column   |
+    |-----------------|-----------------|
+    |TIMESTAMP | timestamp|
+    |DEVICE_ID | device_id|
+    |INTER_BEAT_INTERVAL | inter_beat_interval|
+
+    **MUTATION**
+
+    - **COLUMN_MAPPINGS** (None)
+    - **SCRIPTS** (None)
+
+??? info "EMPATICA_EMPATICA_TAGS"
+
+    
+    **RAPIDS_COLUMN_MAPPINGS**
+
+    | RAPIDS column   | Stream column   |
+    |-----------------|-----------------|
+    |TIMESTAMP | timestamp|
+    |DEVICE_ID | device_id|
+    |TAGS | tags|
+
+    **MUTATION**
+
+    - **COLUMN_MAPPINGS** (None)
+    - **SCRIPTS** (None)
--- a/docs/datastreams/fitbitjson-csv.md
+++ b/docs/datastreams/fitbitjson-csv.md
@ -0,0 +1,23 @@
+# `fitbitjson_csv`
+This [data stream](../../datastreams/data-streams-introduction) handles Fitbit sensor data downloaded using the [Fitbit Web API](https://dev.fitbit.com/build/reference/web-api/) and stored in a CSV file. Please note that RAPIDS cannot query the API directly; you need to use other available tools or implement your own. Once you have your sensor data in a CSV file, RAPIDS can process it.
+
+!!! warning
+    The CSV files have to use `,` as separator, `\` as escape character (do not escape `"` with `""`), and wrap any string columns with `"`.
+
+    ??? example "Example of a valid CSV file"
+        ```csv
+        "timestamp","device_id","label","fitbit_id","fitbit_data_type","fitbit_data"
+        1587614400000,"a748ee1a-1d0b-4ae9-9074-279a2b6ba524","5S","5ZKN9B","steps","{\"activities-steps\":[{\"dateTime\":\"2020-04-23\",\"value\":\"7881\"}]"
+        ```
+
+## Container
+The container should be a CSV file per Fitbit sensor, each containing all participants' data.
+
+The script to connect and download data from this container is at:
+```bash
+src/data/streams/fitbitjson_csv/container.R
+```
+
+## Format
+
+--8<---- "docs/snippets/jsonfitbit_format.md"
--- a/docs/datastreams/fitbitjson-mysql.md
+++ b/docs/datastreams/fitbitjson-mysql.md
@ -0,0 +1,14 @@
+# `fitbitjson_mysql`
+This [data stream](../../datastreams/data-streams-introduction) handles Fitbit sensor data downloaded using the [Fitbit Web API](https://dev.fitbit.com/build/reference/web-api/) and stored in a MySQL database. Please note that RAPIDS cannot query the API directly; you need to use other available tools or implement your own. Once you have your sensor data in a MySQL database, RAPIDS can process it.
+
+## Container
+The container should be a MySQL database with a table per sensor, each containing all participants' data.
+
+The script to connect and download data from this container is at:
+```bash
+src/data/streams/fitbitjson_mysql/container.R
+```
+
+## Format
+
+--8<---- "docs/snippets/jsonfitbit_format.md"
--- a/docs/datastreams/fitbitparsed-csv.md
+++ b/docs/datastreams/fitbitparsed-csv.md
@ -0,0 +1,29 @@
+# `fitbitparsed_csv`
+This [data stream](../../datastreams/data-streams-introduction) handles Fitbit sensor data downloaded using the [Fitbit Web API](https://dev.fitbit.com/build/reference/web-api/), **parsed**, and stored in a CSV file. Please note that RAPIDS cannot query the API directly; you need to use other available tools or implement your own. Once you have your parsed sensor data in a CSV file, RAPIDS can process it.
+
+!!! info "What is the difference between JSON and plain data streams"
+    Most people will only need `fitbitjson_*` because they downloaded and stored their data directly from Fitbit's API. However, if, for some reason, you don't have access to that JSON data and instead only have the parsed data (columns and rows), you can use this data stream.
+
+!!! warning
+    The CSV files have to use `,` as separator, `\` as escape character (do not escape `"` with `""`), and wrap any string columns with `"`.
+
+    ??? example "Example of a valid CSV file"
+        ```csv
+        "device_id","heartrate","heartrate_zone","local_date_time","timestamp"
+        "a748ee1a-1d0b-4ae9-9074-279a2b6ba524",69,"outofrange","2020-04-23 00:00:00",0
+        "a748ee1a-1d0b-4ae9-9074-279a2b6ba524",69,"outofrange","2020-04-23 00:01:00",0
+        "a748ee1a-1d0b-4ae9-9074-279a2b6ba524",67,"outofrange","2020-04-23 00:02:00",0
+        "a748ee1a-1d0b-4ae9-9074-279a2b6ba524",69,"outofrange","2020-04-23 00:03:00",0
+        ```
+
+## Container
+The container should be a CSV file per sensor, each containing all participants' data.
+
+The script to connect and download data from this container is at:
+```bash
+src/data/streams/fitbitparsed_csv/container.R
+```
+
+## Format
+
+--8<---- "docs/snippets/parsedfitbit_format.md"
--- a/docs/datastreams/fitbitparsed-mysql.md
+++ b/docs/datastreams/fitbitparsed-mysql.md
@ -0,0 +1,17 @@
+# `fitbitparsed_mysql`
+This [data stream](../../datastreams/data-streams-introduction) handles Fitbit sensor data downloaded using the [Fitbit Web API](https://dev.fitbit.com/build/reference/web-api/), **parsed**, and stored in a MySQL database. Please note that RAPIDS cannot query the API directly; you need to use other available tools or implement your own. Once you have your parsed sensor data in a MySQL database, RAPIDS can process it.
+
+!!! info "What is the difference between JSON and plain data streams"
+    Most people will only need `fitbitjson_*` because they downloaded and stored their data directly from Fitbit's API. However, if, for some reason, you don't have access to that JSON data and instead only have the parsed data (columns and rows), you can use this data stream.
+
+## Container
+The container should be a MySQL database with a table per sensor, each containing all participants' data.
+
+The script to connect and download data from this container is at:
+```bash
+src/data/streams/fitbitparsed_mysql/container.R
+```
+
+## Format
+
+--8<---- "docs/snippets/parsedfitbit_format.md"
--- a/docs/datastreams/mandatory-empatica-format.md
+++ b/docs/datastreams/mandatory-empatica-format.md
@ -0,0 +1,61 @@
+# Mandatory Empatica Format
+
+This is a description of the format RAPIDS needs to process data for the following Empatica sensors.
+
+??? info "EMPATICA_ACCELEROMETER"
+
+    | RAPIDS column   | Description                                                  |
+    |-----------------|--------------------------------------------------------------|
+    | TIMESTAMP       | An UNIX timestamp (13 digits) when a row of data was logged  |
+    | DEVICE_ID       | A string that uniquely identifies a device                   |
+    | DOUBLE_VALUES_0 | x axis of acceleration                                       |
+    | DOUBLE_VALUES_1 | y axis of acceleration                                       |
+    | DOUBLE_VALUES_2 | z axis of acceleration                                       |
+
+??? info "EMPATICA_HEARTRATE"
+
+    | RAPIDS column   | Description   |
+    |-----------------|-----------------|
+    | TIMESTAMP       |  An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
+    | DEVICE_ID       |  A string that uniquely identifies a device |
+    | HEARTRATE |  Intraday heartrate |
+
+??? info "EMPATICA_TEMPERATURE"
+
+    | RAPIDS column   | Description   |
+    |-----------------|-----------------|
+    | TIMESTAMP       |  An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
+    | DEVICE_ID       |  A string that uniquely identifies a device |
+    | TEMPERATURE |  temperature |
+
+??? info "EMPATICA_ELECTRODERMAL_ACTIVITY"
+
+    | RAPIDS column   | Description   |
+    |-----------------|-----------------|
+    | TIMESTAMP       |  An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
+    | DEVICE_ID       |  A string that uniquely identifies a device |
+    | ELECTRODERMAL_ACTIVITY |  electrical conductance |
+
+??? info "EMPATICA_BLOOD_VOLUME_PULSE"
+
+    | RAPIDS column   | Description   |
+    |-----------------|-----------------|
+    | TIMESTAMP       |  An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
+    | DEVICE_ID       |  A string that uniquely identifies a device |
+    | BLOOD_VOLUME_PULSE |  blood volume pulse |
+
+??? info "EMPATICA_INTER_BEAT_INTERVAL"
+
+    | RAPIDS column   | Description   |
+    |-----------------|-----------------|
+    | TIMESTAMP       |  An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
+    | DEVICE_ID       |  A string that uniquely identifies a device |
+    | INTER_BEAT_INTERVAL |  inter beat interval |
+
+??? info "EMPATICA_TAGS"
+
+    | RAPIDS column   | Description   |
+    |-----------------|-----------------|
+    | TIMESTAMP       |  An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
+    | DEVICE_ID       |  A string that uniquely identifies a device |
+    | TAGS |  tags |
--- a/docs/datastreams/mandatory-fitbit-format.md
+++ b/docs/datastreams/mandatory-fitbit-format.md
@ -0,0 +1,75 @@
+# Mandatory Fitbit Format
+
+This is a description of the format RAPIDS needs to process data for the following Fitbit\ sensors.
+
+??? info "FITBIT_HEARTRATE_SUMMARY"
+
+    | RAPIDS column   | Description   |
+    |-----------------|-----------------|
+    | TIMESTAMP       |  An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
+    | LOCAL_DATE_TIME       |  Date time string with format `yyyy-mm-dd hh:mm:ss` |
+    | DEVICE_ID       |  A string that uniquely identifies a device |
+    | HEARTRATE_DAILY_RESTINGHR |  Daily resting heartrate |
+    | HEARTRATE_DAILY_CALORIESOUTOFRANGE |  Calories spent while heartrate was oustide a heartrate [zone](https://help.fitbit.com/articles/en_US/Help_article/1565.htm#) |
+    | HEARTRATE_DAILY_CALORIESFATBURN |  Calories spent while heartrate was inside the fat burn [zone](https://help.fitbit.com/articles/en_US/Help_article/1565.htm#) |
+    | HEARTRATE_DAILY_CALORIESCARDIO |  Calories spent while heartrate was inside the cardio [zone](https://help.fitbit.com/articles/en_US/Help_article/1565.htm#) |
+    | HEARTRATE_DAILY_CALORIESPEAK |  Calories spent while heartrate was inside the peak [zone](https://help.fitbit.com/articles/en_US/Help_article/1565.htm#) |
+
+??? info "FITBIT_HEARTRATE_INTRADAY"
+
+    | RAPIDS column   | Description   |
+    |-----------------|-----------------|
+    | TIMESTAMP       |  An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
+    | LOCAL_DATE_TIME       |  Date time string with format `yyyy-mm-dd hh:mm:ss` |
+    | DEVICE_ID       |  A string that uniquely identifies a device |
+    | HEARTRATE |  Intraday heartrate |
+    | HEARTRATE_ZONE |  Heartrate [zone](https://help.fitbit.com/articles/en_US/Help_article/1565.htm#) that HEARTRATE belongs to. It is based on the heartrate zone ranges of each device |
+
+??? info "FITBIT_SLEEP_SUMMARY"
+
+    | RAPIDS column   | Description   |
+    |-----------------|-----------------|
+    | TIMESTAMP       |  An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
+    | LOCAL_DATE_TIME       |  Date time string with format `yyyy-mm-dd 00:00:00`, the date is the same as the start date of a daily sleep episode if its time is after SLEEP_SUMMARY_LAST_NIGHT_END, otherwise it is the day before the start date of that sleep episode |
+    | LOCAL_START_DATE_TIME       |  Date time string with format `yyyy-mm-dd hh:mm:ss` representing the start of a daily sleep episode |
+    | LOCAL_END_DATE_TIME       |  Date time string with format `yyyy-mm-dd hh:mm:ss`  representing the end of a daily sleep episode|
+    | DEVICE_ID       |  A string that uniquely identifies a device |
+    | EFFICIENCY | Sleep efficiency computed by fitbit as time asleep / (total time in bed - time to fall asleep)|
+    | MINUTES_AFTER_WAKEUP | Minutes the participant spent in bed after waking up|
+    | MINUTES_ASLEEP | Minutes the participant was asleep |
+    | MINUTES_AWAKE | Minutes the participant was awake |
+    | MINUTES_TO_FALL_ASLEEP | Minutes the participant spent in bed before falling asleep|
+    | MINUTES_IN_BED | Minutes the participant spent in bed across the sleep episode|
+    | IS_MAIN_SLEEP | 0 if this episode is a nap, or 1 if it is a main sleep episode|
+    | TYPE | stages or classic [sleep data](https://dev.fitbit.com/build/reference/web-api/sleep/)|
+
+??? info "FITBIT_SLEEP_INTRADAY"
+
+    | RAPIDS column   | Description   |
+    |-----------------|-----------------|
+    | TIMESTAMP       |  An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS)|
+    | LOCAL_DATE_TIME       |  Date time string with format `yyyy-mm-dd hh:mm:ss`, this either is a copy of LOCAL_START_DATE_TIME or LOCAL_END_DATE_TIME depending on which column is used to assign an episode to a specific day|
+    | DEVICE_ID       |  A string that uniquely identifies a device |
+    | TYPE_EPISODE_ID | An id for each unique main or nap episode. Main and nap episodes have different levels, each row in this table is one of such levels, so multiple rows can have the same TYPE_EPISODE_ID|
+    | DURATION | Duration of the episode level in minutes|
+    | IS_MAIN_SLEEP | 0 if this episode level belongs to a nap, or 1 if it belongs to a main sleep episode|
+    | TYPE | type of level: stages or classic [sleep data](https://dev.fitbit.com/build/reference/web-api/sleep/)|
+    | LEVEL | For stages levels one of `wake`, `deep`, `light`, or `rem`. For classic levels one of `awake`, `restless`, and `asleep`|
+
+??? info "FITBIT_STEPS_SUMMARY"
+
+    | RAPIDS column   | Description   |
+    |-----------------|-----------------|
+    | TIMESTAMP       |  An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
+    | LOCAL_DATE_TIME       |  Date time string with format `yyyy-mm-dd hh:mm:ss` |
+    | DEVICE_ID       |  A string that uniquely identifies a device |
+    | STEPS |  Daily step count |
+
+??? info "FITBIT_STEPS_INTRADAY"
+
+    | RAPIDS column   | Description   |
+    |-----------------|-----------------|
+    | TIMESTAMP       |  An UNIX timestamp (13 digits) when a row of data was logged (automatically created by RAPIDS) |
+    | LOCAL_DATE_TIME       |  Date time string with format `yyyy-mm-dd hh:mm:ss` |
+    | DEVICE_ID       |  A string that uniquely identifies a device |
+    | STEPS |  Intraday step count (usually every minute)|
--- a/docs/datastreams/mandatory-phone-format.md
+++ b/docs/datastreams/mandatory-phone-format.md
@ -0,0 +1,202 @@
+# Mandatory Phone Format
+
+This is a description of the format RAPIDS needs to process data for the following PHONE sensors.
+
+See examples in the CSV files inside [rapids_example_csv.zip](https://osf.io/wbg23/)
+
+??? info "PHONE_ACCELEROMETER"
+
+    | RAPIDS column   | Description                                                  |
+    |-----------------|--------------------------------------------------------------|
+    | TIMESTAMP       | An UNIX timestamp (13 digits) when a row of data was logged  |
+    | DEVICE_ID       | A string that uniquely identifies a device                   |
+    | DOUBLE_VALUES_0 | x axis of acceleration                                       |
+    | DOUBLE_VALUES_1 | y axis of acceleration                                       |
+    | DOUBLE_VALUES_2 | z axis of acceleration                                       |
+
+
+??? info "PHONE_ACTIVITY_RECOGNITION"
+
+    | RAPIDS column   | Description                                                               |
+    |-----------------|---------------------------------------------------------------------------|
+    | TIMESTAMP       | An UNIX timestamp (13 digits) when a row of data was logged               |
+    | DEVICE_ID       | A string that uniquely identifies a device                                |
+    | ACTIVITY_NAME   | An string that denotes current activity name: `in_vehicle`, `on_bicycle`, `on_foot`, `still`, `unknown`, `tilting`, `walking` or `running`   |
+    | ACTIVITY_TYPE   | An integer (ranged from 0 to 8) that denotes current activity type        |
+    | CONFIDENCE      | An integer (ranged from 0 to 100) that denotes the prediction accuracy    |
+
+
+??? info "PHONE_APPLICATIONS_CRASHES"
+
+    | RAPIDS column      | Description                                                               |
+    |--------------------|---------------------------------------------------------------------------|
+    | TIMESTAMP          | An UNIX timestamp (13 digits) when a row of data was logged               |
+    | DEVICE_ID          | A string that uniquely identifies a device                                |
+    | PACKAGE_NAME       | Application’s package name                                                |
+    | APPLICATION_NAME   | Application’s localized name                                              |
+    | APPLICATION_VERSION| Application’s version code                                                |
+    | ERROR_SHORT        | Short description of the error                                            |
+    | ERROR_LONG         | More verbose version of the error description                             |
+    | ERROR_CONDITION    | 1 = code error; 2 = non-responsive (ANR error)                            |
+    | IS_SYSTEM_APP      | Device’s pre-installed application                                        |
+
+
+??? info "PHONE_APPLICATIONS_FOREGROUND"
+
+    | RAPIDS column      | Description                                                               |
+    |--------------------|---------------------------------------------------------------------------|
+    | TIMESTAMP          | An UNIX timestamp (13 digits) when a row of data was logged               |
+    | DEVICE_ID          | A string that uniquely identifies a device                                |
+    | PACKAGE_NAME       | Application’s package name                                                |
+    | APPLICATION_NAME   | Application’s localized name                                              |
+    | IS_SYSTEM_APP      | Device’s pre-installed application                                        |
+
+
+??? info "PHONE_APPLICATIONS_NOTIFICATIONS"
+
+    | RAPIDS column      | Description                                                               |
+    |--------------------|---------------------------------------------------------------------------|
+    | TIMESTAMP          | An UNIX timestamp (13 digits) when a row of data was logged               |
+    | DEVICE_ID          | A string that uniquely identifies a device                                |
+    | PACKAGE_NAME       | Application’s package name                                                |
+    | APPLICATION_NAME   | Application’s localized name                                              |
+    | TEXT               | Notification’s header text, not the content                               |
+    | SOUND              | Notification’s sound source (if applicable)                               |
+    | VIBRATE            | Notification’s vibration pattern (if applicable)                          |
+    | DEFAULTS           | If notification was delivered according to device’s default settings      |
+    | FLAGS              | An integer that denotes [Android notification flag](https://developer.android.com/reference/android/app/Notification.html)  |
+
+
+??? info "PHONE_BATTERY"
+
+    | RAPIDS column        | Description                                                                                                            |
+    |----------------------|------------------------------------------------------------------------------------------------------------------------|
+    | TIMESTAMP            | An UNIX timestamp (13 digits) when a row of data was logged                                                            |
+    | DEVICE_ID            | A string that uniquely identifies a device                                                                             |
+    | BATTERY_STATUS       | An integer that denotes battery status: 0 or 1 = unknown, 2 = charging, 3 = discharging, 4 = not charging, 5 = full    |
+    | BATTERY_LEVEL        | An integer that denotes battery level, between 0 and `BATTERY_SCALE`                                                   |
+    | BATTERY_SCALE        | An integer that denotes the maximum battery level                                                                      |
+
+
+??? info "PHONE_BLUETOOTH"
+
+    | RAPIDS column      | Description                                                               |
+    |--------------------|---------------------------------------------------------------------------|
+    | TIMESTAMP          | An UNIX timestamp (13 digits) when a row of data was logged               |
+    | DEVICE_ID          | A string that uniquely identifies a device                                |
+    | BT_ADDRESS         | MAC address of the device’s Bluetooth sensor                              |
+    | BT_NAME            | User assigned name of the device’s Bluetooth sensor                       |
+    | BT_RSSI            | The RSSI dB to the scanned device                                         |
+
+
+??? info "PHONE_CALLS"
+
+    | RAPIDS column      | Description                                                               |
+    |--------------------|---------------------------------------------------------------------------|
+    | TIMESTAMP          | An UNIX timestamp (13 digits) when a row of data was logged               |
+    | DEVICE_ID          | A string that uniquely identifies a device                                |
+    | CALL_TYPE          | An integer that denotes call type: 1 = incoming, 2 = outgoing, 3 = missed |
+    | CALL_DURATION      | Length of the call session                                                |
+    | TRACE              | SHA-1 one-way source/target of the call                                   |
+
+
+??? info "PHONE_CONVERSATION"
+
+    | RAPIDS column        | Description                                                                          |
+    |----------------------|--------------------------------------------------------------------------------------|
+    | TIMESTAMP            | An UNIX timestamp (13 digits) when a row of data was logged                          |
+    | DEVICE_ID            | A string that uniquely identifies a device                                           |
+    | DOUBLE_ENERGY        | A number that denotes the amplitude of an audio sample (L2-norm of the audio frame)     |
+    | INFERENCE            | An integer (ranged from 0 to 3) that denotes the type of an audio sample: 0 = silence, 1 = noise, 2 = voice, 3 = unknown      |
+    | DOUBLE_CONVO_START   | UNIX timestamp (13 digits) of the beginning of a conversation                        |
+    | DOUBLE_CONVO_END     | UNIX timestamp (13 digits) of the end of a conversation                              |
+
+
+??? info "PHONE_KEYBOARD"
+
+    | RAPIDS column      | Description                                                               |
+    |--------------------|---------------------------------------------------------------------------|
+    | TIMESTAMP          | An UNIX timestamp (13 digits) when a row of data was logged               |
+    | DEVICE_ID          | A string that uniquely identifies a device                                |
+    | PACKAGE_NAME       | The application’s package name of keyboard interaction                    |
+    | BEFORE_TEXT        | The previous keyboard input (empty if password)                           |
+    | CURRENT_TEXT       | The current keyboard input (empty if password)                            |
+    | IS_PASSWORD        | An integer: 0 = not password; 1 = password                                |
+
+
+??? info "PHONE_LIGHT"
+
+    | RAPIDS column      | Description                                                                                                          |
+    |--------------------|----------------------------------------------------------------------------------------------------------------------|
+    | TIMESTAMP          | An UNIX timestamp (13 digits) when a row of data was logged                                                          |
+    | DEVICE_ID          | A string that uniquely identifies a device                                                                           |
+    | DOUBLE_LIGHT_LUX   | The ambient luminance in lux units                                                                                   |
+    | ACCURACY           | An integer that denotes the sensor's accuracy level: 3 = maximum accuracy, 2 = medium accuracy, 1 = low accuracy     |
+
+
+??? info "PHONE_LOCATIONS"
+
+    | RAPIDS column      | Description                                                               |
+    |--------------------|---------------------------------------------------------------------------|
+    | TIMESTAMP          | An UNIX timestamp (13 digits) when a row of data was logged               |
+    | DEVICE_ID          | A string that uniquely identifies a device                                |
+    | DOUBLE_LATITUDE    | The location’s latitude, in degrees                                       |
+    | DOUBLE_LONGITUDE   | The location’s longitude, in degrees                                      |
+    | DOUBLE_BEARING     | The location’s bearing, in degrees                                        |
+    | DOUBLE_SPEED       | The speed if available, in meters/second over ground                      |
+    | DOUBLE_ALTITUDE    | The altitude if available, in meters above sea level                      |
+    | PROVIDER           | A string that denotes the provider: `gps`, `fused` or `network`           |
+    | ACCURACY           | The estimated location accuracy                                           |
+
+
+??? info "PHONE_LOG"
+
+    | RAPIDS column      | Description                                                               |
+    |--------------------|---------------------------------------------------------------------------|
+    | TIMESTAMP          | An UNIX timestamp (13 digits) when a row of data was logged               |
+    | DEVICE_ID          | A string that uniquely identifies a device                                |
+    | LOG_MESSAGE        | A string that denotes log message                                         |
+
+
+??? info "PHONE_MESSAGES"
+
+    | RAPIDS column      | Description                                                               |
+    |--------------------|---------------------------------------------------------------------------|
+    | TIMESTAMP          | An UNIX timestamp (13 digits) when a row of data was logged               |
+    | DEVICE_ID          | A string that uniquely identifies a device                                |
+    | MESSAGE_TYPE       | An integer that denotes message type: 1 = received, 2 = sent              |
+    | TRACE              | SHA-1 one-way source/target of the message                                |
+
+
+??? info "PHONE_SCREEN"
+
+    | RAPIDS column      | Description                                                                       |
+    |--------------------|-----------------------------------------------------------------------------------|
+    | TIMESTAMP          | An UNIX timestamp (13 digits) when a row of data was logged                       |
+    | DEVICE_ID          | A string that uniquely identifies a device                                        |
+    | SCREEN_STATUS      | An integer that denotes screen status: 0 = off, 1 = on, 2 = locked, 3 = unlocked  |
+
+
+??? info "PHONE_WIFI_CONNECTED"
+
+    | RAPIDS column      | Description                                                                       |
+    |--------------------|-----------------------------------------------------------------------------------|
+    | TIMESTAMP          | An UNIX timestamp (13 digits) when a row of data was logged                       |
+    | DEVICE_ID          | A string that uniquely identifies a device                                        |
+    | MAC_ADDRESS        | Device’s MAC address                                                              |
+    | SSID               | Currently connected access point network name                                     |
+    | BSSID              | Currently connected access point MAC address                                      |
+
+
+??? info "PHONE_WIFI_VISIBLE"
+
+    | RAPIDS column      | Description                                                                       |
+    |--------------------|-----------------------------------------------------------------------------------|
+    | TIMESTAMP          | An UNIX timestamp (13 digits) when a row of data was logged                       |
+    | DEVICE_ID          | A string that uniquely identifies a device                                        |
+    | SSID               | Detected access point network name                                                |
+    | BSSID              | Detected access point MAC address                                                 |
+    | SECURITY           | Active security protocols                                                         |
+    | FREQUENCY          | Wi-Fi band frequency (e.g., 2427, 5180), in Hz                                    |
+    | RSSI               | RSSI dB to the scanned device                                                     |
+
--- a/docs/develop/contributors.rst
+++ b/docs/develop/contributors.rst
@ -1,83 +0,0 @@
-RAPIDS Contributors
-====================
-
-Currently, RAPIDS is being developed by the Mobile Sensing + Health Institute (MoSHI) but if you are interested in contributing feel free to submit a pull request or contact us.
-
-
-Julio Vega, PhD
-""""""""""""""""""
-**Postdoctoral Associate**
-
-vegaju@upmc.edu
-
-Julio Vega is a postdoctoral associate at the Mobile Sensing + Health Institute. He is interested in personalized methodologies to monitor chronic conditions that affect daily human behavior using mobile and wearable data. In the long term, his goal is to explore how we can enable patients to inform, amend, and evaluate their health tracking algorithms to improve disease self-management.
-
-`Julio Vega Personal Website`_
-
-
-
-Meng Li, MS
-"""""""""""""
-**Data Scientist**
-
-lim11@upmc.edu
-
-Meng Li received her Master of Science degree in Information Science from the University of Pittsburgh. She is interested in applying machine learning algorithms to the medical field.
-
-`Meng Li Linkedin Profile`_
-
-`Meng Li Github Profile`_ 
-
-
-
-
-Kwesi Aguillera, BS
-""""""""""""""""""""
-**Intern**
-
-Kwesi Aguillera is currently in his first year at the University of Pittsburgh pursuing a Master of Sciences in Information Science specializing in Big Data Analytics. He received his Bachelor of Science degree in Computer Science and Management from the University of the West Indies. Kwesi considers himself a full stack developer and looks forward to applying this knowledge to big data analysis.
-
-`Kwesi Aguillera Linkedin Profile`_
-
-
-Echhit Joshi, BS
-"""""""""""""""""
-**Intern**
-
-Echhit Joshi is a Masters student at the School of Computing and Information at University of Pittsburgh. His areas of interest are Machine/Deep Learning, Data Mining, and Analytics.
-
-`Echhit Joshi Linkedin Profile`_
-
-Nicolas Leo, BS
-""""""""""""""""
-**Intern**
-
-Nicolas is a rising senior studying computer science at the University of Pittsburgh. His academic interests include databases, machine learning, and application development. After completing his undergraduate degree, he plans to attend graduate school for a MS in Computer Science with a focus on Intelligent Systems. 
-
-
-Nikunj Goel, BS
-""""""""""""""""
-**Intern**
-
-Nik is a graduate student at the University of Pittsburgh pursuing Master of Science in Information Science. He earned his Bachelor of Technology degree in Information Technology from India. He is a Data Enthusiasts and passionate about finding the meaning out of raw data. In a long term, his goal is to create a breakthrough in Data Science and Deep Learning.
-
-`Nikunj Goel Linkedin Profile`_
-
-Agam Kumar, BS
-""""""""""""""""
-**Research Assistant at CMU**
-
-Agam is a junior at Carnegie Mellon University studying Statistics and Machine Learning and pursuing an additional major in Computer Science.  He is a member of the Data Science team in the Health and Human Performance Lab at CMU and has keen interests in software development and data science.  His research interests include ML applications in medicine.
-
-`Agam Kumar Linkedin Profile`_
-
-`Agam Kumar Github Profile`_
-
-.. _`Julio Vega Personal Website`: https://juliovega.info/
-.. _`Meng Li Linkedin Profile`: https://www.linkedin.com/in/meng-li-57238414a
-.. _`Meng Li Github Profile`: https://github.com/Meng6
-.. _`Kwesi Aguillera Linkedin Profile`: https://www.linkedin.com/in/kwesi-aguillera-29529823
-.. _`Echhit Joshi Linkedin Profile`: https://www.linkedin.com/in/echhitjoshi/
-.. _`Nikunj Goel Linkedin Profile`: https://www.linkedin.com/in/nikunjgoel95/
-.. _`Agam Kumar Linkedin Profile`: https://www.linkedin.com/in/agam-kumar
-.. _`Agam Kumar Github Profile`: https://github.com/agam-kumar
--- a/docs/develop/documentation.rst
+++ b/docs/develop/documentation.rst
@ -1,237 +0,0 @@
-How to Edit Documentation
-============================
-
-The following is a basic guide for editing the documentation for this project. The documentation is rendered using Sphinx_ documentation builder
-
-Quick start up
----------------------------------
-
-#. Install Sphinx in Mac OS  ``brew install sphinx-doc`` or Linux (Ubuntu) ``apt-get install python3-sphinx``
-
-#. Go to the docs folder ``cd docs``
-
-#. Change any ``.rst`` file you need to modify
-
-#. To visualise the results locally do ``make dirhtml`` and check the html files in the ``_build/dirhtml`` directory
-
-#. When you are done, push your changes to the git repo.
-
-
-Sphinx Workspace Structure
----------------------------
-
-All of the files concerned with documentation can be found in the ``docs`` directory. At the top level there is the ``conf.py`` file and an ``index.rst`` file among others. There should be no need to change the ``conf.py`` file. The ``index.rst`` file is known as the master document and defines the document structure of the documentation (i.e. Menu Or Table of Contents structure). It contains the root of the “table of contents" tree -or toctree- that is used to connect the multiple files to a single hierarchy of documents. The TOC is defined using the ``toctree`` directive which is used as follows::
-
-    .. toctree::
-       :maxdepth: 2
-       :caption: Getting Started
-
-        usage/introduction
-        usage/installation
-
-The ``toctree`` inserts a TOC tree at the current location using the individual TOCs of the documents given in the directive command body. In other words if there are ``toctree`` directives in the files listed in the above example it will also be applied to the resulting TOC. Relative document names (not beginning with a slash) are relative to the document the directive occurs in, absolute names are relative to the source directory. Thus in the example above the ``usage`` directory is relative to the ``index.rst`` page . The ``:maxdepth:`` parameter defines the depth of the tree for that particular menu. The ``caption`` parameter is used to give a caption for that menu tree at that level. It should be noted the titles for the links of the menu items under that header would be taken from the titles of the referenced document. For example the menu item title for ``usage/introduction`` is taken from the main header specified in ``introduction.rst`` document in the ``usage`` directory. Also note the document name does not include the extention (i.e. .rst).
-
-Thus the directory structure for the above example is shown below::
-
-    ├── index.rst
-    └── usage
-        ├── introduction.rst
-        └── installation.rst
-
-
-Basic reStructuredText Syntax
-------------------------------
-
-Now we will look at some basic reStructuredText syntax necessary to start editing the .rst files that are used to generate documentation. 
-
-Headers
-""""""""
-
-**Section Header**
-
-The following was used to make the header at the top of this page:
-::
-
-    How to Edit Documentation
-    ==========================
-
-**Subsection Header**
-
-The follwoing was used to create the secondary header (e.g. Sphinx Workspace Structure section header)
-::
-
-    Sphinx Workspace structure
-    ----------------------------
-
-..... 
-
-
-Lists
-""""""
-**Bullets List**
-::
-
-    - This is a bullet
-    - This is a bullet
-
-Will produce the following:
-
- This is a bullet
- This is a bullet
-
-
-**Numbered List**
-::
-
-    #. This is a numbered list item
-    #. This is a numbered list item
-
-Will produce the following:
-
-#. This is a numbered list item
-#. This is a numbered list item
-
-.....
-
-Inline Markup
-""""""""""""""
-**Emphasis/Italics**
-::
-
-    *This is for emphasis*
-
-Will produce the following 
-
-*This is for emphasis*
-
-
-**Bold**
-::
-
-    **This is bold text**
-
-Will produce the following
-
-**This is bold text**
-
-..... 
-
-**Code Sample**
-::
-    
-    ``Backquotes = code sample``
-
-Will produce the following:
-
-``Backquotes = code sample``
-
-**Apostraphies in Text**
-::
-
-    `don't know`
-
-Will produce the following
-
-`don't know`
-
-
-**Literal blocks**
-
-Literal code blocks are introduced by ending a paragraph with the special marker ``::``. The literal block must be indented (and, like all paragraphs, separated from the surrounding ones by blank lines)::
-
-    This is a normal text paragraph. The next paragraph is a code sample::
-
-        It is not processed in any way, except
-        that the indentation is removed.
-
-        It can span multiple lines.
-
-    This is a normal text paragraph again.
-
-
-The following is produced:
-
-.....
-
-This is a normal text paragraph. The next paragraph is a code sample::
-
-    It is not processed in any way, except
-    that the indentation is removed.
-
-    It can span multiple lines.
-
-This is a normal text paragraph again.
-
-.....
-
-**Doctest blocks**
-
-Doctest blocks are interactive Python sessions cut-and-pasted into docstrings. They do not require the literal blocks syntax. The doctest block must end with a blank line and should not end with with an unused prompt:
-
->>> 1 + 1
-2
-
-**External links**
-
-Use ```Link text <https://domain.invalid/>`_`` for inline web links `Link text <https://domain.invalid/>`_. If the link text should be the web address, you don’t need special markup at all, the parser finds links and mail addresses in ordinary text. *Important:* There must be a space between the link text and the opening ``<`` for the URL.
-
-You can also separate the link and the target definition , like this
-::
-
-    This is a paragraph that contains `a link`_.
-
-    .. _a link: https://domain.invalid/
-
-
-Will produce the following:
-
-This is a paragraph that contains `a link`_.
-
-.. _a link: https://domain.invalid/
-
-
-
-**Internal links**
-
-Internal linking is done via a special reST role provided by Sphinx to cross-reference arbitrary locations. For this to work label names must be unique throughout the entire documentation. There are two ways in which you can refer to labels:
-
- If you place a label directly before a section title, you can reference to it with ``:ref:`label-name```. For example::
-
-    .. _my-reference-label:
-
-    Section to cross-reference
-    --------------------------
-
-    This is the text of the section.
-
-    It refers to the section itself, see :ref:`my-reference-label`.
-
-The ``:ref:`` role would then generate a link to the section, with the link title being “Section to cross-reference”. This works just as well when section and reference are in different source files. The above produces the following:
-
-.....
-
-.. _my-reference-label:
-
-Section to cross-reference
-"""""""""""""""""""""""""""
-
-This is the text of the section.
-
-It refers to the section itself, see :ref:`my-reference-label`.
-
-.....
-
- Labels that aren’t placed before a section title can still be referenced, but you must give the link an explicit title, using this syntax: ``:ref:`Link title <label-name>```.
-
-
-**Comments**
-
-Every explicit markup block which isn’t a valid markup construct is regarded as a comment. For example::
-
-    .. This is a comment.
-
-Go to Sphinx_ for more documentation. 
-
-.. _Sphinx: https://www.sphinx-doc.org
-.. _reStructuredText: https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html
-
--- a/docs/develop/environments.rst
+++ b/docs/develop/environments.rst
@ -1,18 +0,0 @@
-Manage virtual environments
-=============================
-
-**Add new packages**
-
-Try to install any new package using `conda install my_package`. If a package is not available in one of conda's channels you can install it with pip but make sure your virtual environment is active.
-
-**Update your conda environment.yaml**
-
-After installing a new package you can use the following command in your terminal to update your ``environment.yaml`` before publishing your pipeline. Note that we ignore the package version for ``libfortran`` to keep compatibility with Linux:
-
-    ``conda env export --no-builds | sed 's/^.*libgfortran.*$/  - libgfortran/' >  environment.yml``
-
-**Update and prune your conda environment from a environment.yaml file**
-
-Execute the following command in your terminal. See https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#updating-an-environment
-
-    ``conda env update --prefix ./env --file environment.yml  --prune``
--- a/docs/develop/features.rst
+++ b/docs/develop/features.rst
@ -1,28 +0,0 @@
-Add new features to RAPIDS
-============================
-
-Take accelerometer features as an example.
-
-#. Add your script to accelerometer_ folder
-
-    - Copy the signature of the base_accelerometer_features() function_ for your own feature function
-
-#. Add any parameters you need for your function
-
-    - Add your parameters to the settings_ of accelerometer sensor in config file
-    - Add your parameters to the params_ of accelerometer_features rule in features.snakefile
-
-#. Merge your new features with the existent features
-
-    - Call the function you just created below this line (LINK_) of accelerometer_features.py script
-
-#. Update config file
-
-    - Add your new feature names to the ``FEATURES`` list for accelerometer in the config_ file
-
-.. _accelerometer: https://github.com/carissalow/rapids/tree/master/src/features/accelerometer
-.. _function: https://github.com/carissalow/rapids/blob/master/src/features/accelerometer/accelerometer_base.py#L35
-.. _settings: https://github.com/carissalow/rapids/blob/master/config.yaml#L100
-.. _params: https://github.com/carissalow/rapids/blob/master/rules/features.snakefile#L146
-.. _LINK: https://github.com/carissalow/rapids/blob/master/src/features/accelerometer_features.py#L10
-.. _config: https://github.com/carissalow/rapids/blob/master/config.yaml#L102
--- a/docs/develop/remotesupport.rst
+++ b/docs/develop/remotesupport.rst
@ -1,16 +0,0 @@
-Remote Support
-======================================
-
-We use the Live Share extension of Visual Studio Code to debug bugs when sharing data or database credentials is not possible.
-
-#. Install `Visual Studio Code <https://code.visualstudio.com/>`_
-
-#. Open you rapids folder in a new VSCode window
-
-#. Open a new Terminal ``Terminal > New terminal``
-
-#. Install the `Live Share extension pack <https://marketplace.visualstudio.com/items?itemName=MS-vsliveshare.vsliveshare-pack>`_
-
-#. Press ``Ctrl+P``/``Cmd+P`` and run this command ``>live share: start collaboration session`` 
-
-#. Follow the instructions and share the session link you receive
--- a/docs/develop/test_cases.rst
+++ b/docs/develop/test_cases.rst
@ -1,110 +0,0 @@
-.. _test-cases:
-
-Test Cases
-----------
-
-Along with the continued development and the addition of new sensors and features to the RAPIDS pipeline, tests for the currently available sensors and features are being implemented. Since this is a Work In Progress this page will be updated with the list of sensors and features for which testing is available. For each of the sensors listed a description of the data used for testing (test cases) are outline. Currently for all intent and testing purposes the ``tests/data/raw/test01/`` contains all the test data files for testing android data formats and ``tests/data/raw/test02/`` contains all the test data files for testing iOS data formats. It follows that the expected (verified output) are contained in the ``tests/data/processed/test01/`` and ``tests/data/processed/test02/`` for Android and iOS respectively. ``tests/data/raw/test03/`` and ``tests/data/raw/test04/`` contain data files for testing empty raw data files for android and iOS respectively. 
-
-List of Sensor with Tests
-^^^^^^^^^^^^^^^^^^^^^^^^^^
-The following is a list of the sensors that testing is currently available. 
-
-
-Messages (SMS)
-"""""""""""""""
-
-    - The raw message data file contains data for 2 separate days. 
-    - The data for the first day contains records 5 records for every ``epoch``.
-    - The second day's data contains 6 records for each of only 2 ``epoch`` (currently ``morning`` and ``evening``)
-    - The raw message data contains records for both ``message_types`` (i.e. ``recieved`` and ``sent``) in both days in all epochs. The number records with each ``message_types`` per epoch is randomly distributed There is at least one records with each ``message_types`` per epoch.
-    - There is one raw message data file each, as described above, for testing both iOS and Android data. 
-    - There is also an additional empty data file for both android and iOS for testing empty data files
-
-Calls
-"""""""
-
-    Due to the difference in the format of the raw call data for iOS and Android (see the **Assumptions/Observations** section of :ref:`Calls<call-sensor-doc>`) the following is the expected results the ``calls_with_datetime_unified.csv``. This would give a better idea of the use cases being tested since the ``calls_with_datetime_unified.csv`` would make both the iOS and Android data comparable. 
-
-    - The call data would contain data for 2 days. 
-    - The data for the first day contains 6 records for every ``epoch``. 
-    - The second day's data contains 6 records for each of only 2 ``epoch`` (currently ``morning`` and ``evening``)
-    - The call data contains records for all ``call_types`` (i.e. ``incoming``, ``outgoing`` and ``missed``) in both days in all epochs. The number records with each of the ``call_types`` per epoch is randomly distributed. There is at least one records with each ``call_types`` per epoch.
-    - There is one call data file each, as described above, for testing both iOS and Android data. 
-    - There is also an additional empty data file for both android and iOS for testing empty data files
-
-Screen
-""""""""
-
-    Due to the difference in the format of the raw screen data for iOS and Android (see the **Assumptions/Observations** section of :ref:`Screen<screen-sensor-doc>`) the following is the expected results the ``screen_deltas.csv``. This would give a better idea of the use cases being tested since the ``screen_deltas.csv`` would make both the iOS and Android data comparable. These files are used to calculate the features for the screen sensor. 
-
-    - The screen delta data file contains data for 1 day. 
-    - The screen delta data contains 1 record to represent an ``unlock`` episode that falls within an ``epoch`` for every ``epoch``. 
-    - The screen delta data contains 1 record to represent an ``unlock`` episode that falls across the boundary of 2 epochs. Namely the ``unlock`` episode starts in one epoch and ends in the next, thus there is a record for ``unlock`` episodes that fall across ``night`` to ``morning``, ``morning`` to ``afternoon`` and finally ``afternoon`` to ``night``
-    - The testing is done for ``unlock`` episode_type.
-    - There is one screen data file each for testing both iOS and Android data formats.
-    - There is also an additional empty data file for both android and iOS for testing empty data files
-
-Battery
-"""""""""
-
-    Due to the difference in the format of the raw battery data for iOS and Android as well as versions of iOS (see the **Assumptions/Observations** section of :ref:`Battery<battery-sensor-doc>`) the following is the expected results the ``battery_deltas.csv``. This would give a better idea of the use cases being tested since the ``battery_deltas.csv`` would make both the iOS and Android data comparable. These files are used to calculate the features for the battery sensor. 
-
-    - The battery delta data file contains data for 1 day. 
-    - The battery delta data contains 1 record each for a ``charging`` and ``discharging`` episode that falls within an ``epoch`` for every ``epoch``. Thus, for the ``daily`` epoch there would be multiple ``charging`` and ``discharging`` episodes
-    - Since either a ``charging`` episode or a ``discharging`` episode and not both can occur across epochs, in order to test episodes that occur across epochs alternating episodes of ``charging`` and ``discharging`` episodes that fall across ``night`` to ``morning``, ``morning`` to ``afternoon`` and finally ``afternoon`` to ``night`` are present in the battery delta data. This starts with a ``discharging`` episode that begins in ``night`` and end in ``morning``.
-    - There is one battery data file each, for testing both iOS and Android data formats.
-    - There is also an additional empty data file for both android and iOS for testing empty data files
-
-Bluetooth
-""""""""""
-
-    - The raw Bluetooth data file contains data for 1 day. 
-    - The raw Bluetooth data contains at least 2 records for each ``epoch``. Each ``epoch`` has a record with a ``timestamp`` for the beginning boundary for that ``epoch`` and a record with a ``timestamp`` for the ending boundary for that ``epoch``. (e.g. For the ``morning`` epoch there is a record with a ``timestamp`` for ``6:00AM`` and another record with a ``timestamp`` for ``11:59:59AM``. These are to test edge cases) 
-    - An option of 5 Bluetooth devices are randomly distributed throughout the data records.
-    - There is one raw Bluetooth data file each, for testing both iOS and Android data formats.
-    - There is also an additional empty data file for both android and iOS for testing empty data files.
-
-WIFI
-"""""
-
-    - There are 2 data files (``wifi_raw.csv`` and ``sensor_wifi_raw.csv``) for each fake participant for each phone platform. (see the **Assumptions/Observations** section of :ref:`WIFI<wifi-sensor-doc>`)
-    - The raw WIFI data files contain data for 1 day. 
-    - The ``sensor_wifi_raw.csv`` data contains at least 2 records for each ``epoch``. Each ``epoch`` has a record with a ``timestamp`` for the beginning boundary for that ``epoch`` and a record with a ``timestamp`` for the ending boundary for that ``epoch``. (e.g. For the ``morning`` epoch there is a record with a ``timestamp`` for ``6:00AM`` and another record with a ``timestamp`` for ``11:59:59AM``. These are to test edge cases) 
-    - The ``wifi_raw.csv`` data contains 3 records with random timestamps for each ``epoch`` to represent visible broadcasting WIFI network. This file is empty for the iOS phone testing data.
-    - An option of 10 access point devices is randomly distributed throughout the data records. 5 each for ``sensor_wifi_raw.csv`` and ``wifi_raw.csv``.
-    - There data files for testing both iOS and Android data formats.
-    - There are also additional empty data files for both android and iOS for testing empty data files.
-
-Light
-"""""""
-
-    - The raw light data file contains data for 1 day. 
-    - The raw light data contains 3 or 4 rows of data for each ``epoch`` except ``night``. The single row of data for ``night`` is for testing features for single values inputs. (Example testing the standard deviation of one input value)
-    - Since light is only available for Android there is only one file that contains data for Android. All other files (i.e. for iPhone) are empty data files.
-
-Application Foreground 
-"""""""""""""""""""""""
-
-    - The raw application foreground data file contains data for 1 day. 
-    - The raw application foreground data contains 7 - 9 rows of data for each ``epoch``. The records for each ``epoch`` contains apps that are randomly selected from a list of apps that are from the ``MULTIPLE_CATEGORIES`` and ``SINGLE_CATEGORIES`` (See `testing_config.yaml`_). There are also records in each epoch that have apps randomly selected from a list of apps that are from the ``EXCLUDED_CATEGORIES`` and ``EXCLUDED_APPS``. This is to test that these apps are actually being excluded from the calculations of features. There are also records to test ``SINGLE_APPS`` calculations. 
-    - Since application foreground is only available for Android there is only one file that contains data for Android. All other files (i.e. for iPhone) are empty data files.
-
-Activity Recognition
-""""""""""""""""""""""
-
-    - The raw Activity Recognition data file contains data for 1 day. 
-    - The raw Activity Recognition data each ``epoch`` period contains rows that records 2 - 5 different ``activity_types``. The is such that durations of activities can be tested. Additionally, there are records that mimic the duration of an activity over the time boundary of neighboring epochs. (For example, there a set of records that mimic the participant ``in_vehicle`` from ``afternoon`` into ``evening``) 
-    - There is one file each with raw Activity Recognition data for testing both iOS and Android data formats. (plugin_google_activity_recognition_raw.csv for android and plugin_ios_activity_recognition_raw.csv for iOS)
-    - There is also an additional empty data file for both android and iOS for testing empty data files.
-
-Conversation
-"""""""""""""
-
-    - The raw conversation data file contains data for 2 day. 
-    - The raw conversation data contains records with a sample of both ``datatypes`` (i.e. ``voice/noise`` = ``0``, and ``conversation`` = ``2`` ) as well as rows with for samples of each of the ``inference`` values (i.e. ``silence`` = ``0``, ``noise`` = ``1``, ``voice`` = ``2``, and ``unknown`` = ``3``) for each ``epoch``. The different ``datatype`` and ``inference`` records are randomly distributed throughout the ``epoch``. 
-    - Additionally there are 2 - 5 records for conversations (``datatype`` = 2, and ``inference`` = -1) in each ``epoch`` and for each ``epoch`` except night, there is a conversation record that has a ``double_convo_start`` ``timestamp`` that is from the previous ``epoch``. This is to test the calculations of features across ``epochs``.
-    - There is a raw conversation data file for both android and iOS platforms (``plugin_studentlife_audio_android_raw.csv`` and ``plugin_studentlife_audio_raw.csv`` respectively).
-    - Finally, there are also additional empty data files for both android and iOS for testing empty data files
-
-
- .. _`testing_config.yaml`: https://github.com/carissalow/rapids/blob/c498b8d2dfd7cc29d1e4d53e978d30cff6cdf3f2/tests/settings/testing_config.yaml#L70
--- a/docs/develop/testing.rst
+++ b/docs/develop/testing.rst
@ -1,67 +0,0 @@
-Testing 
-==========
-
-The following is a simple guide to testing RAPIDS. All files necessary for testing are stored in the ``tests`` directory:
-
-::
-
-    ├── tests
-    │   ├── data                        <- Replica of the project root data directory for testing.
-    │   │   ├── external                <- Contains the fake testing participant files. 
-    │   │   ├── interim                 <- The expected intermediate data that has been transformed.
-    │   │   ├── processed               <- The expected final data, canonical data sets for modeling used to test/validate feature calculations.
-    │   │   └── raw                     <- The specially created raw input datasets (fake data) that will be used for testing.
-    │   │   
-    │   ├── scripts                     <- Scripts for testing. Add test scripts in this directory.
-    │   │   ├── run_tests.sh            <- The shell script to runs RAPIDS pipeline test data and test the results
-    │   │   ├── test_sensor_features.py <- The default test script for testing RAPIDS builting sensor features. 
-    │   │   └── utils.py                <- Contains any helper functions and methods.
-    │   │
-    │   ├── settings                    <- The directory contains the config and settings files for testing snakemake.
-    │   │   ├── config.yaml             <- Defines the testing profile configurations for running snakemake.
-    │   │   └── testing_config.yaml     <- Contains the actual snakemake configuration settings for testing.
-    │   │
-    │   └── Snakefile                   <- The Snakefile for testing only. It contains the rules that you would be testing.
-    │
-
-
-Steps for Testing
-""""""""""""""""""
-
-#. To begin testing  RAPIDS place the fake raw input data ``csv`` files in ``tests/data/raw/``. The fake participant files should be placed in ``tests/data/external/``. The expected output files of RAPIDS after processing the input data should be placed in ``tests/data/processesd/``. 
-
-#. The Snakemake rule(s) that are to be tested must be placed in the ``tests/Snakemake`` file. The current ``tests/Snakemake`` is a good example of how to define them. (At the time of writing this documentation the snakefile contains rules messages (SMS), calls and screen)
-
-#. Edit the ``tests/settings/config.yaml``. Add and/or remove the rules to be run for testing from the ``forcerun`` list.
-
-#. Edit the ``tests/settings/testing_config.yaml`` with the necessary configuration settings for running the rules to be tested. 
-
-#. Add any additional testscripts in ``tests/scripts``.
-
-#. Uncomment or comment off lines in the testing shell script ``tests/scripts/run_tests.sh``.
-
-#. Run the testing shell script.
-
-::
-
-    $ tests/scripts/run_tests.sh
-
-
-The following is a snippet of the output you should see after running your test. 
-
-::
-
-    test_sensors_files_exist (test_sensor_features.TestSensorFeatures) ... ok
-    test_sensors_features_calculations (test_sensor_features.TestSensorFeatures) ... FAIL
-
-    ======================================================================
-    FAIL: test_sensors_features_calculations (test_sensor_features.TestSensorFeatures)
-    ----------------------------------------------------------------------
-
-The results above show that the first test ``test_sensors_files_exist`` passed while ``test_sensors_features_calculations`` failed. In addition you should get the traceback of the failure (not shown here). For more information on how to implement test scripts and use unittest please see `Unittest Documentation`_
-
-Testing of the RAPIDS sensors and features is a work-in-progess. Please see :ref:`test-cases` for a list of sensors and features that have testing currently available. 
-
-Currently the repository is set up to test a number of senssors out of the box by simply running the ``tests/scripts/run_tests.sh`` command once the RAPIDS python environment is active. 
-
-.. _`Unittest Documentation`: https://docs.python.org/3.7/library/unittest.html#command-line-interface
--- a/docs/developers/documentation.md
+++ b/docs/developers/documentation.md
@ -0,0 +1,41 @@
+# Documentation
+
+We use [mkdocs](https://www.mkdocs.org/) with the [material theme](https://squidfunk.github.io/mkdocs-material/) to write these docs. Whenever you make any changes, just push them back to the repo and the documentation will be deployed automatically.
+
+## Set up development environment
+
+1. Make sure your conda environment is active
+2. `pip install mkdocs`
+3. `pip install mkdocs-material`
+
+## Preview
+
+Run the following command in RAPIDS root folder and go to [http://127.0.0.1:8000](http://127.0.0.1:8000):
+
+```bash
+mkdocs serve
+```
+
+## File Structure
+
+The documentation config file is `/mkdocs.yml`, if you are adding new `.md` files to the docs modify the `nav` attribute at the bottom of that file. You can use the hierarchy there to find all the files that appear in the documentation.
+
+## Reference
+
+Check this [page](https://squidfunk.github.io/mkdocs-material/reference/abbreviations/) to get familiar with the different visual elements we can use in the docs (admonitions, code blocks, tables, etc.) You can also refer to `/docs/setup/installation.md` and `/docs/setup/configuration.md` to see practical examples of these elements.
+
+!!! hint
+    Any links to internal pages should be relative to the current page. For example, any link from this page (documentation) which is inside `./developers` should begin with `../` to go one folder level up like:
+    ```md
+    [mylink](../setup/installation.md)
+    ```
+
+## Extras
+
+You can insert [emojis](https://facelessuser.github.io/pymdown-extensions/extensions/emoji/) using this syntax `:[SOURCE]-[ICON_NAME]` from the following sources:
+
+- https://materialdesignicons.com/
+- https://fontawesome.com/icons/tasks?style=solid
+- https://primer.style/octicons/
+
+You can use this [page](https://www.tablesgenerator.com/markdown_tables) to create markdown tables more easily
--- a/docs/developers/git-flow.md
+++ b/docs/developers/git-flow.md
@ -0,0 +1,161 @@
+# Git Flow
+
+We use the `develop/master` variation of the [OneFlow](https://www.endoflineblog.com/oneflow-a-git-branching-model-and-workflow) git flow
+
+## Add New Features
+We use feature (topic) branches to implement new features
+
+=== "Internal Developer"
+    You are an internal developer if you have writing permissions to the repository.
+    
+    Most feature branches are never pushed to the repo, only do so if you expect that its development will take days (to avoid losing your work if you computer is damaged). Otherwise follow the following instructions to locally rebase your feature branch into `develop` and push those rebased changes online.
+
+    **Starting your feature branch**
+
+    1. Pull the latest develop 
+    ```bash
+    git checkout develop
+    git pull
+    ```
+    1. Create your feature branch
+    ```bash
+    git checkout -b feature/feature1
+    ```
+    1. Add, modify or delete the necessary files to add your new feature
+    1. Update the [change log](../../change-log) (`docs/change-log.md`)
+    2. Stage and commit your changes using VS Code git GUI or the following commands
+    ```bash
+    git add modified-file1 modified-file2
+    git commit -m "Add my new feature" # use a concise description
+    ```
+
+    **Merging back your feature branch**
+
+    If your changes took time to be implemented it is possible that there are new commits in our `develop` branch, so we need to rebase your feature branch.
+
+    1. Fetch the latest changes to develop
+    ```bash
+    git fetch origin develop
+    ```
+
+    1. Rebase your feature branch
+    ```bash
+    git checkout feature/feature1
+    git rebase -i develop
+    ```
+
+    1. Integrate your new feature to `develop`
+    ```bash
+    git checkout develop
+    git merge --no-ff feature/feature1 # (use the default merge message)
+    git push origin develop
+    git branch -d feature/feature1
+    ```
+
+=== "External Developer"
+    You are an external developer if you do NOT have writing permissions to the repository.
+
+    **Starting your feature branch**
+
+    1. Fork and clone our repository on Github
+    1. Switch to the latest develop 
+    ```bash
+    git checkout develop
+    ```
+    1. Create your feature branch
+    ```bash
+    git checkout -b feature/external-test
+    ```
+    1. Add, modify or delete the necessary files to add your new feature
+    2. Stage and commit your changes using VS Code git GUI or the following commands
+    ```bash
+    git add modified-file1 modified-file2
+    git commit -m "Add my new feature" # use a concise description
+    ```
+    
+    **Merging back your feature branch**
+    
+    If your changes took time to be implemented, it is possible that there are new commits in our `develop` branch, so we need to rebase your feature branch.
+
+    1. Add our repo as another `remote`
+    ```bash
+    git remote add upstream https://github.com/carissalow/rapids/
+    ```
+
+    1. Fetch the latest changes to develop
+    ```bash
+    git fetch upstream develop 
+    ```
+    
+    1. Rebase your feature branch
+    ```bash
+    git checkout feature/external-test
+    git rebase -i develop
+    ```
+    
+    1. Push your feature branch online
+    ```bash
+    git push --set-upstream origin feature/external-test
+    ```
+    
+    1. Open a pull request to the `develop` branch using Github's GUI
+
+## Release a New Version
+
+1. Pull the latest develop 
+```bash
+git checkout develop
+git pull
+```
+1. Create a new release branch
+```bash
+git describe --abbrev=0 --tags # Bump the release (0.1.0 to 0.2.0 => NEW_HOTFIX)
+git checkout -b release/v[NEW_RELEASE] develop
+```
+1. Add new tag
+```bash
+git tag v[NEW_RELEASE]
+```
+1. Merge and push the release branch
+```bash
+git checkout develop
+git merge release/v[NEW_RELEASE]
+git push --tags origin develop
+git branch -d release/v[NEW_RELEASE]
+```
+1. Fast-forward master
+```
+git checkout master
+git merge --ff-only develop
+git push # Unlock the master branch before merging
+```
+1. Release happens automatically after passing the tests
+
+## Release a Hotfix
+1. Pull the latest master
+```bash
+git checkout master
+git pull
+```
+1. Start a hotfix branch
+```bash
+git describe --abbrev=0 --tags # Bump the hotfix (0.1.0 to 0.1.1 => NEW_HOTFIX)
+git checkout -b hotfix/v[NEW_HOTFIX] master
+```
+1. Fix whatever needs to be fixed
+1. Update the change log
+1. Tag and merge the hotfix
+```bash
+git tag v[NEW_HOTFIX]
+git checkout develop
+git merge hotfix/v[NEW_HOTFIX]
+git push --tags origin develop
+git branch -d hotfix/v[NEW_HOTFIX]
+```
+1. Fast-forward master
+```
+git checkout master
+git merge --ff-only v[NEW_HOTFIX]
+git push # Unlock the master branch before merging
+```
+1. Release happens automatically after passing the tests
--- a/docs/developers/remote-support.md
+++ b/docs/developers/remote-support.md
@ -0,0 +1,15 @@
+# Remote Support
+
+We use the Live Share extension of Visual Studio Code to debug bugs when sharing data or database credentials is not possible.
+
+1.  Install [Visual Studio Code](https://code.visualstudio.com/)
+2.  Open your RAPIDS root folder in a new VSCode window
+3.  Open a new terminal in Visual Studio Code `Terminal > New terminal`
+4.  Install the [Live Share extension pack](https://marketplace.visualstudio.com/items?itemName=MS-vsliveshare.vsliveshare-pack)
+5.  Press ++ctrl+p++ or ++cmd+p++ and run this command:
+    
+    ```bash
+    >live share: start collaboration session
+    ```
+
+6.  Follow the instructions and share the session link you receive
--- a/docs/developers/test-cases.md
+++ b/docs/developers/test-cases.md
@ -0,0 +1,587 @@
+# Test Cases
+
+Along with the continued development and the addition of new sensors and features to the RAPIDS pipeline, tests for the currently available sensors and features are being implemented. Since this is a Work In Progress this page will be updated with the list of sensors and features for which testing is available. For each of the sensors listed a description of the data used for testing (test cases) are outline. Currently for all intent and testing purposes the `tests/data/raw/test01/` contains all the test data files for testing android data formats and `tests/data/raw/test02/` contains all the test data files for testing iOS data formats. It follows that the expected (verified output) are contained in the `tests/data/processed/test01/` and `tests/data/processed/test02/` for Android and iOS respectively. `tests/data/raw/test03/` and `tests/data/raw/test04/` contain data files for testing empty raw data files for android and iOS respectively.
+
+The following is a list of the sensors that testing is currently available.
+
+
+| Sensor                        | Provider | Periodic | Frequency | Event |
+|-------------------------------|----------|----------|-----------|-------|
+| Phone Accelerometer           | Panda    | Y        | Y         | Y     |
+| Phone Accelerometer           | RAPIDS   | Y        | Y         | Y     |
+| Phone Activity Recognition    | RAPIDS   | Y        | Y         | Y     |
+| Phone Applications Foreground | RAPIDS   | Y        | Y         | Y     |
+| Phone Battery                 | RAPIDS   | Y        | Y         | Y     |
+| Phone Bluetooth               | Doryab   | Y        | Y         | Y     |
+| Phone Bluetooth               | RAPIDS   | Y        | Y         | Y     |
+| Phone Calls                   | RAPIDS   | Y        | Y         | Y     |
+| Phone Conversation            | RAPIDS   | Y        | Y         | Y     |
+| Phone Data Yield              | RAPIDS   | Y        | Y         | Y     |
+| Phone Light                   | RAPIDS   | Y        | Y         | Y     |
+| Phone Locations               | Doryab   | Y        | Y         | Y     |
+| Phone Locations               | Barnett  | N        | N         | N     |
+| Phone Messages                | RAPIDS   | Y        | Y         | Y     |
+| Phone Screen                  | RAPIDS   | Y        | Y         | Y     |
+| Phone WiFi Connected          | RAPIDS   | Y        | Y         | Y     |
+| Phone WiFi Visible            | RAPIDS   | Y        | Y         | Y     |
+| Fitbit Calories Intraday      | RAPIDS   | Y        | Y         | Y     |
+| Fitbit Data Yield             | RAPIDS   | Y        | Y         | Y     |
+| Fitbit Heart Rate Summary     | RAPIDS   | Y        | Y         | Y     |
+| Fitbit Heart Rate Intraday    | RAPIDS   | Y        | Y         | Y     |
+| Fitbit Sleep Summary          | RAPIDS   | Y        | Y         | Y     |
+| Fitbit Sleep Intraday         | RAPIDS   | Y        | Y         | Y     |
+| Fitbit Sleep Intraday         | PRICE    | Y        | Y         | Y     |
+| Fitbit Steps Summary          | RAPIDS   | Y        | Y         | Y     |
+| Fitbit Steps Intraday         | RAPIDS   | Y        | Y         | Y     |
+
+
+## Accelerometer
+
+Description
+
+- The raw accelerometer data file, `phone_accelerometer_raw.csv`, contains data for 4 separate days
+- One episode for each daily segment (night, morning, afternoon and evening)
+- Two episodes locate in the same 30-min segment (`Fri 00:15:00` and `Fri 00:21:21`)
+- Two episodes locate in the same daily segment (`Fri 00:15:00` and `Fri 18:12:00`)
+- One episode before the time switch (`Sun 00:02:00`) and one episode after the time switch (`Sun 04:18:00`)
+- Multiple episodes within one min which cause variance in magnitude (`Fri 00:10:25`, `Fri 00:10:27` and `Fri 00:10:46`)
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|android, ios|
+|morning|OK|OK|android, ios|
+|daily|OK|OK|android, ios|
+|threeday|OK|OK|android, ios|
+|weekend|OK|OK|android, ios|
+|beforeMarchEvent|OK|OK|android, ios|
+|beforeNovemberEvent|OK|OK|android, ios|
+
+## Messages (SMS)
+
+Description
+
+- The raw message data file, `phone_messages_raw.csv`, contains data for 4 separate days
+- One episode for each daily segment (night, morning, afternoon and evening)
+- Two `sent` episodes locate in the same 30-min segment (`Fri 16:08:03.000` and `Fri 16:19:35.000`)
+- Two `received` episodes locate in the same 30-min segment (`Sat 06:45:05.000` and `Fri 06:45:05.000`)
+- Two episodes locate in the same daily segment (`Fri 11:57:56.385` and `Sat 10:54:10.000`)
+- One episode before the time switch (`Sun 00:48:01.000`) and one episode after the time switch (`Sun 06:21:01.000`)
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|android|
+|morning|OK|OK|android|
+|daily|OK|OK|android|
+|threeday|OK|OK|android|
+|weekend|OK|OK|android|
+|beforeMarchEvent|OK|OK|android|
+|beforeNovemberEvent|OK|OK|android|
+
+## Calls
+
+Due to the difference in the format of the raw data for iOS and Android the following is the expected results 
+the `phone_calls.csv`. 
+
+Description
+
+- One missed episode, one outgoing episode and one incoming episode on Friday night, morning, afternoon and evening
+- There is at least one episode of each type of phone calls on each day
+- One incoming episode crossing two 30-mins segments
+- One outgoing episode crossing two 30-mins segments
+- One missed episode before, during and after the `event`
+- There is one incoming episode before, during or after the `event`
+- There is one outcoming episode before, during or after the `event`
+- There is one missed episode before, during or after the `event`
+
+Data format
+
+| Device | Missed | Outgoing | Incoming |
+|-|-|-|-|
+|android| 3 | 2 | 1 |
+|ios| 1,4 or 3,4 | 3,2,4 | 1,2,4 |
+
+Note
+When generating test data, all traces for iOS device need to be unique otherwise the episode with duplicate trace will be dropped 
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|android, iOS|
+|morning|OK|OK|android, iOS|
+|daily|OK|OK|android, iOS|
+|threeday|OK|OK|android, iOS|
+|weekend|OK|OK|android, iOS|
+|beforeMarchEvent|OK|OK|android, iOS|
+|beforeNovemberEvent|OK|OK|android, iOS|
+
+## Screen
+
+Due to the difference in the format of the raw screen data for iOS and Android the following is the expected results the `phone_screen.csv`. 
+
+Description
+
+- The screen data file contains data for 4 days.
+- The screen data contains 1 record to represent an `unlock`
+    episode that falls within an `epoch` for every `epoch`.
+- The screen data contains 1 record to represent an `unlock`
+    episode that falls across the boundary of 2 epochs. Namely the
+    `unlock` episode starts in one epoch and ends in the next, thus
+    there is a record for `unlock` episodes that fall across `night`
+    to `morning`, `morning` to `afternoon` and finally `afternoon` to
+    `night`
+- One episode that crossing two `30-min` segments
+
+Data format
+
+| Device | unlock |
+|-|-|
+| Android | 3, 0|
+| iOS | 3, 2|
+
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|android, iOS|
+|morning|OK|OK|android, iOS|
+|daily|OK|OK|android, iOS|
+|threeday|OK|OK|android, iOS|
+|weekend|OK|OK|android, iOS|
+|beforeMarchEvent|OK|OK|android, iOS|
+|beforeNovemberEvent|OK|OK|android, iOS|
+
+## Battery
+
+Description
+
+- The 4-day raw data is contained in `phone_battery_raw.csv`
+- One discharge episode acrossing two 30-min time segements (`Fri 05:57:30.123` to `Fri 06:04:32.456`)
+- One charging episode acrossing two 30-min time segments (`Fri 11:55:58.416` to `Fri 12:08:07.876`)
+- One discharge episode and one charging episode locate within the same 30-min time segement (`Fri 21:30:00` to `Fri 22:00:00`)
+- One episode before the time switch (`Sun 00:24:00.000`) and one episode after the time switch (`Sun 21:58:00`)
+- Two episodes locate in the same daily segment
+  
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|android|
+|morning|OK|OK|android|
+|daily|OK|OK|android|
+|threeday|OK|OK|android|
+|weekend|OK|OK|android|
+|beforeMarchEvent|OK|OK|android|
+|beforeNovemberEvent|OK|OK|android|
+
+## Bluetooth
+
+Description 
+
+- The 4-day raw data is contained in `phone_bluetooth_raw.csv`
+- One episode for each daily segment (`night`, `morning`, `afternoon` and `evening`)
+- Two episodes locate in the same 30-min segment (`Fri 23:38:45.789` and `Fri 23:59:59.465`)
+- Two episodes locate in the same daily segment (`Fri 00:00:00.798` and `Fri 00:49:04.132`)
+- One episode before the time switch (`Sun 00:24:00.000`) and one episode after the time switch (`Sun 17:32:00.000`)
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|android|
+|morning|OK|OK|android|
+|daily|OK|OK|android|
+|threeday|OK|OK|android|
+|weekend|OK|OK|android|
+|beforeMarchEvent|OK|OK|android|
+|beforeNovemberEvent|OK|OK|android|
+
+## WIFI
+
+There are two wifi features (`phone wifi connected` and `phone wifi visible`). The raw test data are seperatly stored in the `phone_wifi_connected_raw.csv` and `phone_wifi_visible_raw.csv`.
+
+Description 
+
+- One episode for each `epoch` (`night`, `morining`, `afternoon` and `evening`)
+- Two two episodes in the same time segment (`daily` and `30-min`)
+- Two episodes around the transition of `epochs` (e.g. one at the end of `night` and one at the beginning of `morning`) 
+- One episode before and after the time switch on Sunday
+
+phone wifi connected
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|android, iOS|
+|morning|OK|OK|android, iOS|
+|daily|OK|OK|android, iOS|
+|threeday|OK|OK|android, iOS|
+|weekend|OK|OK|android, iOS|
+|beforeMarchEvent|OK|OK|android, iOS|
+|beforeNovemberEvent|OK|OK|android, iOS|
+
+phone wifi visible
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|android|
+|morning|OK|OK|android|
+|daily|OK|OK|android|
+|threeday|OK|OK|android|
+|weekend|OK|OK|android|
+|beforeMarchEvent|OK|OK|android|
+|beforeNovemberEvent|OK|OK|android|
+
+## Light
+
+Description
+
+- The 4-day raw light data is contained in `phone_light_raw.csv`
+- One episode for each daily segment (`night`, `morning`, `afternoon` and `evening`)
+- Two episodes locate in the same 30-min segment (`Fri 00:07:27.000` and `Fri 00:12:00.000`)
+- Two episodes locate in the same daily segment (`Fri 01:00:00` and `Fri 03:59:59.654`)
+- One episode before the time switch (`Sun 00:08:00.000`) and one episode after the time switch (`Sun 05:36:00.000`)
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|android|
+|morning|OK|OK|android|
+|daily|OK|OK|android|
+|threeday|OK|OK|android|
+|weekend|OK|OK|android|
+|beforeMarchEvent|OK|OK|android|
+|beforeNovemberEvent|OK|OK|android|
+
+## Locations
+
+Description
+
+- The participant's home location is (latitude=1, longitude=1).
+- From Sat 10:56:00 to Sat 11:04:00, the center of the cluster is (latitude=-100, longitude=-100).
+- From Sun 03:30:00 to Sun 03:47:00, the center of the cluster is (latitude=1, longitude=1). Home location is extracted from this period.
+- From Sun 11:30:00 to Sun 11:38:00, the center of the cluster is (latitude=100, longitude=100).
+
+## Application Foreground
+
+- The 4-day raw application data is contained in `phone_applications_foreground_raw.csv`
+- One episode for each daily segment (night, morning, afternoon and evening)
+- Two episodes locate in the same 30-min segment (`Fri 10:12:56.385` and `Fri 10:18:48.895`)
+- Two episodes locate in the same daily segment (`Fri 11:57:56.385` and `Fri 12:02:56.385`)
+- One episode before the time switch (`Sun 00:07:48.001`) and one episode after the time switch (`Sun 05:10:30.001`)
+- Two custom category (`Dating`) episode, one at `Fri 06:05:10.385`, another one at ` Fri 11:53:00.385`
+
+Checklist:
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|android|
+|morning|OK|OK|android|
+|daily|OK|OK|android|
+|threeday|OK|OK|android|
+|weekend|OK|OK|android|
+|beforeMarchEvent|OK|OK|android|
+|beforeNovemberEvent|OK|OK|android|
+
+## Activity Recognition
+
+Description
+
+- The 4-day raw activity data is contained in `plugin_google_activity_recognition_raw.csv` and `plugin_ios_activity_recognition_raw.csv`.
+- Two episodes locate in the same 30-min segment (`Fri 04:01:54` and `Fri 04:13:52`)
+- One episode for each daily segment (`night`, `morning`, `afternoon` and `evening`)
+- Two episodes locate in the same daily segment (`Fri  05:03:09` and `Fri 05:50:36`)
+- Two episodes with the time difference less than `5 mins` threshold (`Fri 07:14:21` and `Fri 07:18:50`)
+- One episode before the time switch (`Sun 00:46:00`) and one episode after the time switch (`Sun 03:42:00`)
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|android, iOS|
+|morning|OK|OK|android, iOS|
+|daily|OK|OK|android, iOS|
+|threeday|OK|OK|android, iOS|
+|weekend|OK|OK|android, iOS|
+|beforeMarchEvent|OK|OK|android, iOS|
+|beforeNovemberEvent|OK|OK|android, iOS|
+
+## Conversation
+
+The 4-day raw conversation data is contained in `phone_conversation_raw.csv`. The different `inference` records are 
+randomly distributed throughout the `epoch`. 
+
+Description
+
+- One episode for each daily segment (`night`, `morning`, `afternoon` and `evening`) on each day
+- Two episodes near the transition of the daily segment, one starts at the end of the afternoon, `Fri 17:10:00` and another one starts at the beginning of the evening, `Fri 18:01:00`
+- One episode across two segments, `daily` and `30-mins`, (from `Fri 05:55:00` to `Fri 06:00:41`)
+- Two episodes locate in the same daily segment (`Sat 12:45:36` and `Sat 16:48:22`)
+- One episode before the time switch, `Sun 00:15:06`, and one episode after the time switch, `Sun 06:01:00`
+
+Data format
+
+| inference | type |
+| - | - |
+| 0 | silence |
+| 1 | noise | 
+| 2 | voice |
+| 3 | unknown | 
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|android|
+|morning|OK|OK|android|
+|daily|OK|OK|android|
+|threeday|OK|OK|android|
+|weekend|OK|OK|android|
+|beforeMarchEvent|OK|OK|android|
+|beforeNovemberEvent|OK|OK|android|
+
+## Keyboard
+
+- The raw keyboard data file contains data for 4 days.
+- The raw keyboard data contains records with difference in `timestamp` ranging from
+  milliseconds to seconds.  
+  
+- With difference in timestamps between consecutive records more than 5 seconds helps us to create separate 
+  sessions within the usage of the same app. This helps to verify the case where sessions have to be different. 
+
+- The raw keyboard data contains records where the difference in text is less 
+  than 5 seconds which makes it into 1 session but because of difference of app
+  new session starts. This edge case determines the behaviour within particular app
+  and also within 5 seconds.
+
+- The raw keyboard data also contains the records where length of `current_text` varies between consecutive rows. This helps us to tests on the cases where input text is entered by auto-suggested
+  or auto-correct operations.
+
+- One three-minute episode with a 1-minute row on Sun 08:59:54.65 and 09:00:00,another on Sun 12:01:02 that are considering a single episode in multi-timezone event segments to showcase how
+ inferring time zone data for Keyboard from phone data can produce inaccurate results around the tz change. This happens because the device was on LA time until 11:59 and switched to NY time at 12pm, in terms of actual time 09 am LA and 12 pm NY represent the same moment in time so 09:00 LA and 12:01 NY are consecutive minutes.
+## Application Episodes
+
+-   The feature requires raw application foreground data file and raw phone screen data file
+-   The raw data files contains data for 4 day.
+-   The raw conversation data contains records with difference in `timestamp` ranging from milliseconds to minutes.
+-   An app episode starts when an app is launched and ends when another app is launched, marking the episode end of the first one,
+or when the screen locks. Thus, we are taking into account the screen unlock episodes.
+-   There are multiple apps usage within each screen unlock episode to verify creation of different app episodes in each 
+screen unlock session. In the screen unlock episode starting from Fri 05:56:51, Fri 10:00:24, Sat 17:48:01, Sun 22:02:00, and Mon 21:05:00 we have multiple apps, both system and non-system apps, to check this.
+-   The 22 minute chunk starting from Fri 10:03:56 checks app episodes for system apps only.
+-   The screen unlock episode starting from Mon 21:05:00 and Sat 17:48:01 checks if the screen lock marks the end of episode for that particular app which was launched a few milliseconds to 8 mins before the screen lock.
+-   Finally, since application foreground is only for Android devices, this feature is also for Android devices only. All other files are empty data files
+
+
+## Data Yield
+
+Description
+
+- Two sensors were picked for testing, `phone_screen` and `phone_light`. `phone_screen` is event based and `phone_light` is sampling at regular frequency
+- A 31-min episode (from `Fri 01:00:00` to `Fri 01:30:00`) in phone_light data, which is considered as a `validyieldedhours`
+
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|android, ios|
+|morning|OK|OK|android, ios|
+|daily|OK|OK|android, ios|
+|threeday|OK|OK|android, ios|
+|weekend|OK|OK|android, ios|
+|beforeMarchEvent|OK|OK|android, ios|
+|beforeNovemberEvent|OK|OK|android, ios|
+
+
+## Fitbit Calories Intraday
+
+Description
+
+- A five-minute sedentary episode on Fri 11:00:00
+- A one-minute sedentary episode on Sun 02:00:00. It exists in November but not in February in STZ
+- A five-minute sedentary episode on Fri 11:58:00. It is split within two 30-min segments and the morning
+- A three-minute lightly active episode on Fri 11:10:00, a one-minute at 11:18:00 and a one-minute 11:24:00. These check for start and end times of first/last/longest episode
+- A three-minute fairly active episode on Fri 11:40:00, a one-minute at 11:48:00 and a one-minute 11:54:00. These check for start and end times of first/last/longest episode
+- A three-minute very active episode on Fri 12:10:00, a one-minute at 12:18:00 and a one-minute 12:24:00. These check for start and end times of first/last/longest episode
+- A eight-minute MVPA episode with intertwined fairly and very active rows on Fri 12:30:00
+- The above episodes contain six higmet (>= 3 MET) episodes and nine lowmet episodes.
+- One two-minute sedentary episode with a 1-minute row on Sun 09:00:00 and another on Sun 12:01:01 that are considering a single episode in multi-timezone event segments to showcase how inferring time zone data for Fitbit from phone data can produce inaccurate results around the tz change. This happens because the device was on LA time until 11:59 and switched to NY time at 12pm, in terms of actual time 09 am LA and 12 pm NY represent the same moment in time so 09:00 LA and 12:01 NY are consecutive minutes.
+- A three-minute sedentary episode on Sat 08:59 that will be ignored for multi-timezone event segments.
+- A three-minute sedentary episode on Sat 12:59 of which the first minute will be ignored for multi-timezone event segments since the test segment starts at 13:00
+- A three-minute sedentary episode on Sat 16:00
+- A four-minute sedentary episode on Sun 10:01 that will be ignored for Novembers's multi-timezone event segments since the test segment ends at 10am on that weekend.
+- A three-minute very active episode on Sat 16:03. This episode and the one at 16:00 are counted as one for lowmet episodes
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|fitbit|
+|morning|OK|OK|fitbit|
+|daily|OK|OK|fitbit|
+|threeday|OK|OK|fitbit|
+|weekend|OK|OK|fitbit|
+|beforeMarchEvent|OK|OK|fitbit|
+|beforeNovemberEvent|OK|OK|fitbit|
+
+
+## Fitbit Heartrate intraday 
+
+Description:
+
+- The 4-day raw heartrate data is contained in `fitbit_heartrate_intraday_raw.csv`
+- One episode for each daily segment (`night`, `morning`, `afternoon` and `evening`)
+- Two episodes locate in the same 30-min segment (`Fri 00:49:00` and `Fri 00:52:00`)
+- Two different types of heartrate zone episodes locate in the same 30-min segment (`Fri 05:49:00 outofrange` and `Fri 05:57:00 fatburn`)
+- Two episodes locate in the same daily segment (`Fri 12:02:00` and `Fri 19:38:00`)
+- One episode before the time switch, `Sun 00:08:00`, and one episode after the time switch, `Sun 07:28:00`
+
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|fitbit|
+|morning|OK|OK|fitbit|
+|daily|OK|OK|fitbit|
+|threeday|OK|OK|fitbit|
+|weekend|OK|OK|fitbit|
+|beforeMarchEvent|OK|OK|fitbit|
+|beforeNovemberEvent|OK|OK|fitbit|
+
+## Fitbit Sleep Summary
+
+Description
+
+- A main sleep episode that starts on Fri 20:00:00 and ends on Sat 02:00:00. This episode starts after 11am (Last Night End) which will be considered as today's (Fri) data.
+- A nap that starts on Sat 04:00:00 and ends on Sat 06:00:00. This episode starts before 11am (Last Night End) which will be considered as yesterday's (Fri) data.
+- A nap that starts on Sat 13:00:00 and ends on Sat 15:00:00. This episode starts after 11am (Last Night End) which will be considered as today's (Sat) data.
+- A main sleep that starts on Sun 01:00:00 and ends on Sun 12:00:00. This episode starts before 11am (Last Night End) which will be considered as yesterday's (Sat) data.
+- A main sleep that starts on Sun 23:00:00 and ends on Mon 07:00:00. This episode starts after 11am (Last Night End) which will be considered as today's (Sun) data.
+- Any segment shorter than one day will be ignored for sleep RAPIDS features.
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|fitbit|
+|morning|OK|OK|fitbit|
+|daily|OK|OK|fitbit|
+|threeday|OK|OK|fitbit|
+|weekend|OK|OK|fitbit|
+|beforeMarchEvent|OK|OK|fitbit|
+|beforeNovemberEvent|OK|OK|fitbit|
+
+## Fitbit Sleep Intraday
+
+Description
+
+- A five-minute main sleep episode with asleep-classic level on Fri 11:00:00.
+- An eight-hour main sleep episode on Fri 17:00:00. It is split into 2 parts for daily segment: a seven-hour sleep episode on Fri 17:00:00 and an one-hour sleep episode on Sat 00:00:00.
+- A two-hour nap on Sat 01:00:00 that will be ignored for main sleep features.
+- An one-hour nap on Sat 13:00:00 that will be ignored for main sleep features.
+- An eight-hour main sleep episode on Sat 22:00:00. This episode ends on Sun 08:00:00 (NY) for March and Sun 06:00:00 (NY) for Novembers due to daylight savings. It will be considered for `beforeMarchEvent` segment and ignored for `beforeNovemberEvent` segment.
+- A nine-hour main sleep episode on Sun 11:00:00. Start time will be assigned as NY time zone and converted to 14:00:00.
+- A seven-hour main sleep episode on Mon 06:00:00. This episode will be split into two parts: a five-hour sleep episode on Mon 06:00:00 and a two-hour sleep episode on Mon 11:00:00. The first part will be discarded as it is before 11am (Last Night End)
+- Any segment shorter than one day will be ignored for sleep PRICE features.
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|fitbit|
+|morning|OK|OK|fitbit|
+|daily|OK|OK|fitbit|
+|threeday|OK|OK|fitbit|
+|weekend|OK|OK|fitbit|
+|beforeMarchEvent|OK|OK|fitbit|
+|beforeNovemberEvent|OK|OK|fitbit|
+
+
+## Fitbit Heartrate Summary
+
+Description
+
+- The 4-day raw heartrate summary data is contained in `fitbit_heartrate_summary_raw.csv`.
+- As heartrate summary is periodic, it only generates results in periodic feature, there will be no result in frequency and event. 
+
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|fitbit|
+|morning|OK|OK|fitbit|
+|daily|OK|OK|fitbit|
+|threeday|OK|OK|fitbit|
+|weekend|OK|OK|fitbit|
+|beforeMarchEvent|OK|OK|fitbit|
+|beforeNovemberEvent|OK|OK|fitbit|
+
+## Fitbit Step Intraday
+
+Description
+
+- The 4-day raw heartrate summary data is contained in `fitbit_steps_intraday_raw.csv`
+- One episode for each daily segment (`night`, `morning`, `afternoon` and `evening`) on each day
+- Two episodes within the same 30-min segment (`Fri 05:58:00` and `Fri 05:59:00`)
+- A one-min episode at `2020-03-07 09:00:00` that will be converted to New York time `2020-03-07 12:00:00`
+- One episode before the time switch, `Sun 00:19:00`, and one episode after the time switch, `Sun 09:01:00`
+- Episodes cross two 30-min segments (`Fri 11:59:00` and `Fri 12:00:00`)
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|fitbit|
+|morning|OK|OK|fitbit|
+|daily|OK|OK|fitbit|
+|threeday|OK|OK|fitbit|
+|weekend|OK|OK|fitbit|
+|beforeMarchEvent|OK|OK|fitbit|
+|beforeNovemberEvent|OK|OK|fitbit|
+
+
+## Fitbit Step Summary
+
+Description
+
+- The 4-day raw heartrate summary data is contained in `fitbit_steps_summary_raw.csv`.
+- As heartrate summary is periodic, it only generates results in periodic feature, there will be no result in frequency and event. 
+
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|fitbit|
+|morning|OK|OK|fitbit|
+|daily|OK|OK|fitbit|
+|threeday|OK|OK|fitbit|
+|weekend|OK|OK|fitbit|
+|beforeMarchEvent|OK|OK|fitbit|
+|beforeNovemberEvent|OK|OK|fitbit|
+
+## Fitbit Data Yield
+
+Checklist
+
+|time segment| single tz | multi tz|platform|
+|-|-|-|-|
+|30min|OK|OK|fitbit|
+|morning|OK|OK|fitbit|
+|daily|OK|OK|fitbit|
+|threeday|OK|OK|fitbit|
+|weekend|OK|OK|fitbit|
+|beforeMarchEvent|OK|OK|fitbit|
+|beforeNovemberEvent|OK|OK|fitbit|
--- a/docs/developers/testing.md
+++ b/docs/developers/testing.md
@ -0,0 +1,177 @@
+# Testing
+
+The following is a simple guide to run RAPIDS' tests. All files necessary for testing are stored in the `./tests/` directory
+
+## Steps for Testing
+
+??? check "**Testing Overview**"
+    1. You have to create a single four day test dataset for the sensor you are working on. 
+    2. You will adjust your dataset with `tests/script/assign_test_timestamps.py` to fit `Fri March 6th 2020 - Mon March 9th 2020` and `Fri Oct 30th 2020 - Mon Nov 2nd 2020`. We test daylight saving times with these dates.
+    2. We have one test participant per platform (`pids`: `android`, `ios`, `fitbit`, `empatica`, `empty`). The data `device_id` should be equal to the `pid`.
+    2. We will run this test dataset against six test pipelines, three for `frequency`, `periodic`, and `event` time segments in a `single` time zone, and the same three in `multiple` time zones.
+    3. You will have to create your test data to cover as many corner cases as possible. These cases depend on the sensor you are working on.
+    4. The time segments and time zones to be tested are:
+
+    ??? example "Frequency"
+        - 30 minutes (`30min,30`)
+
+    ??? example "Periodic"
+        - morning (`morning,06:00:00,5H 59M 59S,every_day,0`)
+        - daily (`daily,00:00:00,23H 59M 59S,every_day,0`)
+        - three-day segments that repeat every day (`threeday,00:00:00,71H 59M 59S,every_day,0`)
+        - three-day segments that repeat every Friday (`weekend,00:00:00,71H 59M 59S,wday,5`)
+
+    ??? example "Event"
+        - A segment that starts 3 hour before an event (Sat Mar 07 2020 19:00:00 EST) and lasts for 22 hours. Note that the last part of this segment will happen during a daylight saving change on Sunday at 2am when the clock moves forward and the period 2am-3am does not exist. In this case, the segment would start on Sat Mar 07 2020 16:00:00 EST (timestamp: 1583614800000) and end on Sun Mar 08 2020 15:00:00 EST (timestamp: 1583694000000). (`beforeMarchEvent,1583625600000,22H,3H,-1,android`)
+        - A segment that starts 3 hour before an event (Sat Oct 31 2020 19:00:00 EST) and lasts for 22 hours. Note that the last part of this segment will happen during a daylight saving change on Sunday at 2am when the clock moves back and the period 1am-2am exists twice. In this case, the segment would start on Sat Oct 31 2020 16:00:00 EST (timestamp: 1604174400000) and end on Sun Nov 01 2020 13:00:00 EST (timestamp: 1604253600000). (`beforeNovemberEvent,1604185200000,22H,3H,-1,android`)
+
+    ??? example "Single time zone to test"
+        America/New_York
+
+    ??? example "Multi time zones to test"
+        - America/New_York starting at `0`
+        - America/Los_Angeles starting at `1583600400000` (Sat Mar 07 2020 12:00:00 EST)
+        - America/New_York starting at `1583683200000` (Sun Mar 08 2020 12:00:00 EST)
+        - America/Los_Angeles starting at `1604160000000` (Sat Oct 31 2020 12:00:00 EST)
+        - America/New_York starting at `1604250000000` (Sun Nov 01 2020 12:00:00 EST)
+    
+    ??? hint "Understanding event segments with multi timezones"
+        <figure>
+            <img src="../../img/testing_eventsegments_mtz.png" max-width="100%" />
+        </figure>
+
+??? check "**Document your tests**"
+
+    - Before you start implementing any test data you need to document your tests. 
+    - The documentation of your tests should be added to `docs/developers/test-cases.md` under the corresponding sensor. 
+    - You will need to add two subsections `Description` and the `Checklist`
+    - The amount of data you need depends on each sensor but you can be efficient by creating data that covers corner cases in more than one time segment. For example, a battery episode from 11am to 1pm, covers the case when an episode has to be split for 30min frequency segments and for morning segments.
+    - As a rule of thumb think about corner cases for 30min segments as they will give you the most flexibility.
+    - Only add tests for iOS if the raw data format is different than Android's (for example for screen)
+    - Create specific tests for Sunday before and after 02:00. These will test daylight saving switches, in March 02:00 to 02:59 do not exist, and in November 01:00 to 01:59 exist twice (read below how `tests/script/assign_test_timestamps.py` handles this)
+
+
+    ??? example "Example of Description"
+        `Description` is a list and every item describes the different scenarios your test data is covering. For example, if we are testing PHONE_BATTERY:
+
+        ```
+        - We test 24 discharge episodes, 24 charge episodes and 2 episodes with a 0 discharge rate
+        - One episode is shorter than 30 minutes (`start timestamp` to `end timestamp`)
+        - One episode is 120 minutes long from 11:00 to 13:00 (`start timestamp` to `end timestamp`). This one covers the case when an episode has to be chunked for 30min frequency segments and for morning segments
+        - One episode is 60 minutes long from 23:30 to 00:30 (`start timestamp` to `end timestamp`). This one covers the case when an episode has to be chunked for 30min frequency segments and for daly segments (overnight)
+        - One 0 discharge rate episode 10 minutes long that happens within a 30-minute segment (10:00 to 10:29) (`start timestamp` to `end timestamp`)
+        - Three discharge episodes that happen between during beforeMarchEvent (start/end timestamps of those discharge episodes)
+        - Three charge episodes that happen between during beforeMarchEvent (start/end timestamps of those charge episodes)
+        - One discharge episode that happen between 00:30 and 04:00 to test for daylight saving times in March and Novemeber 2020.
+        - ... any other test corner cases you can think of
+        ```
+
+        Describe your test cases in as much detail as possible so in the future if we find a bug in RAPIDS, we know what test case we did not include and should add.
+
+    
+    ??? example "Example of Checklist"
+        `Checklist` is a table where you confirm you have verified the output of your dataset for the different time segments and time zones
+
+        |time segment| single tz | multi tz|platform|
+        |-|-|-|-|
+        |30min|OK|OK|android and iOS|
+        |morning|OK|OK|android and iOS|
+        |daily|OK|OK|android and iOS|
+        |threeday|OK|OK|android and iOS|
+        |weekend|OK|OK|android and iOS|
+        |beforeMarchEvent|OK|OK|android and iOS|
+        |beforeNovemberEvent|OK|OK|android and iOS|
+
+
+??? check "**Add raw input data.**"
+    1. Add the raw test data to the corresponding sensor CSV file in `tests/data/manual/aware_csv/SENSOR_raw.csv`. Create the CSV if it does not exist.
+    2. The test data you create will have the same columns as normal raw data except `test_time` replaces `timestamp`. To make your life easier, you can place a test data row in time using the `test_time` column with the following format: `Day HH:MM:SS.XXX`, for example `Fri 22:54:30.597`.
+    2. You can convert your manual test data to actual raw test data with the following commands:
+        
+        - For the selected files: (It could be a single file name or multiple file names separated by whitespace(s))
+            ```
+            python tests/scripts/assign_test_timestamps.py -f file_name_1 file_name_2
+            ```
+
+        - For all files under the `tests/data/manual/aware_csv` folder: 
+            ```
+            python tests/scripts/assign_test_timestamps.py -a
+            ```
+    
+    2. The script `assign_test_timestamps.py` converts you `test_time` column into a `timestamp`. For example, `Fri 22:54:30.597` is converted to `1583553270597` (`Fri Mar 06 2020 22:54:30 GMT-0500`) and to `1604112870597` (`Fri Oct 30 2020 22:54:30 GMT-0400`). Note you can include milliseconds.
+    2. The `device_id` should be the same as `pid`.
+
+    ??? example "Example of test data you need to create"
+        The `test_time` column will be automatically converted to a timestamp that fits our testing periods in March and November by `tests/script/assign_test_timestamps.py`
+
+        ```
+        test_time,device_id,battery_level,battery_scale,battery_status
+        Fri 01:00:00.000,ios,90,100,4
+        Fri 01:00:30.500,ios,89,100,4
+        Fri 01:01:00.000,ios,80,100,4
+        Fri 01:01:45.500,ios,79,100,4
+        ...
+        Sat 08:00:00.000,ios,78,100,4
+        Sat 08:01:00.000,ios,50,100,4
+        Sat 08:02:00.000,ios,49,100,4
+        ```
+
+??? check "**Add expected output data.**"
+    1. Add or update the expected output feature file of the participant and sensor you are testing:
+    ```bash
+    tests/data/processed/features/{type_of_time_segment}/{pid}/device_sensor.csv 
+    
+    # this example is expected output data for battery tests for periodic segments in a single timezone
+    tests/data/processed/features/stz_periodic/android/phone_sensor.csv 
+
+    # this example is expected output data for battery tests for periodic segments in multi timezones
+    tests/data/processed/features/mtz_periodic/android/phone_sensor.csv 
+    ```
+
+??? check "**Edit the config file(s).**"
+    1. Activate the sensor provider you are testing if it isn't already. Set `[SENSOR][PROVIDER][COMPUTE]` to `TRUE` in the `config.yaml` of the time segments and time zones you are testing:
+    ```yaml
+    - tests/settings/stz_frequency_config.yaml # For single-timezone frequency time segments
+    - tests/settings/stz_periodic_config.yaml # For single-timezone periodic time segments
+    - tests/settings/stz_event_config.yaml # For single-timezone event time segments
+
+    - tests/settings/mtz_frequency_config.yaml # For multi-timezone frequency time segments
+    - tests/settings/mtz_periodic_config.yaml # For multi-timezone periodic time segments
+    - tests/settings/mtz_event_config.yaml # For multi-timezone event time segments
+    ```
+??? check "**Run the pipeline and tests.**"
+    1. You can run all six segment pipelines and their tests
+    ```bash
+    bash tests/scripts/run_tests.sh -t all
+    ```
+    2. You can run only the pipeline of a specific time segment and its tests
+    ```bash
+    bash tests/scripts/run_tests.sh -t stz_frequency -a both # swap stz_frequency for mtz_frequency, stz_event, mtz_event, etc
+    ```
+    2. Or, if you are working on your tests and you want to run a pipeline and its tests independently
+    ```bash
+    bash tests/scripts/run_tests.sh -t stz_frequency -a run
+    bash tests/scripts/run_tests.sh -t stz_frequency -a test
+    ```
+
+    ??? hint "How does the test execution work?"
+        This bash script `tests/scripts/run_tests.sh` executes one or all test pipelines for different time segment types (`frequency`, `periodic`, and `events`) and single or multiple timezones.
+
+        The python script `tests/scripts/run_tests.py` runs the tests. It parses the involved participants and active sensor providers in the `config.yaml` file of the time segment type and time zone being tested. We test that the output file we expect exists and that its content matches the expected values.
+
+    ??? example "Output Example"
+        The following is a snippet of the output you should see after running your test.
+
+        ```bash
+        test_sensors_files_exist (test_sensor_features.TestSensorFeatures) ... stz_periodic
+        ok
+        test_sensors_features_calculations (test_sensor_features.TestSensorFeatures) ... stz_periodic
+        ok
+
+        test_sensors_files_exist (test_sensor_features.TestSensorFeatures) ... stz_frequency
+        ok
+        test_sensors_features_calculations (test_sensor_features.TestSensorFeatures) ... stz_frequency
+        FAIL
+        ```
+
+        The results above show that the for stz_periodic, both `test_sensors_files_exist` and `test_sensors_features_calculations` passed. While for stz_frequency, the first test `test_sensors_files_exist` passed while `test_sensors_features_calculations` failed. Additionally, you should get the traceback of the failure (not shown here).
--- a/docs/developers/validation-schema-config.md
+++ b/docs/developers/validation-schema-config.md
@ -0,0 +1,175 @@
+# Validation schema of `config.yaml`
+
+!!! hint "Why do we need to validate the `config.yaml`?"
+    Most of the key/values in the `config.yaml` are constrained to a set of possible values or types. For example `[TIME_SEGMENTS][TYPE]` can only be one of `["FREQUENCY", "PERIODIC", "EVENT"]`, and `[TIMEZONE]` has to be a string. 
+    
+    We should show the user an error if that's not the case. We could validate this in Python or R but since we reuse scripts and keys in multiple places, tracking these validations can be time consuming and get out of control. Thus, we do these validations through a schema and check that schema before RAPIDS starts processing any data so the user can see the error right away.
+
+    Keep in mind these validations can only cover certain base cases. Some validations that require more complex logic should still be done in the respective script. For example, we can check that a CSV file path actually ends in `.csv` but we can only check that the file actually exists in a Python script.
+ 
+The structure and values of the `config.yaml` file are validated using a YAML schema stored in `tools/config.schema.yaml`. Each key in `config.yaml`, for example `PIDS`, has a corresponding entry in the schema where we can validate its type, possible values, required properties, min and max values, among other things. 
+
+The `config.yaml` is validated against the schema every time RAPIDS runs (see the top of the `Snakefile`):
+
+```python
+validate(config, "tools/config.schema.yaml")
+```
+
+## Structure of the schema
+
+The schema has three main sections `required`, `definitions`, and `properties`. All of them are just nested key/value YAML pairs, where the value can be a primitive type (`integer`, `string`, `boolean`, `number`) or can be another key/value pair (`object`).
+
+### required
+`required` lists `properties` that should be present in the `config.yaml`. We will almost always add every `config.yaml` key to this list (meaning that the user cannot delete any of those keys like `TIMEZONE` or `PIDS`). 
+
+### definitions
+`definitions` lists key/values that are common to different `properties` so we can reuse them. You can define a key/value under `definitions` and use `$ref` to refer to it in any `property`. 
+
+For example, every sensor like `[PHONE_ACCELEROMETER]` has one or more providers like `RAPIDS` and `PANDA`, these providers have some common properties like the `COMPUTE` flag or the `SRC_SCRIPT` string. Therefore we define a shared provider "template" that is used by every provider and extended with properties exclusive to each one of them. For example:
+
+=== "provider definition (template)"
+    The `PROVIDER` definition will be used later on different `properties`.
+
+    ```yaml
+    PROVIDER:
+        type: object
+        required: [COMPUTE, SRC_SCRIPT, FEATURES]
+        properties:
+        COMPUTE:
+            type: boolean
+        FEATURES:
+            type: [array, object]
+        SRC_SCRIPT:
+            type: string
+            pattern: "^.*\\.(py|R)$"
+    ```
+
+=== "provider reusing and extending the template"
+    Notice that `RAPIDS` (a provider) uses and extends the `PROVIDER` template in this example. The `FEATURES` key is overriding the `FEATURES` key from the `#/definitions/PROVIDER` template but is keeping the validation for `COMPUTE`, and `SRC_SCRIPT`. For more details about reusing properties, go to this [link](http://json-schema.org/understanding-json-schema/structuring.html#reuse)
+
+    ```yaml hl_lines="9 10"
+    PHONE_ACCELEROMETER:
+        type: object
+         # .. other properties
+        PROVIDERS:
+            type: ["null", object]
+            properties:
+            RAPIDS:
+                allOf:
+                - $ref: "#/definitions/PROVIDER"
+                - properties:
+                    FEATURES: 
+                        type: array
+                        uniqueItems: True
+                        items:
+                        type: string
+                        enum: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
+    ```
+
+
+
+### properties
+
+`properties` are nested key/values that describe the different components of our `config.yaml` file. Values can be of one or more primitive types like `string`, `number`, `array`, `boolean` and `null`. Values can also be another key/value pair (of type `object`) that are similar to a dictionary in Python.
+
+For example, the following property validates the `PIDS` of our `config.yaml`. It checks that `PIDS` is an `array` with unique items of type `string`.
+
+```yaml
+PIDS:
+    type: array
+    uniqueItems: True
+    items:
+      type: string
+```
+
+## Modifying the schema
+
+!!! hint "Validating the `config.yaml` during development"
+    If you updated the schema and want to check the `config.yaml` is compliant, you can run the command `snakemake --list-params-changes`. You will see `Building DAG of jobs...` if there are no problems or an error message otherwise (try setting any `COMPUTE` flag to a string like `test` instead of `False/True`).
+    
+    You can use this command without having to configure RAPIDS to process any participants or sensors.
+
+You can validate different aspects of each key/value in our `config.yaml` file:
+
+=== "number/integer"
+    Including min and max values
+    ```yaml
+    MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS:
+        type: number
+        minimum: 0
+        maximum: 1
+
+    FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD:
+        type: integer
+        exclusiveMinimum: 0
+    ```
+=== "string"
+    Including valid values (`enum`)
+    ```yaml
+    items:
+        type: string
+        enum: ["count", "maxlux", "minlux", "avglux", "medianlux", "stdlux"]
+    ```
+=== "boolean"
+    ```yaml
+    MINUTES_DATA_USED:
+        type: boolean
+    ```
+=== "array"
+    Including whether or not it should have unique values, the type of the array's elements (`strings`, `numbers`) and valid values (`enum`).
+    ```yaml
+    MESSAGES_TYPES:
+        type: array
+        uniqueItems: True
+        items:
+            type: string
+            enum: ["received", "sent"]
+    ```
+=== "object"
+    `PARENT` is an object that has two properties. `KID1` is one of those properties that are, in turn, another object that will reuse the  `"#/definitions/PROVIDER"` `definition` **AND** also include (extend) two extra properties `GRAND_KID1` of type `array` and `GRAND_KID2` of type `number`. `KID2` is another property of `PARENT` of type `boolean`.
+
+    The schema validation looks like this
+    ```yaml
+    PARENT:
+        type: object
+        properties:
+          KID1:
+            allOf:
+              - $ref: "#/definitions/PROVIDER"
+              - properties:
+                  GRAND_KID1:
+                    type: array
+                    uniqueItems: True
+                  GRAND_KID2:
+                    type: number
+          KID2:
+            type: boolean
+    ```
+
+    The `config.yaml` key that the previous schema validates looks like this:
+    ```yaml
+    PARENT:
+        KID1:
+            # These four come from the `PROVIDER` definition (template)
+            COMPUTE: False
+            FEATURES: [x, y] # an array
+            SRC_SCRIPT: "a path to a py or R script"
+
+            # This two come from the extension
+            GRAND_KID1: [a, b] # an array
+            GRAND_KID2: 5.1 # an number
+         KID2: True # a boolean
+    ```
+
+## Verifying the schema is correct
+We recommend that before you start modifying the schema you modify the `config.yaml` key that you want to validate with an invalid value. For example, if you want to validate that `COMPUTE` is boolean, you set `COMPUTE: 123`. Then create your validation, run `snakemake --list-params-changes` and make sure your validation fails (123 is not `boolean`), and then set the key to the correct value. In other words, make sure it's broken first so that you know that your validation works.
+
+!!! warning
+    **Be careful**. You can check that the schema `config.schema.yaml` has a valid format by running `python tools/check_schema.py`. You will see this message if its structure is correct: `Schema is OK`. However, we don't have a way to detect typos, for example `allOf` will work but `allOF` won't (capital `F`) and it won't show any error. That's why we recommend to start with an invalid key/value in your `config.yaml` so that you can be sure the schema validation finds the problem.
+
+## Useful resources
+
+Read the following links to learn more about what we can validate with schemas. They are based on `JSON` instead of `YAML` schemas but the same concepts apply.
+
+- [Understanding JSON Schemas](http://json-schema.org/understanding-json-schema/index.html)
+- [Specification of the JSON schema we use](https://tools.ietf.org/html/draft-handrews-json-schema-01)
--- a/docs/developers/virtual-environments.md
+++ b/docs/developers/virtual-environments.md
@ -0,0 +1,43 @@
+## Python Virtual Environment
+
+### Add new packages
+Try to install any new package using `conda install -c CHANNEL PACKAGE_NAME` (you can use `pip` if the package is only available there). Make sure your Python virtual environment is active (`conda activate YOUR_ENV`).
+
+### Remove packages
+Uninstall packages using the same manager you used to install them `conda remove PACKAGE_NAME` or `pip uninstall PACKAGE_NAME`
+
+### Updating all packages
+Make sure your Python virtual environment is active (`conda activate YOUR_ENV`), then run
+```bash
+conda update --all
+```
+
+### Update your conda `environment.yaml`
+After installing or removing a package you can use the following command in your terminal to update your `environment.yaml` before publishing your pipeline. Note that we ignore the package version for `libfortran` and `mkl` to keep compatibility with Linux:
+```bash
+conda env export --no-builds | sed 's/^.*libgfortran.*$/  - libgfortran/' | sed 's/^.*mkl=.*$/  - mkl/' >  environment.yml
+```
+
+## R Virtual Environment
+
+### Add new packages
+1. Open your terminal and navigate to RAPIDS' root folder
+2. Run `R` to open an R interactive session
+3. Run `renv::install("PACKAGE_NAME")`
+
+### Remove packages
+1. Open your terminal and navigate to RAPIDS' root folder
+2. Run `R` to open an R interactive session
+3. Run `renv::remove("PACKAGE_NAME")`
+
+### Updating all packages
+1. Open your terminal and navigate to RAPIDS' root folder
+2. Run `R` to open an R interactive session
+3. Run `renv::update()`
+### Update your R `renv.lock`
+After installing or removing a package you can use the following command in your terminal to update your `renv.lock` before publishing your pipeline.
+
+1. Open your terminal and navigate to RAPIDS' root folder
+2. Run `R` to open an R interactive session
+3. Run `renv::snapshot()` (renv will ask you to confirm any updates to this file)
+
--- a/docs/features/add-new-features.md
+++ b/docs/features/add-new-features.md
@ -0,0 +1,183 @@
+# Add New Features
+
+!!! hint
+    - We recommend reading the [Behavioral Features Introduction](../feature-introduction/) before reading this page.
+    - You can implement new features in Python or R scripts.
+    - You won't have to deal with time zones, dates, times, data cleaning, or preprocessing. The data that RAPIDS pipes to your feature extraction code are ready to process.
+
+## New Features for Existing Sensors
+
+You can add new features to any existing sensors (see list below) by adding a new provider in three steps:
+
+1. [Modify](#modify-the-configyaml-file) the `config.yaml` file 
+2. [Create](#create-a-feature-provider-script) your feature provider script
+3. [Implement](#implement-your-feature-extraction-code) your features extraction code
+   
+As a tutorial, we will add a new provider for `PHONE_ACCELEROMETER` called `VEGA` that extracts `feature1`, `feature2`, `feature3` with a Python script that requires a parameter from the user called `MY_PARAMETER`.
+
+??? info "Existing Sensors"
+    An existing sensor of any device with a configuration entry in `config.yaml`:
+
+    Smartphone (AWARE)
+
+    - Phone Accelerometer
+    - Phone Activity Recognition
+    - Phone Applications Crashes
+    - Phone Applications Foreground
+    - Phone Applications Notifications
+    - Phone Battery
+    - Phone Bluetooth
+    - Phone Calls
+    - Phone Conversation
+    - Phone Data Yield
+    - Phone Keyboard
+    - Phone Light
+    - Phone Locations
+    - Phone Log
+    - Phone Messages
+    - Phone Screen
+    - Phone WiFI Connected
+    - Phone WiFI Visible
+
+    Fitbit
+
+    - Fitbit Data Yield
+    - Fitbit Heart Rate Summary
+    - Fitbit Heart Rate Intraday
+    - Fitbit Sleep Summary
+    - Fitbit Sleep Intraday
+    - Fitbit Steps Summary
+    - Fitbit Steps Intraday
+
+    Empatica
+
+    - Empatica Accelerometer
+    - Empatica Heart Rate
+    - Empatica Temperature
+    - Empatica Electrodermal Activity
+    - Empatica Blood Volume Pulse
+    - Empatica Inter Beat Interval
+    - Empatica Tags
+
+
+### Modify the `config.yaml` file
+
+In this step, you need to add your provider configuration section under the relevant sensor in `config.yaml`. See our example for our tutorial's `VEGA` provider for  `PHONE_ACCELEROMETER`:
+
+??? example "Example configuration for a new accelerometer provider `VEGA`"
+    ```yaml hl_lines="12 13 14 15 16"
+    PHONE_ACCELEROMETER:
+        CONTAINER: accelerometer
+        PROVIDERS:
+            RAPIDS: # this is a feature provider
+                COMPUTE: False
+                ...
+            
+            PANDA: # this is another feature provider
+                COMPUTE: False
+                ...
+
+            VEGA: # this is our new feature provider
+                COMPUTE: False
+                FEATURES: ["feature1", "feature2", "feature3"]
+                MY_PARAMTER: a_string
+                SRC_SCRIPT: src/features/phone_accelerometer/vega/main.py
+            
+    ```
+
+| Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description
+|---|---|
+|`[COMPUTE]`| Flag to activate/deactivate your provider
+|`[FEATURES]`| List of features your provider supports. Your provider code should only return the features on this list
+|`[MY_PARAMTER]`| An arbitrary parameter that our example provider `VEGA` needs. This can be a boolean, integer, float, string, or an array of any of such types.
+|`[SRC_SCRIPT]`| The relative path from RAPIDS' root folder to a script that computes the features for this provider. It can be implemented in R or Python.
+
+### Create a feature provider script
+
+Create your feature Python or R script called `main.py` or `main.R` in the correct folder, `src/feature/[sensorname]/[providername]/`. RAPIDS automatically loads and executes it based on the config key `[SRC_SCRIPT]` you added in the last step. For our example, this script is:
+```bash
+src/feature/phone_accelerometer/vega/main.py
+```
+
+### Implement your feature extraction code
+Every feature script (`main.[py|R]`) needs a `[providername]_features` function with specific parameters. RAPIDS calls this function with the sensor data ready to process and with other functions and arguments you will need.
+
+=== "Python function"
+    ```python
+    def [providername]_features(sensor_data_files, time_segment, provider, filter_data_by_segment, *args, **kwargs):
+        # empty for now
+        return(your_features_df)
+    ```
+
+=== "R function"
+    ```r
+    [providername]_features <- function(sensor_data, time_segment, provider){
+        # empty for now
+        return(your_features_df)
+    }
+    ```
+
+| Parameter&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description
+|---|---|
+|`sensor_data_files`| Path to the CSV file containing the data of a single participant. This data has been cleaned and preprocessed. Your function will be automatically called for each participant in your study (in the `[PIDS]` array in `config.yaml`) 
+|`time_segment`| The label of the time segment that should be processed.
+|`provider`| The parameters you configured for your provider in `config.yaml` will be available in this variable as a dictionary in Python or a list in R. In our example, this dictionary contains `{MY_PARAMETER:"a_string"}`
+|`filter_data_by_segment`| Python only. A function that you will use to filter your data. In R, this function is already available in the environment.
+|`*args`| Python only. Not used for now
+|`**kwargs`| Python only. Not used for now
+
+
+The next step is to implement the code that computes your behavioral features in your provider script's function. As with any other script, this function can call other auxiliary methods, but in general terms, it should have three stages:
+
+??? info "1. Read a participant's data by loading the CSV data stored in the file pointed by `sensor_data_files`"
+    ``` python
+    acc_data = pd.read_csv(sensor_data_files["sensor_data"])
+    ```
+
+    Note that the phone's battery, screen, and activity recognition data are given as episodes instead of event rows (for example, start and end timestamps of the periods the phone screen was on)
+
+
+??? info "2. Filter your data to process only those rows that belong to `time_segment`"
+
+    This step is only one line of code, but keep reading to understand why we need it.
+    ```python
+    acc_data = filter_data_by_segment(acc_data, time_segment)
+    ```
+
+    You should use the `filter_data_by_segment()` function to process and group those rows that belong to each of the [time segments RAPIDS could be configured with](../../setup/configuration/#time-segments).
+
+    Let's understand the `filter_data_by_segment()` function with an example. A RAPIDS user can extract features on any arbitrary [time segment](../../setup/configuration/#time-segments). A time segment is a period that has a label and one or more instances. For example, the user (or you) could have requested features on a daily, weekly, and weekend basis for `p01`. The labels are arbitrary, and the instances depend on the days a participant was monitored for: 
+
+     - the daily segment could be named `my_days` and if `p01` was monitored for 14 days, it would have 14 instances
+     - the weekly segment could be named `my_weeks` and if `p01` was monitored for 14 days, it would have 2 instances.
+     - the weekend segment could be named `my_weekends` and if `p01` was monitored for 14 days, it would have 2 instances.
+    
+    For this example, RAPIDS will call your provider function three times for `p01`, once where `time_segment` is `my_days`, once where `time_segment` is `my_weeks`, and once where `time_segment` is `my_weekends`. In this example, not every row in `p01`'s data needs to take part in the feature computation for either segment **and** the rows need to be grouped differently. 
+    
+    Thus `filter_data_by_segment()` comes in handy, it will return a data frame that contains the rows that were logged during a time segment plus an extra column called `local_segment`. This new column will have as many unique values as time segment instances exist (14, 2, and 2 for our `p01`'s `my_days`, `my_weeks`, and `my_weekends` examples). After filtering, **you should group the data frame by this column and compute any desired features**, for example:
+
+    ```python
+    acc_features["maxmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].max()
+    ```
+
+    The reason RAPIDS does not filter the participant's data set for you is because your code might need to compute something based on a participant's complete dataset before computing their features. For example, you might want to identify the number that called a participant the most throughout the study before computing a feature with the number of calls the participant received from that number.
+
+??? info "3. Return a data frame with your features"
+    After filtering, grouping your data, and computing your features, your provider function should return a data frame that has:
+    
+    -  One row per time segment instance (e.g., 14 our `p01`'s `my_days` example)
+    -  The `local_segment` column added by `filter_data_by_segment()`
+    -  One column per feature. The name of your features should only contain letters or numbers (`feature1`) by convention. RAPIDS automatically adds the correct sensor and provider prefix; in our example, this prefix is `phone_accelerometr_vega_`.
+
+??? example "`PHONE_ACCELEROMETER` Provider Example"
+    For your reference, this our own provider (`RAPIDS`) for `PHONE_ACCELEROMETER` that computes five acceleration features
+
+    ```python
+
+    --8<---- "src/features/phone_accelerometer/rapids/main.py"
+
+    ```
+
+## New Features for Non-Existing Sensors
+
+If you want to add features for a device or a sensor that we do not support at the moment (those that do not appear in the `"Existing Sensors"` list above), [open a new discussion](https://github.com/carissalow/rapids/discussions) in Github and we can add the necessary code so you can follow the instructions above.
--- a/docs/features/empatica-accelerometer.md
+++ b/docs/features/empatica-accelerometer.md
@ -0,0 +1,42 @@
+# Empatica Accelerometer
+
+Sensor parameters description for `[EMPATICA_ACCELEROMETER]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Name of the CSV file containing accelerometer data that is compressed inside an Empatica zip file. Since these zip files are created [automatically](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) by Empatica, there is no need to change the value of this attribute.
+
+## DBDP provider
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/empatica_accelerometer_raw.csv
+    - data/raw/{pid}/empatica_accelerometer_with_datetime.csv
+    - data/interim/{pid}/empatica_accelerometer_features/empatica_accelerometer_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/empatica_accelerometer.csv
+    ```
+
+
+Parameters description for `[EMPATICA_ACCELEROMETER][PROVIDERS][DBDP]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`| Set to `True` to extract `EMPATICA_ACCELEROMETER` features from the `DBDP` provider|
+|`[FEATURES]` |         Features to be computed, see table below
+
+
+Features description for `[EMPATICA_ACCELEROMETER][PROVIDERS][RAPDBDPIDS]`:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+|maxmagnitude      |m/s^2^    |The maximum magnitude of acceleration ($\|acceleration\| = \sqrt{x^2 + y^2 + z^2}$).
+|minmagnitude      |m/s^2^    |The minimum magnitude of acceleration.
+|avgmagnitude      |m/s^2^    |The average magnitude of acceleration.
+|medianmagnitude   |m/s^2^    |The median magnitude of acceleration.
+|stdmagnitude      |m/s^2^    |The standard deviation of acceleration.
+
+!!! note "Assumptions/Observations"
+    1. Analyzing accelerometer data is a memory intensive task. If RAPIDS crashes is likely because the accelerometer dataset for a participant is too big to fit in memory. We are considering different alternatives to overcome this problem, if this is something you need, get in touch and we can discuss how to implement it.
--- a/docs/features/empatica-blood-volume-pulse.md
+++ b/docs/features/empatica-blood-volume-pulse.md
@ -0,0 +1,46 @@
+# Empatica Blood Volume Pulse
+
+Sensor parameters description for `[EMPATICA_BLOOD_VOLUME_PULSE]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Name of the CSV file containing blood volume pulse data that is compressed inside an Empatica zip file. Since these zip files are created [automatically](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) by Empatica, there is no need to change the value of this attribute.
+
+## DBDP provider
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/empatica_blood_volume_pulse_raw.csv 
+    - data/raw/{pid}/empatica_blood_volume_pulse_with_datetime.csv
+    - data/interim/{pid}/empatica_blood_volume_pulse_features/empatica_blood_volume_pulse_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/empatica_blood_volume_pulse.csv
+    ```
+
+
+Parameters description for `[EMPATICA_BLOOD_VOLUME_PULSE][PROVIDERS][DBDP]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`  | Set to `True` to extract `EMPATICA_BLOOD_VOLUME_PULSE` features from the `DBDP` provider|
+|`[FEATURES]` |         Features to be computed from blood volume pulse intraday data, see table below          |
+
+
+Features description for `[EMPATICA_BLOOD_VOLUME_PULSE][PROVIDERS][DBDP]`:
+
+|Feature                    |Units          |Description|
+|-------------------------- |-------------- |---------------------------|
+|maxbvp                      |-     |The maximum blood volume pulse during a time segment.
+|minbvp                      |-     |The minimum blood volume pulse during a time segment.
+|avgbvp                      |-     |The average blood volume pulse during a time segment.
+|medianbvp                   |-     |The median of blood volume pulse during a time segment.
+|modebvp                     |-     |The mode of blood volume pulse during a time segment.
+|stdbvp                      |-     |The standard deviation of blood volume pulse during a time segment.
+|diffmaxmodebvp              |-     |The difference between the maximum and mode blood volume pulse during a time segment.
+|diffminmodebvp              |-     |The difference between the mode and minimum blood volume pulse during a time segment.
+|entropybvp                  |nats           |Shannon’s entropy measurement based on blood volume pulse during a time segment.
+
+!!! note "Assumptions/Observations"
+    For more information about BVP read [this](https://support.empatica.com/hc/en-us/articles/360029719792-E4-data-BVP-expected-signal).
--- a/docs/features/empatica-electrodermal-activity.md
+++ b/docs/features/empatica-electrodermal-activity.md
@ -0,0 +1,46 @@
+# Empatica Electrodermal Activity
+
+Sensor parameters description for `[EMPATICA_ELECTRODERMAL_ACTIVITY]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Name of the CSV file containing electrodermal activity data that is compressed inside an Empatica zip file. Since these zip files are created [automatically](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) by Empatica, there is no need to change the value of this attribute.
+
+## DBDP provider
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/empatica_electrodermal_activity_raw.csv
+    - data/raw/{pid}/empatica_electrodermal_activity_with_datetime.csv
+    - data/interim/{pid}/empatica_electrodermal_activity_features/empatica_electrodermal activity_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/empatica_electrodermal_activity.csv
+    ```
+
+
+Parameters description for `[EMPATICA_ELECTRODERMAL_ACTIVITY][PROVIDERS][DBDP]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`  | Set to `True` to extract `EMPATICA_ELECTRODERMAL_ACTIVITY` features from the `DBDP` provider|
+|`[FEATURES]` |         Features to be computed from electrodermal activity intraday data, see table below          |
+
+
+Features description for `[EMPATICA_ELECTRODERMAL ACTIVITY][PROVIDERS][DBDP]`:
+
+|Feature                    |Units          |Description|
+|-------------------------- |-------------- |---------------------------|
+|maxeda                      |microsiemens     |The maximum electrical conductance during a time segment.
+|mineda                      |microsiemens     |The minimum electrical conductance during a time segment.
+|avgeda                      |microsiemens     |The average electrical conductance during a time segment.
+|medianeda                   |microsiemens     |The median of electrical conductance during a time segment.
+|modeeda                     |microsiemens     |The mode of electrical conductance during a time segment.
+|stdeda                      |microsiemens     |The standard deviation of electrical conductance during a time segment.
+|diffmaxmodeeda              |microsiemens     |The difference between the maximum and mode electrical conductance during a time segment.
+|diffminmodeeda              |microsiemens     |The difference between the mode and minimum electrical conductance during a time segment.
+|entropyeda                  |nats           |Shannon’s entropy measurement based on electrical conductance during a time segment.
+
+!!! note "Assumptions/Observations"
+    None
--- a/docs/features/empatica-heartrate.md
+++ b/docs/features/empatica-heartrate.md
@ -0,0 +1,46 @@
+# Empatica Heart Rate
+
+Sensor parameters description for `[EMPATICA_HEARTRATE]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Name of the CSV file containing heart rate data that is compressed inside an Empatica zip file. Since these zip files are created [automatically](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) by Empatica, there is no need to change the value of this attribute.
+
+## DBDP provider
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/empatica_heartrate_raw.csv
+    - data/raw/{pid}/empatica_heartrate_with_datetime.csv
+    - data/interim/{pid}/empatica_heartrate_features/empatica_heartrate_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/empatica_heartrate.csv
+    ```
+
+
+Parameters description for `[EMPATICA_HEARTRATE][PROVIDERS][DBDP]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`  | Set to `True` to extract `EMPATICA_HEARTRATE` features from the `DBDP` provider|
+|`[FEATURES]` |         Features to be computed from heart rate intraday data, see table below          |
+
+
+Features description for `[EMPATICA_HEARTRATE][PROVIDERS][DBDP]`:
+
+|Feature                    |Units          |Description|
+|-------------------------- |-------------- |---------------------------|
+|maxhr                      |beats     |The maximum heart rate during a time segment.
+|minhr                      |beats     |The minimum heart rate during a time segment.
+|avghr                      |beats     |The average heart rate during a time segment.
+|medianhr                   |beats     |The median of heart rate during a time segment.
+|modehr                     |beats     |The mode of heart rate during a time segment.
+|stdhr                      |beats     |The standard deviation of heart rate during a time segment.
+|diffmaxmodehr              |beats     |The difference between the maximum and mode heart rate during a time segment.
+|diffminmodehr              |beats     |The difference between the mode and minimum heart rate during a time segment.
+|entropyhr                  |nats           |Shannon’s entropy measurement based on heart rate during a time segment.
+
+!!! note "Assumptions/Observations"
+    We extract the previous features based on the average heart rate values computed in [10-second windows](https://support.empatica.com/hc/en-us/articles/360029469772-E4-data-HR-csv-explanation).
--- a/docs/features/empatica-inter-beat-interval.md
+++ b/docs/features/empatica-inter-beat-interval.md
@ -0,0 +1,46 @@
+# Empatica Inter Beat Interval
+
+Sensor parameters description for `[EMPATICA_INTER_BEAT_INTERVAL]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Name of the CSV file containing inter beat interval data that is compressed inside an Empatica zip file. Since these zip files are created [automatically](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) by Empatica, there is no need to change the value of this attribute.
+
+## DBDP provider
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/empatica_inter_beat_interval_raw.csv
+    - data/raw/{pid}/empatica_inter_beat_interval_with_datetime.csv
+    - data/interim/{pid}/empatica_inter_beat_interval_features/empatica_inter_beat_interval_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/empatica_inter_beat_interval.csv
+    ```
+
+
+Parameters description for `[EMPATICA_INTER_BEAT_INTERVAL][PROVIDERS][DBDP]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`  | Set to `True` to extract `EMPATICA_INTER_BEAT_INTERVAL` features from the `DBDP` provider|
+|`[FEATURES]` |         Features to be computed from inter beat interval intraday data, see table below          |
+
+
+Features description for `[EMPATICA_INTER_BEAT_INTERVAL][PROVIDERS][DBDP]`:
+
+|Feature                    |Units          |Description|
+|-------------------------- |-------------- |---------------------------|
+|maxibi                      |seconds     |The maximum inter beat interval during a time segment.
+|minibi                      |seconds     |The minimum inter beat interval during a time segment.
+|avgibi                      |seconds     |The average inter beat interval during a time segment.
+|medianibi                   |seconds     |The median of inter beat interval during a time segment.
+|modeibi                     |seconds     |The mode of inter beat interval during a time segment.
+|stdibi                      |seconds     |The standard deviation of inter beat interval during a time segment.
+|diffmaxmodeibi              |seconds     |The difference between the maximum and mode inter beat interval during a time segment.
+|diffminmodeibi              |seconds     |The difference between the mode and minimum inter beat interval during a time segment.
+|entropyibi                  |nats           |Shannon’s entropy measurement based on inter beat interval during a time segment.
+
+!!! note "Assumptions/Observations"
+    For more information about IBI read [this](https://support.empatica.com/hc/en-us/articles/360030058011-E4-data-IBI-expected-signal).
--- a/docs/features/empatica-tags.md
+++ b/docs/features/empatica-tags.md
@ -0,0 +1,11 @@
+# Empatica Tags
+
+Sensor parameters description for `[EMPATICA_TAGS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Name of the CSV file containing tags data that is compressed inside an Empatica zip file. Since these zip files are created [automatically](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) by Empatica, there is no need to change the value of this attribute.
+
+!!! Note
+    - No feature providers have been implemented for this sensor yet, however you can [implement your own features](../add-new-features).
+    - To know more about tags read [this](https://support.empatica.com/hc/en-us/articles/204578699-Event-Marking-with-the-E4-wristband).
--- a/docs/features/empatica-temperature.md
+++ b/docs/features/empatica-temperature.md
@ -0,0 +1,46 @@
+# Empatica Temperature
+
+Sensor parameters description for `[EMPATICA_TEMPERATURE]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Name of the CSV file containing temperature data that is compressed inside an Empatica zip file. Since these zip files are created [automatically](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) by Empatica, there is no need to change the value of this attribute.
+
+## DBDP provider
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/empatica_temperature_raw.csv
+    - data/raw/{pid}/empatica_temperature_with_datetime.csv
+    - data/interim/{pid}/empatica_temperature_features/empatica_temperature_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/empatica_temperature.csv
+    ```
+
+
+Parameters description for `[EMPATICA_TEMPERATURE][PROVIDERS][DBDP]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`  | Set to `True` to extract `EMPATICA_TEMPERATURE` features from the `DBDP` provider|
+|`[FEATURES]` |         Features to be computed from temperature intraday data, see table below          |
+
+
+Features description for `[EMPATICA_TEMPERATURE][PROVIDERS][DBDP]`:
+
+|Feature                    |Units          |Description|
+|-------------------------- |-------------- |---------------------------|
+|maxtemp                      |degrees C     |The maximum temperature during a time segment.
+|mintemp                      |degrees C     |The minimum temperature during a time segment.
+|avgtemp                      |degrees C     |The average temperature during a time segment.
+|mediantemp                   |degrees C     |The median of temperature during a time segment.
+|modetemp                     |degrees C     |The mode of temperature during a time segment.
+|stdtemp                      |degrees C     |The standard deviation of temperature during a time segment.
+|diffmaxmodetemp              |degrees C     |The difference between the maximum and mode temperature during a time segment.
+|diffminmodetemp              |degrees C     |The difference between the mode and minimum temperature during a time segment.
+|entropytemp                  |nats           |Shannon’s entropy measurement based on temperature during a time segment.
+
+!!! note "Assumptions/Observations"
+    None
--- a/docs/features/extracted.rst
+++ b/docs/features/extracted.rst
--- a/docs/features/feature-introduction.md
+++ b/docs/features/feature-introduction.md
@ -0,0 +1,45 @@
+# Behavioral Features Introduction
+
+A behavioral feature is a metric computed from raw sensor data quantifying the behavior of a participant. For example, the time spent at home computed based on location data. These are also known as digital biomarkers. 
+
+RAPIDS' `config.yaml` has a section for each supported device/sensor (e.g., `PHONE_ACCELEROMETER`, `FITBIT_STEPS`, `EMPATICA_HEARTRATE`). These sections follow a similar structure, and they can have one or more feature `PROVIDERS`, that compute one or more behavioral features.  You will modify the parameters of these `PROVIDERS` to obtain features from different mobile sensors. We'll use `PHONE_ACCELEROMETER` as an example to explain this further.
+
+!!! hint
+    - We recommend reading this page if you are using RAPIDS for the first time
+    - All computed sensor features are stored under `/data/processed/features` on files per sensor, per participant and per study (all participants).
+    - Every time you change any sensor parameters, provider parameters or provider features, all the necessary files will be updated as soon as you execute RAPIDS.
+    - In short, to extract features offered by a provider, you need to set its `[COMPUTE]` flag to `TRUE`, configure any of its parameters, and [execute](../../setup/execution) RAPIDS.
+
+
+### Explaining the config.yaml sensor sections with an example
+
+Each sensor section follows the same structure. Click on the numbered markers to know more.
+
+``` { .yaml .annotate }
+PHONE_ACCELEROMETER: # (1)
+
+    CONTAINER: accelerometer # (2)
+
+    PROVIDERS: # (3)
+        RAPIDS:
+            COMPUTE: False # (4)
+            FEATURES: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
+
+            SRC_SCRIPT: src/features/phone_accelerometer/rapids/main.py
+        
+        PANDA:
+            COMPUTE: False
+            VALID_SENSED_MINUTES: False
+            FEATURES: # (5)
+                exertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
+                nonexertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
+
+                        # (6)
+            SRC_SCRIPT: src/features/phone_accelerometer/panda/main.py
+```
+
+--8<--- "docs/snippets/feature_introduction_example.md"
+
+These are the descriptions of each marker for accessibility:
+
+--8<--- "docs/snippets/feature_introduction_example.md"
--- a/docs/features/fitbit-calories-intraday.md
+++ b/docs/features/fitbit-calories-intraday.md
@ -0,0 +1,68 @@
+# Fitbit Calories Intraday
+
+Sensor parameters description for `[FITBIT_CALORIES_INTRADAY]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Container where your calories intraday data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. |
+
+
+## RAPIDS provider
+
+!!! info "Available time segments"
+    - Available for all time segments
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/fitbit_calories_intraday_raw.csv
+    - data/raw/{pid}/fitbit_calories_intraday_with_datetime.csv
+    - data/interim/{pid}/fitbit_calories_intraday_features/fitbit_calories_intraday_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/fitbit_calories_intraday.csv
+    ```
+
+
+Parameters description for `[FITBIT_CALORIES_INTRADAY][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`  | Set to `True` to extract `FITBIT_CALORIES_INTRADAY` features from the `RAPIDS` provider|
+|`[FEATURES]` |         Features to be computed from calories intraday data, see table below          |
+|`[EPISODE_TYPE]` |    RAPIDS will compute features for any episodes in this list. There are seven types of episodes defined as consecutive appearances of a label. Four are based on the activity level labels provided by Fitbit: `sedentary`, `lightly active`, `fairly active`, and `very active`. One is defined by RAPIDS as moderate to vigorous physical activity `MVPA` episodes that are based on all `fairly active`, and `very active`  labels. Two are defined by the user based on a threshold that divides low or high MET (metabolic equivalent) episodes.        |
+|`EPISODE_TIME_THRESHOLD` | Any consecutive rows of the same `[EPISODE_TYPE]` will be considered a single episode if the time difference between them is less or equal than this threshold in minutes|
+|`[EPISODE_MET_THRESHOLD]` |    Any 1-minute calorie data chunk with a MET value equal or higher than this threshold will be considered a high MET episode and low MET otherwise.  The default value is 3|
+|`[EPISODE_MVPA_CATEGORIES]` |    The Fitbit level labels that are considered part of a moderate to vigorous physical activity episode. One or more of `sedentary`, `lightly active`, `fairly active`, and `very active`. The default are `fairly active` and `very active`|
+|`[EPISODE_REFERENCE_TIME]` |   Reference time for the start/end time features. `MIDNIGHT` sets the reference time to 00:00 of each day, `START_OF_THE_SEGMENT` sets the reference time to the start of the time segment (useful when a segment is shorter than a day or spans multiple days)|
+
+
+Features description for `[FITBIT_CALORIES_INTRADAY][PROVIDERS][RAPIDS]`:
+
+|Feature&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+|starttimefirstepisode`EPISODE_TYPE`               |minutes     |Start time of the first episode of type `[EPISODE_TYPE]`
+|endtimefirstepisode`EPISODE_TYPE`               |minutes     |End time of the first episode of type `[EPISODE_TYPE]`
+|starttimelastepisode`EPISODE_TYPE`               |minutes     |Start time of the last episode of type `[EPISODE_TYPE]`
+|endtimelastepisode`EPISODE_TYPE`               |minutes     |End time of the last episode of type `[EPISODE_TYPE]`
+|starttimelongestepisode`EPISODE_TYPE`               |minutes     |Start time of the longest episode of type `[EPISODE_TYPE]`
+|endtimelongestepisode`EPISODE_TYPE`               |minutes     |End time of the longest episode of type `[EPISODE_TYPE]`
+|countepisode`EPISODE_TYPE`               |episodes     |The number of episodes of type `[EPISODE_TYPE]`
+|sumdurationepisode`EPISODE_TYPE`               |minutes     |The sum of the duration of episodes of type `[EPISODE_TYPE]`
+|avgdurationepisode`EPISODE_TYPE`               |minutes     |The average of the duration of episodes of type `[EPISODE_TYPE]`
+|maxdurationepisode`EPISODE_TYPE`               |minutes     |The maximum of the duration of episodes of type `[EPISODE_TYPE]`
+|mindurationepisode`EPISODE_TYPE`               |minutes     |The minimum of the duration of episodes of type `[EPISODE_TYPE]`
+|stddurationepisode`EPISODE_TYPE`               |minutes     |The standard deviation of the duration of episodes of type `[EPISODE_TYPE]`
+|summet`EPISODE_TYPE`               |METs     |The sum of all METs during episodes of type `[EPISODE_TYPE]`
+|avgmet`EPISODE_TYPE`               |METs     |The average of all METs during episodes of type `[EPISODE_TYPE]`
+|maxmet`EPISODE_TYPE`               |METs     |The maximum of all METs during episodes of type `[EPISODE_TYPE]`
+|minmet`EPISODE_TYPE`               |METs     |The minimum of all METs during episodes of type `[EPISODE_TYPE]`
+|stdmet`EPISODE_TYPE`               |METs     |The standard deviation of all METs during episodes of type `[EPISODE_TYPE]`
+|sumcalories`EPISODE_TYPE`               |calories     |The sum of all calories during episodes of type `[EPISODE_TYPE]`
+|avgcalories`EPISODE_TYPE`               |calories     |The average of all calories during episodes of type `[EPISODE_TYPE]`
+|maxcalories`EPISODE_TYPE`               |calories     |The maximum of all calories during episodes of type `[EPISODE_TYPE]`
+|mincalories`EPISODE_TYPE`               |calories     |The minimum of all calories during episodes of type `[EPISODE_TYPE]`
+|stdcalories`EPISODE_TYPE`               |calories     |The standard deviation of all calories during episodes of type `[EPISODE_TYPE]`
+
+
+!!! note "Assumptions/Observations"
+    - These features are based on intraday calories data that is usually obtained in 1-minute chunks from Fitbit's API.
+    - The MET value returned by Fitbit is divided by 10
+    - Take into account that the [intraday data returned by Fitbit](https://dev.fitbit.com/build/reference/web-api/activity/#get-activity-intraday-time-series) can contain time series for calories burned inclusive of BMR, tracked activity, and manually logged activities.
--- a/docs/features/fitbit-data-yield.md
+++ b/docs/features/fitbit-data-yield.md
@ -0,0 +1,62 @@
+# Fitbit Data Yield
+
+We use Fitbit **heart rate intraday** data to extract data yield features. Fitbit data yield features can be used to remove rows ([time segments](../../setup/configuration/#time-segments)) that do not contain enough Fitbit data. You should decide what is your "enough" threshold depending on the time a participant was supposed to be wearing their Fitbit, the length of your study, and the rates of missing data that your analysis could handle.
+
+!!! hint "Why is Fitbit data yield important?"
+    Imagine that you want to extract `FITBIT_STEPS_SUMMARY` features on daily segments (`00:00` to `23:59`). Let's say that on day 1 the Fitbit logged 6k as the total step count and the heart rate sensor logged 24 hours of data and on day 2 the Fitbit logged 101 as the total step count and the heart rate sensor logged 2 hours of data. It’s very likely that on day 2 you walked during the other 22 hours so including this day in your analysis could bias your results.
+Sensor parameters description for `[FITBIT_DATA_YIELD]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;          | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[SENSORS]`| The Fitbit sensor we considered for calculating the Fitbit data yield features. We only support `FITBIT_HEARTRATE_INTRADAY` since sleep data is commonly collected only overnight, and step counts are 0 even when not wearing the Fitbit device.
+
+## RAPIDS provider
+
+Before explaining the data yield features, let's define the following relevant concepts:
+
+- A valid minute is any 60 second window when Fitbit heart rate intraday sensor logged at least 1 row of data
+- A valid hour is any 60 minute window with at least X valid minutes. The X or threshold is given by `[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]`
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/fitbit_heartrate_intraday_raw.csv
+    - data/raw/{pid}/fitbit_heartrate_intraday_with_datetime.csv
+    - data/interim/{pid}/fitbit_data_yield_features/fitbit_data_yield_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/fitbit_data_yield.csv
+    ```
+
+
+Parameters description for `[FITBIT_DATA_YIELD][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`| Set to `True` to extract `FITBIT_DATA_YIELD` features from the `RAPIDS` provider|
+|`[FEATURES]` |  Features to be computed, see table below
+|`[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]` | The proportion `[0.0 ,1.0]` of valid minutes in a 60-minute window necessary to flag that window as valid.
+
+
+Features description for `[FITBIT_DATA_YIELD][PROVIDERS][RAPIDS]`:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+|ratiovalidyieldedminutes   |-          | The ratio between the number of valid minutes and the duration in minutes of a time segment.
+|ratiovalidyieldedhours     |-          | The ratio between the number of valid hours and the duration in hours of a time segment. If the time segment is shorter than 1 hour this feature will always be 1.
+
+
+!!! note "Assumptions/Observations"
+    
+    1. We recommend using `ratiovalidyieldedminutes` on time segments that are shorter than two or three hours and `ratiovalidyieldedhours` for longer segments. This is because relying on yielded minutes only can be misleading when a big chunk of those missing minutes are clustered together. 
+    
+        For example, let's assume we are working with a 24-hour time segment that is missing 12 hours of data. Two extreme cases can occur: 
+
+        <ol type="A">
+        <li>the 12 missing hours are from the beginning of the segment or </li>
+        <li>30 minutes could be missing from every hour (24 * 30 minutes = 12 hours).</li>
+        </ol>
+        
+        `ratiovalidyieldedminutes` would be 0.5 for both `a` and `b` (hinting the missing circumstances are similar). However, `ratiovalidyieldedhours` would be 0.5 for `a` and 1.0 for `b` if `[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]` is between [0.0 and 0.49] (hinting that the missing circumstances might be more favorable for `b`. In other words, sensed data for `b` is more evenly spread compared to `a`.
+    
+    2. We assume your Fitbit intraday data was sampled (requested form the Fitbit API) at 1 minute intervals, if the interval is longer, for example 15 minutes, you need to take into account that valid minutes and valid hours ratios are going to be small (for example you would have at most 4 “minutes” of data per hour because you would have four 15-minute windows) and so you should adjust your thresholds to include and exclude rows accordingly. If you are in this situation, get in touch with us, we could implement this use case but we are not sure there is enough demand for it at the moment since you can control the sampling rate of the data you request from Fitbit API.
--- a/docs/features/fitbit-heartrate-intraday.md
+++ b/docs/features/fitbit-heartrate-intraday.md
@ -0,0 +1,49 @@
+# Fitbit Heart Rate Intraday
+
+Sensor parameters description for `[FITBIT_HEARTRATE_INTRADAY]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Container where your heart rate intraday data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. |
+
+
+## RAPIDS provider
+
+!!! info "Available time segments"
+    - Available for all time segments
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/fitbit_heartrate_intraday_raw.csv
+    - data/raw/{pid}/fitbit_heartrate_intraday_with_datetime.csv
+    - data/interim/{pid}/fitbit_heartrate_intraday_features/fitbit_heartrate_intraday_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/fitbit_heartrate_intraday.csv
+    ```
+
+
+Parameters description for `[FITBIT_HEARTRATE_INTRADAY][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`  | Set to `True` to extract `FITBIT_HEARTRATE_INTRADAY` features from the `RAPIDS` provider|
+|`[FEATURES]` |         Features to be computed from heart rate intraday data, see table below          |
+
+
+Features description for `[FITBIT_HEARTRATE_INTRADAY][PROVIDERS][RAPIDS]`:
+
+|Feature                    |Units          |Description|
+|-------------------------- |-------------- |---------------------------|
+|maxhr                      |beats/mins     |The maximum heart rate during a time segment.
+|minhr                      |beats/mins     |The minimum heart rate during a time segment.
+|avghr                      |beats/mins     |The average heart rate during a time segment.
+|medianhr                   |beats/mins     |The median of heart rate during a time segment.
+|modehr                     |beats/mins     |The mode of heart rate during a time segment.
+|stdhr                      |beats/mins     |The standard deviation of heart rate during a time segment.
+|diffmaxmodehr              |beats/mins     |The difference between the maximum and mode heart rate during a time segment.
+|diffminmodehr              |beats/mins     |The difference between the mode and minimum heart rate during a time segment.
+|entropyhr                  |nats           |Shannon’s entropy measurement based on heart rate during a time segment.
+|minutesonZONE              |minutes        |Number of minutes the user’s heart rate fell within each `heartrate_zone` during a time segment.
+
+!!! note "Assumptions/Observations"
+    
+    1. There are four heart rate zones (ZONE): ``outofrange``, ``fatburn``, ``cardio``, and ``peak``. Please refer to [Fitbit documentation](https://help.fitbit.com/articles/en_US/Help_article/1565.htm) for more information about the way they are computed.
--- a/docs/features/fitbit-heartrate-summary.md
+++ b/docs/features/fitbit-heartrate-summary.md
@ -0,0 +1,57 @@
+# Fitbit Heart Rate Summary
+
+Sensor parameters description for `[FITBIT_HEARTRATE_SUMMARY]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Container where your heart rate summary data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. |
+
+
+## RAPIDS provider
+
+!!! info "Available time segments"
+    - Only available for segments that span 1 or more complete days (e.g. Jan 1st 00:00 to Jan 3rd 23:59)
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/fitbit_heartrate_summary_raw.csv
+    - data/raw/{pid}/fitbit_heartrate_summary_with_datetime.csv
+    - data/interim/{pid}/fitbit_heartrate_summary_features/fitbit_heartrate_summary_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/fitbit_heartrate_summary.csv
+    ```
+
+
+Parameters description for `[FITBIT_HEARTRATE_SUMMARY][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`  | Set to `True` to extract `FITBIT_HEARTRATE_SUMMARY` features from the `RAPIDS` provider|
+|`[FEATURES]` |         Features to be computed from heart rate summary data, see table below          |
+
+
+Features description for `[FITBIT_HEARTRATE_SUMMARY][PROVIDERS][RAPIDS]`:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+|maxrestinghr               |beats/mins     |The maximum daily resting heart rate during a time segment.
+|minrestinghr               |beats/mins     |The minimum daily resting heart rate during a time segment.
+|avgrestinghr               |beats/mins     |The average daily resting heart rate during a time segment.
+|medianrestinghr            |beats/mins     |The median of daily resting heart rate during a time segment.
+|moderestinghr              |beats/mins     |The mode of daily resting heart rate during a time segment.
+|stdrestinghr               |beats/mins     |The standard deviation of daily resting heart rate during a time segment.
+|diffmaxmoderestinghr       |beats/mins     |The difference between the maximum and mode daily resting heart rate during a time segment.
+|diffminmoderestinghr       |beats/mins     |The difference between the mode and minimum daily resting heart rate during a time segment.
+|entropyrestinghr           |nats           |Shannon’s entropy measurement based on daily resting heart rate during a time segment.
+|sumcaloriesZONE            |cals           |The total daily calories burned within `heartrate_zone` during a time segment.
+|maxcaloriesZONE            |cals           |The maximum daily calories burned within `heartrate_zone` during a time segment.
+|mincaloriesZONE            |cals           |The minimum daily calories burned within `heartrate_zone` during a time segment.
+|avgcaloriesZONE            |cals           |The average daily calories burned within `heartrate_zone` during a time segment.
+|mediancaloriesZONE         |cals           |The median of daily calories burned within `heartrate_zone` during a time segment.
+|stdcaloriesZONE            |cals           |The standard deviation of daily calories burned within `heartrate_zone` during a time segment.
+|entropycaloriesZONE        |nats           |Shannon’s entropy measurement based on daily calories burned within `heartrate_zone` during a time segment.
+
+!!! note "Assumptions/Observations"
+    
+    1. There are four heart rate zones (ZONE): ``outofrange``, ``fatburn``, ``cardio``, and ``peak``. Please refer to [Fitbit documentation](https://help.fitbit.com/articles/en_US/Help_article/1565.htm) for more information about the way they are computed.
+
+    2. Calories' accuracy depends on the users’ Fitbit profile (weight, height, etc.).
--- a/docs/features/fitbit-sleep-intraday.md
+++ b/docs/features/fitbit-sleep-intraday.md
@ -0,0 +1,156 @@
+# Fitbit Sleep Intraday
+
+Sensor parameters description for `[FITBIT_SLEEP_INTRADAY]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Container where your sleep intraday data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. |
+
+## RAPIDS provider
+
+!!! hint "Understanding RAPIDS features"
+    [This diagram](../../img/sleep_intraday_rapids.png) will help you understand how sleep episodes are chunked and grouped within time segments for the RAPIDS provider.
+
+
+!!! info "Available time segments"
+    - Available for all time segments
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/fitbit_sleep_intraday_raw.csv
+    - data/raw/{pid}/fitbit_sleep_intraday_with_datetime.csv
+    - data/interim/{pid}/fitbit_sleep_intraday_episodes.csv
+    - data/interim/{pid}/fitbit_sleep_intraday_episodes_resampled.csv
+    - data/interim/{pid}/fitbit_sleep_intraday_episodes_resampled_with_datetime.csv
+    - data/interim/{pid}/fitbit_sleep_intraday_features/fitbit_sleep_intraday_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/fitbit_sleep_intraday.csv
+    ```
+
+
+Parameters description for `[FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`                                  | Set to `True` to extract `FITBIT_SLEEP_INTRADAY` features from the `RAPIDS` provider|
+|`[FEATURES]`                                 |         Features to be computed from sleep intraday data, see table below           |
+|`[SLEEP_LEVELS]`                             | Fitbit’s sleep API Version 1 only provides `CLASSIC` records. However, Version 1.2 provides 2 types of records: `CLASSIC` and `STAGES`. `STAGES` is only available in devices with a heart rate sensor and even those devices will fail to report it if the battery is low or the device is not tight enough. While `CLASSIC` contains 3 sleep levels (`awake`, `restless`, and `asleep`), `STAGES` contains 4 sleep levels (`wake`, `deep`, `light`, `rem`). To make it consistent, RAPIDS groups them into 2 `UNIFIED` sleep levels: `awake` (`CLASSIC`: `awake` and `restless`; `STAGES`: `wake`) and `asleep` (`CLASSIC`: `asleep`; `STAGES`: `deep`, `light`, and `rem`). In this section, there is a boolean flag named `INCLUDE_ALL_GROUPS` that if set to TRUE, computes LEVELS_AND_TYPES features grouping all levels together in a single `all` category.
+|`[SLEEP_TYPES]`                              | Types of sleep to be included in the feature extraction computation. There are three sleep types: `main`, `nap`, and `all`. The `all` type means both main sleep and naps are considered.
+
+
+Features description for `[FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS][LEVELS_AND_TYPES]`:
+
+|Feature&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                    |Units          |Description                                                  |
+|------------------------------- |-------------- |-------------------------------------------------------------|
+|countepisode`[LEVEL][TYPE]`     |episodes       |Number of `[LEVEL][TYPE]`sleep episodes. `[LEVEL]`is one of `[SLEEP_LEVELS]` (e.g. awake-classic or rem-stages) and `[TYPE]` is one of `[SLEEP_TYPES]` (e.g. main). `[LEVEL]` can also be `all` when `INCLUDE_ALL_GROUPS` is True, which ignores the levels and groups by sleep types.
+|sumduration`[LEVEL][TYPE]`      |minutes        |Total duration of all `[LEVEL][TYPE]`sleep episodes. `[LEVEL]`is one of `[SLEEP_LEVELS]` (e.g. awake-classic or rem-stages) and `[TYPE]` is one of `[SLEEP_TYPES]` (e.g. main). `[LEVEL]` can also be `all` when `INCLUDE_ALL_GROUPS` is True, which ignores the levels and groups by sleep types.
+|maxduration`[LEVEL][TYPE]`      |minutes        | Longest duration of any `[LEVEL][TYPE]`sleep episode. `[LEVEL]`is one of `[SLEEP_LEVELS]` (e.g. awake-classic or rem-stages) and `[TYPE]` is one of `[SLEEP_TYPES]` (e.g. main). `[LEVEL]` can also be `all` when `INCLUDE_ALL_GROUPS` is True, which ignores the levels and groups by sleep types.
+|minduration`[LEVEL][TYPE]`      |minutes        | Shortest duration of any `[LEVEL][TYPE]`sleep episode. `[LEVEL]`is one of `[SLEEP_LEVELS]` (e.g. awake-classic or rem-stages) and `[TYPE]` is one of `[SLEEP_TYPES]` (e.g. main). `[LEVEL]` can also be `all` when `INCLUDE_ALL_GROUPS` is True, which ignores the levels and groups by sleep types.
+|avgduration`[LEVEL][TYPE]`      |minutes        | Average duration of all `[LEVEL][TYPE]`sleep episodes. `[LEVEL]`is one of `[SLEEP_LEVELS]` (e.g. awake-classic or rem-stages) and `[TYPE]` is one of `[SLEEP_TYPES]` (e.g. main). `[LEVEL]` can also be `all` when `INCLUDE_ALL_GROUPS` is True, which ignores the levels and groups by sleep types.
+|medianduration`[LEVEL][TYPE]`   |minutes        | Median duration of all `[LEVEL][TYPE]`sleep episodes. `[LEVEL]`is one of `[SLEEP_LEVELS]` (e.g. awake-classic or rem-stages) and `[TYPE]` is one of `[SLEEP_TYPES]` (e.g. main). `[LEVEL]` can also be `all` when `INCLUDE_ALL_GROUPS` is True, which ignores the levels and groups by sleep types.
+|stdduration`[LEVEL][TYPE]`      |minutes        | Standard deviation duration of all `[LEVEL][TYPE]`sleep episodes. `[LEVEL]`is one of `[SLEEP_LEVELS]` (e.g. awake-classic or rem-stages) and `[TYPE]` is one of `[SLEEP_TYPES]` (e.g. main). `[LEVEL]` can also be `all` when `INCLUDE_ALL_GROUPS` is True, which ignores the levels and groups by sleep types.
+
+
+Features description for `[FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS]` RATIOS `[ACROSS_LEVELS]`:
+
+|Feature&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                    |Units |Description                                                        |
+|-------------------------- |-------------- |-------------------------------------------------------------|
+|ratiocount`[LEVEL]`         |-|Ratio between the **count** of episodes of a single sleep `[LEVEL]` and the **count** of all episodes of all levels during both `main` and `nap` sleep types. This answers the question: what percentage of all `wake`, `deep`, `light`, and `rem` episodes were `rem`? (e.g., $countepisode[remstages][all] / countepisode[all][all]$)
+|ratioduration`[LEVEL]`      |-|Ratio between the **duration** of episodes of a single sleep `[LEVEL]` and the **duration** of all episodes of all levels during both `main` and `nap` sleep types. This answers the question: what percentage of all `wake`, `deep`, `light`, and `rem` time was `rem`? (e.g., $sumduration[remstages][all] / sumduration[all][all]$)
+
+
+Features description for `[FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS]` RATIOS `[ACROSS_TYPES]`:
+
+|Feature&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                    |Units          |Description                                                  |
+|-------------------------- |-------------- |-------------------------------------------------------------|
+|ratiocountmain             |-              |Ratio between the **count** of all `main` episodes (independently of the levels inside) divided by the **count** of all `main` and `nap` episodes. This answers the question: what percentage of all sleep episodes (`main` and `nap`) were `main`? We do not provide the ratio for `nap` because is complementary. ($countepisode[all][main] / countepisode[all][all]$)
+|ratiodurationmain          |-              |Ratio between the **duration** of all `main` episodes (independently of the levels inside) divided by the **duration** of all `main` and `nap` episodes. This answers the question: what percentage of all sleep time (`main` and `nap`) was `main`? We do not provide the ratio for `nap` because is complementary. ($sumduration[all][main] / sumduration[all][all]$)
+
+
+Features description for `[FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS]` RATIOS `[WITHIN_LEVELS]`:
+
+|Feature&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                           |Units          |Description                                                  |
+|--------------------------------- |-------------- |-------------------------------------------------------------|
+|ratiocountmainwithin`[LEVEL]`    |-              |Ratio between the **count** of episodes of a single sleep `[LEVEL]` during `main` sleep divided by the **count** of episodes of a single sleep `[LEVEL]` during `main` **and** `nap`. This answers the question: are `rem` episodes more frequent during `main` than `nap` sleep? We do not provide the ratio for `nap` because is complementary. ($countepisode[remstages][main] / countepisode[remstages][all]$)
+|ratiodurationmainwithin`[LEVEL]` |-              |Ratio between the **duration** of episodes of a single sleep `[LEVEL]` during `main` sleep divided by the **duration** of episodes of a single sleep `[LEVEL]` during `main` **and** `nap`. This answers the question: is `rem` time more frequent during `main` than `nap` sleep? We do not provide the ratio for `nap` because is complementary. ($countepisode[remstages][main] / countepisode[remstages][all]$)
+
+
+Features description for `[FITBIT_SLEEP_INTRADAY][PROVIDERS][RAPIDS]` RATIOS `[WITHIN_TYPES]`:
+
+|Feature&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|Units|Description|
+| - |- | - |
+|ratiocount`[LEVEL]`within`[TYPE]`    |-|Ratio between the **count** of episodes of a single sleep `[LEVEL]` and the **count** of all episodes of all levels during either `main` or `nap` sleep types. This answers the question: what percentage of all `wake`, `deep`, `light`, and `rem` episodes were `rem` during `main`/`nap` sleep time? (e.g., $countepisode[remstages][main] / countepisode[all][main]$)
+|ratioduration`[LEVEL]`within`[TYPE]` |-|Ratio between the **duration** of episodes of a single sleep `[LEVEL]` and the **duration** of all episodes of all levels during either `main` or `nap` sleep types. This answers the question: what percentage of all `wake`, `deep`, `light`, and `rem` time was `rem` during `main`/`nap` sleep time? (e.g., $sumduration[remstages][main] / sumduration[all][main]$)
+
+
+
+!!! note "Assumptions/Observations"
+    1. [This diagram](../../img/sleep_intraday_rapids.png) will help you understand how sleep episodes are chunked and grouped within time segments for the RAPIDS provider.
+    1. Features listed in `[LEVELS_AND_TYPES]` are computed for any levels and types listed in `[SLEEP_LEVELS]` or `[SLEEP_TYPES]`. For example if `STAGES` only contains `[rem, light]` you will not get `countepisode[wake|deep][TYPE]` or sum, max, min, avg, median, or std `duration`. Levels or types in these lists do not influence `RATIOS` or `ROUTINE` features.
+    2. Any `[LEVEL]` grouping is done within the elements of each class `CLASSIC`, `STAGES`, and `UNIFIED`. That is, we never combine `CLASSIC` or `STAGES` types to compute features.
+    3. The categories for `all` levels (when `INCLUDE_ALL_GROUPS` is `True`) and `all` `SLEEP_TYPES` are not considered for `RATIOS` features as they are always 1.
+    3. These features can be computed in time segments of any length, but only the 1-minute sleep chunks within each segment instance will be used.
+
+
+
+## PRICE provider
+
+!!! hint "Understanding PRICE features"
+    [This diagram](../../img/sleep_intraday_price.png) will help you understand how sleep episodes are chunked and grouped within time segments and `LNE-LNE` intervals for the PRICE provider.
+
+!!! info "Available time segments"
+    - Available for any time segments larger or equal to one day
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/fitbit_sleep_intraday_raw.csv
+    - data/raw/{pid}/fitbit_sleep_intraday_parsed.csv
+    - data/interim/{pid}/fitbit_sleep_intraday_episodes_resampled.csv
+    - data/interim/{pid}/fitbit_sleep_intraday_episodes_resampled_with_datetime.csv
+    - data/interim/{pid}/fitbit_sleep_intraday_features/fitbit_sleep_intraday_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/fitbit_sleep_intraday.csv
+    ```
+
+
+Parameters description for `[FITBIT_SLEEP_INTRADAY][PROVIDERS][PRICE]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`                                  | Set to `True` to extract `FITBIT_SLEEP_INTRADAY` features from the `PRICE` provider                                      |
+|`[FEATURES]`                                 |         Features to be computed from sleep intraday data, see table below   
+|`[SLEEP_LEVELS]`                             | Fitbit’s sleep API Version 1 only provides `CLASSIC` records. However, Version 1.2 provides 2 types of records: `CLASSIC` and `STAGES`. `STAGES` is only available in devices with a heart rate sensor and even those devices will fail to report it if the battery is low or the device is not tight enough. While `CLASSIC` contains 3 sleep levels (`awake`, `restless`, and `asleep`), `STAGES` contains 4 sleep levels (`wake`, `deep`, `light`, `rem`). To make it consistent, RAPIDS groups them into 2 `UNIFIED` sleep levels: `awake` (`CLASSIC`: `awake` and `restless`; `STAGES`: `wake`) and `asleep` (`CLASSIC`: `asleep`; `STAGES`: `deep`, `light`, and `rem`). In this section, there is a boolean flag named `INCLUDE_ALL_GROUPS` that if set to TRUE, computes avgdurationallmain`[DAY_TYPE]` features grouping all levels together in a single `all` category.
+|`[DAY_TYPE]`                                 | The features of this provider can be computed using daily averages/standard deviations that were extracted on `WEEKEND` days only, `WEEK` days only, or `ALL` days|
+|`[LAST_NIGHT_END]`                    | Only `main` sleep episodes that start within the `LNE-LNE` interval [`LAST_NIGHT_END`, `LAST_NIGHT_END` + 23H 59M 59S] are taken into account to compute the features described below. `[LAST_NIGHT_END]` is a number ranging from 0 (midnight) to 1439 (23:59). |
+
+
+Features description for `[FITBIT_SLEEP_INTRADAY][PROVIDERS][PRICE]`:
+
+|Feature&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                                  |Units          |Description                                                  |
+|------------------------------------- |----------------- |-------------------------------------------------------------|
+|avgduration`[LEVEL]`main`[DAY_TYPE]`             |minutes           | Average duration of daily sleep chunks of a `LEVEL`. Use the `DAY_TYPE` flag to include daily durations from weekend days only, weekdays, or both. Use `[LEVEL]` to group all levels in a single `all` category.
+|avgratioduration`[LEVEL]`withinmain`[DAY_TYPE]`  |-                 | Average of the daily ratio between the duration of sleep chunks of a `LEVEL` and total duration of all `main` sleep episodes in a day. When `INCLUDE_ALL_GROUPS` is `True` the `all` `LEVEL` is ignored since this feature is always 1. Use the `DAY_TYPE` flag to include start times from weekend days only, weekdays, or both.
+|avgstarttimeofepisodemain`[DAY_TYPE]` |minutes           | Average of all start times of the first `main` sleep episode within each `LNE-LNE` interval in a time segment. Use the `DAY_TYPE` flag to include start times from `LNE-LNE` intervals that start on weekend days only, weekdays, or both.
+|avgendtimeofepisodemain`[DAY_TYPE]`   |minutes           | Average of all end times of the last `main` sleep episode within each `LNE-LNE` interval in a time segment. Use the `DAY_TYPE` flag to include end times from `LNE-LNE` intervals that start on weekend days only, weekdays, or both.
+|avgmidpointofepisodemain`[DAY_TYPE]`  |minutes           | Average of all the differences between `avgendtime...` and `avgstarttime..` in a time segment. Use the `DAY_TYPE` flag to include end times from `LNE-LNE` intervals that start on weekend days only, weekdays, or both.
+|stdstarttimeofepisodemain`[DAY_TYPE]` |minutes           | Standard deviation of all start times of the first `main` sleep episode within each `LNE-LNE` interval in a time segment. Use the `DAY_TYPE` flag to include start times from `LNE-LNE` intervals that start on weekend days only, weekdays, or both.
+|stdendtimeofepisodemain`[DAY_TYPE]`   |minutes           | Standard deviation of all end times of the last `main` sleep episode within each `LNE-LNE` interval in a time segment. Use the `DAY_TYPE` flag to include end times from `LNE-LNE` intervals that start on weekend days only, weekdays, or both.
+|stdmidpointofepisodemain`[DAY_TYPE]`  |minutes           | Standard deviation of all the differences between `avgendtime...` and `avgstarttime..` in a time segment. Use the `DAY_TYPE` flag to include end times from `LNE-LNE` intervals that start on weekend days only, weekdays, or both.
+|socialjetlag                          |minutes           | Difference in minutes between the avgmidpointofepisodemain of weekends and weekdays that belong to each time segment instance. If your time segment does not contain at least one week day and one weekend day this feature will be NA. 
+|rmssdmeanstarttimeofepisodemain       |minutes           | Square root of the **mean** squared successive difference (RMSSD) between today's and yesterday's `starttimeofepisodemain` values across the entire participant's sleep data grouped per time segment instance. It represents the mean of how someone's `starttimeofepisodemain` (bedtime) changed from night to night.
+|rmssdmeanendtimeofepisodemain         |minutes           | Square root of the **mean** squared successive difference (RMSSD) between today's and yesterday's `endtimeofepisodemain` values across the entire participant's sleep data grouped per time segment instance. It represents the mean of how someone's `endtimeofepisodemain` (wake time) changed from night to night.
+|rmssdmeanmidpointofepisodemain        |minutes           | Square root of the **mean** squared successive difference (RMSSD) between today's and yesterday's `midpointofepisodemain` values across the entire participant's sleep data grouped per time segment instance. It represents the mean of how someone's `midpointofepisodemain` (mid time between bedtime and wake time) changed from night to night.
+|rmssdmedianstarttimeofepisodemain     |minutes           | Square root of the **median** squared successive difference (RMSSD) between today's and yesterday's `starttimeofepisodemain` values across the entire participant's sleep data grouped per time segment instance. It represents the median of how someone's `starttimeofepisodemain` (bedtime) changed from night to night.
+|rmssdmedianendtimeofepisodemain       |minutes           | Square root of the **median** squared successive difference (RMSSD) between today's and yesterday's `endtimeofepisodemain` values across the entire participant's sleep data grouped per time segment instance. It represents the median of how someone's `endtimeofepisodemain` (wake time) changed from night to night.
+|rmssdmedianmidpointofepisodemain      |minutes           | Square root of the **median** squared successive difference (RMSSD) between today's and yesterday's `midpointofepisodemain` values across the entire participant's sleep data grouped per time segment instance. It represents the median of how someone's `midpointofepisodemain` (average mid time between bedtime and wake time) changed from night to night.
+
+
+
+!!! note "Assumptions/Observations"
+    1. [This diagram](../../img/sleep_intraday_price.png) will help you understand how sleep episodes are chunked and grouped within time segments and `LNE-LNE` intervals for the PRICE provider.
+    1. We recommend you use periodic segments that start in the morning so RAPIDS can chunk and group sleep episodes overnight. Shifted segments (as any other segments) are labelled based on their start and end date times.
+    5. `avgstarttime...` and `avgendtime...` are roughly equivalent to an average bed and awake time only if you are using shifted segments.
+    1. The features of this provider are only available on time segments that are longer than 24 hours because they are based on descriptive statistics computed across daily values.
+    2. Even though Fitbit provides 2 types of sleep episodes (`main` and `nap`), only `main` sleep episodes are considered.
+    4. The reference point for all times is 00:00 of the first day in the LNE-LNE interval.
+    5. Sleep episodes are formed by 1-minute chunks that we group overnight starting from today’s LNE and ending on tomorrow’s LNE or the end of that segment (whatever is first). 
+    5. The features `avgstarttime...` and `avgendtime...` are the average of the first and last sleep episode across every LNE-LNE interval within a segment (`avgmidtime...` is the mid point between start and end). Therefore, only segments longer than 24hrs will be averaged across more than one LNE-LNE interval.
+    5. `socialjetlag` is only available on segment instances equal or longer than 48hrs that contain at least one weekday day and one weekend day, for example seven-day (weekly) segments.
--- a/docs/features/fitbit-sleep-summary.md
+++ b/docs/features/fitbit-sleep-summary.md
@ -0,0 +1,70 @@
+# Fitbit Sleep Summary
+
+Sensor parameters description for `[FITBIT_SLEEP_SUMMARY]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Container where your sleep summary data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. |
+
+
+## RAPIDS provider
+
+!!! hint "Understanding RAPIDS features"
+    [This diagram](../../img/sleep_summary_rapids.png) will help you understand how sleep episodes are chunked and grouped within time segments using `SLEEP_SUMMARY_LAST_NIGHT_END` for the RAPIDS provider.
+
+!!! info "Available time segments"
+    - Only available for segments that span 1 or more complete days (e.g. Jan 1st 00:00 to Jan 3rd 23:59)
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/fitbit_sleep_summary_raw.csv
+    - data/raw/{pid}/fitbit_sleep_summary_with_datetime.csv
+    - data/interim/{pid}/fitbit_sleep_summary_features/fitbit_sleep_summary_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/fitbit_sleep_summary.csv
+    ```
+
+
+Parameters description for `[FITBIT_SLEEP_SUMMARY][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`     | Set to `True` to extract `FITBIT_SLEEP_SUMMARY` features from the `RAPIDS` provider                                                |
+|`[SLEEP_TYPES]` | Types of sleep to be included in the feature extraction computation. There are three sleep types: `main`, `nap`, and `all`. The `all` type means both main sleep and naps are considered.       |
+|`[FEATURES]`    |         Features to be computed from sleep summary data, see table below                                                           |
+|`[FITBIT_DATA_STREAMS][data stream][SLEEP_SUMMARY_LAST_NIGHT_END]`    |  As an exception, the `LAST_NIGHT_END` parameter for this provider is in the data stream configuration section. This parameter controls how sleep episodes are assigned to different days and affects wake and bedtimes.|
+
+
+Features description for `[FITBIT_SLEEP_SUMMARY][PROVIDERS][RAPIDS]`:
+
+|Feature                        |Units      |Description                                  |
+|------------------------------ |---------- |-------------------------------------------- |
+|firstwaketimeTYPE              |minutes    |First wake time for a certain sleep type during a time segment. Wake time is number of minutes after midnight of a sleep episode's end time.
+|lastwaketimeTYPE               |minutes    |Last wake time for a certain sleep type during a time segment. Wake time is number of minutes after midnight of a sleep episode's end time.
+|firstbedtimeTYPE                   |minutes    |First bedtime for a certain sleep type during a time segment. Bedtime is number of minutes after midnight of a sleep episode's start time.
+|lastbedtimeTYPE                    |minutes    |Last bedtime for a certain sleep type during a time segment. Bedtime is number of minutes after midnight of a sleep episode's start time.
+|countepisodeTYPE               |episodes   |Number of sleep episodes for a certain sleep type during a time segment.
+|avgefficiencyTYPE              |scores     |Average sleep efficiency for a certain sleep type during a time segment.
+|sumdurationafterwakeupTYPE     |minutes    |Total duration the user stayed in bed after waking up for a certain sleep type during a time segment.
+|sumdurationasleepTYPE          |minutes    |Total sleep duration for a certain sleep type during a time segment.
+|sumdurationawakeTYPE           |minutes    |Total duration the user stayed awake but still in bed for a certain sleep type during a time segment.
+|sumdurationtofallasleepTYPE    |minutes    |Total duration the user spent to fall asleep for a certain sleep type during a time segment.
+|sumdurationinbedTYPE           |minutes    |Total duration the user stayed in bed (sumdurationtofallasleep + sumdurationawake + sumdurationasleep + sumdurationafterwakeup) for a certain sleep type during a time segment.
+|avgdurationafterwakeupTYPE     |minutes    |Average duration the user stayed in bed after waking up for a certain sleep type during a time segment.
+|avgdurationasleepTYPE          |minutes    |Average sleep duration for a certain sleep type during a time segment.
+|avgdurationawakeTYPE           |minutes    |Average duration the user stayed awake but still in bed for a certain sleep type during a time segment.
+|avgdurationtofallasleepTYPE    |minutes    |Average duration the user spent to fall asleep for a certain sleep type during a time segment.
+|avgdurationinbedTYPE           |minutes    |Average duration the user stayed in bed (sumdurationtofallasleep + sumdurationawake + sumdurationasleep + sumdurationafterwakeup) for a certain sleep type during a time segment.
+
+
+
+!!! note "Assumptions/Observations"
+    1. [This diagram](../../img/sleep_summary_rapids.png) will help you understand how sleep episodes are chunked and grouped within time segments using `LNE` for the RAPIDS provider.
+    1. There are three sleep types (TYPE): `main`, `nap`, `all`. The `all` type groups both `main` sleep and `naps`. All types are based on Fitbit's labels.
+    2. There are two versions of Fitbit’s sleep API ([version 1](https://dev.fitbit.com/build/reference/web-api/sleep-v1/) and [version 1.2](https://dev.fitbit.com/build/reference/web-api/sleep/)), and each provides raw sleep data in a different format:
+        - _Count & duration summaries_. `v1` contains `count_awake`, `duration_awake`, `count_awakenings`, `count_restless`, and `duration_restless` fields for every sleep record but `v1.2` does not.
+    3. _API columns_. Most features are computed based on the values provided by Fitbit’s API: `efficiency`, `minutes_after_wakeup`, `minutes_asleep`, `minutes_awake`, `minutes_to_fall_asleep`, `minutes_in_bed`, `is_main_sleep` and `type`.
+    4. Bed time and sleep duration are based on episodes that started between today’s LNE and tomorrow’s LNE while awake time is based on the episodes that started between yesterday’s LNE and today’s LNE
+    5. The reference point for bed/awake times is today’s 00:00. You can have bedtimes larger than 24 and awake times smaller than 0
+    6. These features are only available for time segments that span midnight to midnight of the same or different day.
+    7. We include first and last wake and bedtimes because, when `LAST_NIGHT_END` is 10 am, the first bedtime could match a nap at 2 pm, and the last bedtime could match a main overnight sleep episode that starts at 10pm.
+    5. Set the value for `SLEEP_SUMMARY_LAST_NIGHT_END` int the config parameter [FITBIT_DATA_STREAMS][data stream][SLEEP_SUMMARY_LAST_NIGHT_END].
--- a/docs/features/fitbit-steps-intraday.md
+++ b/docs/features/fitbit-steps-intraday.md
@ -0,0 +1,64 @@
+# Fitbit Steps Intraday
+
+Sensor parameters description for `[FITBIT_STEPS_INTRADAY]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Container where your steps intraday data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. |
+|`[EXCLUDE_SLEEP]` | Step data will be excluded if it was logged during sleep periods when at least one `[EXCLUDE]` flag is set to `True`. Sleep can be delimited by (1) a fixed period that repeats on every day if `[TIME_BASED][EXCLUDE]` is True or (2) by Fitbit summary sleep episodes if `[FITBIT_BASED][EXCLUDE]` is True. If both are True (3), we use all Fitbit sleep episodes as well as the time-based episodes that do not overlap with any Fitbit episodes. If `[TIME_BASED][EXCLUDE]` is True, make sure Fitbit sleep summary container points to a valid table or file.
+
+## RAPIDS provider
+
+!!! info "Available time segments"
+    - Available for all time segments
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/fitbit_steps_intraday_raw.csv
+    - data/raw/{pid}/fitbit_steps_intraday_with_datetime.csv
+    - data/raw/{pid}/fitbit_sleep_summary_raw.csv (Only when [EXCLUDE_SLEEP][EXCLUDE]=True and [EXCLUDE_SLEEP][TYPE]=FITBIT_BASED)
+    - data/interim/{pid}/fitbit_steps_intraday_with_datetime_exclude_sleep.csv (Only when [EXCLUDE_SLEEP][EXCLUDE]=True)
+    - data/interim/{pid}/fitbit_steps_intraday_features/fitbit_steps_intraday_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/fitbit_steps_intraday.csv
+    ```
+
+
+Parameters description for `[FITBIT_STEPS_INTRADAY][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`                | Set to `True` to extract `FITBIT_STEPS_INTRADAY` features from the `RAPIDS` provider|
+|`[FEATURES]`               |         Features to be computed from steps intraday data, see table below           |
+|`[REFERENCE_HOUR]`         | The reference point from which `firststeptime` or `laststeptime` is to be computed, default is midnight |
+|`[THRESHOLD_ACTIVE_BOUT]`  | Every minute with Fitbit steps data wil be labelled as `sedentary` if its step count is below this threshold, otherwise, `active`.    |
+|`[INCLUDE_ZERO_STEP_ROWS]` | Whether or not to include time segments with a 0 step count during the whole day.                          |
+
+
+Features description for `[FITBIT_STEPS_INTRADAY][PROVIDERS][RAPIDS]`:
+
+|Feature                    |Units          |Description                                                  |
+|-------------------------- |-------------- |-------------------------------------------------------------|
+|sumsteps                   |steps          |The total step count during a time segment.
+|maxsteps                   |steps          |The maximum step count during a time segment.
+|minsteps                   |steps          |The minimum step count during a time segment.
+|avgsteps                   |steps          |The average step count during a time segment.
+|stdsteps                   |steps          |The standard deviation of step count during a time segment.
+|firststeptime              |minutes        |Minutes until the first non-zero step count.
+|laststeptime               |minutes        |Minutes until the last non-zero step count.
+|countepisodesedentarybout  |bouts          |Number of sedentary bouts during a time segment.
+|sumdurationsedentarybout   |minutes        |Total duration of all sedentary bouts during a time segment.
+|maxdurationsedentarybout   |minutes        |The maximum duration of any sedentary bout during a time segment.
+|mindurationsedentarybout   |minutes        |The minimum duration of any sedentary bout during a time segment.
+|avgdurationsedentarybout   |minutes        |The average duration of sedentary bouts during a time segment.
+|stddurationsedentarybout   |minutes        |The standard deviation of the duration of sedentary bouts during a time segment.
+|countepisodeactivebout     |bouts          |Number of active bouts during a time segment.
+|sumdurationactivebout      |minutes        |Total duration of all active bouts during a time segment.
+|maxdurationactivebout      |minutes        |The maximum duration of any active bout during a time segment.
+|mindurationactivebout      |minutes        |The minimum duration of any active bout during a time segment.
+|avgdurationactivebout      |minutes        |The average duration of active bouts during a time segment.
+|stddurationactivebout      |minutes        |The standard deviation of the duration of active bouts during a time segment.
+
+!!! note "Assumptions/Observations"
+    
+    1. _Active and sedentary bouts_. If the step count per minute is smaller than `THRESHOLD_ACTIVE_BOUT` (default value is 10), that minute is labelled as sedentary, otherwise, is labelled as active. Active and sedentary bouts are periods of consecutive minutes labelled as `active` or `sedentary`.
+
--- a/docs/features/fitbit-steps-summary.md
+++ b/docs/features/fitbit-steps-summary.md
@ -0,0 +1,44 @@
+# Fitbit Steps Summary
+
+Sensor parameters description for `[FITBIT_STEPS_SUMMARY]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Container where your steps summary data is stored, depending on the data stream you are using this can be a database table, a CSV file, etc. |
+
+
+## RAPIDS provider
+
+!!! info "Available time segments"
+    - Only available for segments that span 1 or more complete days (e.g. Jan 1st 00:00 to Jan 3rd 23:59)
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/fitbit_steps_summary_raw.csv
+    - data/raw/{pid}/fitbit_steps_summary_with_datetime.csv
+    - data/interim/{pid}/fitbit_steps_summary_features/fitbit_steps_summary_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/fitbit_steps_summary.csv
+    ```
+
+
+Parameters description for `[FITBIT_STEPS_SUMMARY][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`  | Set to `True` to extract `FITBIT_STEPS_SUMMARY` features from the `RAPIDS` provider|
+|`[FEATURES]` |         Features to be computed from steps summary data, see table below          |
+
+
+Features description for `[FITBIT_STEPS_SUMMARY][PROVIDERS][RAPIDS]`:
+
+|Feature                    |Units      |Description                                  |
+|-------------------------- |---------- |-------------------------------------------- |
+|maxsumsteps                |steps      |The maximum daily step count during a time segment.
+|minsumsteps                |steps      |The minimum daily step count during a time segment.
+|avgsumsteps                |steps      |The average daily step count during a time segment.
+|mediansumsteps             |steps      |The median of daily step count during a time segment.
+|stdsumsteps                |steps      |The standard deviation of daily step count during a time segment.
+
+!!! note "Assumptions/Observations"
+    
+    NA
--- a/docs/features/phone-accelerometer.md
+++ b/docs/features/phone-accelerometer.md
@ -0,0 +1,83 @@
+# Phone Accelerometer
+
+Sensor parameters description for `[PHONE_ACCELEROMETER]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the accelerometer data is stored
+
+## RAPIDS provider
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+    - Available for Android and iOS
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/phone_accelerometer_raw.csv
+    - data/raw/{pid}/phone_accelerometer_with_datetime.csv
+    - data/interim/{pid}/phone_accelerometer_features/phone_accelerometer_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/phone_accelerometer.csv
+    ```
+
+
+Parameters description for `[PHONE_ACCELEROMETER][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`| Set to `True` to extract `PHONE_ACCELEROMETER` features from the `RAPIDS` provider|
+|`[FEATURES]` |         Features to be computed, see table below
+
+
+Features description for `[PHONE_ACCELEROMETER][PROVIDERS][RAPIDS]`:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+|maxmagnitude      |m/s^2^    |The maximum magnitude of acceleration ($\|acceleration\| = \sqrt{x^2 + y^2 + z^2}$).
+|minmagnitude      |m/s^2^    |The minimum magnitude of acceleration.
+|avgmagnitude      |m/s^2^    |The average magnitude of acceleration.
+|medianmagnitude   |m/s^2^    |The median magnitude of acceleration.
+|stdmagnitude      |m/s^2^    |The standard deviation of acceleration.
+
+!!! note "Assumptions/Observations"
+    1. Analyzing accelerometer data is a memory intensive task. If RAPIDS crashes is likely because the accelerometer dataset for a participant is to big to fit in memory. We are considering different alternatives to overcome this problem.
+
+## PANDA provider
+
+These features are based on the work by [Panda et al](../../citation#panda-accelerometer).
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+    - Available for Android and iOS
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/phone_accelerometer_raw.csv
+    - data/raw/{pid}/phone_accelerometer_with_datetime.csv
+    - data/interim/{pid}/phone_accelerometer_features/phone_accelerometer_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/phone_accelerometer.csv
+    ```
+
+
+Parameters description for `[PHONE_ACCELEROMETER][PROVIDERS][PANDA]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`| Set to `True` to extract `PHONE_ACCELEROMETER` features from the `PANDA` provider|
+|`[FEATURES]` |         Features to be computed for exertional and non-exertional activity episodes, see table below
+
+
+Features description for `[PHONE_ACCELEROMETER][PROVIDERS][PANDA]`:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+| sumduration    | minutes | Total duration of all exertional or non-exertional activity episodes.                     |
+| maxduration    | minutes | Longest duration of any exertional or non-exertional activity episode.                    |
+| minduration    | minutes | Shortest duration of any exertional or non-exertional activity episode.                   |
+| avgduration    | minutes | Average duration of any exertional or non-exertional activity episode.                    |
+| medianduration | minutes | Median duration of any exertional or non-exertional activity episode.                     |
+| stdduration    | minutes | Standard deviation of the duration of all exertional or non-exertional activity episodes. |
+
+!!! note "Assumptions/Observations"
+    1. Analyzing accelerometer data is a memory intensive task. If RAPIDS crashes is likely because the accelerometer dataset for a participant is to big to fit in memory. We are considering different alternatives to overcome this problem.
+    2. See [Panda et al](../../citation#panda-accelerometer) for a definition of exertional and non-exertional activity episodes
--- a/docs/features/phone-activity-recognition.md
+++ b/docs/features/phone-activity-recognition.md
@ -0,0 +1,63 @@
+# Phone Activity Recognition
+
+Sensor parameters description for `[PHONE_ACTIVITY_RECOGNITION]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER][ANDROID]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the activity data from Android devices is stored (the AWARE client saves this data on different tables for Android and iOS)
+|`[CONTAINER][IOS]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the activity data from iOS devices is stored (the AWARE client saves this data on different tables for Android and iOS)
+|`[EPISODE_THRESHOLD_BETWEEN_ROWS]` | Difference in minutes between any two rows for them to be considered part of the same activity episode
+
+## RAPIDS provider
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+    - Available for Android and iOS
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/phone_activity_recognition_raw.csv
+    - data/raw/{pid}/phone_activity_recognition_with_datetime.csv
+    - data/interim/{pid}/phone_activity_recognition_episodes.csv
+    - data/interim/{pid}/phone_activity_recognition_episodes_resampled.csv
+    - data/interim/{pid}/phone_activity_recognition_episodes_resampled_with_datetime.csv
+    - data/interim/{pid}/phone_activity_recognition_features/phone_activity_recognition_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/phone_activity_recognition.csv
+    ```
+
+
+Parameters description for `[PHONE_ACTIVITY_RECOGNITION][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`| Set to `True` to extract `PHONE_ACTIVITY_RECOGNITION` features from the `RAPIDS` provider|
+|`[FEATURES]` |         Features to be computed, see table below
+|`[ACTIVITY_CLASSES][STATIONARY]` | An array of the activity labels to be considered in the `STATIONARY` category choose any of `still`, `tilting`
+|`[ACTIVITY_CLASSES][MOBILE]` | An array of the activity labels to be considered in the `MOBILE` category choose any of `on_foot`, `walking`, `running`, `on_bicycle`
+|`[ACTIVITY_CLASSES][VEHICLE]` | An array of the activity labels to be considered in the `VEHICLE` category choose any of `in_vehicule`
+
+
+Features description for `[PHONE_ACTIVITY_RECOGNITION][PROVIDERS][RAPIDS]`:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+|count                   |rows             | Number of episodes.
+|mostcommonactivity      |activity type   | The most common activity type (e.g. `still`, `on_foot`, etc.). If there is a tie, the first one is chosen.
+|countuniqueactivities   |activity type   | Number of unique activities.
+|durationstationary      |minutes          | The total duration of `[ACTIVITY_CLASSES][STATIONARY]` episodes of still and tilting activities
+|durationmobile          |minutes          | The total duration of `[ACTIVITY_CLASSES][MOBILE]` episodes of on foot, running, and on bicycle activities
+|durationvehicle         |minutes          | The total duration of `[ACTIVITY_CLASSES][VEHICLE]` episodes of on vehicle activity
+
+!!! note "Assumptions/Observations"
+    1. iOS Activity Recognition names and types are unified with Android labels: 
+
+        | iOS Activity Name | Android Activity Name | Android Activity Type |
+        |----|----|----|
+        |`walking`|  `walking` |  `7`
+        |`running`|  `running` |  `8`
+        |`cycling`|  `on_bicycle` |  `1`
+        |`automotive`|  `in_vehicle` |  `0`
+        |`stationary`|  `still` |  `3`
+        |`unknown`|  `unknown` |  `4`
+
+    2. In AWARE, Activity Recognition data for Android and iOS are stored in two different database tables, RAPIDS automatically infers what platform each participant belongs to based on their [participant file](../../setup/configuration/#participant-files).
--- a/docs/features/phone-applications-crashes.md
+++ b/docs/features/phone-applications-crashes.md
@ -0,0 +1,14 @@
+# Phone Applications Crashes
+
+Sensor parameters description for `[PHONE_APPLICATIONS_CRASHES]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the applications crashes data is stored
+|`[APPLICATION_CATEGORIES][CATALOGUE_SOURCE]` | `FILE` or `GOOGLE`. If `FILE`, app categories (genres) are read from `[CATALOGUE_FILE]`. If `[GOOGLE]`, app categories (genres) are scrapped from the Play Store
+|`[APPLICATION_CATEGORIES][CATALOGUE_FILE]` | CSV file with a `package_name` and `genre` column. By default we provide the catalogue created by [Stachl et al](../../citation#stachl-applications-crashes) in `data/external/stachl_application_genre_catalogue.csv`
+|`[APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE]` | if `[CATALOGUE_SOURCE]` is equal to `FILE`, this flag signals whether or not to update `[CATALOGUE_FILE]`, if `[CATALOGUE_SOURCE]` is equal to `GOOGLE` all scraped genres will be saved to `[CATALOGUE_FILE]`
+|`[APPLICATION_CATEGORIES][SCRAPE_MISSING_CATEGORIES]` | This flag signals whether or not to scrape categories (genres) missing from the `[CATALOGUE_FILE]`. If `[CATALOGUE_SOURCE]` is equal to `GOOGLE`, all genres are scraped anyway (this flag is ignored)
+
+!!! note
+    No feature providers have been implemented for this sensor yet, however you can use its key (`PHONE_APPLICATIONS_CRASHES`) to improve [`PHONE_DATA_YIELD`](../phone-data-yield) or you can [implement your own features](../add-new-features).
--- a/docs/features/phone-applications-foreground.md
+++ b/docs/features/phone-applications-foreground.md
@ -0,0 +1,68 @@
+# Phone Applications Foreground
+
+Sensor parameters description for `[PHONE_APPLICATIONS_FOREGROUND]` (these parameters are used by the only provider available at the moment, RAPIDS):
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the applications foreground data is stored
+|`[APPLICATION_CATEGORIES][CATALOGUE_SOURCE]` | `FILE` or `GOOGLE`. If `FILE`, app categories (genres) are read from `[CATALOGUE_FILE]`. If `[GOOGLE]`, app categories (genres) are scrapped from the Play Store
+|`[APPLICATION_CATEGORIES][CATALOGUE_FILE]` | CSV file with a `package_name` and `genre` column. By default we provide the catalogue created by [Stachl et al](../../citation#stachl-applications-foreground) in `data/external/stachl_application_genre_catalogue.csv`
+|`[APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE]` | if `[CATALOGUE_SOURCE]` is equal to `FILE`, this flag signals whether or not to update `[CATALOGUE_FILE]`, if `[CATALOGUE_SOURCE]` is equal to `GOOGLE` all scraped genres will be saved to `[CATALOGUE_FILE]`
+|`[APPLICATION_CATEGORIES][SCRAPE_MISSING_CATEGORIES]` | This flag signals whether or not to scrape categories (genres) missing from the `[CATALOGUE_FILE]`. If `[CATALOGUE_SOURCE]` is equal to `GOOGLE`, all genres are scraped anyway (this flag is ignored)
+
+## RAPIDS provider
+
+The app category (genre) catalogue used in these features was originally created by [Stachl et al](../../citation#stachl-applications-foreground).
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+    - Available for Android only
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/phone_applications_foreground_raw.csv
+    - data/raw/{pid}/phone_applications_foreground_with_datetime.csv
+    - data/raw/{pid}/phone_applications_foreground_with_datetime_with_categories.csv
+    - data/interim/{pid}/phone_applications_foreground_features/phone_applications_foreground_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/phone_applications_foreground.csv
+    ```
+
+
+Parameters description for `[PHONE_APPLICATIONS_FOREGROUND][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`| Set to `True` to extract `PHONE_APPLICATIONS_FOREGROUND` features from the `RAPIDS` provider|
+|`[INCLUDE_EPISODE_FEATURES]`| Set to `True` to extract features from application usage episodes using Screen data |
+|`[FEATURES]` |         Features to be computed, see table below
+|`[SINGLE_CATEGORIES]`     | An array of app categories to be *included* in the feature extraction computation. The special keyword `all` represents a category with all the apps from each participant. By default, we use the category catalog pointed by `[APPLICATION_CATEGORIES][CATALOGUE_FILE]` (see the Sensor parameters description table above)
+|`[CUSTOM_CATEGORIES]`   | An array of collections representing your own app categories. The key of each element is the name of the custom category, and the value is an array of the package names (apps) included in that category.
+|`[MULTIPLE_CATEGORIES]`   | An array of collections representing meta-categories (a group of categories). The key of each element is the name of the `meta-category` and the value is an array of member app categories. By default, we use the category catalog pointed by `[APPLICATION_CATEGORIES][CATALOGUE_FILE]` (see the Sensor parameters description table above)
+|`[SINGLE_APPS]`           | An array of apps to be *included* in the feature extraction computation. Use their package name (e.g. `com.google.android.youtube`) or the reserved keyword `top1global` (the most used app by a participant over the whole monitoring study)
+|`[EXCLUDED_CATEGORIES]`   | An array of app categories to be *excluded* from the feature extraction computation. By default, we use the category catalog pointed by `[APPLICATION_CATEGORIES][CATALOGUE_FILE]` (see the Sensor parameters description table above)
+|`[EXCLUDED_APPS]`         | An array of apps to be excluded from the feature extraction computation. Use their package name, for example: `com.google.android.youtube`
+
+Features description for `[PHONE_APPLICATIONS_FOREGROUND][PROVIDERS][RAPIDS]`:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+|countevent              |apps      | Number of times a single app or apps within a category were used (i.e. they were brought to the foreground either by tapping their icon or switching to it from another app)
+|timeoffirstuse     |minutes   | The time in minutes between 12:00am (midnight) and the first use of a single app or apps within a category during a `time_segment`
+|timeoflastuse      |minutes   | The time in minutes between 12:00am (midnight) and the last use of a single app or apps within a category during a `time_segment`
+|frequencyentropy   |nats      | The entropy of the used apps within a category during a `time_segment` (each app is seen as a unique event, the more apps were used, the higher the entropy). This is especially relevant when computed over all apps. Entropy cannot be obtained for a single app
+|countepisode              |apps      | Number of times a usage episode of a single app or apps within a category were logged. In contrast to `countevent`, if an app was used across more than one time segment (for example, across more than one 30-minute segment), the `countepisode` will be one on each time segment instance. 
+|minduration        |minutes   | For a `time_segment`, the minimum duration an application was used in minutes
+|maxduration        |minutes   | For a `time_segment`, the maximum duration an application was used in minutes
+|meanduration       |minutes   | For a `time_segment`, the mean duration of all the applications used in minutes
+|sumduration        |minutes   | For a `time_segment`, the sum duration of all the applications used in minutes
+
+!!! note "Assumptions/Observations"
+    1. Features can be computed by app, by apps grouped under a single category (genre), by your own categories, or by multiple categories grouped together (meta-categories). For example, we can get features for `Facebook` (single app), for `Social Network` apps (a category including Facebook and other social media apps), for `Traditional Social Media` (a custom category that includes Twitter and Facebook), or for `Social` (a meta-category formed by `Social Network` and `Social Media Tools` categories).
+
+    2. Apps installed by default like YouTube are considered systems apps on some phones. We do an exact match to exclude apps where "genre" == `EXCLUDED_CATEGORIES` or "package_name" == `EXCLUDED_APPS`.
+
+    3. We provide four ways of classifying an app within a category (genre): a) by automatically scraping its official category from the Google Play Store, b) by using the catalog created by Stachl et al., which we provide in RAPIDS (`data/external/stachl_application_genre_catalogue.csv`), c) by manually creating a personalized catalog, or d) by defining a custom category in `config.yaml`. You can choose a, b, or c by modifying `[APPLICATION_GENRES]` keys and values (see the first table of this page).
+
+    4. We count `episodes` and `events` separately. Events are single app logs (when an app was opened), but episodes span from the time an app was opened until a new app is in the foreground or the screen is locked. Episodes will be chunked across any overlapping time segments. The `top1global` of `episodes` might not be the same as the `top1global` of `events`.
+
+    5. The application episodes are calculated using the application foreground and screen unlock episode data. An application episode starts when the application is launched and ends when new application is launched, or the screen is locked.
--- a/docs/features/phone-applications-notifications.md
+++ b/docs/features/phone-applications-notifications.md
@ -0,0 +1,14 @@
+# Phone Applications Notifications
+
+Sensor parameters description for `[PHONE_APPLICATIONS_NOTIFICATIONS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the applications notifications data is stored
+|`[APPLICATION_CATEGORIES][CATALOGUE_SOURCE]` | `FILE` or `GOOGLE`. If `FILE`, app categories (genres) are read from `[CATALOGUE_FILE]`. If `[GOOGLE]`, app categories (genres) are scrapped from the Play Store
+|`[APPLICATION_CATEGORIES][CATALOGUE_FILE]` | CSV file with a `package_name` and `genre` column. By default we provide the catalogue created by [Stachl et al](../../citation#stachl-applications-notifications) in `data/external/stachl_application_genre_catalogue.csv`
+|`[APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE]` | if `[CATALOGUE_SOURCE]` is equal to `FILE`, this flag signals whether or not to update `[CATALOGUE_FILE]`, if `[CATALOGUE_SOURCE]` is equal to `GOOGLE` all scraped genres will be saved to `[CATALOGUE_FILE]`
+|`[APPLICATION_CATEGORIES][SCRAPE_MISSING_CATEGORIES]` | This flag signals whether or not to scrape categories (genres) missing from the `[CATALOGUE_FILE]`. If `[CATALOGUE_SOURCE]` is equal to `GOOGLE`, all genres are scraped anyway (this flag is ignored)
+
+!!! note
+    No feature providers have been implemented for this sensor yet, however you can use its key (`PHONE_APPLICATIONS_NOTIFICATIONS`) to improve [`PHONE_DATA_YIELD`](../phone-data-yield) or you can [implement your own features](../add-new-features).
--- a/docs/features/phone-battery.md
+++ b/docs/features/phone-battery.md
@ -0,0 +1,48 @@
+# Phone Battery
+
+Sensor parameters description for `[PHONE_BATTERY]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the battery data is stored
+|`[EPISODE_THRESHOLD_BETWEEN_ROWS]` | Difference in minutes between any two rows for them to be considered part of the same battery charge or discharge episode
+
+## RAPIDS provider
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+    - Available for Android and iOS
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/phone_battery_raw.csv
+    - data/interim/{pid}/phone_battery_episodes.csv
+    - data/interim/{pid}/phone_battery_episodes_resampled.csv
+    - data/interim/{pid}/phone_battery_episodes_resampled_with_datetime.csv
+    - data/interim/{pid}/phone_battery_features/phone_battery_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/phone_battery.csv
+    ```
+
+
+Parameters description for `[PHONE_BATTERY][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`| Set to `True` to extract `PHONE_BATTERY` features from the `RAPIDS` provider|
+|`[FEATURES]` |         Features to be computed, see table below
+
+
+Features description for `[PHONE_BATTERY][PROVIDERS][RAPIDS]`:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+|countdischarge         |episodes           | Number of discharging episodes.
+|sumdurationdischarge   |minutes            | The total duration of all discharging episodes.
+|countcharge            |episodes           | Number of battery charging episodes.
+|sumdurationcharge      |minutes            | The total duration of all charging episodes.
+|avgconsumptionrate     |episodes/minutes   | The average of all episodes' consumption rates. An episode's consumption rate is defined as the ratio between its battery delta and duration
+|maxconsumptionrate     |episodes/minutes   | The highest of all episodes' consumption rates. An episode's consumption rate is defined as the ratio between its battery delta and duration
+
+!!! note "Assumptions/Observations"
+    1. We convert battery data collected with iOS client v1 (autodetected because battery status `4` do not exist) to match Android battery format: we swap status `3` for `5` and `1` for `3`
+    2. We group battery data into discharge or charge episodes considering any contiguous rows with consecutive reductions or increases of the battery level if they are logged within `[EPISODE_THRESHOLD_BETWEEN_ROWS]` minutes from each other.
--- a/docs/features/phone-bluetooth.md
+++ b/docs/features/phone-bluetooth.md
@ -0,0 +1,161 @@
+# Phone Bluetooth
+
+Sensor parameters description for `[PHONE_BLUETOOTH]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the bluetooth data is stored
+
+## RAPIDS provider
+
+!!! warning
+    The features of this provider are deprecated in favor of `DORYAB` provider (see below).
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+    - Available for Android only
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/phone_bluetooth_raw.csv
+    - data/raw/{pid}/phone_bluetooth_with_datetime.csv
+    - data/interim/{pid}/phone_bluetooth_features/phone_bluetooth_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/phone_bluetooth.csv"
+    ```
+
+
+Parameters description for `[PHONE_BLUETOOTH][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`| Set to `True` to extract `PHONE_BLUETOOTH` features from the `RAPIDS` provider|
+|`[FEATURES]` |         Features to be computed, see table below
+
+
+Features description for `[PHONE_BLUETOOTH][PROVIDERS][RAPIDS]`:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+| {--countscans--}                 | devices | Number of scanned devices during a time segment, a device can be detected multiple times over time and these appearances are counted separately |
+| {--uniquedevices--}              | devices | Number of unique devices during a time segment as identified by their hardware (`bt_address`) address                                                          |
+| {--countscansmostuniquedevice--} | scans   | Number of scans of the most sensed device within each time segment instance                                              |
+
+!!! note "Assumptions/Observations"
+    - From `v0.2.0` `countscans`, `uniquedevices`, `countscansmostuniquedevice` were deprecated because they overlap with the respective features for `ALL` devices of the `PHONE_BLUETOOTH` `DORYAB` provider
+
+## DORYAB provider
+This provider is adapted from the work by [Doryab et al](../../citation#doryab-bluetooth). 
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+    - Available for Android only
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/phone_bluetooth_raw.csv
+    - data/raw/{pid}/phone_bluetooth_with_datetime.csv
+    - data/interim/{pid}/phone_bluetooth_features/phone_bluetooth_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/phone_bluetooth.csv"
+    ```
+
+
+Parameters description for `[PHONE_BLUETOOTH][PROVIDERS][DORYAB]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`| Set to `True` to extract `PHONE_BLUETOOTH` features from the `DORYAB` provider|
+|`[FEATURES]` |         Features to be computed, see table below. These features are computed for three device categories: `all` devices, `own` devices and `other` devices.
+
+
+Features description for `[PHONE_BLUETOOTH][PROVIDERS][DORYAB]`:
+
+|Feature&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;                     |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+| countscans                 | scans | Number of scans (rows) from the devices sensed during a time segment instance. The more scans a bluetooth device has the longer it remained within range of the participant's phone |
+| uniquedevices              | devices | Number of unique bluetooth devices sensed during a time segment instance as identified by their hardware addresses (`bt_address`) |
+| meanscans | scans| Mean of the scans of every sensed device within each time segment instance|
+| stdscans | scans| Standard deviation of the scans of every sensed device within each time segment instance|
+| countscans{==most==}frequentdevice{==within==}segments | scans   | Number of scans of the **most** sensed device **within** each time segment instance|
+| countscans{==least==}frequentdevice{==within==}segments| scans| Number of scans of the **least** sensed device **within** each time segment instance |
+| countscans{==most==}frequentdevice{==across==}segments | scans   | Number of scans of the **most** sensed device **across** time segment instances of the same type|
+| countscans{==least==}frequentdevice{==across==}segments| scans| Number of scans of the **least** sensed device **across** time segment instances of the same type per device|
+| countscans{==most==}frequentdevice{==acrossdataset==} | scans   | Number of scans of the **most** sensed device **across** the entire dataset of every participant|
+| countscans{==least==}frequentdevice{==acrossdataset==}| scans| Number of scans of the **least** sensed device **across** the entire dataset of every participant |
+
+
+!!! note "Assumptions/Observations"
+    - Devices are classified as belonging to the participant (`own`) or to other people (`others`) using k-means based on the number of times and the number of days each device was detected across each participant's dataset. See [Doryab et al](../../citation#doryab-bluetooth) for more details.
+    - If ownership cannot be computed because all devices were detected on only one day, they are all considered as `other`. Thus `all` and `other` features will be equal. The likelihood of this scenario decreases the more days of data you have.
+    - When searching for the most frequent device across 30-minute segments, the search range is equivalent to the sum of all segments of the same time period. For instance, the `countscansmostfrequentdeviceacrosssegments` for the time segment (`Fri 00:00:00, Fri 00:29:59`) will get the count in that segment of the most frequent device found within all (`00:00:00, 00:29:59`) time segments. To find `countscansmostfrequentdeviceacrosssegments` for `other` devices, the search range needs to filter out all `own` devices. But no need to do so for `countscansmostfrequentdeviceacrosssedataset`. The most frequent device across the dataset stays the same for `countscansmostfrequentdeviceacrossdatasetall`, `countscansmostfrequentdeviceacrossdatasetown` and `countscansmostfrequentdeviceacrossdatasetother`. Same rule applies to the least frequent device across the dataset. 
+    - The most and least frequent devices will be the same across time segment instances and across the entire dataset when every time segment instance covers every hour of a dataset. For example, daily segments (00:00 to 23:59) fall in this category but morning segments (06:00am to 11:59am) or periodic 30-minute segments don't.
+
+    ??? info "Example"
+        
+        ??? example "Simplified raw bluetooth data"
+            The following is a simplified example with bluetooth data from three days and two time segments: morning and afternoon. There are two `own` devices: `5C836F5-487E-405F-8E28-21DBD40FA4FF` detected seven times across two days and `499A1EAF-DDF1-4657-986C-EA5032104448` detected eight times on a single day.
+            ```csv
+            local_date	segment	    bt_address                              own_device
+            2016-11-29	morning	    55C836F5-487E-405F-8E28-21DBD40FA4FF              1
+            2016-11-29	morning	    55C836F5-487E-405F-8E28-21DBD40FA4FF              1
+            2016-11-29	morning	    55C836F5-487E-405F-8E28-21DBD40FA4FF              1
+            2016-11-29	morning	    55C836F5-487E-405F-8E28-21DBD40FA4FF              1
+            2016-11-29	morning	    48872A52-68DE-420D-98DA-73339A1C4685              0
+            2016-11-29	afternoon	55C836F5-487E-405F-8E28-21DBD40FA4FF              1
+            2016-11-29	afternoon	48872A52-68DE-420D-98DA-73339A1C4685              0
+            2016-11-30	morning	    55C836F5-487E-405F-8E28-21DBD40FA4FF              1
+            2016-11-30	morning	    48872A52-68DE-420D-98DA-73339A1C4685              0
+            2016-11-30	morning	    25262DC7-780C-4AD5-AD3A-D9776AEF7FC1              0
+            2016-11-30	morning	    5B1E6981-2E50-4D9A-99D8-67AED430C5A8              0
+            2016-11-30	morning	    5B1E6981-2E50-4D9A-99D8-67AED430C5A8              0
+            2016-11-30	afternoon	55C836F5-487E-405F-8E28-21DBD40FA4FF              1
+            2017-05-07	morning	    5C5A9C41-2F68-4CEB-96D0-77DE3729B729              0
+            2017-05-07	morning	    25262DC7-780C-4AD5-AD3A-D9776AEF7FC1              0
+            2017-05-07	morning	    5B1E6981-2E50-4D9A-99D8-67AED430C5A8              0
+            2017-05-07	morning	    6C444841-FE64-4375-BC3F-FA410CDC0AC7              0
+            2017-05-07	morning	    4DC7A22D-9F1F-4DEF-8576-086910AABCB5              0
+            2017-05-07	afternoon	5B1E6981-2E50-4D9A-99D8-67AED430C5A8              0
+            2017-05-07  afternoon   499A1EAF-DDF1-4657-986C-EA5032104448              1
+            2017-05-07  afternoon   499A1EAF-DDF1-4657-986C-EA5032104448              1
+            2017-05-07  afternoon   499A1EAF-DDF1-4657-986C-EA5032104448              1
+            2017-05-07  afternoon   499A1EAF-DDF1-4657-986C-EA5032104448              1
+            2017-05-07  afternoon   499A1EAF-DDF1-4657-986C-EA5032104448              1
+            2017-05-07  afternoon   499A1EAF-DDF1-4657-986C-EA5032104448              1
+            2017-05-07  afternoon   499A1EAF-DDF1-4657-986C-EA5032104448              1
+            2017-05-07  afternoon   499A1EAF-DDF1-4657-986C-EA5032104448              1
+            ```
+        
+
+        
+
+        ??? example "The most and least frequent `OTHER` devices (`own_device == 0`) during morning segments"
+            The most and least frequent `ALL`|`OWN`|`OTHER` devices are computed within each time segment instance, across time segment instances of the same type and across the entire dataset of each person. These are the most and least frequent devices for `OTHER` devices during morning segments.
+            ```csv
+            most frequent device across 2016-11-29 morning:   '48872A52-68DE-420D-98DA-73339A1C4685'  (this device is the only one in this instance)
+            least frequent device across 2016-11-29 morning:  '48872A52-68DE-420D-98DA-73339A1C4685'  (this device is the only one in this instance)
+            most frequent device across 2016-11-30 morning:   '5B1E6981-2E50-4D9A-99D8-67AED430C5A8'
+            least frequent device across 2016-11-30 morning:  '25262DC7-780C-4AD5-AD3A-D9776AEF7FC1'  (when tied, the first occurance is chosen)
+            most frequent device across 2017-05-07 morning:   '25262DC7-780C-4AD5-AD3A-D9776AEF7FC1'  (when tied, the first occurance is chosen)
+            least frequent device across 2017-05-07 morning:  '25262DC7-780C-4AD5-AD3A-D9776AEF7FC1'  (when tied, the first occurance is chosen)
+            
+            most frequent across morning segments:            '5B1E6981-2E50-4D9A-99D8-67AED430C5A8'
+            least frequent across morning segments:           '6C444841-FE64-4375-BC3F-FA410CDC0AC7' (when tied, the first occurance is chosen)
+            
+            most frequent across dataset:                     '499A1EAF-DDF1-4657-986C-EA5032104448' (only taking into account "morning" segments)
+            least frequent across dataset:                    '4DC7A22D-9F1F-4DEF-8576-086910AABCB5' (when tied, the first occurance is chosen)
+            ```
+
+        ??? example "Bluetooth features for  `OTHER` devices and morning segments"
+            For brevity we only show the following features for morning segments:
+            ```yaml
+            OTHER: 
+                DEVICES: ["countscans", "uniquedevices", "meanscans", "stdscans"]
+                SCANS_MOST_FREQUENT_DEVICE: ["withinsegments", "acrosssegments", "acrossdataset"]
+            ```
+
+            Note that `countscansmostfrequentdeviceacrossdatasetothers` is all `0`s because `499A1EAF-DDF1-4657-986C-EA5032104448` is excluded from the count as is labelled as an `own` device (not `other`).
+            ```csv
+            local_segment       countscansothers	uniquedevicesothers	meanscansothers	stdscansothers	countscansmostfrequentdevicewithinsegmentsothers	countscansmostfrequentdeviceacrosssegmentsothers	countscansmostfrequentdeviceacrossdatasetothers
+            2016-11-29-morning	1	                1	                1.000000	    NaN             1	                                                0.0	                                                0.0
+            2016-11-30-morning	4	                3	                1.333333	    0.57735	        2	                                                2.0	                                                2.0
+            2017-05-07-morning	5	                5	                1.000000	    0.00000	        1	                                                1.0	                                                1.0
+            ```
--- a/docs/features/phone-calls.md
+++ b/docs/features/phone-calls.md
@ -0,0 +1,64 @@
+# Phone Calls
+
+Sensor parameters description for `[PHONE_CALLS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the calls data is stored
+
+## RAPIDS Provider
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+    - Available for Android and iOS
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/phone_calls_raw.csv
+    - data/raw/{pid}/phone_calls_with_datetime.csv
+    - data/interim/{pid}/phone_calls_features/phone_calls_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/phone_calls.csv
+    ```
+
+
+Parameters description for `[PHONE_CALLS][PROVIDERS][RAPIDS]`:
+
+| Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;        | Description |
+|-------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+|`[COMPUTE]`| Set to `True` to extract `PHONE_CALLS` features from the `RAPIDS` provider|
+|`[FEATURES_TYPE]`| Set to `EPISODES` to extract features based on call episodes or `EVENTS` to extract features based on events.|
+| `[CALL_TYPES]`   | The particular call_type that will be analyzed. The options for this parameter are incoming, outgoing or missed.                                                                                                                                                 |
+| `[FEATURES]`    | Features to be computed for `outgoing`, `incoming`, and `missed` calls. Note that the same features are available for both incoming and outgoing calls, while missed calls has its own set of features. See the tables below. |
+
+
+Features description for `[PHONE_CALLS][PROVIDERS][RAPIDS]` incoming and outgoing calls:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+|count                    |calls      |Number of calls of a particular `call_type` occurred during a particular `time_segment`.
+|distinctcontacts         |contacts   |Number of distinct contacts that are associated with a particular `call_type` for a particular `time_segment`
+|meanduration             |seconds    |The mean duration of all calls of a particular `call_type` during a particular `time_segment`.
+|sumduration              |seconds    |The sum of the duration of all calls of a particular `call_type` during a particular `time_segment`.
+|minduration              |seconds    |The duration of the shortest call of a particular `call_type` during a particular `time_segment`.
+|maxduration              |seconds    |The duration of the longest call of a particular `call_type` during a particular `time_segment`.
+|stdduration              |seconds    |The standard deviation of the duration of all the calls of a particular `call_type` during a particular `time_segment`.
+|modeduration             |seconds    |The mode of the duration of all the calls of a particular `call_type` during a particular `time_segment`.
+|entropyduration          |nats       |The estimate of the Shannon entropy for the the duration of all the calls of a particular `call_type` during a particular `time_segment`.
+|timefirstcall            |minutes    |The time in minutes between 12:00am (midnight) and the first call of `call_type`.
+|timelastcall             |minutes    |The time in minutes between 12:00am (midnight) and the last call of `call_type`.
+|countmostfrequentcontact |calls      |The number of calls of a particular `call_type` during a particular `time_segment` of the most frequent contact throughout the monitored period.
+
+Features description for `[PHONE_CALLS][PROVIDERS][RAPIDS]` missed calls:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+|count                      |calls      |Number of `missed` calls that occurred during a particular `time_segment`.
+|distinctcontacts           |contacts   |Number of distinct contacts that are associated with `missed` calls for a particular `time_segment`
+|timefirstcall              |minutes    |The time in hours from 12:00am (Midnight) that the first `missed` call occurred.
+|timelastcall               |minutes    |The time in hours from 12:00am (Midnight) that the last `missed` call occurred.
+|countmostfrequentcontact   |calls      |The number of `missed` calls during a particular `time_segment` of the most frequent contact throughout the monitored period.
+
+!!! note "Assumptions/Observations"
+    1. Traces for iOS calls are unique even for the same contact calling a participant more than once which renders `countmostfrequentcontact` meaningless and `distinctcontacts` equal to the total number of traces. 
+    2. `[CALL_TYPES]` and `[FEATURES]` keys in `config.yaml` need to match. For example, `[CALL_TYPES]` `outgoing` matches the `[FEATURES]` key `outgoing`
+    3. iOS calls data is transformed to match Android calls data format.
--- a/docs/features/phone-conversation.md
+++ b/docs/features/phone-conversation.md
@ -0,0 +1,70 @@
+# Phone Conversation
+
+Sensor parameters description for `[PHONE_CONVERSATION]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER][ANDROID]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the conversation data from Android devices is stored (the AWARE client saves this data on different tables for Android and iOS)
+|`[CONTAINER][IOS]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the conversation data from iOS devices is stored (the AWARE client saves this data on different tables for Android and iOS)
+
+## RAPIDS provider
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+    - Available for Android only
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/phone_conversation_raw.csv
+    - data/raw/{pid}/phone_conversation_with_datetime.csv
+    - data/interim/{pid}/phone_conversation_features/phone_conversation_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/phone_conversation.csv
+    ```
+
+
+Parameters description for `[PHONE_CONVERSATION][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`| Set to `True` to extract `PHONE_CONVERSATION` features from the `RAPIDS` provider|
+|`[FEATURES]` |         Features to be computed, see table below
+|`[RECORDING_MINUTES]` | Minutes the plugin was recording audio (default 1 min)
+|`[PAUSED_MINUTES]` |  Minutes the plugin was NOT recording audio (default 3 min)
+
+
+Features description for `[PHONE_CONVERSATION][PROVIDERS][RAPIDS]`:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+| minutessilence          | minutes | Minutes labeled as silence                                                                                                                                                                 |
+| minutesnoise            | minutes | Minutes labeled as noise                                                                                                                                                                   |
+| minutesvoice            | minutes | Minutes labeled as voice                                                                                                                                                                   |
+| minutesunknown          | minutes | Minutes labeled as unknown                                                                                                                                                                 |
+| sumconversationduration | minutes | Total duration of all conversations                                                                                                                                                        |
+| maxconversationduration | minutes | Longest duration of all conversations                                                                                                                                                      |
+| minconversationduration | minutes | Shortest duration of all conversations                                                                                                                                                     |
+| avgconversationduration | minutes | Average duration of all conversations                                                                                                                                                      |
+| sdconversationduration  | minutes | Standard Deviation of the duration of all conversations                                                                                                                                    |
+| timefirstconversation   | minutes | Minutes since midnight when the first conversation for a time segment was detected                                                                                                          |
+| timelastconversation    | minutes | Minutes since midnight when the last conversation for a time segment was detected                                                                                                           |
+| noisesumenergy          | L2-norm | Sum of all energy values when inference is noise                                                                                                                                           |
+| noiseavgenergy          | L2-norm | Average of all energy values when inference is noise                                                                                                                                       |
+| noisesdenergy           | L2-norm | Standard Deviation of all energy values when inference is noise                                                                                                                            |
+| noiseminenergy          | L2-norm | Minimum of all energy values when inference is noise                                                                                                                                       |
+| noisemaxenergy          | L2-norm | Maximum of all energy values when inference is noise                                                                                                                                       |
+| voicesumenergy          | L2-norm | Sum of all energy values when inference is voice                                                                                                                                           |
+| voiceavgenergy          | L2-norm | Average of all energy values when inference is voice                                                                                                                                       |
+| voicesdenergy           | L2-norm | Standard Deviation of all energy values when inference is voice                                                                                                                            |
+| voiceminenergy          | L2-norm | Minimum of all energy values when inference is voice                                                                                                                                       |
+| voicemaxenergy          | L2-norm | Maximum of all energy values when inference is voice                                                                                                                                       |
+| silencesensedfraction   |   -      | Ratio between minutessilence and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown)                                                                                   |
+| noisesensedfraction     |   -      | Ratio between minutesnoise and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown)                                                                                     |
+| voicesensedfraction     |   -      | Ratio between minutesvoice and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown)                                                                                     |
+| unknownsensedfraction   |   -      | Ratio between minutesunknown and the sum of (minutessilence, minutesnoise, minutesvoice, minutesunknown)                                                                                   |
+| silenceexpectedfraction |   -      | Ration between minutessilence and the number of minutes that in  theory should have been sensed based on the record and pause cycle of  the plugin (1440 / recordingMinutes+pausedMinutes) |
+| noiseexpectedfraction   |   -      | Ration between minutesnoise and the number of minutes that in theory  should have been sensed based on the record and pause cycle of the  plugin (1440 / recordingMinutes+pausedMinutes)   |
+| voiceexpectedfraction   |   -      | Ration between minutesvoice and the number of minutes that in theory  should have been sensed based on the record and pause cycle of the  plugin (1440 / recordingMinutes+pausedMinutes)   |
+| unknownexpectedfraction |   -      | Ration between minutesunknown and the number of minutes that in  theory should have been sensed based on the record and pause cycle of  the plugin (1440 / recordingMinutes+pausedMinutes) |
+
+!!! note "Assumptions/Observations"
+    1. The timestamp of conversation rows in iOS is in seconds so we convert it to milliseconds to match Android's format
--- a/docs/features/phone-data-yield.md
+++ b/docs/features/phone-data-yield.md
@ -0,0 +1,85 @@
+# Phone Data Yield
+
+This is a combinatorial sensor which means that we use the data from multiple sensors to extract data yield features. Data yield features can be used to remove rows ([time segments](../../setup/configuration/#time-segments)) that do not contain enough data. You should decide what is your "enough" threshold depending on the type of sensors you collected (frequency vs event based, e.g. acceleroemter vs calls), the length of your study, and the rates of missing data that your analysis could handle.
+
+!!! hint "Why is data yield important?"
+    Imagine that you want to extract `PHONE_CALL` features on daily segments (`00:00` to `23:59`). Let's say that on day 1 the phone logged 10 calls and 23 hours of data from other sensors and on day 2 the phone logged 10 calls and only 2 hours of data from other sensors. It's more likely that other calls were placed on the 22 hours of data that you didn't log on day 2 than on the 1 hour of data you didn't log on day 1, and so including day 2 in your analysis could bias your results.
+
+Sensor parameters description for `[PHONE_DATA_YIELD]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;          | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[SENSORS]`| One or more phone sensor config keys (e.g. `PHONE_MESSAGE`). The more keys you include the more accurately RAPIDS can approximate the time an smartphone was sensing data. The supported phone sensors you can include in this list are outlined below (**do NOT include Fitbit sensors, ONLY include phone sensors**).
+
+!!! info "Supported phone sensors for `[PHONE_DATA_YIELD][SENSORS]`"
+    ```yaml
+    PHONE_ACCELEROMETER
+    PHONE_ACTIVITY_RECOGNITION
+    PHONE_APPLICATIONS_CRASHES
+    PHONE_APPLICATIONS_FOREGROUND
+    PHONE_APPLICATIONS_NOTIFICATIONS
+    PHONE_BATTERY
+    PHONE_BLUETOOTH
+    PHONE_CALLS
+    PHONE_CONVERSATION
+    PHONE_KEYBOARD
+    PHONE_LIGHT
+    PHONE_LOCATIONS
+    PHONE_LOG
+    PHONE_MESSAGES
+    PHONE_SCREEN
+    PHONE_WIFI_CONNECTED
+    PHONE_WIFI_VISIBLE
+    ```
+
+## RAPIDS provider
+
+Before explaining the data yield features, let's define the following relevant concepts:
+
+- A valid minute is any 60 second window when any phone sensor logged at least 1 row of data
+- A valid hour is any 60 minute window with at least X valid minutes. The X or threshold is given by `[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]`
+
+The timestamps of all sensors are concatenated and then grouped per time segment. Minute and hour windows are created from the beginning of each time segment instance and these windows are marked as valid based on the definitions above. The duration of each time segment is taken into account to compute the features described below.
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+    - Available for Android and iOS
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/{sensor}_raw.csv # one for every [PHONE_DATA_YIELD][SENSORS]
+    - data/interim/{pid}/phone_yielded_timestamps.csv
+    - data/interim/{pid}/phone_yielded_timestamps_with_datetime.csv
+    - data/interim/{pid}/phone_data_yield_features/phone_data_yield_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/phone_data_yield.csv
+    ```
+
+
+Parameters description for `[PHONE_DATA_YIELD][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`| Set to `True` to extract `PHONE_DATA_YIELD` features from the `RAPIDS` provider|
+|`[FEATURES]` |  Features to be computed, see table below
+|`[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]` | The proportion `[0.0 ,1.0]` of valid minutes in a 60-minute window necessary to flag that window as valid.
+
+
+Features description for `[PHONE_DATA_YIELD][PROVIDERS][RAPIDS]`:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+|ratiovalidyieldedminutes   |-          | The ratio between the number of valid minutes and the duration in minutes of a time segment.
+|ratiovalidyieldedhours     |-          | The ratio between the number of valid hours and the duration in hours of a time segment. If the time segment is shorter than 1 hour this feature will always be 1.
+
+
+!!! note "Assumptions/Observations"
+    1. We recommend using `ratiovalidyieldedminutes` on time segments that are shorter than two or three hours and `ratiovalidyieldedhours` for longer segments. This is because relying on yielded minutes only can be misleading when a big chunk of those missing minutes are clustered together. 
+    
+        For example, let's assume we are working with a 24-hour time segment that is missing 12 hours of data. Two extreme cases can occur: 
+
+        <ol type="A">
+        <li>the 12 missing hours are from the beginning of the segment or </li>
+        <li>30 minutes could be missing from every hour (24 * 30 minutes = 12 hours).</li>
+        </ol>
+        
+        `ratiovalidyieldedminutes` would be 0.5 for both `a` and `b` (hinting the missing circumstances are similar). However, `ratiovalidyieldedhours` would be 0.5 for `a` and 1.0 for `b` if `[MINUTE_RATIO_THRESHOLD_FOR_VALID_YIELDED_HOURS]` is between [0.0 and 0.49] (hinting that the missing circumstances might be more favorable for `b`. In other words, sensed data for `b` is more evenly spread compared to `a`.
--- a/docs/features/phone-keyboard.md
+++ b/docs/features/phone-keyboard.md
@ -0,0 +1,40 @@
+# Phone Keyboard
+
+Sensor parameters description for `[PHONE_KEYBOARD]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the keyboard data is stored
+
+## RAPIDS provider
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+    - Available for Android only
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/phone_keyboard_raw.csv
+    - data/raw/{pid}/phone_keyboard_with_datetime.csv
+    - data/interim/{pid}/phone_keyboard_features/phone_keyboard_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/phone_keyboard.csv
+    ```
+
+Features description for `[PHONE_KEYBOARD]`:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+|sessioncount                                            | -    |Number of typing sessions in a time segment. A session begins with any keypress and finishes until 5 seconds have elapsed since the last key was pressed or the application that the user was typing on changes.
+|averagesessionlength                                           | milliseconds          | Average length of all sessions in a time segment instance
+|averageinterkeydelay                                                |milliseconds        |The average time between keystrokes measured in milliseconds.
+|changeintextlengthlessthanminusone                                                 |         | Number of times a keyboard typing or swiping event changed the length of the current text to less than one fewer character.
+|changeintextlengthequaltominusone                                                 |         | Number of times a keyboard typing or swiping event changed the length of the current text in exactly one fewer character.
+|changeintextlengthequaltoone                                                 |         | Number of times a keyboard typing or swiping event changed the length of the current text in exactly one more character.
+|changeintextlengthmorethanone                                                 |         | Number of times a keyboard typing or swiping event changed the length of the current text to more than one character.
+|maxtextlength                                                      |        | Length in characters of the longest sentence(s) contained in the typing text box of any app during the time segment.
+|lastmessagelength                                                  |       | Length of the last text in characters of the sentence(s) contained in the typing text box of any app during the time segment.
+|totalkeyboardtouches                                               |       | Average number of typing events across all sessions in a time segment instance.
+
+!!! note
+    We did not find a reliable way to distinguish between AutoCorrect or AutoComplete changes, since both can be applied with a single touch or swipe event and can decrease or increase the length of the text by an arbitrary number of characters.
+    
--- a/docs/features/phone-light.md
+++ b/docs/features/phone-light.md
@ -0,0 +1,44 @@
+# Phone Light
+
+Sensor parameters description for `[PHONE_LIGHT]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the light data is stored
+
+## RAPIDS provider
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+    - Available for Android only
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/phone_light_raw.csv
+    - data/raw/{pid}/phone_light_with_datetime.csv
+    - data/interim/{pid}/phone_light_features/phone_light_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/phone_light.csv
+    ```
+
+
+Parameters description for `[PHONE_LIGHT][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`| Set to `True` to extract `PHONE_LIGHT` features from the `RAPIDS` provider|
+|`[FEATURES]` |         Features to be computed, see table below
+
+
+Features description for `[PHONE_LIGHT][PROVIDERS][RAPIDS]`:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+|count       |rows    | Number light sensor rows recorded.
+|maxlux      |lux     | The maximum ambient luminance.
+|minlux      |lux     | The minimum ambient luminance.
+|avglux      |lux     | The average ambient luminance.
+|medianlux   |lux     | The median ambient luminance.
+|stdlux      |lux     | The standard deviation of ambient luminance.
+
+!!! note "Assumptions/Observations"
+    NA
--- a/docs/features/phone-locations.md
+++ b/docs/features/phone-locations.md
@ -0,0 +1,201 @@
+# Phone Locations
+
+Sensor parameters description for `[PHONE_LOCATIONS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the location data is stored
+|`[LOCATIONS_TO_USE]`| Type of location data to use, one of `ALL`, `GPS`, `ALL_RESAMPLED` or `FUSED_RESAMPLED`. This filter is based on the `provider` column of the locations table, `ALL` includes every row, `GPS` only includes rows where the provider is gps, `ALL_RESAMPLED` includes all rows after being resampled, and `FUSED_RESAMPLED` only includes rows where the provider is fused after being resampled.
+|`[FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD]`| If `ALL_RESAMPLED` or `FUSED_RESAMPLED` is used, the original fused data has to be resampled. A location row is resampled to the next valid timestamp (see the Assumptions/Observations below) only if the time difference between them is less or equal than this threshold (in minutes).
+|`[FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION]`| If `ALL_RESAMPLED` or `FUSED_RESAMPLED` is used, the original fused data has to be resampled. A location row is resampled at most for this long (in minutes).
+|`[ACCURACY_LIMIT]` | An integer in meters, any location rows with an accuracy higher or equal than this is dropped. This number means there's a 68% probability the actual location is within this radius.
+
+!!! note "Assumptions/Observations"
+    **Types of location data to use**
+    Android and iOS clients can collect location coordinates through the phone's GPS, the network cellular towers around the phone, or Google's fused location API. 
+    
+    - If you want to use only the GPS provider, set `[LOCATIONS_TO_USE]` to `GPS`
+    - If you want to use all providers, set `[LOCATIONS_TO_USE]` to `ALL`
+    - If you collected location data from different providers, including the fused API, use `ALL_RESAMPLED`
+    - If your mobile client was configured to use fused location only or want to focus only on this provider, set `[LOCATIONS_TO_USE]` to `FUSED_RESAMPLED`.
+    
+    `ALL_RESAMPLED` and `FUSED_RESAMPLED` take the original location coordinates and replicate each pair forward in time as long as the phone was sensing data as indicated by the joined timestamps of [`[PHONE_DATA_YIELD][SENSORS]`](../phone-data-yield/). This is done because Google's API only logs a new location coordinate pair when it is sufficiently different in time or space from the previous one and because GPS and network providers can log data at variable rates.
+
+    There are two parameters associated with resampling fused location.
+    
+    1. `FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD` (in minutes, default 30) controls the maximum gap between any two coordinate pairs to replicate the last known pair. For example, participant A's phone did not collect data between 10.30 am and 10:50 am and between 11:05am and 11:40am, the last known coordinate pair is replicated during the first period but not the second. In other words, we assume that we cannot longer guarantee the participant stayed at the last known location if the phone did not sense data for more than 30 minutes. 
+    2. `FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION` (in minutes, default 720 or 12 hours) stops the last known fused location from being replicated longer than this threshold even if the phone was sensing data continuously. For example, participant A went home at 9 pm, and their phone was sensing data without gaps until 11 am the next morning, the last known location is replicated until 9 am. 
+    
+    If you have suggestions to modify or improve this resampling, let us know.
+
+## BARNETT provider
+
+These features are based on the original open-source implementation by [Barnett et al](../../citation#barnett-locations) and some features created by [Canzian et al](../../citation#barnett-locations).
+
+
+!!! info "Available time segments and platforms"
+    - Available only for segments that start at 00:00:00 and end at 23:59:59 of the same or a different day (daily, weekly, weekend, etc.)
+    - Available for Android and iOS
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/phone_locations_raw.csv
+    - data/interim/{pid}/phone_locations_processed.csv
+    - data/interim/{pid}/phone_locations_processed_with_datetime.csv
+    - data/interim/{pid}/phone_locations_barnett_daily.csv
+    - data/interim/{pid}/phone_locations_features/phone_locations_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/phone_locations.csv
+    ```
+
+
+Parameters description for `[PHONE_LOCATIONS][PROVIDERS][BARNETT]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`| Set to `True` to extract `PHONE_LOCATIONS` features from the `BARNETT` provider|
+|`[FEATURES]` |         Features to be computed, see table below
+|`[IF_MULTIPLE_TIMEZONES]` |    Currently, `USE_MOST_COMMON` is the only value supported. If the location data for a participant belongs to multiple time zones, we select the most common because Barnett's algorithm can only handle one time zone 
+|`[MINUTES_DATA_USED]` |    Set to `True` to include an extra column in the final location feature file containing the number of minutes used to compute the features on each time segment. Use this for quality control purposes; the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough.
+
+
+
+Features description for `[PHONE_LOCATIONS][PROVIDERS][BARNETT]` adapted from [Beiwe Summary Statistics](http://wiki.beiwe.org/wiki/Summary_Statistics):
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+|hometime                              |minutes     | Time at home. Time spent at home in minutes. Home is the most visited significant location between 8 pm and 8 am, including any pauses within a 200-meter radius.
+|disttravelled                         |meters      | Total distance traveled over a day (flights).
+|rog                                   |meters      | The Radius of Gyration (rog) is a measure in meters of the area covered by a person over a day. A centroid is calculated for all the places (pauses) visited during a day, and a weighted distance between all the places and that centroid is computed. The weights are proportional to the time spent in each place.
+|maxdiam                               |meters      | The maximum diameter is the largest distance between any two pauses.
+|maxhomedist                           |meters      | The maximum distance from home in meters.
+|siglocsvisited                        |locations   | The number of significant locations visited during the day. Significant locations are computed using k-means clustering over pauses found in the whole monitoring period. The number of clusters is found iterating k from 1 to 200 stopping until the centroids of two significant locations are within 400 meters of one another.
+|avgflightlen                          |meters      | Mean length of all flights.
+|stdflightlen                          |meters      | Standard deviation of the length of all flights.
+|avgflightdur                          |seconds     | Mean duration of all flights.
+|stdflightdur                           |seconds     | The standard deviation of the duration of all flights. 
+|probpause                              |     -      | The fraction of a day spent in a pause (as opposed to a flight)
+|siglocentropy                          |nats        | Shannon's entropy measurement is based on the proportion of time spent at each significant location visited during a day.
+|circdnrtn                              |      -     |   A continuous metric quantifying a person's circadian routine that can take any value between 0 and 1, where 0 represents a daily routine completely different from any other sensed days and 1 a routine the same as every other sensed day.
+|wkenddayrtn                            |       -    | Same as circdnrtn but computed separately for weekends and weekdays.
+
+!!! note "Assumptions/Observations"
+    **Multi day segment features**
+    Barnett's features are only available on time segments that span entire days (00:00:00 to 23:59:59). Such segments can be one-day long (daily) or multi-day (weekly, for example). Multi-day segment features are computed based on daily features summarized the following way:
+
+    - sum for `hometime`, `disttravelled`, `siglocsvisited`, and `minutes_data_used`
+    - max for `maxdiam`, and `maxhomedist`
+    - mean for `rog`, `avgflightlen`, `stdflightlen`, `avgflightdur`, `stdflightdur`, `probpause`, `siglocentropy`, `circdnrtn`, `wkenddayrtn`, and `minsmissing`
+
+    **Computation speed**
+    The process to extract these features can be slow compared to other sensors and providers due to the required simulation.
+
+    **How are these features computed?**
+    These features are based on a Pause-Flight model. A pause is defined as a mobility trace (location pings) within a certain duration and distance (by default, 300 seconds and 60 meters). A flight is any mobility trace between two pauses. Data is resampled and imputed before the features are computed. See [Barnett et al](../../citation#barnett-locations) for more information. In RAPIDS, we only expose one parameter for these features (accuracy limit). You can change other parameters in `src/features/phone_locations/barnett/library/MobilityFeatures.R`.
+
+    **Significant Locations**
+    Significant locations are determined using K-means clustering on pauses longer than 10 minutes. The number of clusters (K) is increased until no two clusters are within 400 meters from each other. After this, pauses within a certain range of a cluster (200 meters by default) count as a visit to that significant location. This description was adapted from the Supplementary Materials of [Barnett et al](../../citation#barnett-locations).
+
+    **The Circadian Calculation**
+    For a detailed description of how this is calculated, see [Canzian et al](../../citation#barnett-locations).
+
+## DORYAB provider
+
+These features are based on the original implementation by [Doryab et al.](../../citation#doryab-locations).
+
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+    - Available for Android and iOS
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/phone_locations_raw.csv
+    - data/interim/{pid}/phone_locations_processed.csv
+    - data/interim/{pid}/phone_locations_processed_with_datetime.csv
+    - data/interim/{pid}/phone_locations_processed_with_datetime_with_doryab_columns_episodes.csv
+    - data/interim/{pid}/phone_locations_processed_with_datetime_with_doryab_columns_episodes_resampled.csv
+    - data/interim/{pid}/phone_locations_processed_with_datetime_with_doryab_columns_episodes_resampled_with_datetime.csv
+    - data/interim/{pid}/phone_locations_features/phone_locations_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/phone_locations.csv
+    ```
+
+
+Parameters description for `[PHONE_LOCATIONS][PROVIDERS][DORYAB]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`| Set to `True` to extract `PHONE_LOCATIONS` features from the `DORYAB` provider|
+|`[FEATURES]` |         Features to be computed, see table below
+| `[DBSCAN_EPS]`             | The maximum distance in meters between two samples for one to be considered as in the neighborhood of the other. This is not a maximum bound on the distances of points within a cluster. This is the most important DBSCAN parameter to choose appropriately for your data set and distance function.
+| `[DBSCAN_MINSAMPLES]`      | The number of samples (or total weight) in a neighborhood for a point to be considered as a core point of a cluster. This includes the point itself.
+| `[THRESHOLD_STATIC]`       | It is the threshold value in km/hr which labels a row as Static or Moving.
+| `[MAXIMUM_ROW_GAP]`   | The maximum gap (in seconds) allowed between any two consecutive rows for them to be considered part of the same displacement. If this threshold is too high, it can throw speed and distance calculations off for periods when the phone was not sensing. This value must be larger than your GPS sampling interval when `[LOCATIONS_TO_USE]` is `ALL` or `GPS`, otherwise all the stationary-related features will be NA. If `[LOCATIONS_TO_USE]` is `ALL_RESAMPLED` or `FUSED_RESAMPLED`, you can use the default value as every row will be resampled at 1-minute intervals.
+| `[MINUTES_DATA_USED]`     | Set to `True` to include an extra column in the final location feature file containing the number of minutes used to compute the features on each time segment. Use this for quality control purposes; the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough.
+| `[CLUSTER_ON]`             | Set this flag to `PARTICIPANT_DATASET` to create clusters based on the entire participant's dataset or to `TIME_SEGMENT` to create clusters based on all the instances of the corresponding time segment (e.g. all mornings) or to `TIME_SEGMENT_INSTANCE` to create clusters based on a single instance (e.g. 2020-05-20's morning).
+|`[INFER_HOME_LOCATION_STRATEGY]`          | The strategy applied to infer home locations. Set to `DORYAB_STRATEGY` to infer one home location for the entire dataset of each participant or to `SUN_LI_VEGA_STRATEGY` to infer one home location per day per participant. See Observations below to know more.
+|`[MINIMUM_DAYS_TO_DETECT_HOME_CHANGES]`   | The minimum number of consecutive days a new home location candidate has to repeat before it is considered the participant's new home. This parameter will be used only when `[INFER_HOME_LOCATION_STRATEGY]` is set to `SUN_LI_VEGA_STRATEGY`.
+| `[CLUSTERING_ALGORITHM]`   | The original Doryab et al. implementation uses `DBSCAN`, `OPTICS` is also available with similar (but not identical) clustering results and lower memory consumption.
+| `[RADIUS_FOR_HOME]`        | All location coordinates within this distance (meters) from the home location coordinates are considered a homestay (see `timeathome` feature).
+
+
+Features description for `[PHONE_LOCATIONS][PROVIDERS][DORYAB]`:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+|locationvariance                                            |$meters^2$    |The sum of the variances of the latitude and longitude columns. 
+|loglocationvariance                                           | -          | Log of the sum of the variances of the latitude and longitude columns.
+|totaldistance                                                |meters        |Total distance traveled in a time segment using the haversine formula.
+|avgspeed                                                 |km/hr         |Average speed in a time segment considering only the instances labeled as Moving. This feature is 0 when the participant is stationary during a time segment.
+|varspeed                                                      |km/hr         |Speed variance in a time segment considering only the instances labeled as Moving. This feature is 0 when the participant is stationary during a time segment.
+|{--circadianmovement--}                                      |-             | Deprecated, see Observations below. \ "It encodes the extent to which a person's location patterns follow a 24-hour circadian cycle.\" [Doryab et al.](../../citation#doryab-locations).
+|numberofsignificantplaces                                    |places        |Number of significant locations visited. It is calculated using the DBSCAN/OPTICS clustering algorithm which takes in EPS and MIN_SAMPLES as parameters to identify clusters. Each cluster is a significant place.
+|numberlocationtransitions                                    |transitions   |Number of movements between any two clusters in a time segment.
+|radiusgyration                                               |meters        |Quantifies the area covered by a participant
+|timeattop1location                                           |minutes       |Time spent at the most significant location.
+|timeattop2location                                           |minutes       |Time spent at the 2nd most significant location.
+|timeattop3location                                           |minutes       |Time spent at the 3rd most significant location. 
+|movingtostaticratio                                          | -   |  Ratio between stationary time and total location sensed time. A lat/long coordinate pair is labeled as stationary if its speed (distance/time) to the next coordinate pair is less than 1km/hr. A higher value represents a more stationary routine.
+|outlierstimepercent                                          | -   | Ratio between the time spent in non-significant clusters divided by the time spent in all clusters (stationary time. Only stationary samples are clustered). A higher value represents more time spent in non-significant clusters.
+|maxlengthstayatclusters                                      |minutes       |Maximum time spent in a cluster (significant location).
+|minlengthstayatclusters                                      |minutes       |Minimum time spent in a cluster (significant location).
+|avglengthstayatclusters                                      |minutes       |Average time spent in a cluster (significant location).
+|stdlengthstayatclusters                                      |minutes       |Standard deviation of time spent in a cluster (significant location).
+|locationentropy                                              |nats          |Shannon Entropy computed over the row count of each cluster (significant location), it is higher the more rows belong to a cluster (i.e., the more time a participant spent at a significant location).
+|normalizedlocationentropy                                    |nats          |Shannon Entropy computed over the row count of each cluster (significant location) divided by the number of clusters; it is higher the more rows belong to a cluster (i.e., the more time a participant spent at a significant location).
+|timeathome                                                   |minutes       | Time spent at home (see Observations below for a description on how we compute home).
+|homelabel                                                    |-             | An integer that represents a different home location. It will be a constant number (1) for all participants when `[INFER_HOME_LOCATION_STRATEGY]` is set to `DORYAB_STRATEGY` or an incremental index if the strategy is set to `SUN_LI_VEGA_STRATEGY`.
+
+!!! note "Assumptions/Observations"
+    **Significant Locations Identified**
+    Significant locations are determined using `DBSCAN` or `OPTICS` clustering on locations that a participant visited over the course of the period of data collection. The most significant location is the place where the participant stayed for the longest time.
+
+    **Circadian Movement Calculation**
+    Note Feb 3 2021. It seems the implementation of this feature is not correct; we suggest not to use this feature until a fix is in place. For a detailed description of how this should be calculated, see [Saeb et al](https://pubmed.ncbi.nlm.nih.gov/28344895/).
+
+    **Fine-Tuning Clustering Parameters**
+    Based on an experiment where we collected fused location data for 7 days with a mean accuracy of 86 & SD of 350.874635, we determined that `EPS/MAX_EPS`=100 produced closer clustering results to reality. Higher values (>100) missed out on some significant places, like a short grocery visit, while lower values (<100) picked up traffic lights and stop signs while driving as significant locations. We recommend you set `EPS` based on your location data's accuracy (the more accurate your data is, the lower you should be able to set EPS).
+
+    **Duration Calculation**
+    To calculate the time duration component for our features, we compute the difference between consecutive rows' timestamps to take into account sampling rate variability. If this time difference is larger than a threshold (300 seconds by default), we replace it with NA and label that row as Moving.
+
+    **Home location**
+
+    - `DORYAB_STRATEGY`: home is calculated using all location data of a participant between 12 am and 6 am, then applying a clustering algorithm (`DBSCAN` or `OPTICS`) and considering the center of the biggest cluster home for that participant.
+    
+    - `SUN_LI_VEGA_STRATEGY`: home is calculated using all location data of a participant between 12 am and 6 am, then applying a clustering algorithm (`DBSCAN` or `OPTICS`). The following steps are used to infer the home location per day for that participant:
+        
+        1.  if there are records within [03:30:00, 04:30:00] for that night:<br>
+                &nbsp;&nbsp;&nbsp;&nbsp;we choose the most common cluster during that period as a home candidate for that day.<br>
+            elif there are records within [midnight, 03:30:00) for that night:<br>
+                &nbsp;&nbsp;&nbsp;&nbsp;we choose the last valid cluster during that period as a home candidate for that day.<br>
+            elif there are records within (04:30:00, 06:00:00] for that night:<br>
+                &nbsp;&nbsp;&nbsp;&nbsp;we choose the first valid cluster during that period as a home candidate for that day.<br>
+            else:<br>
+                &nbsp;&nbsp;&nbsp;&nbsp;the home location is NA (missing) for that day.
+
+        2. If the count of consecutive days with the same candidate home location cluster label is larger or equal to `[MINIMUM_DAYS_TO_DETECT_HOME_CHANGES]`,
+            the candidate will be regarded as the home cluster; otherwise, the home cluster will be the last valid day's cluster.
+            If there are no valid clusters before that day, the first home location in the days after is used.
+
+    **Clustering algorithms**
+    [`DBSCAN`](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) and [`OPTICS`](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.OPTICS.html#r2c55e37003fe-1) algorithms are available currently. Duplicated locations are discarded while clustering. The `DBSCAN` algorithm takes the time spent at each location into consideration. However, the `OPTICS` algorithm ignores it as it is not supported in the current [scikit-learn](https://github.com/scikit-learn/scikit-learn/issues/12394) implementation.
--- a/docs/features/phone-log.md
+++ b/docs/features/phone-log.md
@ -0,0 +1,11 @@
+# Phone Log
+
+Sensor parameters description for `[PHONE_LOG]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER][ANDROID]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where a data log is stored for Android devices
+|`[CONTAINER][IOS]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where a data log is stored for iOS devices
+
+!!! note
+    No feature providers have been implemented for this sensor yet, however you can use its key (`PHONE_LOG`) to improve [`PHONE_DATA_YIELD`](../phone-data-yield) or you can [implement your own features](../add-new-features).
--- a/docs/features/phone-messages.md
+++ b/docs/features/phone-messages.md
@ -0,0 +1,46 @@
+# Phone Messages
+
+Sensor parameters description for `[PHONE_MESSAGES]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[CONTAINER]`| Data stream [container](../../datastreams/data-streams-introduction/) (database table, CSV file, etc.) where the messages data is stored
+
+## RAPIDS provider
+
+!!! info "Available time segments and platforms"
+    - Available for all time segments
+    - Available for Android only
+
+!!! info "File Sequence"
+    ```bash
+    - data/raw/{pid}/phone_messages_raw.csv
+    - data/raw/{pid}/phone_messages_with_datetime.csv
+    - data/interim/{pid}/phone_messages_features/phone_messages_{language}_{provider_key}.csv
+    - data/processed/features/{pid}/phone_messages.csv
+    ```
+
+
+Parameters description for `[PHONE_MESSAGES][PROVIDERS][RAPIDS]`:
+
+|Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;            | Description |
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------
+|`[COMPUTE]`| Set to `True` to extract `PHONE_MESSAGES` features from the `RAPIDS` provider|
+|`[MESSAGES_TYPES]` |  The `messages_type` that will be analyzed. The options for this parameter are `received` or `sent`.
+|`[FEATURES]` |         Features to be computed, see table below for `[MESSAGES_TYPES]` `received` and `sent`
+
+
+Features description for `[PHONE_MESSAGES][PROVIDERS][RAPIDS]`:
+
+|Feature                    |Units      |Description|
+|-------------------------- |---------- |---------------------------|
+|count                      |messages   |Number of messages of type `messages_type` that occurred during a particular `time_segment`.
+|distinctcontacts           |contacts   |Number of distinct contacts that are associated with a particular `messages_type` during a particular `time_segment`.
+|timefirstmessages          |minutes    |Number of minutes between 12:00am (midnight) and the first `message` of a particular `messages_type` during a particular `time_segment`.
+|timelastmessages           |minutes    |Number of minutes between 12:00am (midnight) and the last `message` of a particular `messages_type` during a particular `time_segment`.
+|countmostfrequentcontact   |messages   |Number of messages from the contact with the most messages of `messages_type` during a `time_segment` throughout the whole dataset of each participant.
+
+!!! note "Assumptions/Observations"
+    1. `[MESSAGES_TYPES]` and `[FEATURES]` keys in `config.yaml` need to match. For example, `[MESSAGES_TYPES]` `sent` matches the `[FEATURES]` key `sent`
+
+
--- a/Show More
+++ b/Show More