Browse Source

Update docs with contributing guide

pull/134/head v1.2.0
JulioV 1 year ago
parent
commit
b3c05128fa
  1. 32
      .github/ISSUE_TEMPLATE/bug_report.md
  2. 20
      .github/ISSUE_TEMPLATE/feature_request.md
  3. 2
      docs/change-log.md
  4. 56
      docs/contributing.md
  5. 5
      docs/developers/remote-support.md
  6. 8
      docs/features/add-new-features.md
  7. BIN
      docs/img/logos/cmu.png
  8. BIN
      docs/img/logos/dbdp.png
  9. BIN
      docs/img/logos/helsinki.jpg
  10. BIN
      docs/img/logos/manchester.png
  11. BIN
      docs/img/logos/monash.jpg
  12. BIN
      docs/img/logos/oulu.png
  13. BIN
      docs/img/logos/penn.png
  14. BIN
      docs/img/logos/pitt.png
  15. BIN
      docs/img/logos/uw.jpg
  16. BIN
      docs/img/logos/virginia.jpg
  17. 59
      docs/index.md
  18. 104
      docs/setup/configuration.md
  19. 56
      docs/setup/overview.md
  20. 18
      docs/stylesheets/extra.css
  21. 9
      mkdocs.yml

32
.github/ISSUE_TEMPLATE/bug_report.md

@ -7,28 +7,16 @@ assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
This form is only for bug reports. For questions, feature requests, or feedback use our [Github discussions](https://github.com/carissalow/rapids/discussions)
**To Reproduce**
Steps to reproduce the behavior:
1. Enable ... feature provider
2. Setup ... sensor parameters
3. Run RAPIDS
4. etc ...
Please make sure to:
**Expected behavior**
A clear and concise description of what you expected to happen.
* [ ] Debug and simplify the problem to create a minimal example. For example, reduce the problem to a single participant, sensor, and a few rows of data.
* [ ] Provide a clear and succinct description of the problem (expected behavior vs actual behavior).
* [ ] Attach your `config.yaml`, time segments file, and time zones file if appropriate.
* [ ] Attach test data if possible, and any screenshots or extra resources that will help us debug the problem.
* [ ] Share the commit you are running: `git rev-parse --short HEAD`
* [ ] Share your OS version (e.g. Windows 10)
* [ ] Share the device/sensor your are processing (e.g. phone accelerometer)
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Please complete the following information:**
- OS: [e.g. MacOS]
- RAPIDS current commit, paste the output of `git rev-parse --short HEAD`
- A link to your `config.yaml`
- Type of mobile data you are dealing with (Android/iOS)
**Additional context**
Add any other context about the problem here.
<!-- You can erase any parts of this template not applicable to your Issue. -->

20
.github/ISSUE_TEMPLATE/feature_request.md

@ -1,20 +0,0 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.

2
docs/change-log.md

@ -4,6 +4,8 @@
- Sleep summary and intraday features are more consistent.
- Add wake and bedtime features for sleep summary data.
- Fix bugs with sleep PRICE features.
- Update home page
- Add contributing guide
## v1.1.1
- Fix length of periodic segments on days with DLS
- Fix crash when scraping data for an app that does not exist

56
docs/contributing.md

@ -0,0 +1,56 @@
# Contributing
Thank you for taking the time to contribute!
All changes, small or big, are welcome, and regardless of who you are, we are always happy to work together to make your contribution as strong as possible. We follow the [Covenant Code of Conduct](../code_of_conduct), so we ask you to uphold it. Be kind to everyone in the community, and please report unacceptable behavior to moshiresearch@gmail.com.
## Questions, Feature Requests, and Discussions
Post any questions, feature requests, or discussions in our [GitHub Discussions tab](https://github.com/carissalow/rapids/discussions).
## Bug Reports
Report any bugs in our [GithHub issue tracker](https://github.com/carissalow/rapids/issues) keeping in mind to:
- Debug and simplify the problem to create a minimal example. For example, reduce the problem to a single participant, sensor, and a few rows of data.
- Provide a clear and succinct description of the problem (expected behavior vs. actual behavior).
- Attach your `config.yaml`, time segments file, and time zones file if appropriate.
- Attach test data if possible and any screenshots or extra resources that will help us debug the problem.
- Share the commit you are running: `git rev-parse --short HEAD`
- Share your OS version (e.g., Windows 10)
- Share the device/sensor you are processing (e.g., phone accelerometer)
## Documentation Contributions
If you want to fix a typo or any other minor changes, you can edit the file online by clicking on the pencil icon at the top right of any page and opening a pull request using [Github's website](https://docs.github.com/en/github/managing-files-in-a-repository/editing-files-in-your-repository)
If your changes are more complex, clone RAPIDS' repository, setup the dev environment for our documentation with this [tutorial](../developers/documentation), and submit any changes on a new *feature branch* following our [git flow](../developers/git-flow).
## Code Contributions
!!! hint "Hints for any code changes"
- To submit any new code, use a new *feature branch* following our [git flow](../developers/git-flow).
- If you neeed a new Python or R package in RAPIDS' virtual environments, follow this [tutorial](../developers/virtual-environments/)
- If you need to change the `config.yaml` you will need to update its validation schema with this [tutorial](../developers/validation-schema-config/)
### New Data Streams
*New data containers.* If you want to process data from a device RAPIDS supports ([see this table](../datastreams/data-streams-introduction/)) but it's stored in a database engine or file type we don't support yet, [implement a new data stream container and format](../datastreams/add-new-data-streams/). You can copy and paste the `format.yaml` of one of the other streams of the device you are targeting.
*New sensing apps.* If you want to add support for new smartphone sensing apps like Beiwe, [implement a new data stream container and format](../datastreams/add-new-data-streams/).
*New wearable devices.* If you want to add support for a new wearable, open a [Github discussion](https://github.com/carissalow/rapids/discussions), so we can add the necessary initial configuration files and code.
### New Behavioral Features
If you want to add new [behavioral features](../features/feature-introduction/) for mobile sensors RAPIDS already supports, follow this [tutorial](../features/add-new-features/). A sensor is supported if it has a configuration section in `config.yaml`.
If you want to add new [behavioral features](../features/feature-introduction/) for mobile sensors RAPIDS does not support yet, open a [Github discussion](https://github.com/carissalow/rapids/discussions), so we can add the necessary initial configuration files and code.
### New Tests
If you want to add new tests for existent behavioral features, follow this [tutorial](../developers/testing).
### New Visualizations
Open a [Github discussion](https://github.com/carissalow/rapids/discussions), so we can add the necessary initial configuration files and code.

5
docs/developers/remote-support.md

@ -3,12 +3,13 @@
We use the Live Share extension of Visual Studio Code to debug bugs when sharing data or database credentials is not possible.
1. Install [Visual Studio Code](https://code.visualstudio.com/)
2. Open you RAPIDS root folder in a new VSCode window
3. Open a new Terminal `Terminal > New terminal`
2. Open your RAPIDS root folder in a new VSCode window
3. Open a new terminal in Visual Studio Code `Terminal > New terminal`
4. Install the [Live Share extension pack](https://marketplace.visualstudio.com/items?itemName=MS-vsliveshare.vsliveshare-pack)
5. Press ++ctrl+p++ or ++cmd+p++ and run this command:
```bash
>live share: start collaboration session
```
6. Follow the instructions and share the session link you receive

8
docs/features/add-new-features.md

@ -90,7 +90,7 @@ In this step, you need to add your provider configuration section under the rele
|`[COMPUTE]`| Flag to activate/deactivate your provider
|`[FEATURES]`| List of features your provider supports. Your provider code should only return the features on this list
|`[MY_PARAMTER]`| An arbitrary parameter that our example provider `VEGA` needs. This can be a boolean, integer, float, string, or an array of any of such types.
|`[SRC_SCRIPT]`| The relative path from RAPIDS' root folder to an script that computes the features for this provider. It can be implemented in R or Python.
|`[SRC_SCRIPT]`| The relative path from RAPIDS' root folder to a script that computes the features for this provider. It can be implemented in R or Python.
### Create a feature provider script
@ -121,8 +121,8 @@ Every feature script (`main.[py|R]`) needs a `[providername]_features` function
|---|---|
|`sensor_data_files`| Path to the CSV file containing the data of a single participant. This data has been cleaned and preprocessed. Your function will be automatically called for each participant in your study (in the `[PIDS]` array in `config.yaml`)
|`time_segment`| The label of the time segment that should be processed.
|`provider`| The parameters you configured for your provider in `config.yaml` will be available in this variable as a dictionary in Python or a list in R. In our example this dictionary contains `{MY_PARAMETER:"a_string"}`
|`filter_data_by_segment`| Python only. A function that you will use to filter your data. In R this function is already available in the environment.
|`provider`| The parameters you configured for your provider in `config.yaml` will be available in this variable as a dictionary in Python or a list in R. In our example, this dictionary contains `{MY_PARAMETER:"a_string"}`
|`filter_data_by_segment`| Python only. A function that you will use to filter your data. In R, this function is already available in the environment.
|`*args`| Python only. Not used for now
|`**kwargs`| Python only. Not used for now
@ -180,4 +180,4 @@ The next step is to implement the code that computes your behavioral features in
## New Features for Non-Existing Sensors
If you want to add features for a device or a sensor that we do not support at the moment (those that do not appear in the `"Existing Sensors"` list above), [contact us](../../team) or request it on [Slack](http://awareframework.com:3000/) and we can add the necessary code so you can follow the instructions above.
If you want to add features for a device or a sensor that we do not support at the moment (those that do not appear in the `"Existing Sensors"` list above), [open a new discussion](https://github.com/carissalow/rapids/discussions) in Github and we can add the necessary code so you can follow the instructions above.

BIN
docs/img/logos/cmu.png

After

Width: 600  |  Height: 600  |  Size: 20 KiB

BIN
docs/img/logos/dbdp.png

After

Width: 200  |  Height: 200  |  Size: 5.9 KiB

BIN
docs/img/logos/helsinki.jpg

After

Width: 200  |  Height: 200  |  Size: 4.4 KiB

BIN
docs/img/logos/manchester.png

After

Width: 543  |  Height: 230  |  Size: 79 KiB

BIN
docs/img/logos/monash.jpg

After

Width: 1270  |  Height: 576  |  Size: 33 KiB

BIN
docs/img/logos/oulu.png

After

Width: 600  |  Height: 600  |  Size: 12 KiB

BIN
docs/img/logos/penn.png

After

Width: 256  |  Height: 256  |  Size: 17 KiB

BIN
docs/img/logos/pitt.png

After

Width: 700  |  Height: 240  |  Size: 23 KiB

BIN
docs/img/logos/uw.jpg

After

Width: 256  |  Height: 256  |  Size: 5.0 KiB

BIN
docs/img/logos/virginia.jpg

After

Width: 288  |  Height: 159  |  Size: 17 KiB

59
docs/index.md

@ -2,20 +2,24 @@
Reproducible Analysis Pipeline for Data Streams (RAPIDS) allows you to process smartphone and wearable data to [extract](features/feature-introduction.md) and [create](features/add-new-features.md) **behavioral features** (a.k.a. digital biomarkers), [visualize](visualizations/data-quality-visualizations.md) mobile sensor data, and [structure](workflow-examples/analysis.md) your analysis into reproducible workflows.
RAPIDS is open source, documented, modular, tested, and reproducible. At the moment, we support [data streams](datastreams/data-streams-introduction) logged by smartphones, Fitbit wearables, and, in collaboration with the [DBDP](https://dbdp.org/), Empatica wearables (but you can [add your own](datastreams/add-new-data-streams) too).
RAPIDS is open source, documented, multi-platform, modular, tested, and reproducible. At the moment, we support [data streams](datastreams/data-streams-introduction) logged by smartphones, Fitbit wearables, and Empatica wearables in collaboration with the [DBDP](https://dbdp.org/).
**If you want to know more head over to [Overview](setup/overview/)**
!!! tip "Where do I start?"
!!! tip
:material-slack: Questions or feedback can be posted on the \#rapids channel in AWARE Framework\'s [slack](http://awareframework.com:3000/).
:material-power-standby: New to RAPIDS? Check our [Overview + FAQ](setup/overview/) and [minimal example](workflow-examples/minimal)
:material-github: Bugs and feature requests should be posted on [Github](https://github.com/carissalow/rapids/issues).
:material-play-speed: [Install](setup/installation), [configure](setup/configuration), and [execute](setup/execution) RAPIDS to [extract](features/feature-introduction.md) and [plot](visualizations/data-quality-visualizations.md) behavioral features
:fontawesome-solid-tasks: Join our discussions on our algorithms and assumptions for feature [processing](https://github.com/carissalow/rapids/discussions).
:material-github: Bugs should be reported on [Github issues](https://github.com/carissalow/rapids/issues)
:fontawesome-solid-tasks: Questions, discussions, feature requests, and feedback can be posted on our [Github discussions](https://github.com/carissalow/rapids/discussions)
:material-twitter: Keep up to date with our [Twitter feed](https://twitter.com/RAPIDS_Science) or [Slack channel](http://awareframework.com:3000/)
:material-plus-network: Do you want to modify or add new functionality to RAPIDS? Check our [contributing guide](./contributing)
:fontawesome-solid-sync-alt: Are you upgrading from RAPIDS `0.4.x` or older? Follow this [guide](migrating-from-old-versions)
:fontawesome-solid-play: Ready? Go to [Overview](setup/overview/).
## What are the benefits of using RAPIDS?
@ -24,9 +28,48 @@ RAPIDS is open source, documented, modular, tested, and reproducible. At the mom
5. **Parallel execution**. Thanks to Snakemake, your analysis can be executed over multiple cores without changing your code.
6. **Code-free features**. Extract any of the behavioral features offered by RAPIDS without writing any code.
7. **Extensible code**. You can easily add your own data streams or behavioral features in R or Python, share them with the community, and keep authorship and citations.
8. **Timezone aware**. Your data is adjusted to one or more time zones per participant.
8. **Time zone aware**. Your data is adjusted to one or more time zones per participant.
9. **Flexible time segments**. You can extract behavioral features on time windows of any length (e.g., 5 minutes, 3 hours, 2 days), on every day or particular days (e.g., weekends, Mondays, the 1st of each month, etc.), or around events of interest (e.g., surveys or clinical relapses).
10. **Tested code**. We are continually adding tests to make sure our behavioral features are correct.
11. **Reproducible code**. If you structure your analysis within RAPIDS, you can be sure your code will run in other computers as intended, thanks to R and Python virtual environments. You can share your analysis code along with your publications without any overhead.
12. **Private**. All your data is processed locally.
## Users and Contributors
??? quote "Community Contributors"
Many thanks to our community contributions and the [whole team](../team):
- Agam Kumar (CMU)
- Yasaman S. Sefidgar (University of Washington)
- Joe Kim (Duke University)
- Brinnae Bent (Duke University)
- Stephen Price (CMU)
- Neil Singh (University of Virginia)
Many thanks to the researchers that made [their work](../citation) open source:
- Panda et al. [paper](https://pubmed.ncbi.nlm.nih.gov/31657854/)
- Stachl et al. [paper](https://www.pnas.org/content/117/30/17680)
- Doryab et al. [paper](https://arxiv.org/abs/1812.10394)
- Barnett et al. [paper](https://doi.org/10.1093/biostatistics/kxy059)
- Canzian et al. [paper](https://doi.org/10.1145/2750858.2805845)
??? quote "Publications using RAPIDS"
- Predicting Symptoms of Depression and Anxiety Using Smartphone and Wearable Data [link](https://www.frontiersin.org/articles/10.3389/fpsyt.2021.625247/full)
- Predicting Depression from Smartphone Behavioral Markers Using Machine Learning Methods, Hyper-parameter Optimization, and Feature Importance Analysis: An Exploratory Study [link](https://preprints.jmir.org/preprint/26540)
- Digital Biomarkers of Symptom Burden Self-Reported by Perioperative Patients Undergoing Pancreatic Surgery: Prospective Longitudinal Study [link](https://cancer.jmir.org/2021/2/e27975/)
- An Automated Machine Learning Pipeline for Monitoring and Forecasting Mobile Health Data [link](https://edas.info/showManuscript.php?m=1570708269&random=750318666&type=final&ext=pdf&title=PDF+file)
<div class="users">
<div><img alt="carnegie mellon university" loading="lazy" src="./img/logos/cmu.png" /></div>
<div><img alt="digital biomarker development pipeline" loading="lazy" src="./img/logos/dbdp.png" /></div>
<div><img alt="university of helsinki" loading="lazy" src="./img/logos/helsinki.jpg" /></div>
<div><img alt="university of manchester" loading="lazy" src="./img/logos/manchester.png" /></div>
<div><img alt="monash university" loading="lazy" src="./img/logos/monash.jpg" /></div>
<div><img alt="oulu university" loading="lazy" src="./img/logos/oulu.png" /></div>
<div><img alt="university of pennsylvania" loading="lazy" src="./img/logos/penn.png" /></div>
<div><img alt="university of pittsburgh" loading="lazy" src="./img/logos/pitt.png" /></div>
<div><img alt="university of virginia" loading="lazy" src="./img/logos/virginia.jpg" /></div>
<div><img alt="university of washington" loading="lazy" src="./img/logos/uw.jpg" /></div>
</div>

104
docs/setup/configuration.md

@ -1,27 +1,27 @@
# Configuration
You need to follow these steps to configure your RAPIDS deployment before you can extract behavioral features
You need to follow these steps to configure your RAPIDS deployment before you can extract behavioral features.
0. Verify RAPIDS can process your [data streams](#supported-data-streams)
3. Create your [participants files](#participant-files)
4. Select what [time segments](#time-segments) you want to extract features on
2. Choose the [timezone of your study](#timezone-of-your-study)
2. Select the [timezone of your study](#timezone-of-your-study)
5. Configure your [data streams](#data-stream-configuration)
6. Select what [sensors and features](#sensor-and-features-to-process) you want to process
When you are done with this configuration, go to [executing RAPIDS](../execution).
!!! hint
Every time you see `config["KEY"]` or `[KEY]` in these docs we are referring to the corresponding key in the `config.yaml` file.
Every time you see `config["KEY"]` or `[KEY]` in these docs, we are referring to the corresponding key in the `config.yaml` file.
---
## Supported data streams
A data stream refers to sensor data collected using a specific type of **device** with a specific **format** and stored in a specific **container**. For example, the `aware_mysql` data stream handles smartphone data (**device**) collected with the [AWARE Framework](https://awareframework.com/) (**format**) stored in a MySQL database (**container**).
A data stream refers to sensor data collected using a specific **device** with a specific **format** and stored in a specific **container**. For example, the `aware_mysql` data stream handles smartphone data (**device**) collected with the [AWARE Framework](https://awareframework.com/) (**format**) stored in a MySQL database (**container**).
Check the table in [introduction to data streams](../../datastreams/data-streams-introduction) to know what data streams we support. If your data stream is supported, continue to the next configuration section, **you will use its label later in this guide** (e.g. `aware_mysql`). If your steam is not supported but you want to implement it, follow the tutorial to [add support for new data streams](../../datastreams/add-new-data-streams) and get in touch by email or in Slack if you have any questions.
Check the table in [introduction to data streams](../../datastreams/data-streams-introduction) to know what data streams we support. If your data stream is supported, continue to the next configuration section, **you will use its label later in this guide** (e.g. `aware_mysql`). If your steam is not supported, but you want to implement it, follow the tutorial to [add support for new data streams](../../datastreams/add-new-data-streams) and [open a new discussion](https://github.com/carissalow/rapids/discussions) in Github with any questions.
---
@ -36,7 +36,7 @@ Participant files link together multiple devices (smartphones and wearables) to
```
??? info "Optional: Migrating participants files with the old format"
If you were using the pre-release version of RAPIDS with participant files in plain text (as opposed to yaml), you can run the following command and your old files will be converted into yaml files stored in `data/external/participant_files/`
If you were using the pre-release version of RAPIDS with participant files in plain text (as opposed to yaml), you could run the following command, and your old files will be converted into yaml files stored in `data/external/participant_files/`
```bash
python tools/update_format_participant_files.py
@ -46,9 +46,9 @@ Participant files link together multiple devices (smartphones and wearables) to
??? example "Example of the structure of a participant file"
In this example, the participant used an android phone, an ios phone, a fitbit device, and a Empatica device throughout the study between Apr 23rd 2020 and Oct 28th 2020
In this example, the participant used an android phone, an ios phone, a Fitbit device, and an Empatica device throughout the study between April 23rd, 2020, and October 28th, 2020
If your participants didn't use a `[PHONE]`, `[FITBIT]` or `[EMPATICA]` device, it is not necessary to include that section in their participant file. In other words, you can analyse data from 1 or more devices per participant.
If your participants didn't use a `[PHONE]`, `[FITBIT]` or `[EMPATICA]` device, it is not necessary to include that section in their participant file. In other words, you can analyze data from 1 or more devices per participant.
```yaml
PHONE:
@ -74,10 +74,10 @@ Participant files link together multiple devices (smartphones and wearables) to
| Key&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; | Description |
|-------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `[DEVICE_IDS]` | An array of the strings that uniquely identify each smartphone, you can have more than one for when participants changed phones in the middle of the study. |
| `[PLATFORMS]` | An array that specifies the OS of each smartphone in `[DEVICE_IDS]` , use a combination of `android` or `ios` (we support participants that changed platforms in the middle of your study!). You can set `[PLATFORMS]: [infer]` and RAPIDS will infer them automatically (each phone data stream infer this differently, e.g. `aware_mysql` uses the `aware_device` table). |
| `[PLATFORMS]` | An array that specifies the OS of each smartphone in `[DEVICE_IDS]` , use a combination of `android` or `ios` (we support participants that changed platforms in the middle of your study!). You can set `[PLATFORMS]: [infer]`, and RAPIDS will infer them automatically (each phone data stream infer this differently, e.g., `aware_mysql` uses the `aware_device` table). |
| `[LABEL]` | A string that is used in reports and visualizations. |
| `[START_DATE]` | A string with format `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`. Only data collected *after* this date time will be included in the analysis. By default, `YYYY-MM-DD` is interpreted as `YYYY-MM-DD 00:00:00`. |
| `[END_DATE]` | A string with format `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`. Only data collected *before* this date time will be included in the analysis. By default, `YYYY-MM-DD` is interpreted as `YYYY-MM-DD 00:00:00`. |
| `[START_DATE]` | A string with format `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`. Only data collected *after* this date-time will be included in the analysis. By default, `YYYY-MM-DD` is interpreted as `YYYY-MM-DD 00:00:00`. |
| `[END_DATE]` | A string with format `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`. Only data collected *before* this date-time will be included in the analysis. By default, `YYYY-MM-DD` is interpreted as `YYYY-MM-DD 00:00:00`. |
=== "[FITBIT]"
@ -85,8 +85,8 @@ Participant files link together multiple devices (smartphones and wearables) to
|------------------|-----------------------------------------------------------------------------------------------------------|
| `[DEVICE_IDS]` | An array of the strings that uniquely identify each Fitbit, you can have more than one in case the participant changed devices in the middle of the study. |
| `[LABEL]` | A string that is used in reports and visualizations. |
| `[START_DATE]` | A string with format `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`. Only data collected *after* this date time will be included in the analysis. By default, `YYYY-MM-DD` is interpreted as `YYYY-MM-DD 00:00:00`. |
| `[END_DATE]` | A string with format `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`. Only data collected *before* this date time will be included in the analysis. By default, `YYYY-MM-DD` is interpreted as `YYYY-MM-DD 00:00:00`. |
| `[START_DATE]` | A string with format `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`. Only data collected *after* this date-time will be included in the analysis. By default, `YYYY-MM-DD` is interpreted as `YYYY-MM-DD 00:00:00`. |
| `[END_DATE]` | A string with format `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`. Only data collected *before* this date-time will be included in the analysis. By default, `YYYY-MM-DD` is interpreted as `YYYY-MM-DD 00:00:00`. |
=== "[EMPATICA]"
@ -94,14 +94,14 @@ Participant files link together multiple devices (smartphones and wearables) to
|------------------|-----------------------------------------------------------------------------------------------------------|
| `[DEVICE_IDS]` | An array of the strings that uniquely identify each Empatica device used by this participant. Since the most common use case involves having multiple zip files from a single device for each person, set this device id to an arbitrary string (we usually use their `pid`) |
| `[LABEL]` | A string that is used in reports and visualizations. |
| `[START_DATE]` | A string with format `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`. Only data collected *after* this date time will be included in the analysis. By default, `YYYY-MM-DD` is interpreted as `YYYY-MM-DD 00:00:00`. |
| `[END_DATE]` | A string with format `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`. Only data collected *before* this date time will be included in the analysis. By default, `YYYY-MM-DD` is interpreted as `YYYY-MM-DD 00:00:00`. |
| `[START_DATE]` | A string with format `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`. Only data collected *after* this date-time will be included in the analysis. By default, `YYYY-MM-DD` is interpreted as `YYYY-MM-DD 00:00:00`. |
| `[END_DATE]` | A string with format `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`. Only data collected *before* this date-time will be included in the analysis. By default, `YYYY-MM-DD` is interpreted as `YYYY-MM-DD 00:00:00`. |
### Automatic creation of participant files
You can use a CSV file with a row per participant to automatically create participant files.
??? "`AWARE_DEVICE_TABLE` was deprecated"
In previous versions of RAPIDS, you could create participant files automatically using the `aware_device` table. We deprecated this option but you can still achieve the same results if you export the output of the following SQL query as a CSV file and follow the instructions below:
In previous versions of RAPIDS, you could create participant files automatically using the `aware_device` table. We deprecated this option, but you can still achieve the same results if you export the output of the following SQL query as a CSV file and follow the instructions below:
```sql
SELECT device_id, device_id as fitbit_id, CONCAT("p", _id) as empatica_id, CONCAT("p", _id) as pid, if(brand = "iPhone", "ios", "android") as platform, CONCAT("p", _id) as label, DATE_FORMAT(FROM_UNIXTIME((timestamp/1000)- 86400), "%Y-%m-%d") as start_date, CURRENT_DATE as end_date from aware_device order by _id;
@ -126,21 +126,21 @@ CREATE_PARTICIPANT_FILES:
IGNORED_DEVICE_IDS: []
```
Your CSV file (`[CSV_FILE_PATH]`) should have the following columns (headers) but the values within each column can be empty:
Your CSV file (`[CSV_FILE_PATH]`) should have the following columns (headers), but the values within each column can be empty:
| Column | Description |
|------------------|-----------------------------------------------------------------------------------------------------------|
| device_id | Phone device id. Separate multiple ids with `;` |
| fitbit_id | Fitbit device id. Separate multiple ids with `;` |
| empatica_id | Empatica device id. Since the most common use case involves having multiple zip files from a single device for each person, set this device id to an arbitrary string (we usually use their `pid`) |
| empatica_id | Empatica device id. Since the most common use case involves having various zip files from a single device for each person, set this device id to an arbitrary string (we usually use their `pid`) |
| pid | Unique identifiers with the format pXXX (your participant files will be named with this string) |
| platform | Use `android`, `ios` or `infer` as explained above, separate values with `;` |
| label | A human readable string that is used in reports and visualizations. |
| label | A human-readable string that is used in reports and visualizations. |
| start_date | A string with format `YYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`. By default, `YYYY-MM-DD` is interpreted as `YYYY-MM-DD 00:00:00`. |
| end_date | A string with format `YYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`. By default, `YYYY-MM-DD` is interpreted as `YYYY-MM-DD 00:00:00`. |
!!! example
We added white spaces to this example to make it easy to read but you don't have to.
We added white spaces to this example to make it easy to read, but you don't have to.
```csv
device_id ,fitbit_id, empatica_id ,pid ,label ,platform ,start_date ,end_date
@ -158,11 +158,11 @@ snakemake -j1 create_participants_files
## Time Segments
Time segments (or epochs) are the time windows on which you want to extract behavioral features. For example, you might want to process data on every day, every morning, or only during weekends. RAPIDS offers three categories of time segments that are flexible enough to cover most use cases: **frequency** (short time windows every day), **periodic** (arbitrary time windows on any day), and **event** (arbitrary time windows around events of interest). See also our [examples](#segment-examples).
Time segments (or epochs) are the time windows on which you want to extract behavioral features. For example, you might want to process data every day, every morning, or only during weekends. RAPIDS offers three categories of time segments that are flexible enough to cover most use cases: **frequency** (short time windows every day), **periodic** (arbitrary time windows on any day), and **event** (arbitrary time windows around events of interest). See also our [examples](#segment-examples).
=== "Frequency Segments"
These segments are computed on every day and all have the same duration (for example 30 minutes). Set the following keys in your `config.yaml`
These segments are computed every day, and all have the same duration (for example, 30 minutes). Set the following keys in your `config.yaml`
```yaml
TIME_SEGMENTS: &time_segments
@ -171,7 +171,7 @@ Time segments (or epochs) are the time windows on which you want to extract beha
INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE
```
The file pointed by `[TIME_SEGMENTS][FILE]` should have the following format and can only have 1 row.
The file pointed by `[TIME_SEGMENTS][FILE]` should have the following format and only have 1 row.
| Column | Description |
|--------|----------------------------------------------------------------------|
@ -198,7 +198,7 @@ Time segments (or epochs) are the time windows on which you want to extract beha
=== "Periodic Segments"
These segments can be computed every day, or on specific days of the week, month, quarter, and year. Their minimum duration is 1 minute but they can be as long as you want. Set the following keys in your `config.yaml`.
These segments can be computed every day or on specific days of the week, month, quarter, and year. Their minimum duration is 1 minute, but they can be as long as you want. Set the following keys in your `config.yaml`.
```yaml
TIME_SEGMENTS: &time_segments
@ -207,7 +207,7 @@ Time segments (or epochs) are the time windows on which you want to extract beha
INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE # or TRUE
```
If `[INCLUDE_PAST_PERIODIC_SEGMENTS]` is set to `TRUE`, RAPIDS will consider instances of your segments back enough in the past as to include the first row of data of each participant. For example, if the first row of data from a participant happened on Saturday March 7th 2020 and the requested segment duration is 7 days starting on every Sunday, the first segment to be considered would start on Sunday March 1st if `[INCLUDE_PAST_PERIODIC_SEGMENTS]` is `TRUE` or on Sunday March 8th if `FALSE`.
If `[INCLUDE_PAST_PERIODIC_SEGMENTS]` is set to `TRUE`, RAPIDS will consider instances of your segments back enough in the past to include the first row of data of each participant. For example, if the first row of data from a participant happened on Saturday, March 7th, 2020, and the requested segment duration is 7 days starting on every Sunday, the first segment to be considered would begin on Sunday, March 1st if `[INCLUDE_PAST_PERIODIC_SEGMENTS]` is `TRUE` or on Sunday, March 8th if `FALSE`.
The file pointed by `[TIME_SEGMENTS][FILE]` should have the following format and can have multiple rows.
@ -215,9 +215,9 @@ Time segments (or epochs) are the time windows on which you want to extract beha
|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| label | A string that is used as a prefix in the name of your time segments. It has to be **unique** between rows |
| start_time | A string with format `HH:MM:SS` representing the starting time of this segment on any day |
| length | A string representing the length of this segment.It can have one or more of the following strings **`XXD XXH XXM XXS`** to represent days, hours, minutes and seconds. For example `7D 23H 59M 59S` |
| repeats_on | One of the follow options `every_day`, `wday`, `qday`, `mday`, and `yday`. The last four represent a week, quarter, month and year day |
| repeats_value | An integer complementing `repeats_on`. If you set `repeats_on` to `every_day` set this to `0`, otherwise `1-7` represent a `wday` starting from Mondays, `1-31` represent a `mday`, `1-91` represent a `qday`, and `1-366` represent a `yday` |
| length | A string representing the length of this segment. It can have one or more of the following strings **`XXD XXH XXM XXS`** to represent days, hours, minutes, and seconds. For example, `7D 23H 59M 59S` |
| repeats_on | One of the following options `every_day`, `wday`, `qday`, `mday`, and `yday`. The last four represent a week, quarter, month, and year day |
| repeats_value | An integer complementing `repeats_on`. If you set `repeats_on` to `every_day`, set this to `0`, otherwise `1-7` represent a `wday` starting from Mondays, `1-31` represent a `mday`, `1-91` represent a `qday`, and `1-366` represent a `yday` |
!!! example
@ -230,11 +230,11 @@ Time segments (or epochs) are the time windows on which you want to extract beha
night,00:00:00,5H 59M 59S,every_day,0
```
This configuration will create five segments instances (`daily`, `morning`, `afternoon`, `evening`, `night`) on any given day (`every_day` set to 0). The `daily` segment will start at midnight and will last `23:59:59`, the other four segments will start at 6am, 12pm, 6pm, and 12am respectively and last for `05:59:59`.
This configuration will create five segment instances (`daily`, `morning`, `afternoon`, `evening`, `night`) on any given day (`every_day` set to 0). The `daily` segment will start at midnight and last `23:59:59`; the other four segments will begin at 6am, 12pm, 6pm, and 12am, respectively, and last for `05:59:59`.
=== "Event segments"
These segments can be computed before or after an event of interest (defined as any UNIX timestamp). Their minimum duration is 1 minute but they can be as long as you want. The start of each segment can be shifted backwards or forwards from the specified timestamp. Set the following keys in your `config.yaml`.
These segments can be computed before or after an event of interest (defined as any UNIX timestamp). Their minimum duration is 1 minute, but they can be as long as you want. The start of each segment can be shifted backward or forwards from the specified timestamp. Set the following keys in your `config.yaml`.
```yaml
TIME_SEGMENTS: &time_segments
@ -247,12 +247,12 @@ Time segments (or epochs) are the time windows on which you want to extract beha
| Column | Description |
|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| label | A string that is used as a prefix in the name of your time segments. If labels are unique, every segment is independent; if two or more segments have the same label, their data will be grouped when computing auxiliary data for features like the `most frequent contact` for calls (the most frequent contact will be computed across all these segments). There cannot be two *overlaping* event segments with the same label (RAPIDS will throw an error) |
| label | A string that is used as a prefix in the name of your time segments. If labels are unique, every segment is independent; if two or more segments have the same label, their data will be grouped when computing auxiliary data for features like the `most frequent contact` for calls (the most frequent contact will be calculated across all these segments). There cannot be two *overlapping* event segments with the same label (RAPIDS will throw an error) |
| event_timestamp | A UNIX timestamp that represents the moment an event of interest happened (clinical relapse, survey, readmission, etc.). The corresponding time segment will be computed around this moment using `length`, `shift`, and `shift_direction` |
| length | A string representing the length of this segment. It can have one or more of the following keys `XXD XXH XXM XXS` to represent a number of days, hours, minutes, and seconds. For example `7D 23H 59M 59S` |
| shift | A string representing the time shift from `event_timestamp`. It can have one or more of the following keys `XXD XXH XXM XXS` to represent a number of days, hours, minutes and seconds. For example `7D 23H 59M 59S`. Use this value to change the start of a segment with respect to its `event_timestamp`. For example, set this variable to `1H` to create a segment that starts 1 hour from an event of interest (`shift_direction` determines if it's before or after). |
| length | A string representing the length of this segment. It can have one or more of the following keys `XXD XXH XXM XXS` to represent days, hours, minutes, and seconds. For example, `7D 23H 59M 59S` |
| shift | A string representing the time shift from `event_timestamp`. It can have one or more of the following keys `XXD XXH XXM XXS` to represent days, hours, minutes, and seconds. For example, `7D 23H 59M 59S`. Use this value to change the start of a segment with respect to its `event_timestamp`. For example, set this variable to `1H` to create a segment that starts 1 hour from an event of interest (`shift_direction` determines if it's before or after). |
| shift_direction | An integer representing whether the `shift` is before (`-1`) or after (`1`) an `event_timestamp` |
|device_id| The device id (smartphone or fitbit) to whom this segment belongs to. You have to create a line in this event segment file for each event of a participant that you want to analyse. If you have participants with multiple device ids you can choose any of them|
|device_id| The device id (smartphone or Fitbit) to whom this segment belongs to. You have to create a line in this event segment file for each event of a participant that you want to analyze. If you have participants with multiple device ids, you can choose any of them|
!!! example
```csv
@ -267,9 +267,9 @@ Time segments (or epochs) are the time windows on which you want to extract beha
mood,1587906020000,7D,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
```
This example will create eight segments for a single participant (`a748ee1a...`), five independent `stressX` segments with various lengths (1,4,3,7, and 9 hours). Segments `stress1`, `stress3`, and `stress5` are shifted forwards by 5 minutes and `stress2` and `stress4` are shifted backwards by 4 hours (that is, if the `stress4` event happened on March 15th at 1pm EST (`1584291600000`), the time segment will start on that day at 9am and end at 4pm).
This example will create eight segments for a single participant (`a748ee1a...`), five independent `stressX` segments with various lengths (1,4,3,7, and 9 hours). Segments `stress1`, `stress3`, and `stress5` are shifted forwards by 5 minutes, and `stress2` and `stress4` are shifted backward by 4 hours (that is, if the `stress4` event happened on March 15th at 1pm EST (`1584291600000`), the time segment will start on that day at 9am and end at 4pm).
The three `mood` segments are 1 hour, 1 day and 7 days long and have no shift. In addition, these `mood` segments are grouped together, meaning that although RAPIDS will compute features on each one of them, some necessary information to compute a few of such features will be extracted from all three segments, for example the phone contact that called a participant the most or the location clusters visited by a participant.
The three `mood` segments are 1 hour, 1 day, and 7 days long and have no shift. In addition, these `mood` segments are grouped together, meaning that although RAPIDS will compute features on each one of them, some information for such computation will be extracted from all three segments, for example, the phone contact that called a participant the most, or the location clusters visited by a participant.
??? info "Date time labels of event segments"
In the final feature file, you will find a row per event segment. The `local_segment` column of each row has a `label`, a start date-time string, and an end date-time string.
@ -280,7 +280,7 @@ Time segments (or epochs) are the time windows on which you want to extract beha
All sensor data is always segmented based on timestamps, and the date-time strings are attached for informative purposes. For example, you can plot your features based on these strings.
When you configure RAPIDS to work with a single time zone, such tz code will be used to convert start/end timestamps (the ones you typed in the event segments file) into start/end date-time strings. However, when you configure RAPIDS to work with multiple time zones, RAPIDS will use the most common time zone across all devices of every participant to do the conversion. The most common time zone is the one in which a participant spent the most time.
When you configure RAPIDS to work with a single time zone, such time zone code will be used to convert start/end timestamps (the ones you typed in the event segments file) into start/end date-time strings. However, when you configure RAPIDS to work with multiple time zones, RAPIDS will use the most common time zone across all devices of every participant to do the conversion. The most common time zone is the one in which a participant spent the most time.
In practical terms, this means that the date-time strings of event segments that happened in uncommon time zones will have shifted start/end date-time labels. However, the data within each segment was correctly filtered based on timestamps.
@ -344,7 +344,7 @@ Time segments (or epochs) are the time windows on which you want to extract beha
### Single timezone
If your study only happened in a single time zone or you want to ignore short trips of your participants to different time zones, select the appropriate code form this [list](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) and change the following config key. Double-check your timezone code pick, for example, US Eastern Time is `America/New_York` not `EST`
If your study only happened in a single time zone or you want to ignore short trips of your participants to different time zones, select the appropriate code from this [list](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) and change the following config key. Double-check your timezone code pick; for example, US Eastern Time is `America/New_York`, not `EST`.
``` yaml
TIMEZONE:
@ -376,7 +376,7 @@ Parameters for `[TIMEZONE]`
|--|--|
|`[TYPE]`| Either `SINGLE` or `MULTIPLE` as explained above |
|`[SINGLE][TZCODE]`| The time zone code from this [list](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) to be used across all devices |
|`[MULTIPLE][TZCODES_FILE]`| A CSV file containing the time zones in which participants' devices sensed data (see the required format below). Multiple devices can be linked to the same person, read more in [Participants Files](#participant-files) |
|`[MULTIPLE][TZCODES_FILE]`| A CSV file containing the time zones in which participants' devices sensed data (see the required format below). Multiple devices can be linked to the same person. Read more in [Participants Files](#participant-files) |
|`[MULTIPLE][IF_MISSING_TZCODE]`| When a device is missing from `[TZCODES_FILE]` Set this flag to `STOP` to stop RAPIDS execution and show an error, or to `USE_DEFAULT` to assign the time zone specified in `[DEFAULT_TZCODE]` to any such devices |
|`[MULTIPLE][FITBIT][ALLOW_MULTIPLE_TZ_PER_DEVICE]`| You only need to care about this flag if one or more Fitbit devices sensed data in one or more time zones, and you want RAPIDS to take into account this in its feature computation. Read more in "How does RAPIDS handle Fitbit devices?" below. |
|`[MULTIPLE][FITBIT][INFER_FROM_SMARTPHONE_TZ]`| You only need to care about this flag if one or more Fitbit devices sensed data in one or more time zones, and you want RAPIDS to take into account this in its feature computation. Read more in "How does RAPIDS handle Fitbit devices?" below. |
@ -416,7 +416,7 @@ Parameters for `[TIMEZONE]`
```bash
python tools/create_multi_timezones_file.py
```
The `TZCODES_FILE` will be saved as `data/external/multiple_timezones.csv` file.
The `TZCODES_FILE` will be saved as `data/external/multiple_timezones.csv`.
??? note "What happens if participant X lives in Los Angeles but participant Y lives in Amsterdam and they both stayed there during my study?"
Add a row per participant and set timestamp to `0`:
@ -431,21 +431,21 @@ Parameters for `[TIMEZONE]`
If `[IF_MISSING_TZCODE]` is set to `STOP`, RAPIDS will stop its execution and show you an error message.
If `[IF_MISSING_TZCODE]` is set to `USE_DEFAULT`, it will assign the time zone specified in `[DEFAULT_TZCODE]` to any devices with missing time zone information in `[TZCODES_FILE]`. This is helpful if only a few of your participants had multiple timezones and you don't want to specify the same time zone for the rest.
If `[IF_MISSING_TZCODE]` is set to `USE_DEFAULT`, it will assign the time zone specified in `[DEFAULT_TZCODE]` to any devices with missing time zone information in `[TZCODES_FILE]`. This is helpful if only a few of your participants had multiple timezones, and you don't want to specify the same time zone for the rest.
??? note "How does RAPIDS handle Fitbit devices?"
Fitbit devices are not time zone aware and they always log data with a local date-time string.
Fitbit devices are not time zone aware, and they always log data with a local date-time string.
- When none of the Fitbit devices in your study changed time zones (e.g., `p01` was always in New York and `p02` was always in Amsterdam), you can set a single time zone per Fitbit device id along with a timestamp 0 (you can still assign multiple time zones to smartphone device ids)
- When none of the Fitbit devices in your study changed time zones (e.g., `p01` was always in New York and `p02` was always in Amsterdam), you can set a single time zone per Fitbit device id along with a timestamp of 0 (you can still assign multiple time zones to smartphone device ids)
```csv
device_id, tzcode, timestamp
fitbit123, America/New_York, 0
fitbit999, Europe/Amsterdam, 0
```
- On the other hand, when at least one of your Fitbit devices changed time zones **AND** you want RAPIDS to take into account these changes, you need to set `[ALLOW_MULTIPLE_TZ_PER_DEVICE]` to `True`. **You have to manually allow this option because you need to be aware it can produce inaccurate features around the times when time zones changed**. This is because we cannot know exactly when the Fitbit device detected and processed the time zone change.
- On the other hand, when at least one of your Fitbit devices changed time zones **AND** you want RAPIDS to take into account these changes, you need to set `[ALLOW_MULTIPLE_TZ_PER_DEVICE]` to `True`. **You have to manually allow this option because you need to be aware it can produce inaccurate features around the times when time zones changed**. This is because we cannot know precisely when the Fitbit device detected and processed the time zone change.
If you want to `ALLOW_MULTIPLE_TZ_PER_DEVICE` you will need to add any time zone changes per device in the `TZCODES_FILE` as explained above. You could obtain this data by hand but if your participants also used a smartphone during your study, you can use their time zone logs. Recall that in RAPIDS every participant is represented with a participant file `pXX.yaml`, this file links together multiple devices and we will use it to know what smartphone time zone data should be applied to Fitbit devices. Thus set `INFER_FROM_SMARTPHONE_TZ` to `TRUE`, if you have included smartphone time zone data in your `TZCODE_FILE` and you want to make a participant's Fitbit data time zone aware with their respective smartphone data.
If you want to `ALLOW_MULTIPLE_TZ_PER_DEVICE`, you will need to add any time zone changes per device in the `TZCODES_FILE` as explained above. You could obtain this data by hand, but if your participants also used a smartphone during your study, you can use their time zone logs. Recall that in RAPIDS, every participant is represented with a participant file `pXX.yaml`, this file links together multiple devices, and we will use it to know what smartphone time zone data should be applied to Fitbit devices. Thus set `INFER_FROM_SMARTPHONE_TZ` to `TRUE`, if you have included smartphone time zone data in your `TZCODE_FILE` and want to make a participant's Fitbit data time zone aware with their respective smartphone data.
---
## Data Stream Configuration
@ -518,24 +518,24 @@ Modify the following keys in your `config.yaml` depending on the [data stream](.
=== "fitbitjson_mysql"
This data stream process Fitbit data inside a JSON column as obtained from the Fitbit API and stored in a MySQL database. Read more about its column mappings and mutations in [`fitbitjson_mysql`](../../datastreams/fitbitjson-mysql#format).
This data stream processes Fitbit data inside a JSON column obtained from the Fitbit API and stored in a MySQL database. Read more about its column mappings and mutations in [`fitbitjson_mysql`](../../datastreams/fitbitjson-mysql#format).
| Key | Description |
|---------------------|----------------------------------------------------------------------------------------------------------------------------|
| `[DATABASE_GROUP]` | A database credentials group. Read the instructions below to set it up |
| `[SLEEP_SUMMARY_LAST_NIGHT_END]` | Segments are assigned based on this parameter. Any sleep episodes starts between today's SLEEP_SUMMARY_LAST_NIGHT_END (LNE) and tomorrow's LNE is regarded as today's sleep episode. While today's bedtime is based on today's sleep episodes, today's wake time is based on yesterday's sleep episodes. |
| `[SLEEP_SUMMARY_LAST_NIGHT_END]` | Segments are assigned based on this parameter. Any sleep episodes that start between today's SLEEP_SUMMARY_LAST_NIGHT_END (LNE) and tomorrow's LNE are regarded as today's sleep episodes. While today's bedtime is based on today's sleep episodes, today's wake time is based on yesterday's sleep episodes. |
--8<---- "docs/snippets/database.md"
=== "fitbitjson_csv"
This data stream process Fitbit data inside a JSON column as obtained from the Fitbit API and stored in a CSV file. Read more about its column mappings and mutations in [`fitbitjson_csv`](../../datastreams/fitbitjson-csv#format).
This data stream processes Fitbit data inside a JSON column obtained from the Fitbit API and stored in a CSV file. Read more about its column mappings and mutations in [`fitbitjson_csv`](../../datastreams/fitbitjson-csv#format).
| Key | Description |
|---------------------|----------------------------------------------------------------------------------------------------------------------------|
| `[FOLDER]` | Folder where you have to place a CSV file **per** Fitbit sensor. Each file has to contain all the data from every participant you want to process. |
| `[SLEEP_SUMMARY_LAST_NIGHT_END]` | Segments are assigned based on this parameter. Any sleep episodes starts between today's SLEEP_SUMMARY_LAST_NIGHT_END (LNE) and tomorrow's LNE is regarded as today's sleep episode. While today's bedtime is based on today's sleep episodes, today's wake time is based on yesterday's sleep episodes. |
| `[SLEEP_SUMMARY_LAST_NIGHT_END]` | Segments are assigned based on this parameter. Any sleep episodes that start between today's SLEEP_SUMMARY_LAST_NIGHT_END (LNE) and tomorrow's LNE are regarded as today's sleep episodes. While today's bedtime is based on today's sleep episodes, today's wake time is based on yesterday's sleep episodes. |
=== "fitbitparsed_mysql"
@ -546,7 +546,7 @@ Modify the following keys in your `config.yaml` depending on the [data stream](.
| Key | Description |
|---------------------|----------------------------------------------------------------------------------------------------------------------------|
| `[DATABASE_GROUP]` | A database credentials group. Read the instructions below to set it up |
| `[SLEEP_SUMMARY_LAST_NIGHT_END]` | Segments are assigned based on this parameter. Any sleep episodes starts between today's SLEEP_SUMMARY_LAST_NIGHT_END (LNE) and tomorrow's LNE is regarded as today's sleep episode. While today's bedtime is based on today's sleep episodes, today's wake time is based on yesterday's sleep episodes. |
| `[SLEEP_SUMMARY_LAST_NIGHT_END]` | Segments are assigned based on this parameter. Any sleep episodes that start between today's SLEEP_SUMMARY_LAST_NIGHT_END (LNE) and tomorrow's LNE are regarded as today's sleep episodes. While today's bedtime is based on today's sleep episodes, today's wake time is based on yesterday's sleep episodes. |
--8<---- "docs/snippets/database.md"
@ -557,7 +557,7 @@ Modify the following keys in your `config.yaml` depending on the [data stream](.
| Key | Description |
|---------------------|----------------------------------------------------------------------------------------------------------------------------|
| `[FOLDER]` | Folder where you have to place a CSV file **per** Fitbit sensor. Each file has to contain all the data from every participant you want to process. |
| `[SLEEP_SUMMARY_LAST_NIGHT_END]` | Segments are assigned based on this parameter. Any sleep episodes starts between today's SLEEP_SUMMARY_LAST_NIGHT_END (LNE) and tomorrow's LNE is regarded as today's sleep episode. While today's bedtime is based on today's sleep episodes, today's wake time is based on yesterday's sleep episodes. |
| `[SLEEP_SUMMARY_LAST_NIGHT_END]` | Segments are assigned based on this parameter. Any sleep episodes that start between today's SLEEP_SUMMARY_LAST_NIGHT_END (LNE) and tomorrow's LNE are regarded as today's sleep episodes. While today's bedtime is based on today's sleep episodes, today's wake time is based on yesterday's sleep episodes. |
=== "Empatica"
@ -580,7 +580,7 @@ Modify the following keys in your `config.yaml` depending on the [data stream](.
| `[FOLDER]` | The relative path to a folder containing one subfolder per participant. The name of a participant folder should match their device_id assigned in their participant file. Each participant folder can have one or more zip files with any name; in other words, the sensor data in those zip files belong to a single participant. The zip files are [automatically](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) generated by Empatica and have a CSV file per sensor (`ACC`, `HR`, `TEMP`, `EDA`, `BVP`, `TAGS`). All CSV files of the same type contained in one or more zip files are uncompressed, parsed, sorted by timestamp, and joined together.|
??? example "Example of an EMPATICA FOLDER"
In the file tree below, we want to process three participants' data: `p01`, `p02`, and `p03`. `p01` has two zip files, `p02` has only one zip file, and `p03` has three zip files. Each zip has a CSV file per sensor that are joined together and processed by RAPIDS.
In the file tree below, we want to process three participants' data: `p01`, `p02`, and `p03`. `p01` has two zip files, `p02` has only one zip file, and `p03` has three zip files. Each zip has a CSV file per sensor that is joined together and processed by RAPIDS.
```bash
data/ # this folder exists in the root RAPIDS folder

56
docs/setup/overview.md

@ -4,49 +4,47 @@ Let's review some key concepts we use throughout these docs:
|Definition&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Description|
|--|--|
|Data Stream|Set of sensor data collected using a specific type of **device** with a specific **format** and stored in a specific **container**. For example, smartphone (device) data collected with the [AWARE Framework](https://awareframework.com/) (format) and stored in a MySQL database (container).|
|Device| A mobile or wearable device, like smartphones, Fitbit wrist bands, Oura Rings, etc.|
|Sensor| A physical or digital module builtin in a device that produces a data stream. For example, a smartphone's accelerometer or screen.
|Format| A file in RAPIDS that describes how sensor data from a device matches RAPIDS data representation.|
|Container|An electronic repository of data, it can be a database, a file, a Web API, etc. RAPIDS connects to containers through container scripts.|
|Data Stream|Set of sensor data collected using a specific **device** with a particular ** format** and stored in a specific **container**. For example, smartphone (device) data collected with the [AWARE Framework](https://awareframework.com/) (format) and stored in a MySQL database (container).|
|Data Stream Format| Sensor data produced by a data stream have columns with specific names and types. RAPIDS can process a data stream using a `format.yaml` file that describes the raw data columns and any necessary transformations.|
|Data Stream Container|Sensor data produced by a data stream can be stored in a database, electronic files, or arbitrary electronic containers. RAPIDS can pull (download) the data from a stream using a container script implemented in R or Python.|
|Participant|A person that took part in a monitoring study|
|Behavioral feature| A metric computed from raw sensor data quantifying the behavior of a participant. For example, time spent at home computed from location data. These are also known as digital biomarkers|
|Behavioral feature| A metric computed from raw sensor data quantifying the behavior of a participant. For example, time spent at home calculated from location data. These are also known as digital biomarkers|
|Time segment| Time segments (or epochs) are the time windows on which RAPIDS extracts behavioral features. For example, you might want to compute participants' time at home every morning or only during weekends. You define time segments in a CSV file that RAPIDS processes.|
|Time zone| A string code like `America/New_York` that represents a time zone where a device logged data. You can process data collected in single or multiple time zones.|
|Provider| A script that creates behavioral features for a specific sensor. Providers are created by the core RAPIDS team or by the community, which are named after its first author like [[PHONE_LOCATIONS][DORYAB]](../../features/phone-locations/#doryab-provider).|
|config.yaml| A YAML file where you can modify parameters to process data streams and behavioral features. This is the heart of RAPIDS and the file that you will modify the most.|
|Time zone| A string like `America/New_York` that represents a time zone where a device logged data. You can process data collected in single or multiple time zones for every participant.|
|Feature Provider| A script that creates behavioral features for a specific sensor. Providers are created by the core RAPIDS team or by the community, which are named after its first author like [[PHONE_LOCATIONS][DORYAB]](../../features/phone-locations/#doryab-provider).|
|config.yaml| A YAML file where you can modify parameters to process data streams and behavioral features. This is the heart of RAPIDS and the file that you will change the most.|
|credentials.yaml| A YAML file where you can define credential groups (user, password, host, etc.) if your data stream needs to connect to a database or Web API|
|Participant file(s)| A YAML file that links one or more smartphone or wearable devices that a single participant used. RAPIDS needs one file per participant. |
|Participant file(s)| A YAML file that links one or more smartphone or wearable devices used by a single participant. RAPIDS needs one file per participant. |
!!! success "What can I do with RAPIDS?"
You can do one or more of these things with RAPIDS:
1. [Extract behavioral features](../../features/feature-introduction/) from smartphone, Fitbit, and Empatica's [supported data streams](../../datastreams/data-streams-introduction/)
1. [Add your own behavioral features](../../features/add-new-features/) (we can include them in RAPIDS if you want to share them with the community)
1. [Add support for new data streams](../../datastreams/add-new-data-streams/) if yours cannot be processed by RAPIDS yet
1. Create visualizations for [data quality control](../../visualizations/data-quality-visualizations/) and [feature inspection](../../visualizations/feature-visualizations/)
1. [Extending RAPIDS to organize your analysis](../../workflow-examples/analysis/) and publish a code repository along with your code
- [Extract behavioral features](../../features/feature-introduction/) from smartphone, Fitbit, and Empatica's [supported data streams](../../datastreams/data-streams-introduction/)
- [Add your own behavioral features](../../features/add-new-features/) (we can include them in RAPIDS if you want to share them with the community)
- [Add support for new data streams](../../datastreams/add-new-data-streams/) if yours cannot be processed by RAPIDS yet
- Create visualizations for [data quality control](../../visualizations/data-quality-visualizations/) and [feature inspection](../../visualizations/feature-visualizations/)
- [Extending RAPIDS to organize your analysis](../../workflow-examples/analysis/) and publish a code repository along with your code
!!! hint
- **In order to follow any of the previous tutorials, you will have to [Install](../installation/), [Configure](../configuration/), and learn how to [Execute](../execution/) RAPIDS.**
- We recommend you follow the [Minimal Example](../../workflow-examples/minimal/) tutorial to get familiar with RAPIDS
- [Email us](../../team), leave a comment in these docs, create a [Github issue](https://github.com/carissalow/rapids/issues) or text us in [Slack](http://awareframework.com:3000/) if you have any questions
- In order to follow any of the previous tutorials, you will have to [Install](../installation/), [Configure](../configuration/), and learn how to [Execute](../execution/) RAPIDS.
- [Open a new discussion](https://github.com/carissalow/rapids/discussions) in Github if you have any questions and [open an issue](https://github.com/carissalow/rapids/issues) to report any bugs.
## Frequently Asked Questions
### General
??? question "What exactly is RAPIDS?"
RAPIDS is a group of configuration files and R and Python scripts that are executed by [Snakemake](https://snakemake.github.io/). You can get a copy of RAPIDS by cloning our Github repository.
RAPIDS is a group of configuration files and R and Python scripts executed by [Snakemake](https://snakemake.github.io/). You can get a copy of RAPIDS by cloning our Github repository.
RAPIDS is not a web application or server; all the processing is done in your laptop, server, or computer cluster.
??? question "How does RAPIDS work?"
You will most of the time only have to modify configuration files in YAML format (`config.yaml`, `credentials.yaml`, and participant files `pxx.yaml`), and in CSV format (time zones and time segments).
RAPIDS pulls data from different data containers and processes it in steps. The input/output of each step is saved as a CSV file for inspection; you can check the files that are created for each sensor on its documentation page.
RAPIDS pulls data from different data containers and processes it in steps. The input/output of each stage is saved as a CSV file for inspection; you can check the files created for each sensor on its documentation page.
All data is stored in `data/`, and all processing Python and R scripts are stored in `src/`.
@ -65,7 +63,7 @@ Let's review some key concepts we use throughout these docs:
RAPIDS can connect to these containers if it has a `format.yaml` and a `container.[R|py]` script used to pull the correct data and mutate it to comply with RAPIDS' internal data representation. Once the data stream is in RAPIDS, it goes through some basic transformations (scripts), one that assigns a time segment and a time zone to each data row, and another one that creates "episodes" of data for some sensors that need it (like screen, battery, activity recognition, and sleep intraday data).
After this, RAPIDS executes the requested `PROVIDER` script that computes behavioral features per time segment instance. After every feature is computed, they are joined per sensor, per participant, and study. Visualizations are built based on raw data or based on computed features.
After this, RAPIDS executes the requested `PROVIDER` script that computes behavioral features per time segment instance. After every feature is computed, they are joined per sensor, per participant, and study. Visualizations are built based on raw data or based on calculated features.
<figure>
<img src="../../img/dataflow.png" max-width="50%" />
@ -73,7 +71,7 @@ Let's review some key concepts we use throughout these docs:
</figure>
??? question "Is my data private?"
Absolutely, you are processing your data with your own copy of RAPIDS in your laptop, server, or computer cluster, so neither we nor anyone else can have access to your datasets.
Absolutely, you are processing your data with your own copy of RAPIDS in your laptop, server, or computer cluster, so neither we nor anyone else can access your datasets.
??? question "Do I need to have coding skills to use RAPIDS?"
If you want to extract the behavioral features or visualizations that RAPIDS offers out of the box, the answer is no. However, you need to be comfortable running commands in your terminal and familiar with editing YAML files and CSV files.
@ -93,25 +91,25 @@ Let's review some key concepts we use throughout these docs:
We believe RAPIDS can benefit your analysis in several ways:
- RAPIDS has more than 250 [behavioral features](../../features/add-new-features/) available, many of them tested and used by other researchers.
- RAPIDS can extract features in dynamic [time segments](../../setup/configuration/#time-segments) (for example, every x minutes, x hours, x days, x weeks, x months, etc.). This is handy because you don't have to deal with time zones, day light saving changes, or date arithmetic.
- RAPIDS can extract features in dynamic [time segments](../../setup/configuration/#time-segments) (for example, every x minutes, x hours, x days, x weeks, x months, etc.). This is handy because you don't have to deal with time zones, daylight saving changes, or date arithmetic.
- Your analysis is less prone to errors. Every participant sensor dataset is analyzed in the same way and isolated from each other.
- If you have lots of data, out-of-the-box parallel execution will speed up your analysis and if your computer crashes, RAPIDS will start from where it left of.
- You can publish your analysis code along with your papers and be sure it will run exactly as it does in your computer.
- If you have lots of data, out-of-the-box parallel execution will speed up your analysis, and if your computer crashes, RAPIDS will start from where it left off.
- You can publish your analysis code along with your papers and be sure it will run exactly as it does on your computer.
- You can still add your own [behavioral features](../../features/add-new-features/) and [data streams](../../datastreams/add-new-data-streams/) if you need to, and the community will be able to reuse your work.
### Data Streams
??? question "Can I process smartphone data collected with Beiwe, PurpleRobot, or app X?"
Yes, but you need to add a new data stream to RAPIDS (a new `format.yaml` and container script in R or Python). Follow this [tutorial](../../datastreams/add-new-data-streams/). [Email us](../../team), create a [Github issue](https://github.com/carissalow/rapids/issues) or text us in [Slack](http://awareframework.com:3000/) if you have any questions.
Yes, but you need to add a new data stream to RAPIDS (a new `format.yaml` and container script in R or Python). Follow this [tutorial](../../datastreams/add-new-data-streams/). [Open a new discussion](https://github.com/carissalow/rapids/discussions) in Github if you have any questions.
If you do so, let us know so we can integrate your work into RAPIDS.
??? question "Can I process data from Oura Rings, Actigraphs, or wearable X?"
The only wearables we support at the moment are Empatica and Fitbit. However, get in touch if you need to process data from a different wearable. We have limited resources so we add support for different devices on an as-needed basis, but we would be happy to collaborate with you to add new wearables. [Email us](../../team), create a [Github issue](https://github.com/carissalow/rapids/issues) or text us in [Slack](http://awareframework.com:3000/) if you have any questions.
The only wearables we support at the moment are Empatica and Fitbit. However, get in touch if you need to process data from a different wearable. We have limited resources, so we add support for additional devices on an as-needed basis, but we would be happy to collaborate. [Open a new discussion](https://github.com/carissalow/rapids/discussions) in Github if you have any questions.
??? question "Can I process smartphone or wearable data stored in PostgreSQL, Oracle, SQLite, CSV files, or data container X?"
Yes, but you need to add a new data stream to RAPIDS (a new `format.yaml` and container script in R or Python). Follow this [tutorial](../../datastreams/add-new-data-streams/). If you are processing data streams we already support like AWARE, Fitbit, or Empatica and are just connecting to a different container; you can reuse their `format.yaml` and only implement a new container script. [Email us](../../team), create a [Github issue](https://github.com/carissalow/rapids/issues) or text us in [Slack](http://awareframework.com:3000/) if you have any questions.
Yes, but you need to add a new data stream to RAPIDS (a new `format.yaml` and container script in R or Python). Follow this [tutorial](../../datastreams/add-new-data-streams/). If you are processing data streams we already support like AWARE, Fitbit, or Empatica and are just connecting to a different container, you can reuse their `format.yaml` and only implement a new container script. [Open a new discussion](https://github.com/carissalow/rapids/discussions) in Github if you have any questions.
If you do so, let us know so we can integrate your work into RAPIDS.
@ -138,4 +136,4 @@ Let's review some key concepts we use throughout these docs:
Yes, you don't need to write any code to use RAPIDS out of the box. If you need to add support for new [data streams](../../datastreams/add-new-data-streams/) or [behavioral features](../../features/add-new-features/) you can use scripts in either language.
??? question "I have scripts that clean raw data from X sensor, can I use them with RAPIDS?"
Yes, you can add them as a [`[MUTATION][SCRIPT]`](../../datastreams/add-new-data-streams/#complex-mapping) in the `format.yaml` of the [data stream](../../datastreams/data-streams-introduction/) you are using. You will add a `main` function that will receive a data frame with the raw data for that sensor that in turn will be used to compute behavioral features.
Yes, you can add them as a [`[MUTATION][SCRIPT]`](../../datastreams/add-new-data-streams/#complex-mapping) in the `format.yaml` of the [data stream](../../datastreams/data-streams-introduction/) you are using. You will add a `main` function that will receive a data frame with the raw data for that sensor that, in turn, will be used to compute behavioral features.

18
docs/stylesheets/extra.css

@ -31,3 +31,21 @@ div[data-md-component=announce]>div#announce-msg>a{
min-width: 0rem;
}
/* Users and contributors grid */
.users {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(150px, 1fr));
grid-template-rows: auto;
grid-gap: 15px;
}
.users > div {
display: flex;
justify-content: center;
align-items: center;
}
.users > div > img {
max-height: 100px;
object-fit: contain;
}

9
mkdocs.yml

@ -39,7 +39,7 @@ extra:
provider: mike
social:
- icon: fontawesome/brands/twitter
link: 'https://twitter.com/julio_ui'
link: 'https://twitter.com/RAPIDS_Science'
extra_javascript:
- https://polyfill.io/v3/polyfill.min.js?features=es6
- https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js
@ -73,13 +73,14 @@ extra_css:
- stylesheets/extra.css
nav:
- Home: 'index.md'
- Overview: setup/overview.md
- Minimal Example: workflow-examples/minimal.md
- Citation: citation.md
- Contributing: contributing.md
- Setup:
- Overview: setup/overview.md
- Minimal Example: workflow-examples/minimal.md
- Installation: 'setup/installation.md'
- Configuration: setup/configuration.md
- Execution: setup/execution.md
- Citation: citation.md
- Data Streams:
- Introduction: datastreams/data-streams-introduction.md
- Phone:

Loading…
Cancel
Save