Add where to start guide and update docs

pull/128/head
JulioV 2021-03-12 15:17:43 -05:00
parent d529490999
commit 57bd1a75dc
7 changed files with 140 additions and 41 deletions

View File

@ -1,20 +0,0 @@
# File Structure
!!! tip
- Read this page if you want to learn more about how RAPIDS is structured. If you want to start using it go to [Installation](../setup/installation/), then to [Configuration](../setup/configuration/), and then to [Execution](../setup/execution/)
- All paths mentioned in this page are relative to RAPIDS' root folder.
If you want to extract the behavioral features that RAPIDS offers, you will only have to create or modify the [`.env` file](../setup/configuration/#database-credentials), [participants files](../setup/configuration/#participant-files), [time segment files](../setup/configuration/#time-segments), and the `config.yaml` file as instructed in the [Configuration page](../setup/configuration). The `config.yaml` file is the heart of RAPIDS and includes parameters to manage participants, data sources, sensor data, visualizations and more.
All data is saved in `data/`. The `data/external/` folder stores any data imported or created by the user, `data/raw/` stores sensor data as imported from your database, `data/interim/` has intermediate files necessary to compute behavioral features from raw data, and `data/processed/` has all the final files with the behavioral features in folders per participant and sensor.
RAPIDS source code is saved in `src/`. The `src/data/` folder stores scripts to download, clean and pre-process sensor data, `src/features` has scripts to extract behavioral features organized in their respective sensor subfolders , `src/models/` can host any script to create models or statistical analyses with the behavioral features you extract, and `src/visualization/` has scripts to create plots of the raw and processed data. There are other files and folders but only relevant if you are interested in extending RAPIDS (e.g. virtual env files, docs, tests, Dockerfile, the Snakefile, etc.).
In the figure below, we represent the interactions between users and files. After a user modifies the configuration files mentioned above, the `Snakefile` file will search for and execute the Snakemake rules that contain the Python or R scripts necessary to generate or update the required output files (behavioral features, plots, etc.).
<figure>
<img src="../img/files.png" max-width="100%" />
<figcaption>Interaction diagram between the user, and important files in RAPIDS</figcaption>
</figure>

View File

@ -2,7 +2,9 @@
Reproducible Analysis Pipeline for Data Streams (RAPIDS) allows you to process smartphone and wearable data to [extract](features/feature-introduction.md) and [create](features/add-new-features.md) **behavioral features** (a.k.a. digital biomarkers), [visualize](visualizations/data-quality-visualizations.md) mobile sensor data, and [structure](workflow-examples/analysis.md) your analysis into reproducible workflows.
RAPIDS is open source, documented, modular, tested, and reproducible. At the moment, we support data streams logged by smartphones, Fitbit wearables, and, in collaboration with the [DBDP](https://dbdp.org/), Empatica wearables. Read the [introduction to data streams](../../datastreams/data-streams-introduction) for more information on what specific data streams RAPIDS can process, and this tutorial if you want to [add support for new data streams](../../datastreams/add-new-data-streams).
RAPIDS is open source, documented, modular, tested, and reproducible. At the moment, we support [data streams](../../datastreams/data-streams-introduction) logged by smartphones, Fitbit wearables, and, in collaboration with the [DBDP](https://dbdp.org/), Empatica wearables (but you can [add your own]((../../datastreams/add-new-data-streams)) too).
**If you want to know more head over to [Where do I start?](../setup/where-do-i-start/)**
!!! tip
:material-slack: Questions or feedback can be posted on the \#rapids channel in AWARE Framework\'s [slack](http://awareframework.com:3000/).
@ -13,11 +15,7 @@ RAPIDS is open source, documented, modular, tested, and reproducible. At the mom
:fontawesome-solid-play: Ready to start? Go to [Installation](setup/installation/), then to [Configuration](setup/configuration/), and then to [Execution](setup/execution/)
:fontawesome-solid-sync-alt: Are you upgrading from RAPIDS [beta](https://rapidspitt.readthedocs.io/en/latest/)? Follow this [guide](migrating-from-old-versions)
## How does it work?
RAPIDS is formed by R and Python scripts orchestrated by [Snakemake](https://snakemake.readthedocs.io/en/stable/). We suggest you read Snakemake's docs but in short: every link in the analysis chain is atomic and has files as input and output. Behavioral features are processed per sensor and participant.
:fontawesome-solid-sync-alt: Are you upgrading from RAPIDS `0.4.x` or older? Follow this [guide](migrating-from-old-versions)
## What are the benefits of using RAPIDS?
@ -32,6 +30,3 @@ RAPIDS is formed by R and Python scripts orchestrated by [Snakemake](https://sna
11. **Reproducible code**. If you structure your analysis within RAPIDS, you can be sure your code will run in other computers as intended, thanks to R and Python virtual environments. You can share your analysis code along with your publications without any overhead.
12. **Private**. All your data is processed locally.
## How is it organized?
In broad terms the `config.yaml`, [`.env` file](setup/configuration/#database-credentials), [participants files](setup/configuration/#participant-files), and [time segment files](setup/configuration/#time-segments) are the only ones that you will have to modify. All data is stored in `data/` and all scripts are stored in `src/`. For more information see RAPIDS' [File Structure](file-structure.md).

View File

@ -21,7 +21,7 @@ When you are done with this configuration, go to [executing RAPIDS](../execution
A data stream refers to sensor data collected using a specific type of **device** with a specific **format** and stored in a specific **container**. For example, the `aware_mysql` data stream handles smartphone data (**device**) collected with the [AWARE Framework](https://awareframework.com/) (**format**) stored in a MySQL database (**container**).
Check the table in [introduction to data streams](../../datastreams/data-streams-introduction) to know what data streams we support. If your data stream is supported, continue to the next configuration section, **you will use its label later in this guide** (e.g. `aware_mysql`). If your steam is not supported but you want to implement it, follow this tutorial to [add support for new data streams](../../datastreams/add-new-data-streams) and get in touch by email or in Slack if you have any questions.
Check the table in [introduction to data streams](../../datastreams/data-streams-introduction) to know what data streams we support. If your data stream is supported, continue to the next configuration section, **you will use its label later in this guide** (e.g. `aware_mysql`). If your steam is not supported but you want to implement it, follow the tutorial to [add support for new data streams](../../datastreams/add-new-data-streams) and get in touch by email or in Slack if you have any questions.
---

View File

@ -16,19 +16,19 @@ After you have [installed](../installation) and [configured](../configuration) R
Any changes to the `config.yaml` file will be applied automatically and only the relevant files will be updated. This means that after modifying the features list for `PHONE_MESSAGE` for example, RAPIDS will execute the script that computes `MESSAGES` features and update its output file.
!!! hint "Multi-core"
You can run RAPIDS over multiple cores by modifying the `-j` argument (e.g. use `-j8` to use 8 cores). **However**, take into account that this means multiple sensor datasets for different participants will be loaded in memory at the same time. If RAPIDS crashes because it ran out of memory reduce the number of cores and try again.
You can run RAPIDS over multiple cores by modifying the `-j` argument (e.g. use `-j8` to use 8 cores). **However**, take into account that this means multiple sensor datasets for different participants will be loaded in memory at the same time. If RAPIDS crashes because it ran out of memory, reduce the number of cores and try again.
As reference, we have run RAPIDS over 12 cores and 32 Gb of RAM without problems for a study with 200 participants with 14 days of low-frequency smartphone data (no accelerometer, gyroscope, or magnetometer).
!!! hint "Deleting RAPIDS output"
If you want to delete all the output files RAPIDS produces you can execute the following command:
If you want to delete all the output files RAPIDS produces, you can execute the following command:
```bash
./rapids -j1 --delete-all-output
```
!!! hint "Forcing a complete rerun"
If you want to update your raw data or rerun the whole pipeline from scratch run the following commands:
!!! hint "Forcing a complete rerun or updating your raw data in RAPIDS"
If you want to update your raw data or rerun the whole pipeline from scratch, run the following commands:
```bash
./rapids -j1 --delete-all-output

View File

@ -0,0 +1,124 @@
# Where do I start?
Let's review some key concepts we use throughout these docs:
|Definition&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| Description|
|--|--|
|Data Stream|Set of sensor data collected using a specific type of **device** with a specific **format** and stored in a specific **container**. For example, smartphone (device) data collected with the [AWARE Framework](https://awareframework.com/) (format) and stored in a MySQL database (container).|
|Device| A mobile or wearable device, like smartphones, Fitbit wrist bands, Oura Rings, etc.|
|Sensor| A physical or digital module builtin in a device that produces a data stream. For example, a smartphone's accelerometer or screen.
|Format| A file in RAPIDS that describes how sensor data from a device matches RAPIDS data representation.|
|Container|An electronic repository of data, it can be a database, a file, a Web API, etc. RAPIDS connects to containers through container scripts.|
|Participant|A person that took part in a monitoring study|
|Behavioral feature| A metric computed from raw sensor data quantifying the behavior of a participant. For example, time spent at home computed from location data. These are also known as digital biomarkers|
|Time segment| Time segments (or epochs) are the time windows on which RAPIDS extracts behavioral features. For example, you might want to compute participants' time at home every morning or only during weekends. You define time segments in a CSV file that RAPIDS processes.|
|Time zone| A string code like `America/New_York` that represents a time zone where a device logged data. You can process data collected in single or multiple time zones.|
|Provider| A script that creates behavioral features for a specific sensor. Providers are created by the core RAPIDS team or by the community, which are named after its first author like [[PHONE_LOCATIONS][DORYAB]](../../features/phone-locations/#doryab-provider).|
|config.yaml| A YAML file where you can modify parameters to process data streams and behavioral features. This is the heart of RAPIDS and the file that you will modify the most.|
|credentials.yaml| A YAML file where you can define credential groups (user, password, host, etc.) if your data stream needs to connect to a database or Web API|
|Participant file| A YAML file that links one or more smartphone or wearable devices that a single participant used. RAPIDS needs one file per participant. |
RAPIDS functionality includes:
- [Extract behavioral features](../../features/feature-introduction/) from smartphone, Fitbit, and Empatica's [supported data streams](../../datastreams/data-streams-introduction/)
- [Add your own behavioral features](../../features/add-new-features/) (we can include them in RAPIDS if you want to share them with the community)
- [Add support for new data streams](../../datastreams/add-new-data-streams/) if yours cannot be processed by RAPIDS yet
- Create visualizations for [data quality control](../../visualizations/data-quality-visualizations/) and [feature inspection](../../visualizations/feature-visualizations/)
- [Extending RAPIDS to organize your analysis](../../workflow-examples/analysis/) and publish a code repository along with your code
!!! hint
- If you want to use RAPIDS for any of the above, you will have to [Install](../installation/), [Configure](../configuration/), and learn how to [Execute](../execution/) it.
- We also recommend you follow the [Minimal Example](../../workflow-examples/minimal/) tutorial to get familiar with RAPIDS
- [Email us](../../team), create a [Github issue](https://github.com/carissalow/rapids/issues) or text us in [Slack](http://awareframework.com:3000/) if you have any questions
## Frequently Asked Questions
### General
??? info "What exactly is RAPIDS?"
RAPIDS is a group of configuration files and R and Python scripts that are executed by [Snakemake](https://snakemake.github.io/). You can get a copy of RAPIDS by cloning our Github repository.
RAPIDS is not a web application or server; all the processing is done in your laptop, server, or computer cluster.
??? info "How does RAPIDS work?"
You will most of the time only have to modify configuration files in YAML format (`config.yaml`, `credentials.yaml`, and participant files `pxx.yaml`), and in CSV format (time zones and time segments).
RAPIDS pulls data from different data containers and processes it in steps. The input/output of each step is saved as a CSV file for inspection. All data is stored in `data/`, and all processing Python and R scripts are stored in `src/`.
In the figure below, we represent the interactions between users and files. After a user modifies the configuration files mentioned above, the `Snakefile` file will search for and execute the Snakemake rules that contain the Python or R scripts necessary to generate or update the required output files (behavioral features, plots, etc.).
<figure>
<img src="../../img/files.png" max-width="50%" />
<figcaption>Interaction diagram between the user, and important files in RAPIDS</figcaption>
</figure>
??? info "Is my data private?"
Absolutely, you are processing your data with your own copy of RAPIDS in your laptop, server, or computer cluster, so neither we nor anyone else can have access to your datasets.
??? info "Do I need to have coding skills to use RAPIDS?"
If you want to extract the behavioral features or visualizations that RAPIDS offers out of the box, the answer is no. However, you need to be comfortable running commands in your terminal and familiar with editing YAML files and CSV files.
If you want to add support for new data streams or behavioral features, you need to be familiar with R or Python.
??? info "Is RAPIDS open-source or free?"
Yes, RAPIDS is both open-source and free.
??? info "How do I cite RAPIDS?"
Please refer to our [Citation guide](../../citation/); depending on what parts of RAPIDS you used, we also ask you to cite the work of other authors that shared their work.
??? info "I have a lot of data, can RAPIDS handle it/ is RAPIDS fast enough?"
Yes, we use Snakemake under the hood, so you can automatically distribute RAPIDS execution over multiple [cores](../execution/) or [clusters](https://snakemake.readthedocs.io/en/stable/executing/cluster.html). RAPIDS processes data per sensor and participant, so it can take advantage of this parallel processing.
??? info "What are the advantages of using RAPIDS over implementing my own analysis code?"
We believe RAPIDS can benefit your analysis in several ways:
- RAPIDS has more than 250 [behavioral features](../../features/add-new-features/) available, many of them tested and used by other researchers.
- RAPIDS can extract features in dynamic [time segments](../../setup/configuration/#time-segments) (for example, every x minutes, x hours, x days, x weeks, x months, etc.). This is handy because you don't have to deal with time zones, light saving changes, or date arithmetic.
- Your analysis is less prone to errors. Every participant sensor dataset is analyzed in the same way and isolated from each other.
- If you have lots of data, out-of-the-box parallel execution will speed up your analysis and if your computer crashes, RAPIDS will start from where it left of.
- You can publish your analysis code along with your papers and be sure it will run exactly as it does in your computer.
- You can still add your own [behavioral features](../../features/add-new-features/) and [data streams](../../datastreams/add-new-data-streams/) if you need to, and the community will be able to reuse your work.
### Data Streams
??? info "Can I process smartphone data collected with Beiwe, PurpleRobot, or app X?"
Yes, but you need to add a new data stream to RAPIDS (a new `format.yaml` and container script in R or Python). Follow this [tutorial](../../datastreams/add-new-data-streams/). [Email us](../../team), create a [Github issue](https://github.com/carissalow/rapids/issues) or text us in [Slack](http://awareframework.com:3000/) if you have any questions.
If you do so, let us know so we can integrate your work into RAPIDS.
??? info "Can I process data from Oura Rings, Actigraphs, or wearable X?"
The only wearables we support at the moment are Empatica and Fitbit. However, get in touch if you need to process data from a different wearable. We have limited resources so we add support for different devices on an as-needed basis, but we would be happy to collaborate with you to add new wearables. [Email us](../../team), create a [Github issue](https://github.com/carissalow/rapids/issues) or text us in [Slack](http://awareframework.com:3000/) if you have any questions.
??? info "Can I process smartphone or wearable data stored in PostgreSQL, Oracle, SQLite, CSV files, or data container X?"
Yes, but you need to add a new data stream to RAPIDS (a new `format.yaml` and container script in R or Python). Follow this [tutorial](../../datastreams/add-new-data-streams/). If you are processing data streams we already support like AWARE, Fitbit, or Empatica and are just connecting to a different container; you can reuse their `format.yaml` and only implement a new container script. [Email us](../../team), create a [Github issue](https://github.com/carissalow/rapids/issues) or text us in [Slack](http://awareframework.com:3000/) if you have any questions.
If you do so, let us know so we can integrate your work into RAPIDS.
??? info "I have participants that live in different time zones and some that travel; can RAPIDS handle this?"
Yes, RAPIDS can handle [single or multiple timezones](../../setup/configuration/#timezone-of-your-study) per participant. You can use time zone data collected by smartphones or collected by hand.
??? info "Some of my participants used more than one device during my study; can RAPIDS handle this?"
Yes, you can link more than one smartphone or wearable device to a single participant. RAPIDS will merge them and sort them automatically.
??? info "Some of my participants switched from Android to iOS or vice-versa during my study; can RAPIDS handle this?"
Yes, data from multiple smartphones can be linked to a single participant. All iOS data is converted to Android data before merging it.
### Extending RAPIDS
??? info "Can I add my own behavioral features/digital biomarkers?"
Yes, you can implement your own features in R or Python following this [tutorial](../../features/add-new-features/)
??? info "Can I extract behavioral features based on two or more sensors?"
Yes, we do this for `PHONE_DATA_YIELD` (combines all phone sensors), `PHONE_LOCATIONS` (combines location and data yield data), `PHONE_APPLICATIONS_BACKGROUND` (combines screen and app usage data), and `FITBIT_INTRADAY_STEPS` (combines Fitbit and sleep and step data).
However, we haven't come up with a user-friendly way to configure this, and currently, we join sensors on a case-by-case basis. This is mainly because not enough users have needed this functionality so far. Get in touch, and we can set it up together; the more use cases we are aware of, the easier it will be to integrate this into RAPIDS.
??? info "I know how to program in Python or R but not both. Can I still use or extend RAPIDS?"
Yes, you don't need to write any code to use RAPIDS out of the box. If you need to add support for new [data streams](../../datastreams/add-new-data-streams/) or [behavioral features](../../features/add-new-features/) you can use scripts in either language.
??? info "I have scripts that clean data from X sensor, can I use them with RAPIDS?"
Yes, you can add them as a [`[MUTATION][SCRIPT]`](../../datastreams/add-new-data-streams/#complex-mapping) in the `format.yaml` of the [data stream](../../datastreams/data-streams-introduction/) you are using. You will add a `main` function that will receive a data frame with the raw data for that sensor.

View File

@ -1,7 +1,7 @@
Minimal Working Example
=======================
This is a quick guide for creating and running a simple pipeline to extract missing, outgoing, and incoming `call` features for `daily` (`00:00:00` to `23:59:59`) and `night` (`00:00:00` to `05:59:59`) epochs of every day of data of one participant monitored on the US East coast with an Android smartphone.
This is a quick guide for creating and running a simple pipeline to extract missing, outgoing, and incoming `call` features for `24 hr` (`00:00:00` to `23:59:59`) and `night` (`00:00:00` to `05:59:59`) time segments of every day of data of one participant that was monitored on the US East coast with an Android smartphone.
1. Install RAPIDS and make sure your `conda` environment is active (see [Installation](../../setup/installation))
3. Download this [CSV file](../img/calls.csv) and save it as `data/external/aware_csv/calls.csv`
@ -10,7 +10,7 @@ This is a quick guide for creating and running a simple pipeline to extract miss
??? info "Required configuration changes"
1. **Supported [data streams](../../setup/configuration#supported-data-streams).**
We identified that we will use the `aware_csv` data stream because we are processing aware data saved in a CSV file. We will use this label in a later step.
Based on the docs, we decided to use the `aware_csv` data stream because we are processing aware data saved in a CSV file. We will use this label in a later step; there's no need to type it or save it anywhere yet.
3. **Create your [participants file](../../setup/configuration#participant-files).**

View File

@ -73,14 +73,12 @@ extra_css:
nav:
- Home: 'index.md'
- Setup:
- File Structure: file-structure.md
- Where do I start?: setup/where-do-i-start.md
- Minimal Example: workflow-examples/minimal.md
- Installation: 'setup/installation.md'
- Configuration: setup/configuration.md
- Execution: setup/execution.md
- Citation: citation.md
- Example Workflows:
- Minimal: workflow-examples/minimal.md
- Analysis: workflow-examples/analysis.md
- Data Streams:
- Introduction: datastreams/data-streams-introduction.md
- Phone:
@ -138,6 +136,8 @@ nav:
- Visualizations:
- Data Quality: visualizations/data-quality-visualizations.md
- Features: visualizations/feature-visualizations.md
- Analysis Workflows:
- Complete Example: workflow-examples/analysis.md
- Developers:
- Git Flow: developers/git-flow.md
- Remote Support: developers/remote-support.md
@ -147,7 +147,7 @@ nav:
- Test cases: developers/test-cases.md
- Validation schema of config.yaml: developers/validation-schema-config.md
- Others:
- Migrating from beta: migrating-from-old-versions.md
- Migrating from an old version: migrating-from-old-versions.md
- Code of Conduct: code_of_conduct.md
- FAQ: faq.md
- Team: team.md