Add file structure docs

pull/103/head
JulioV 2020-11-04 14:11:46 -05:00
parent 1ea2730afd
commit ce55cbda4a
5 changed files with 24 additions and 12 deletions

View File

@ -0,0 +1,18 @@
# File Structure
All paths mentioned in this page are relative to RAPIDS' root folder.
If you want to extract the behavioral features that RAPIDS offers, you will only have to create or modify the [`.env` file](https://www.rapids.science/setup/configuration/#database-credentials), [participants files](https://www.rapids.science/setup/configuration/#participant-files), [day segment files](https://www.rapids.science/setup/configuration/#day-segments), and the `config.yaml` file. The `config.yaml` file is the heart of RAPIDS and includes parameters to manage participants, data sources, sensor data, visualizations and more.
All data is saved in `data/`. The `data/external/` folder stores any data imported or created by the user, `data/raw/` stores sensor data as imported from your database, `data/interim/` has intermediate files necessary to compute behavioral features from raw data, and `data/processed/` has all the final files with the behavioral features in folders per participant and sensor.
All the source code is saved in `src/`. The `src/data/` folder stores scripts to download, clean and pre-process sensor data, `src/features` has scripts to extract behavioral features organized in their respective subfolders , `src/models/` can host any script to create models or statistical analyses with the behavioral features you extract, and `src/visualization/` has scripts to create plots of the raw and processed data.
There are other important files and folders but only relevant if you are interested in extending RAPIDS (e.g. virtual env files, docs, tests, Dockerfile, the Snakefile, etc.). In the figure below, we represent the interactions between users and files. After a user modifies `config.yaml` and `.env` the `Snakefile` file will decide what Snakemake rules have to be executed to produce the required output files (behavioral features) and what scripts are in charge of producing such files. In addition, users can add or modifiy files in the `data` folder (for example to configure the [participants files](https://www.rapids.science/setup/configuration/#participant-files) or the [day segment files](https://www.rapids.science/setup/configuration/#day-segments)).
<figure>
<img src="/img/files.png" width="600" />
<figcaption>Interaction diagram between the user, and important files in RAPIDS</figcaption>
</figure>

BIN
docs/img/files.png 100644

Binary file not shown.

After

Width:  |  Height:  |  Size: 527 KiB

View File

@ -10,6 +10,8 @@ RAPIDS is open source, documented, modular, tested, and reproducible. At the mom
:fontawesome-solid-tasks: Join our discussions on our algorithms and assumptions for feature [processing](https://github.com/carissalow/rapids/issues?q=is%3Aissue+is%3Aopen+label%3Adiscussion). :fontawesome-solid-tasks: Join our discussions on our algorithms and assumptions for feature [processing](https://github.com/carissalow/rapids/issues?q=is%3Aissue+is%3Aopen+label%3Adiscussion).
:fontawesome-solid-play: Ready to start? Go to [Installation](https://www.rapids.science/setup/installation/) and then to [Initial Configuration](https://www.rapids.science/setup/configuration/)
## How does it work? ## How does it work?
RAPIDS is formed by R and Python scripts orchestrated by [Snakemake](https://snakemake.readthedocs.io/en/stable/). We suggest you read Snakemake's docs but in short: every link in the analysis chain is atomic and has files as input and output. Behavioral features are processed per sensor and per participant. RAPIDS is formed by R and Python scripts orchestrated by [Snakemake](https://snakemake.readthedocs.io/en/stable/). We suggest you read Snakemake's docs but in short: every link in the analysis chain is atomic and has files as input and output. Behavioral features are processed per sensor and per participant.
@ -26,15 +28,6 @@ RAPIDS is formed by R and Python scripts orchestrated by [Snakemake](https://sna
8. **Reproducible code**. You can be sure your code will run in other computers as intended thanks to R and Python virtual environments. You can share your analysis code along your publications without any overhead. 8. **Reproducible code**. You can be sure your code will run in other computers as intended thanks to R and Python virtual environments. You can share your analysis code along your publications without any overhead.
9. **Private**. All your data is processed locally. 9. **Private**. All your data is processed locally.
## How is it organized? ## How is it organized?
The `config.yaml` file is the only file that you will have to modify. It includes parameters to manage participants, data sources, sensor data, visualizations and more. In broad terms the `config.yaml`, `.env` files, participant files, and day segment files are the only ones that you will have to modify. All data is stored in `data/` and all scripts are stored in `src/`. For more information see RAPIDS' [File Structure](file-structure.md).
All data is saved in `data/`. The `data/external/` folder stores any data imported by the user, `data/raw/` stores sensor data as imported from your database, `data/interim/` has intermediate files necessary to compute behavioral features from raw data, and `data/processed/` has all the final files with the behavioral features per sensor and participant.
All the source code is saved in `src/`. The `src/data/` folder stores scripts to download, clean and pre-process sensor data, `src/features` has scripts to extract behavioral features organized in their respective subfolders , `src/models/` can host any script to create models or statistical analyses with the behavioral features you extract, and `src/visualization/` has scripts to create plots of the raw and processed data.
There are other important files and folders but only relevant if you are interested in extending RAPIDS (e.g. virtual env files, docs, tests, Dockerfile, the Snakefile, etc.).

View File

@ -9,11 +9,11 @@ This is a quick guide for creating and running a simple pipeline to extract miss
!!! info "Things to change on each configuration step" !!! info "Things to change on each configuration step"
1\. Setup your database connection credentials in `.env`. We assume your credentials group is called `MY_GROUP`. 1\. Setup your database connection credentials in `.env`. We assume your credentials group is called `MY_GROUP`.
2\. Set `America/New_York` as the timezone 2\. `America/New_York` should be the default timezone
3\. Create a participant file `p01.yaml` based on one of your participants and add `p01` to `[PIDS]` in `config.yaml` 3\. Create a participant file `p01.yaml` based on one of your participants and add `p01` to `[PIDS]` in `config.yaml`
4\. Set `[DAY_SEGMENTS][TYPE]` to `PERIODIC` and `FILE` to a file containing the following lines: 4\. `[DAY_SEGMENTS][TYPE]` should be the default `PERIODIC`. Change `[DAY_SEGMENTS][FILE]` with the path of a file containing the following lines:
```csv ```csv
label,start_time,length,repeats_on,repeats_value label,start_time,length,repeats_on,repeats_value
daily,00:00:00,23H 59M 59S,every_day,0 daily,00:00:00,23H 59M 59S,every_day,0

View File

@ -46,6 +46,7 @@ theme:
pages: pages:
- Home: 'index.md' - Home: 'index.md'
- File Structure: file-structure.md
- Setup: - Setup:
- Installation: 'setup/installation.md' - Installation: 'setup/installation.md'
- Initial Configuration: setup/configuration.md - Initial Configuration: setup/configuration.md