Update CSV example links and feature introduction

pull/128/head
JulioV 2021-03-14 16:13:57 -04:00
parent 8583fa1db0
commit 5f355560de
6 changed files with 127 additions and 54 deletions

View File

@ -5,6 +5,8 @@ This [data stream](../../datastreams/data-streams-introduction) handles iOS and
!!! warning
The CSV files have to use `,` as separator, `\` as escape character (do not escape `"` with `""`), and wrap any string columns with `"`.
See examples in the CSV files inside [rapids_example_csv.zip](https://osf.io/wbg23/)
??? example "Example of a valid CSV file"
```csv
"_id","timestamp","device_id","activities","confidence","stationary","walking","running","automotive","cycling","unknown","label"

View File

@ -2,6 +2,8 @@
This is a description of the format RAPIDS needs to process data for the following PHONE sensors.
See examples in the CSV files inside [rapids_example_csv.zip](https://osf.io/wbg23/)
??? info "PHONE_ACCELEROMETER"
| RAPIDS column | Description |

View File

@ -9,50 +9,36 @@ Every device sensor has a corresponding config section in `config.yaml`, these s
- In short, to extract features offered by a provider, you need to set its `[COMPUTE]` flag to `TRUE`, configure any of its parameters, and [execute](../../setup/execution) RAPIDS.
!!! example "Config section example for `PHONE_ACCELEROMETER`"
### Explaining the config.yaml sensor sections with an example
```yaml
# 1) Config section
PHONE_ACCELEROMETER:
# 2) Parameters for PHONE_ACCELEROMETER
CONTAINER: accelerometer
Each sensor section follows the same structure. Click on the numbered markers to know more.
# 3) Providers for PHONE_ACCELEROMETER
PROVIDERS:
# 4) RAPIDS provider
RAPIDS:
# 4.1) Parameters of RAPIDS provider of PHONE_ACCELEROMETER
COMPUTE: False
# 4.2) Features of RAPIDS provider of PHONE_ACCELEROMETER
FEATURES: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
SRC_FOLDER: "rapids" # inside src/features/phone_accelerometer
SRC_LANGUAGE: "python"
# 5) PANDA provider
PANDA:
# 5.1) Parameters of PANDA provider of PHONE_ACCELEROMETER
COMPUTE: False
VALID_SENSED_MINUTES: False
# 5.2) Features of PANDA provider of PHONE_ACCELEROMETER
FEATURES:
exertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
nonexertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
SRC_FOLDER: "panda" # inside src/features/phone_accelerometer
SRC_LANGUAGE: "python"
```
``` { .yaml .annotate }
PHONE_ACCELEROMETER: # (1)
## Sensor Parameters
Each sensor configuration section has a "parameters" subsection (see `#2` in the example). These are parameters that affect different aspects of how the raw data is downloaded, and processed. The `CONTAINER` parameter exists for every sensor, but some sensors will have extra parameters like [`[PHONE_LOCATIONS]`](../phone-locations/). We explain these parameters in a table at the top of each sensor documentation page.
CONTAINER: accelerometer # (2)
## Sensor Providers
Each sensor configuration section can have zero, one or more behavioral feature **providers** (see `#3` in the example). A provider is a script created by the core RAPIDS team or other researchers that extracts behavioral features for that sensor. In this example, accelerometer has two providers: RAPIDS (see `#4`) and PANDA (see `#5`).
PROVIDERS: # (3)
RAPIDS:
COMPUTE: False # (4)
FEATURES: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
### Provider Parameters
Each provider has parameters that affect the computation of the behavioral features it offers (see `#4.1` or `#5.1` in the example). These parameters will include at least a `[COMPUTE]` flag that you switch to `True` to extract a provider's behavioral features.
SRC_FOLDER: "rapids"
SRC_LANGUAGE: "python"
PANDA:
COMPUTE: False
VALID_SENSED_MINUTES: False
FEATURES: # (5)
exertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
nonexertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
We explain every provider's parameter in a table under the `Parameters description` heading on each provider documentation page.
SRC_FOLDER: "panda"
SRC_LANGUAGE: "python" # (6)
```
### Provider Features
Each provider offers a set of behavioral features (see `#4.2` or `#5.2` in the example). For some providers these features are grouped in an array (like those for `RAPIDS` provider in `#4.2`) but for others they are grouped in a collection of arrays depending on the meaning and purpose of those features (like those for `PANDAS` provider in `#5.2`). In either case, you can delete the features you are not interested in and they will not be included in the sensor's output feature file.
--8<--- "docs/snippets/feature_introduction_example.md"
We explain each behavioral feature in a table under the `Features description` heading on each provider documentation page.
These are descriptions of each marker for accessibility:
--8<--- "docs/snippets/feature_introduction_example.md"

View File

@ -0,0 +1,41 @@
1. **Sensor section**
Each sensor (accelerometer, screen, etc.) of every supported device (smartphone, Fitbit, etc.) has a section in the `config.yaml` with `parameters` and feature `PROVIDERS`.
2. **Sensor Parameters.**
Each sensor section has one or more parameters. These are parameters that affect different aspects of how the raw data is pulled, and processed.
The `CONTAINER` parameter exists for every sensor, but some sensors will have extra parameters like [`[PHONE_LOCATIONS]`](../phone-locations/).
We explain these parameters in a table at the top of each sensor documentation page.
3. **Sensor Providers**
Each object in this list represents a feature `PROVIDER`. Each sensor can have zero, one, or more providers.
A `PROVIDER` is a script that creates behavioral features for a specific sensor. Providers are created by the core RAPIDS team or by the community, which are named after its first author like [[PHONE_LOCATIONS][DORYAB]](../../features/phone-locations/#doryab-provider).
In this example, there are two accelerometer feature providers `RAPIDS` and `PANDA`.
4. **`PROVIDER` Parameters**
Each `PROVIDER` has parameters that affect the computation of the behavioral features it offers.
These parameters include at least a `[COMPUTE]` flag that you switch to `True` to extract a provider's behavioral features.
We explain every provider's parameter in a table under the `Parameters description` heading on each provider documentation page.
5. **`PROVIDER` Features**
Each `PROVIDER` offers a set of behavioral features.
These features are grouped in an array for some providers, like those for `RAPIDS` provider. For others, they are grouped in a collection of arrays, like those for `PANDAS` provider.
In either case, you can delete the features you are not interested in, and they will not be included in the sensor's output feature file.
We explain each behavioral feature in a table under the `Features description` heading on each provider documentation page.
6. **`PROVIDER` script**
Each `PROVIDER` has a `SRC_FOLDER` and `SRC_LANGUAGE` that point to the script implementing the features of this `PROVIDER`.

View File

@ -27,7 +27,7 @@ Our example is based on a hypothetical study that recruited 2 participants that
The goal of this workflow is to find out if we can predict the daily symptom burden score of a participant. Thus, we framed this question as a binary classification problem with two classes, high and low symptom burden based on the scores above and below average of each participant. We also want to compare the performance of individual (personalized) models vs a population model.
In total, our example workflow has nine steps that are in charge of sensor data preprocessing, feature extraction, feature cleaning, machine learning model training and model evaluation (see figure below). We ship this workflow with RAPIDS and share a database with [test data](https://osf.io/skqfv/files/) in an Open Science Framework repository.
In total, our example workflow has nine steps that are in charge of sensor data preprocessing, feature extraction, feature cleaning, machine learning model training and model evaluation (see figure below). We ship this workflow with RAPIDS and share files with [test data](https://osf.io/wbg23/) in an Open Science Framework repository.
<figure>
<img src="../../img/analysis_workflow.png" max-width="100%" />
@ -37,7 +37,7 @@ In total, our example workflow has nine steps that are in charge of sensor data
## Configure and run the analysis workflow example
1. [Install](../../setup/installation) RAPIDS
2. *Skip this step if you are using RAPIDS docker container*. Unzip the [test database](https://osf.io/skqfv/files/) as `example_workflow` folder and move it to `data/external/` directory.
2. Unzip the CSV files inside [rapids_example_csv.zip](https://osf.io/wbg23/) in `data/external/example_workflow/*.csv`.
3. Create the participant files for this example by running:
```bash
./rapids -j1 create_example_participant_files
@ -47,6 +47,8 @@ In total, our example workflow has nine steps that are in charge of sensor data
./rapids -j1 --profile example_profile
```
Note you will see a lot of warning messages, you can ignore them since they happen because we ran ML algorithms with a small fake dataset.
## Modules of our analysis workflow example
??? info "1. Feature extraction"

View File

@ -7,14 +7,14 @@ This is a quick guide for creating and running a simple pipeline to extract miss
3. Download this [CSV file](../img/calls.csv) and save it as `data/external/aware_csv/calls.csv`
2. Make the changes listed below for the corresponding [Configuration](../../setup/configuration) step (we provide an example of what the relevant sections in your `config.yml` will look like after you are done)
??? info "Required configuration changes"
??? info "Required configuration changes (*click to expand*)"
1. **Supported [data streams](../../setup/configuration#supported-data-streams).**
Based on the docs, we decided to use the `aware_csv` data stream because we are processing aware data saved in a CSV file. We will use this label in a later step; there's no need to type it or save it anywhere yet.
3. **Create your [participants file](../../setup/configuration#participant-files).**
Since we are processing data from a single participant, you only need to create a single participant file called `p01.yaml`. This participant file only has a `PHONE` section because this hypothetical participant was only monitored with a smartphone. Note that for a real analysis, you can do this [automatically with a CSV file](../../setup/configuration##automatic-creation-of-participant-files)
Since we are processing data from a single participant, you only need to create a single participant file called `p01.yaml` in `data/external/participant_files`. This participant file only has a `PHONE` section because this hypothetical participant was only monitored with a smartphone. Note that for a real analysis, you can do this [automatically with a CSV file](../../setup/configuration##automatic-creation-of-participant-files)
1. Add `p01` to `[PIDS]` in `config.yaml`
@ -65,25 +65,30 @@ This is a quick guide for creating and running a simple pipeline to extract miss
1. Set `[PHONE_CALLS][PROVIDERS][RAPIDS][COMPUTE]` to `True` in the `config.yaml` file.
??? example "Example of the `config.yaml` sections after the changes outlined above"
Highlighted lines are related to the configuration steps above.
``` yaml hl_lines="1 4 6 12 16 27 30"
PIDS: [p01]
!!! example "Example of the `config.yaml` sections after the changes outlined above"
TIMEZONE:
TYPE: SINGLE
This will be your `config.yaml` after following the instructions above. Click on the numbered markers to know more.
``` { .yaml .annotate }
PIDS: [p01] # (1)
TIMEZONE:
TYPE: SINGLE # (2)
SINGLE:
TZCODE: America/New_York
# ... other irrelevant sections
TIME_SEGMENTS: &time_segments
TYPE: PERIODIC
FILE: "data/external/timesegments_periodic.csv"
TYPE: PERIODIC # (3)
FILE: "data/external/timesegments_periodic.csv" # (4)
INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE
PHONE_DATA_STREAMS:
USE: aware_csv
USE: aware_csv # (5)
aware_csv:
FOLDER: data/external/aware_csv # (6)
# ... other irrelevant sections
@ -94,13 +99,48 @@ This is a quick guide for creating and running a simple pipeline to extract miss
# Communication call features config, TYPES and FEATURES keys need to match
PHONE_CALLS:
CONTAINER: calls.csv
CONTAINER: calls.csv # (7)
PROVIDERS:
RAPIDS:
COMPUTE: True
COMPUTE: True # (8)
CALL_TYPES: ...
```
1. We added `p01` to PIDS after creating the participant file:
```bash
data/external/participant_files/p01.yaml
```
With the following content:
```yaml
PHONE:
DEVICE_IDS: [a748ee1a-1d0b-4ae9-9074-279a2b6ba524] # the participant's AWARE device id
PLATFORMS: [android] # or ios
LABEL: MyTestP01 # any string
START_DATE: 2020-01-01 # this can also be empty
END_DATE: 2021-01-01 # this can also be empty
```
2. We use the default `SINGLE` time zone.
3. We use the default `PERIODIC` time segment `[TYPE]`
4. We created this time segments file with these lines:
```csv
label,start_time,length,repeats_on,repeats_value
daily,00:00:00,23H 59M 59S,every_day,0
night,001:00:00,5H 59M 59S,every_day,0
```
5. We set `[USE]` to `aware_device` to tell RAPIDS to process sensor data collected with the AWARE Framework stored in CSV files.
6. We used the default `[FOLDER]` for `awre_csv` since we already stored our test `calls.csv` file there
7. We changed `[CONTAINER]` to `calls.csv` to process our test call data.
8. We flipped `[COMPUTE]` to `True` to extract call behavioral features using the `RAPIDS` feature provider.
3. Run RAPIDS
```bash
./rapids -j1