Update CSV example links and feature introduction

2021-03-14 16:13:57 -04:00 · 2021-03-14 16:13:57 -04:00 · 5f355560de
parent 8583fa1db0
commit 5f355560de
6 changed files with 127 additions and 54 deletions
--- a/docs/datastreams/aware-csv.md
+++ b/docs/datastreams/aware-csv.md
@ -5,6 +5,8 @@ This [data stream](../../datastreams/data-streams-introduction) handles iOS and
 !!! warning
    The CSV files have to use `,` as separator, `\` as escape character (do not escape `"` with `""`), and wrap any string columns with `"`.

+    See examples in the CSV files inside [rapids_example_csv.zip](https://osf.io/wbg23/)
+
    ??? example "Example of a valid CSV file"
        ```csv
        "_id","timestamp","device_id","activities","confidence","stationary","walking","running","automotive","cycling","unknown","label"
--- a/docs/datastreams/mandatory-phone-format.md
+++ b/docs/datastreams/mandatory-phone-format.md
@ -2,6 +2,8 @@

 This is a description of the format RAPIDS needs to process data for the following PHONE sensors.

+See examples in the CSV files inside [rapids_example_csv.zip](https://osf.io/wbg23/)
+
 ??? info "PHONE_ACCELEROMETER"

    | RAPIDS column   | Description                                                  |
--- a/docs/features/feature-introduction.md
+++ b/docs/features/feature-introduction.md
@ -9,50 +9,36 @@ Every device sensor has a corresponding config section in `config.yaml`, these s
    - In short, to extract features offered by a provider, you need to set its `[COMPUTE]` flag to `TRUE`, configure any of its parameters, and [execute](../../setup/execution) RAPIDS.


-!!! example "Config section example for `PHONE_ACCELEROMETER`"
+### Explaining the config.yaml sensor sections with an example

-    ```yaml
-    # 1) Config section
-    PHONE_ACCELEROMETER:
-        # 2) Parameters for PHONE_ACCELEROMETER
-        CONTAINER: accelerometer
+Each sensor section follows the same structure. Click on the numbered markers to know more.

-        # 3) Providers for PHONE_ACCELEROMETER
-        PROVIDERS:
-            # 4) RAPIDS provider
+``` { .yaml .annotate }
+PHONE_ACCELEROMETER: # (1)
+
+    CONTAINER: accelerometer # (2)
+
+    PROVIDERS: # (3)
        RAPIDS:
-                # 4.1) Parameters of RAPIDS provider of PHONE_ACCELEROMETER
-                COMPUTE: False
-                # 4.2) Features of RAPIDS provider of PHONE_ACCELEROMETER
+            COMPUTE: False # (4)
            FEATURES: ["maxmagnitude", "minmagnitude", "avgmagnitude", "medianmagnitude", "stdmagnitude"]
-                SRC_FOLDER: "rapids" # inside src/features/phone_accelerometer
+
+            SRC_FOLDER: "rapids"
            SRC_LANGUAGE: "python"
        
-            # 5) PANDA provider
        PANDA:
-                # 5.1) Parameters of PANDA provider of PHONE_ACCELEROMETER
            COMPUTE: False
            VALID_SENSED_MINUTES: False
-                # 5.2) Features of PANDA provider of PHONE_ACCELEROMETER
-                FEATURES:
+            FEATURES: # (5)
                exertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
                nonexertional_activity_episode: ["sumduration", "maxduration", "minduration", "avgduration", "medianduration", "stdduration"]
-                SRC_FOLDER: "panda" # inside src/features/phone_accelerometer
-                SRC_LANGUAGE: "python"
-    ```

-## Sensor Parameters
-Each sensor configuration section has a "parameters" subsection (see `#2` in the example). These are parameters that affect different aspects of how the raw data is downloaded, and processed. The `CONTAINER` parameter exists for every sensor, but some sensors will have extra parameters like [`[PHONE_LOCATIONS]`](../phone-locations/). We explain these parameters in a table at the top of each sensor documentation page.
+            SRC_FOLDER: "panda"
+            SRC_LANGUAGE: "python" # (6)
+```

-## Sensor Providers
-Each sensor configuration section can have zero, one or more behavioral feature **providers** (see `#3` in the example). A provider is a script created by the core RAPIDS team or other researchers that extracts behavioral features for that sensor. In this example, accelerometer has two providers: RAPIDS (see `#4`) and PANDA (see `#5`).
+--8<--- "docs/snippets/feature_introduction_example.md"

-### Provider Parameters
-Each provider has parameters that affect the computation of the behavioral features it offers (see `#4.1` or `#5.1` in the example). These parameters will include at least a `[COMPUTE]` flag that you switch to `True` to extract a provider's behavioral features. 
+These are descriptions of each marker for accessibility:

-We explain every provider's parameter in a table under the `Parameters description` heading on each provider documentation page.
-
-### Provider Features
-Each provider offers a set of behavioral features (see `#4.2` or `#5.2` in the example). For some providers these features are grouped in an array (like those for `RAPIDS` provider in `#4.2`) but for others they are grouped in a collection of arrays depending on the meaning and purpose of those features (like those for `PANDAS` provider in `#5.2`). In either case, you can delete the features you are not interested in and they will not be included in the sensor's output feature file. 
-
-We explain each behavioral feature in a table under the `Features description` heading on each provider documentation page.
+--8<--- "docs/snippets/feature_introduction_example.md"
--- a/docs/snippets/feature_introduction_example.md
+++ b/docs/snippets/feature_introduction_example.md
@ -0,0 +1,41 @@
+1. **Sensor section**
+
+    Each sensor (accelerometer, screen, etc.) of every supported device (smartphone, Fitbit, etc.) has a section in the `config.yaml` with `parameters` and feature `PROVIDERS`.
+
+2. **Sensor Parameters.** 
+    
+    Each sensor section has one or more parameters. These are parameters that affect different aspects of how the raw data is pulled, and processed.
+    
+    The `CONTAINER` parameter exists for every sensor, but some sensors will have extra parameters like [`[PHONE_LOCATIONS]`](../phone-locations/).
+    
+    We explain these parameters in a table at the top of each sensor documentation page.
+
+3. **Sensor Providers**
+
+    Each object in this list represents a feature `PROVIDER`. Each sensor can have zero, one, or more providers.
+    
+    A `PROVIDER` is a script that creates behavioral features for a specific sensor. Providers are created by the core RAPIDS team or by the community, which are named after its first author like [[PHONE_LOCATIONS][DORYAB]](../../features/phone-locations/#doryab-provider).
+
+    In this example, there are two accelerometer feature providers `RAPIDS` and `PANDA`.
+
+4. **`PROVIDER` Parameters**
+    
+    Each `PROVIDER` has parameters that affect the computation of the behavioral features it offers.
+    
+    These parameters include at least a `[COMPUTE]` flag that you switch to `True` to extract a provider's behavioral features. 
+
+    We explain every provider's parameter in a table under the `Parameters description` heading on each provider documentation page.
+
+5. **`PROVIDER` Features**
+
+    Each `PROVIDER` offers a set of behavioral features.
+    
+    These features are grouped in an array for some providers, like those for `RAPIDS` provider. For others, they are grouped in a collection of arrays, like those for `PANDAS` provider.
+    
+    In either case, you can delete the features you are not interested in, and they will not be included in the sensor's output feature file. 
+
+    We explain each behavioral feature in a table under the `Features description` heading on each provider documentation page.
+
+6. **`PROVIDER` script**
+
+    Each `PROVIDER` has a `SRC_FOLDER` and `SRC_LANGUAGE` that point to the script implementing the features of this `PROVIDER`.
--- a/docs/workflow-examples/analysis.md
+++ b/docs/workflow-examples/analysis.md
@ -27,7 +27,7 @@ Our example is based on a hypothetical study that recruited 2 participants that

 The goal of this workflow is to find out if we can predict the daily symptom burden score of a participant. Thus, we framed this question as a binary classification problem with two classes, high and low symptom burden based on the scores above and below average of each participant. We also want to compare the performance of individual (personalized) models vs a population model. 

-In total, our example workflow has nine steps that are in charge of sensor data preprocessing, feature extraction, feature cleaning, machine learning model training and model evaluation (see figure below). We ship this workflow with RAPIDS and share a database with [test data](https://osf.io/skqfv/files/) in an Open Science Framework repository. 
+In total, our example workflow has nine steps that are in charge of sensor data preprocessing, feature extraction, feature cleaning, machine learning model training and model evaluation (see figure below). We ship this workflow with RAPIDS and share files with [test data](https://osf.io/wbg23/) in an Open Science Framework repository. 

 <figure>
  <img src="../../img/analysis_workflow.png" max-width="100%" />
@ -37,7 +37,7 @@ In total, our example workflow has nine steps that are in charge of sensor data

 ## Configure and run the analysis workflow example
 1.	[Install](../../setup/installation) RAPIDS
-2.	*Skip this step if you are using RAPIDS docker container*. Unzip the [test database](https://osf.io/skqfv/files/) as `example_workflow` folder and move it to `data/external/` directory.
+2.	Unzip the CSV files inside [rapids_example_csv.zip](https://osf.io/wbg23/) in `data/external/example_workflow/*.csv`.
 3.	Create the participant files for this example by running:
    ```bash
    ./rapids -j1 create_example_participant_files
@ -47,6 +47,8 @@ In total, our example workflow has nine steps that are in charge of sensor data
    ./rapids -j1 --profile example_profile
    ```

+Note you will see a lot of warning messages, you can ignore them since they happen because we ran ML algorithms with a small fake dataset.
+
 ## Modules of our analysis workflow example

 ??? info "1. Feature extraction"
--- a/docs/workflow-examples/minimal.md
+++ b/docs/workflow-examples/minimal.md
@ -7,14 +7,14 @@ This is a quick guide for creating and running a simple pipeline to extract miss
 3. Download this [CSV file](../img/calls.csv) and save it as `data/external/aware_csv/calls.csv`
 2. Make the changes listed below for the corresponding [Configuration](../../setup/configuration) step (we provide an example of what the relevant sections in your `config.yml` will look like after you are done)
    
-    ??? info "Required configuration changes"
+    ??? info "Required configuration changes (*click to expand*)"
        1. **Supported [data streams](../../setup/configuration#supported-data-streams).** 
            
            Based on the docs, we decided to use the `aware_csv` data stream because we are processing aware data saved in a CSV file. We will use this label in a later step; there's no need to type it or save it anywhere yet.

        3. **Create your [participants file](../../setup/configuration#participant-files).**
        
-            Since we are processing data from a single participant, you only need to create a single participant file called `p01.yaml`. This participant file only has a `PHONE` section because this hypothetical participant was only monitored with a smartphone. Note that for a real analysis, you can do this [automatically with a CSV file](../../setup/configuration##automatic-creation-of-participant-files)
+            Since we are processing data from a single participant, you only need to create a single participant file called `p01.yaml` in `data/external/participant_files`. This participant file only has a `PHONE` section because this hypothetical participant was only monitored with a smartphone. Note that for a real analysis, you can do this [automatically with a CSV file](../../setup/configuration##automatic-creation-of-participant-files)
            
            1. Add `p01` to `[PIDS]` in `config.yaml`

@ -65,25 +65,30 @@ This is a quick guide for creating and running a simple pipeline to extract miss
            1. Set `[PHONE_CALLS][PROVIDERS][RAPIDS][COMPUTE]` to `True` in the `config.yaml` file.


-    ??? example "Example of the `config.yaml` sections after the changes outlined above"
-        Highlighted lines are related to the configuration steps above.
-        ``` yaml hl_lines="1 4 6 12 16 27 30"
-        PIDS: [p01]
+    !!! example "Example of the `config.yaml` sections after the changes outlined above"
+
+        This will be your `config.yaml` after following the instructions above. Click on the numbered markers to know more.
+
+        ``` { .yaml .annotate } 
+        PIDS: [p01] # (1)
        
        TIMEZONE:
-            TYPE: SINGLE
+            TYPE: SINGLE # (2)
            SINGLE:
                TZCODE: America/New_York

        # ... other irrelevant sections

        TIME_SEGMENTS: &time_segments
-            TYPE: PERIODIC
-            FILE: "data/external/timesegments_periodic.csv"
+            TYPE: PERIODIC # (3)
+            FILE: "data/external/timesegments_periodic.csv" # (4)
            INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE

        PHONE_DATA_STREAMS:
-            USE: aware_csv
+            USE: aware_csv # (5)
+
+            aware_csv:
+                FOLDER: data/external/aware_csv # (6)

        # ... other irrelevant sections

@ -94,13 +99,48 @@ This is a quick guide for creating and running a simple pipeline to extract miss

        # Communication call features config, TYPES and FEATURES keys need to match
        PHONE_CALLS:
-            CONTAINER: calls.csv 
+            CONTAINER: calls.csv  # (7) 
            PROVIDERS:
                RAPIDS:
-                    COMPUTE: True 
+                    COMPUTE: True # (8)
                    CALL_TYPES: ...
        ```

+        1. We added `p01` to PIDS after creating the participant file:
+            ```bash
+            data/external/participant_files/p01.yaml
+            ```
+
+            With the following content:
+            ```yaml
+            PHONE:
+                DEVICE_IDS: [a748ee1a-1d0b-4ae9-9074-279a2b6ba524] # the participant's AWARE device id
+                PLATFORMS: [android] # or ios
+                LABEL: MyTestP01 # any string
+                START_DATE: 2020-01-01 # this can also be empty
+                END_DATE: 2021-01-01 # this can also be empty
+            ```
+
+        2. We use the default `SINGLE` time zone.
+
+        3. We use the default `PERIODIC` time segment `[TYPE]`
+
+        4. We created this time segments file with these lines:
+
+            ```csv
+            label,start_time,length,repeats_on,repeats_value
+            daily,00:00:00,23H 59M 59S,every_day,0
+            night,001:00:00,5H 59M 59S,every_day,0
+            ```
+
+        5. We set `[USE]` to `aware_device` to tell RAPIDS to process sensor data collected with the AWARE Framework stored in CSV files.
+
+        6. We used the default `[FOLDER]` for `awre_csv` since we already stored our test `calls.csv` file there
+
+        7. We changed `[CONTAINER]` to `calls.csv` to process our test call data.
+
+        8. We flipped `[COMPUTE]` to `True` to extract call behavioral features using the `RAPIDS` feature provider.
+
 3. Run RAPIDS
    ```bash
    ./rapids -j1