Update minimal workflow

2021-03-11 15:22:23 -05:00 · 2021-03-11 15:22:23 -05:00 · 2e030b377d
parent 1b8453bec4
commit 2e030b377d
1 changed files with 53 additions and 47 deletions
--- a/docs/workflow-examples/minimal.md
+++ b/docs/workflow-examples/minimal.md
@ -3,62 +3,75 @@ Minimal Working Example

 This is a quick guide for creating and running a simple pipeline to extract missing, outgoing, and incoming `call` features for `daily` (`00:00:00` to `23:59:59`) and `night` (`00:00:00` to `05:59:59`) epochs of every day of data of one participant monitored on the US East coast with an Android smartphone.

-!!! hint
-    If you don't have `call` data that you can use to try this example you can restore this [CSV file](../img/calls.csv) as a table in a MySQL database.
-
-
 1. Install RAPIDS and make sure your `conda` environment is active (see [Installation](../../setup/installation))
+3. Download this [CSV file](../img/calls.csv) and save it as `data/external/aware_csv/calls.csv`
 2. Make the changes listed below for the corresponding [Configuration](../../setup/configuration) step (we provide an example of what the relevant sections in your `config.yml` will look like after you are done)
    
    ??? info "Required configuration changes"
-        1. **Add your [database credentials](../../setup/configuration#database-credentials).** 
+        1. **Supported [data streams](../../setup/configuration#supported-data-streams).** 
            
-            Setup your database connection credentials in `.env`, we assume your credentials group in the `.env` file is called `MY_GROUP`.
+            We identified that we will use the `aware_csv` data stream because we are processing aware data saved in a CSV file. We will use this label in a later step.

-        2. **Choose the [timezone of your study](../../setup/configuration#timezone-of-your-study).** 
+        3. **Create your [participants file](../../setup/configuration#participant-files).**
        
-            Since this example is processing data collected on the US East cost, `America/New_York` should be the configured timezone, change this according to your data.
+            Since we are processing data from a single participant, you only need to create a single participant file called `p01.yaml`. This participant file only has a `PHONE` section because this hypothetical participant was only monitored with a smartphone. Note that for a real analysis, you can do this [automatically with a CSV file](../../setup/configuration##automatic-creation-of-participant-files)
+            
+            1. Add `p01` to `[PIDS]` in `config.yaml`

-        3. **Create your [participants files](../../setup/configuration#participant-files).**
-        
-            Since we are processing data from a single participant, you only need to create a single participant file called `p01.yaml`. This participant file only has a `PHONE` section because this hypothetical participant was only monitored with an smartphone. You also need to add `p01` to `[PIDS]` in `config.yaml`. The following would be the content of your `p01.yaml` participant file:
-            ```yaml
-            PHONE:
-                DEVICE_IDS: [a748ee1a-1d0b-4ae9-9074-279a2b6ba524] # the participant's AWARE device id
-                PLATFORMS: [android] # or ios
-                LABEL: MyTestP01 # any string
-                START_DATE: 2020-01-01 # this can also be empty
-                END_DATE: 2021-01-01 # this can also be empty
-            ```
+            1. Create a file in `data/external/participant_files/p01.yaml` with the following content:
+
+                ```yaml
+                PHONE:
+                    DEVICE_IDS: [a748ee1a-1d0b-4ae9-9074-279a2b6ba524] # the participant's AWARE device id
+                    PLATFORMS: [android] # or ios
+                    LABEL: MyTestP01 # any string
+                    START_DATE: 2020-01-01 # this can also be empty
+                    END_DATE: 2021-01-01 # this can also be empty
+                ```
        
        4. **Select what [time segments](../../setup/configuration#time-segments) you want to extract features on.** 
        
-            `[TIME_SEGMENTS][TYPE]` should be the default `PERIODIC`. Change `[TIME_SEGMENTS][FILE]` with the path (for example `data/external/timesegments_periodic.csv`) of a file containing the following lines:
-             ```csv
-             label,start_time,length,repeats_on,repeats_value
-             daily,00:00:00,23H 59M 59S,every_day,0
-             night,00:00:00,5H 59M 59S,every_day,0
-             ```
+            1. Set `[TIME_SEGMENTS][FILE]` to `data/external/timesegments_periodic.csv` 

-         5. **Modify your [device data source configuration](../../setup/configuration#device-data-source-configuration)**
+            1. Create a file in `data/external/timesegments_periodic.csv` with the following content
            
-            In this example we do not need to modify this section because we are using smartphone data collected with AWARE stored on a MySQL database.
+                ```csv
+                label,start_time,length,repeats_on,repeats_value
+                daily,00:00:00,23H 59M 59S,every_day,0
+                night,00:00:00,5H 59M 59S,every_day,0
+                ```
+        
+        2. **Choose the [timezone of your study](../../setup/configuration#timezone-of-your-study).** 
+        
+            We will use the default time zone settings since this example is processing data collected on the US East Coast (`America/New_York`)
+
+            ```yaml
+            TIMEZONE: 
+                TYPE: SINGLE
+                SINGLE:
+                    TZCODE: America/New_York
+            ```
+
+         5. **Modify your [device data stream configuration](../../setup/configuration#data-stream-configuration)**
+            
+            Set `[PHONE_DATA_STREAMS][USE]` to `aware_csv`. 

         6. **Select what [sensors and features](../../setup/configuration#sensor-and-features-to-process) you want to process.** 
         
-            Set `[PHONE_CALLS][PROVIDERS][RAPIDS][COMPUTE]` to `True` in the `config.yaml` file.
+            1. Set `[PHONE_CALLS][CONTAINER]` to `calls.csv` in the `config.yaml` file.
+
+            1. Set `[PHONE_CALLS][PROVIDERS][RAPIDS][COMPUTE]` to `True` in the `config.yaml` file.


    ??? example "Example of the `config.yaml` sections after the changes outlined above"
        Highlighted lines are related to the configuration steps above.
-        ``` yaml hl_lines="1 4 7 12 13 38"
+        ``` yaml hl_lines="1 4 6 12 16 27 30"
        PIDS: [p01]

-        TIMEZONE: &timezone
-        America/New_York
-
-        DATABASE_GROUP: &database_group
-        MY_GROUP
+        TIMEZONE: 
+            TYPE: SINGLE
+            SINGLE:
+                TZCODE: America/New_York

        # ... other irrelevant sections

@ -67,17 +80,10 @@ This is a quick guide for creating and running a simple pipeline to extract miss
            FILE: "data/external/timesegments_periodic.csv" # make sure the three lines specified above are in the file
            INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE

-        # No need to change this if you collected AWARE data on a database and your credentials are grouped under `MY_GROUP` in `.env`
-        DEVICE_DATA:
-            PHONE:
-                SOURCE: 
-                    TYPE: DATABASE
-                    DATABASE_GROUP: *database_group
-                    DEVICE_ID_COLUMN: device_id # column name
-                TIMEZONE: 
-                    TYPE: SINGLE # SINGLE or MULTIPLE
-                    VALUE: *timezone 
+        PHONE_DATA_STREAMS:
+            USE: aware_csv

+        # ... other irrelevant sections

        ############## PHONE ###########################################################
        ################################################################################
@ -86,10 +92,10 @@ This is a quick guide for creating and running a simple pipeline to extract miss

        # Communication call features config, TYPES and FEATURES keys need to match
        PHONE_CALLS:
-            TABLE: calls # change if your calls table has a different name
+            CONTAINER: calls.csv 
            PROVIDERS:
                RAPIDS:
-                    COMPUTE: True # set this to True!
+                    COMPUTE: True 
                    CALL_TYPES: ...
        ```

@ -99,7 +105,7 @@ This is a quick guide for creating and running a simple pipeline to extract miss
    ```
 4. The call features for daily and morning time segments will be in 
   ```
-   /data/processed/features/p01/phone_calls.csv
+   data/processed/features/all_participants/all_sensor_features.csv
   ```