rapids/docs/workflow-examples/minimal.md

114 lines
5.2 KiB
Markdown
Raw Normal View History

2020-11-04 19:27:58 +01:00
Minimal Working Example
=======================
This is a quick guide for creating and running a simple pipeline to extract missing, outgoing, and incoming `call` features for `24 hr` (`00:00:00` to `23:59:59`) and `night` (`00:00:00` to `05:59:59`) time segments of every day of data of one participant that was monitored on the US East coast with an Android smartphone.
2020-12-21 22:30:46 +01:00
1. Install RAPIDS and make sure your `conda` environment is active (see [Installation](../../setup/installation))
2021-03-11 21:22:23 +01:00
3. Download this [CSV file](../img/calls.csv) and save it as `data/external/aware_csv/calls.csv`
2020-12-02 23:27:05 +01:00
2. Make the changes listed below for the corresponding [Configuration](../../setup/configuration) step (we provide an example of what the relevant sections in your `config.yml` will look like after you are done)
2020-11-04 19:27:58 +01:00
2020-12-21 22:30:46 +01:00
??? info "Required configuration changes"
2021-03-11 21:22:23 +01:00
1. **Supported [data streams](../../setup/configuration#supported-data-streams).**
2020-12-21 22:30:46 +01:00
Based on the docs, we decided to use the `aware_csv` data stream because we are processing aware data saved in a CSV file. We will use this label in a later step; there's no need to type it or save it anywhere yet.
2020-11-04 19:27:58 +01:00
2021-03-11 21:22:23 +01:00
3. **Create your [participants file](../../setup/configuration#participant-files).**
Since we are processing data from a single participant, you only need to create a single participant file called `p01.yaml`. This participant file only has a `PHONE` section because this hypothetical participant was only monitored with a smartphone. Note that for a real analysis, you can do this [automatically with a CSV file](../../setup/configuration##automatic-creation-of-participant-files)
1. Add `p01` to `[PIDS]` in `config.yaml`
1. Create a file in `data/external/participant_files/p01.yaml` with the following content:
```yaml
PHONE:
DEVICE_IDS: [a748ee1a-1d0b-4ae9-9074-279a2b6ba524] # the participant's AWARE device id
PLATFORMS: [android] # or ios
LABEL: MyTestP01 # any string
START_DATE: 2020-01-01 # this can also be empty
END_DATE: 2021-01-01 # this can also be empty
```
2020-12-21 22:30:46 +01:00
2021-03-11 21:22:23 +01:00
4. **Select what [time segments](../../setup/configuration#time-segments) you want to extract features on.**
1. Set `[TIME_SEGMENTS][FILE]` to `data/external/timesegments_periodic.csv`
2020-11-04 19:27:58 +01:00
2021-03-11 21:22:23 +01:00
1. Create a file in `data/external/timesegments_periodic.csv` with the following content
```csv
label,start_time,length,repeats_on,repeats_value
daily,00:00:00,23H 59M 59S,every_day,0
night,00:00:00,5H 59M 59S,every_day,0
```
2. **Choose the [timezone of your study](../../setup/configuration#timezone-of-your-study).**
2020-12-21 22:30:46 +01:00
2021-03-11 21:22:23 +01:00
We will use the default time zone settings since this example is processing data collected on the US East Coast (`America/New_York`)
```yaml
2021-03-11 21:22:23 +01:00
TIMEZONE:
TYPE: SINGLE
SINGLE:
TZCODE: America/New_York
```
2021-03-11 21:22:23 +01:00
5. **Modify your [device data stream configuration](../../setup/configuration#data-stream-configuration)**
2020-12-21 22:30:46 +01:00
2021-03-14 16:40:04 +01:00
1. Set `[PHONE_DATA_STREAMS][USE]` to `aware_csv`.
2. We will use the default value for `[PHONE_DATA_STREAMS][aware_csv][FOLDER]` since we already stored the test calls CSV file there.
2020-11-04 19:27:58 +01:00
2020-12-21 22:30:46 +01:00
6. **Select what [sensors and features](../../setup/configuration#sensor-and-features-to-process) you want to process.**
2021-03-11 21:22:23 +01:00
1. Set `[PHONE_CALLS][CONTAINER]` to `calls.csv` in the `config.yaml` file.
1. Set `[PHONE_CALLS][PROVIDERS][RAPIDS][COMPUTE]` to `True` in the `config.yaml` file.
2020-11-04 19:27:58 +01:00
2020-12-03 21:51:59 +01:00
??? example "Example of the `config.yaml` sections after the changes outlined above"
2020-12-21 22:30:46 +01:00
Highlighted lines are related to the configuration steps above.
2021-03-11 21:22:23 +01:00
``` yaml hl_lines="1 4 6 12 16 27 30"
2020-11-04 19:27:58 +01:00
PIDS: [p01]
2021-03-11 21:22:23 +01:00
TIMEZONE:
TYPE: SINGLE
SINGLE:
TZCODE: America/New_York
2020-11-04 19:27:58 +01:00
# ... other irrelevant sections
2020-12-03 00:41:03 +01:00
TIME_SEGMENTS: &time_segments
2020-11-04 19:27:58 +01:00
TYPE: PERIODIC
2021-03-14 16:40:04 +01:00
FILE: "data/external/timesegments_periodic.csv"
2020-11-04 19:27:58 +01:00
INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE
2021-03-11 21:22:23 +01:00
PHONE_DATA_STREAMS:
USE: aware_csv
2020-11-04 19:27:58 +01:00
2021-03-11 21:22:23 +01:00
# ... other irrelevant sections
2020-11-04 19:27:58 +01:00
############## PHONE ###########################################################
################################################################################
# ... other irrelevant sections
2020-11-04 19:27:58 +01:00
# Communication call features config, TYPES and FEATURES keys need to match
PHONE_CALLS:
2021-03-11 21:22:23 +01:00
CONTAINER: calls.csv
2020-11-04 19:27:58 +01:00
PROVIDERS:
RAPIDS:
2021-03-11 21:22:23 +01:00
COMPUTE: True
2020-11-04 19:27:58 +01:00
CALL_TYPES: ...
```
3. Run RAPIDS
```bash
./rapids -j1
```
2020-12-03 00:41:03 +01:00
4. The call features for daily and morning time segments will be in
2020-11-04 19:27:58 +01:00
```
2021-03-11 21:22:23 +01:00
data/processed/features/all_participants/all_sensor_features.csv
2020-11-04 19:27:58 +01:00
```