rapids/docs/setup/configuration.md

558 lines
37 KiB
Markdown
Raw Normal View History

2020-10-30 20:48:05 +01:00
2020-12-02 23:27:05 +01:00
# Configuration
2020-10-30 20:48:05 +01:00
You need to follow these steps to configure your RAPIDS deployment before you can extract behavioral features
0. Verify RAPIDS can process your [data streams](#supported-data-streams)
2020-10-30 20:48:05 +01:00
2. Choose the [timezone of your study](#timezone-of-your-study)
3. Create your [participants files](#participant-files)
2020-12-03 00:41:03 +01:00
4. Select what [time segments](#time-segments) you want to extract features on
5. Configure your [data streams](#data-stream-configuration)
2020-10-30 20:48:05 +01:00
6. Select what [sensors and features](#sensor-and-features-to-process) you want to process
2020-12-08 21:31:34 +01:00
When you are done with this configuration, go to [executing RAPIDS](../execution).
2020-10-30 20:48:05 +01:00
!!! hint
Every time you see `config["KEY"]` or `[KEY]` in these docs we are referring to the corresponding key in the `config.yaml` file.
---
## Supported data streams
2021-02-21 23:30:30 +01:00
A data stream refers to sensor data collected using a specific type of **device** with a specific **format** and stored in a specific **container**. For example, the `aware_mysql` data stream handles smartphone data (**device**) collected with the [AWARE Framework](https://awareframework.com/) (**format**) stored in a MySQL database (**container**).
2020-10-30 20:48:05 +01:00
Check the table in [introduction to data streams](../../datastreams/data-streams-introduction) to know what data streams we support. If your data stream is supported, continue with to the next configuration section. If you want to implement a new data stream, follow this tutorial to [add support for new data streams](../../datastreams/add-new-data-streams). If you have read the tutorial but have questions, get in touch by email or in Slack.
2020-10-30 20:48:05 +01:00
## Timezone of your study
### Single timezone
If your study only happened in a single time zone or you want to ignore short trips of your participants to different time zones, select the appropriate code form this [list](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) and change the following config key. Double check your timezone code pick, for example US Eastern Time is `America/New_York` not `EST`
2020-10-30 20:48:05 +01:00
``` yaml
TIMEZONE:
TYPE: SINGLE
TZCODE: America/New_York
2020-10-30 20:48:05 +01:00
```
### Multiple timezones
If you have the timestamps when participants' devices changed to a new time zone, follow these instructions
TODO more info
``` yaml
TIMEZONE:
TYPE: MULTIPLE
TZCODE: America/New_York
MULTIPLE_TZCODES_FILE: path_to/csv.file
```
2020-10-30 20:48:05 +01:00
---
## Participant files
2020-11-04 19:27:58 +01:00
Participant files link together multiple devices (smartphones and wearables) to specific participants and identify them throughout RAPIDS. You can create these files manually or [automatically](#automatic-creation-of-participant-files). Participant files are stored in `data/external/participant_files/pxx.yaml` and follow a unified [structure](#structure-of-participants-files).
2020-10-30 20:48:05 +01:00
2021-02-21 23:30:30 +01:00
??? important "Remember to modify the `config.yaml` file with your PIDS"
2020-11-04 19:27:58 +01:00
The list `PIDS` in `config.yaml` needs to have the participant file names of the people you want to process. For example, if you created `p01.yaml`, `p02.yaml` and `p03.yaml` files in `/data/external/participant_files/ `, then `PIDS` should be:
```yaml
PIDS: [p01, p02, p03]
```
2020-10-30 20:48:05 +01:00
2021-02-21 23:30:30 +01:00
??? info "Optional: Migrating participants files with the old format"
2020-10-30 20:48:05 +01:00
If you were using the pre-release version of RAPIDS with participant files in plain text (as opposed to yaml), you can run the following command and your old files will be converted into yaml files stored in `data/external/participant_files/`
```bash
python tools/update_format_participant_files.py
```
### Structure of participants files
2021-02-21 23:30:30 +01:00
??? example "Example of the structure of a participant file"
2020-11-04 19:27:58 +01:00
2021-02-21 23:30:30 +01:00
In this example, the participant used an android phone, an ios phone, a fitbit device, and a Empatica device throughout the study between Apr 23rd 2020 and Oct 28th 2020
If your participants didn't use a `[PHONE]`, `[FITBIT]` or `[EMPATICA]` device, it is not necessary to include that section in their participant file. In other words, you can analyse data from 1 or more devices per participant.
2020-11-04 19:27:58 +01:00
```yaml
PHONE:
DEVICE_IDS: [a748ee1a-1d0b-4ae9-9074-279a2b6ba524, dsadas-2324-fgsf-sdwr-gdfgs4rfsdf43]
PLATFORMS: [android,ios]
LABEL: test01
START_DATE: 2020-04-23
END_DATE: 2020-10-28
FITBIT:
DEVICE_IDS: [fitbit1]
LABEL: test01
START_DATE: 2020-04-23
END_DATE: 2020-10-28
2021-02-21 23:30:30 +01:00
EMPATICA: # Empatica doesn't have a device_id because the devices produce zip files per participant
LABEL: test01
START_DATE: 2020-04-23
END_DATE: 2020-10-28
2020-11-04 19:27:58 +01:00
```
2021-02-21 23:30:30 +01:00
=== "[PHONE]"
2020-10-30 20:48:05 +01:00
2021-02-21 23:30:30 +01:00
| Key                      | Description |
|-------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `[DEVICE_IDS]` | An array of the strings that uniquely identify each smartphone, you can have more than one for when participants changed phones in the middle of the study, in this case, data from all their devices will be joined and relabeled with the last 1 on this list. |
| `[PLATFORMS]` | An array that specifies the OS of each smartphone in `[DEVICE_IDS]` , use a combination of `android` or `ios` (we support participants that changed platforms in the middle of your study!). You can set `[PLATFORMS]: [infer]` and RAPIDS will infer them automatically (each phone data stream infer this differently, e.g. `aware_mysql` uses the `aware_device` table). |
2021-02-21 23:30:30 +01:00
| `[LABEL]` | A string that is used in reports and visualizations. |
| `[START_DATE]` | A string with format `YYY-MM-DD` . Only data collected *after* this date will be included in the analysis |
| `[END_DATE]` | A string with format `YYY-MM-DD` . Only data collected *before* this date will be included in the analysis |
2020-10-30 20:48:05 +01:00
2021-02-21 23:30:30 +01:00
=== "[FITBIT]"
| Key                      | Description |
|------------------|-----------------------------------------------------------------------------------------------------------|
| `[DEVICE_IDS]` | An array of the strings that uniquely identify each Fitbit, you can have more than one in case the participant changed devices in the middle of the study, in this case, data from all devices will be joined and relabeled with the last `device_id` on this list. |
| `[LABEL]` | A string that is used in reports and visualizations. |
| `[START_DATE]` | A string with format `YYY-MM-DD` . Only data collected *after* this date will be included in the analysis |
| `[END_DATE]` | A string with format `YYY-MM-DD` . Only data collected *before* this date will be included in the analysis |
2020-10-30 20:48:05 +01:00
2021-02-21 23:30:30 +01:00
=== "[EMPATICA]"
2020-10-30 20:48:05 +01:00
2021-02-21 23:30:30 +01:00
| Key                      | Description |
|------------------|-----------------------------------------------------------------------------------------------------------|
| `[LABEL]` | A string that is used in reports and visualizations. |
| `[START_DATE]` | A string with format `YYY-MM-DD` . Only data collected *after* this date will be included in the analysis |
| `[END_DATE]` | A string with format `YYY-MM-DD` . Only data collected *before* this date will be included in the analysis
2020-10-30 20:48:05 +01:00
### Automatic creation of participant files
You can use a CSV file with a row per participant to automatically create participant files.
2020-10-30 20:48:05 +01:00
??? "`AWARE_DEVICE_TABLE` was deprecated"
In previous versions of RAPIDS, you could create participant files automatically using the `aware_device` table. We deprecated this option but you can still achieve the same results if you export the output of the following SQL query as a CSV file and follow the instructions below:
```sql
SELECT device_id, device_id as fitbit_id, CONCAT("p", _id) as pid, if(brand = "iPhone", "ios", "android") as platform, CONCAT("p", _id) as label, DATE_FORMAT(FROM_UNIXTIME((timestamp/1000)- 86400), "%Y-%m-%d") as start_date, CURRENT_DATE as end_date from aware_device order by _id;
2020-10-30 20:48:05 +01:00
```
In your `config.yaml`:
1. Set `CSV_FILE_PATH` to a CSV file path that complies with the specs described below
2. Set the devices (`PHONE`, `FITBIT`, `EMPATICA`) `[ADD]` flag to `TRUE` depending on what devices you used in your study.
3. Set `[DEVICE_ID_COLUMN]` to the name of the column in your CSV file that uniquely identifies each device (only for `PHONE` and `FITBIT`).
```yaml
CREATE_PARTICIPANT_FILES:
CSV_FILE_PATH: "your_path/to_your.csv"
PHONE_SECTION:
ADD: TRUE # or FALSE
DEVICE_ID_COLUMN: device_id # column name
IGNORED_DEVICE_IDS: []
FITBIT_SECTION:
ADD: FALSE # or FALSE
DEVICE_ID_COLUMN: fitbit_id # column name
IGNORED_DEVICE_IDS: []
EMPATICA_SECTION: # Empatica doesn't have a device_id column because the devices produce zip files per participant
ADD: FALSE # or FALSE
```
2020-10-30 20:48:05 +01:00
Your CSV file (`[CSV_FILE_PATH]`) should have the following columns (headers) but the values within each column can be empty:
2020-10-30 20:48:05 +01:00
| Column | Description |
|------------------|-----------------------------------------------------------------------------------------------------------|
| phone device id | The name of this column has to match `[PHONE_SECTION][DEVICE_ID_COLUMN]`. Separate multiple ids with `;` |
| fitbit device id | The name of this column has to match `[FITBIT_SECTION][DEVICE_ID_COLUMN]`. Separate multiple ids with `;` |
| pid | Unique identifiers with the format pXXX (your participant files will be named with this string) |
| platform | Use `android`, `ios` or `infer` as explained above, separate values with `;` |
| label | A human readable string that is used in reports and visualizations. |
| start_date | A string with format `YYY-MM-DD`. |
| end_date | A string with format `YYY-MM-DD`. |
2020-10-30 20:48:05 +01:00
!!! example
We added white spaces to this example to make it easy to read but you don't have to.
2020-10-30 20:48:05 +01:00
```csv
device_id ,fitbit_id ,pid ,label ,platform ,start_date ,end_date
a748ee1a-1d0b-4ae9-9074-279a2b6ba524;dsadas-2324-fgsf-sdwr-gdfgs4rfsdf43 ,fitbit1 ,p01 ,julio ,android;ios ,2020-01-01 ,2021-01-01
4c4cf7a1-0340-44bc-be0f-d5053bf7390c ,fitbit2 ,p02 ,meng ,ios ,2021-01-01 ,2022-01-01
2020-10-30 20:48:05 +01:00
```
Then run
2020-10-30 20:48:05 +01:00
```bash
snakemake -j1 create_participants_files
```
2020-10-30 20:48:05 +01:00
---
2020-12-03 00:41:03 +01:00
## Time Segments
2020-10-30 20:48:05 +01:00
2020-12-03 00:41:03 +01:00
Time segments (or epochs) are the time windows on which you want to extract behavioral features. For example, you might want to process data on every day, every morning, or only during weekends. RAPIDS offers three categories of time segments that are flexible enough to cover most use cases: **frequency** (short time windows every day), **periodic** (arbitrary time windows on any day), and **event** (arbitrary time windows around events of interest). See also our [examples](#segment-examples).
2020-10-30 20:48:05 +01:00
=== "Frequency Segments"
These segments are computed on every day and all have the same duration (for example 30 minutes). Set the following keys in your `config.yaml`
```yaml
2020-12-03 00:41:03 +01:00
TIME_SEGMENTS: &time_segments
2020-10-30 20:48:05 +01:00
TYPE: FREQUENCY
FILE: "data/external/your_frequency_segments.csv"
INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE
```
2020-12-03 00:41:03 +01:00
The file pointed by `[TIME_SEGMENTS][FILE]` should have the following format and can only have 1 row.
2020-10-30 20:48:05 +01:00
| Column | Description |
|--------|----------------------------------------------------------------------|
2020-12-03 00:41:03 +01:00
| label | A string that is used as a prefix in the name of your time segments |
| length | An integer representing the duration of your time segments in minutes |
2020-10-30 20:48:05 +01:00
!!! example
```csv
label,length
thirtyminutes,30
```
2020-12-03 00:41:03 +01:00
This configuration will compute 48 time segments for every day when any data from any participant was sensed. For example:
2020-10-30 20:48:05 +01:00
```csv
start_time,length,label
00:00,30,thirtyminutes0000
00:30,30,thirtyminutes0001
01:00,30,thirtyminutes0002
01:30,30,thirtyminutes0003
...
```
=== "Periodic Segments"
These segments can be computed every day, or on specific days of the week, month, quarter, and year. Their minimum duration is 1 minute but they can be as long as you want. Set the following keys in your `config.yaml`.
```yaml
2020-12-03 00:41:03 +01:00
TIME_SEGMENTS: &time_segments
2020-10-30 20:48:05 +01:00
TYPE: PERIODIC
FILE: "data/external/your_periodic_segments.csv"
INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE # or TRUE
```
If `[INCLUDE_PAST_PERIODIC_SEGMENTS]` is set to `TRUE`, RAPIDS will consider instances of your segments back enough in the past as to include the first row of data of each participant. For example, if the first row of data from a participant happened on Saturday March 7th 2020 and the requested segment duration is 7 days starting on every Sunday, the first segment to be considered would start on Sunday March 1st if `[INCLUDE_PAST_PERIODIC_SEGMENTS]` is `TRUE` or on Sunday March 8th if `FALSE`.
2020-12-03 00:41:03 +01:00
The file pointed by `[TIME_SEGMENTS][FILE]` should have the following format and can have multiple rows.
2020-10-30 20:48:05 +01:00
| Column | Description |
|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
2020-12-03 00:41:03 +01:00
| label | A string that is used as a prefix in the name of your time segments. It has to be **unique** between rows |
2020-10-30 20:48:05 +01:00
| start_time | A string with format `HH:MM:SS` representing the starting time of this segment on any day |
| length | A string representing the length of this segment.It can have one or more of the following strings **`XXD XXH XXM XXS`** to represent days, hours, minutes and seconds. For example `7D 23H 59M 59S` |
| repeats_on | One of the follow options `every_day`, `wday`, `qday`, `mday`, and `yday`. The last four represent a week, quarter, month and year day |
| repeats_value | An integer complementing `repeats_on`. If you set `repeats_on` to `every_day` set this to `0`, otherwise `1-7` represent a `wday` starting from Mondays, `1-31` represent a `mday`, `1-91` represent a `qday`, and `1-366` represent a `yday` |
!!! example
```csv
label,start_time,length,repeats_on,repeats_value
daily,00:00:00,23H 59M 59S,every_day,0
morning,06:00:00,5H 59M 59S,every_day,0
afternoon,12:00:00,5H 59M 59S,every_day,0
evening,18:00:00,5H 59M 59S,every_day,0
night,00:00:00,5H 59M 59S,every_day,0
```
This configuration will create five segments instances (`daily`, `morning`, `afternoon`, `evening`, `night`) on any given day (`every_day` set to 0). The `daily` segment will start at midnight and will last `23:59:59`, the other four segments will start at 6am, 12pm, 6pm, and 12am respectively and last for `05:59:59`.
=== "Event segments"
These segments can be computed before or after an event of interest (defined as any UNIX timestamp). Their minimum duration is 1 minute but they can be as long as you want. The start of each segment can be shifted backwards or forwards from the specified timestamp. Set the following keys in your `config.yaml`.
```yaml
2020-12-03 00:41:03 +01:00
TIME_SEGMENTS: &time_segments
2020-10-30 20:48:05 +01:00
TYPE: EVENT
FILE: "data/external/your_event_segments.csv"
INCLUDE_PAST_PERIODIC_SEGMENTS: FALSE # or TRUE
```
2020-12-03 00:41:03 +01:00
The file pointed by `[TIME_SEGMENTS][FILE]` should have the following format and can have multiple rows.
2020-10-30 20:48:05 +01:00
| Column | Description |
|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
2020-12-03 00:41:03 +01:00
| label | A string that is used as a prefix in the name of your time segments. If labels are unique, every segment is independent; if two or more segments have the same label, their data will be grouped when computing auxiliary data for features like the `most frequent contact` for calls (the most frequent contact will be computed across all these segments). There cannot be two *overlaping* event segments with the same label (RAPIDS will throw an error) |
| event_timestamp | A UNIX timestamp that represents the moment an event of interest happened (clinical relapse, survey, readmission, etc.). The corresponding time segment will be computed around this moment using `length`, `shift`, and `shift_direction` |
2020-11-12 23:35:57 +01:00
| length | A string representing the length of this segment. It can have one or more of the following keys `XXD XXH XXM XXS` to represent a number of days, hours, minutes, and seconds. For example `7D 23H 59M 59S` |
| shift | A string representing the time shift from `event_timestamp`. It can have one or more of the following keys `XXD XXH XXM XXS` to represent a number of days, hours, minutes and seconds. For example `7D 23H 59M 59S`. Use this value to change the start of a segment with respect to its `event_timestamp`. For example, set this variable to `1H` to create a segment that starts 1 hour from an event of interest (`shift_direction` determines if it's before or after). |
| shift_direction | An integer representing whether the `shift` is before (`-1`) or after (`1`) an `event_timestamp` |
|device_id| The device id (smartphone or fitbit) to whom this segment belongs to. You have to create a line in this event segment file for each event of a participant that you want to analyse. If you have participants with multiple device ids you can choose any of them|
2020-10-30 20:48:05 +01:00
!!! example
```csv
label,event_timestamp,length,shift,shift_direction,device_id
stress1,1587661220000,1H,5M,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress2,1587747620000,4H,4H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress3,1587906020000,3H,5M,1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress4,1584291600000,7H,4H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
stress5,1588172420000,9H,5M,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
mood,1587661220000,1H,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
mood,1587747620000,1D,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
mood,1587906020000,7D,0,0,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
```
2020-12-03 00:41:03 +01:00
This example will create eight segments for a single participant (`a748ee1a...`), five independent `stressX` segments with various lengths (1,4,3,7, and 9 hours). Segments `stress1`, `stress3`, and `stress5` are shifted forwards by 5 minutes and `stress2` and `stress4` are shifted backwards by 4 hours (that is, if the `stress4` event happened on March 15th at 1pm EST (`1584291600000`), the time segment will start on that day at 9am and end at 4pm).
2020-10-30 20:48:05 +01:00
The three `mood` segments are 1 hour, 1 day and 7 days long and have no shift. In addition, these `mood` segments are grouped together, meaning that although RAPIDS will compute features on each one of them, some necessary information to compute a few of such features will be extracted from all three segments, for example the phone contact that called a participant the most or the location clusters visited by a participant.
### Segment Examples
2020-12-02 23:49:44 +01:00
=== "5-minutes"
Use the following `Frequency` segment file to create 288 (12 * 60 * 24) 5-minute segments starting from midnight of every day in your study
```csv
label,length
fiveminutes,5
```
=== "Daily"
Use the following `Periodic` segment file to create daily segments starting from midnight of every day in your study
```csv
label,start_time,length,repeats_on,repeats_value
daily,00:00:00,23H 59M 59S,every_day,0
```
=== "Morning"
2020-12-03 00:49:59 +01:00
Use the following `Periodic` segment file to create morning segments starting at 06:00:00 and ending at 11:59:59 of every day in your study
2020-12-02 23:49:44 +01:00
```csv
label,start_time,length,repeats_on,repeats_value
2020-12-03 00:49:59 +01:00
morning,06:00:00,5H 59M 59S,every_day,0
```
=== "Overnight"
Use the following `Periodic` segment file to create overnight segments starting at 20:00:00 and ending at 07:59:59 (next day) of every day in your study
```csv
label,start_time,length,repeats_on,repeats_value
morning,20:00:00,11H 59M 59S,every_day,0
2020-12-02 23:49:44 +01:00
```
=== "Weekly"
Use the following `Periodic` segment file to create **non-overlapping** weekly segments starting at midnight of every **Monday** in your study
```csv
label,start_time,length,repeats_on,repeats_value
weekly,00:00:00,6D 23H 59M 59S,wday,1
```
Use the following `Periodic` segment file to create **overlapping** weekly segments starting at midnight of **every day** in your study
```csv
label,start_time,length,repeats_on,repeats_value
weekly,00:00:00,6D 23H 59M 59S,every_day,0
```
=== "Week-ends"
Use the following `Periodic` segment file to create week-end segments starting at midnight of every **Saturday** in your study
```csv
label,start_time,length,repeats_on,repeats_value
weekend,00:00:00,1D 23H 59M 59S,wday,6
```
=== "Around surveys"
Use the following `Event` segment file to create two 2-hour segments that start 1 hour before surveys answered by 3 participants
```csv
label,event_timestamp,length,shift,shift_direction,device_id
survey1,1587661220000,2H,1H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
survey2,1587747620000,2H,1H,-1,a748ee1a-1d0b-4ae9-9074-279a2b6ba524
survey1,1587906020000,2H,1H,-1,rqtertsd-43ff-34fr-3eeg-efe4fergregr
survey2,1584291600000,2H,1H,-1,rqtertsd-43ff-34fr-3eeg-efe4fergregr
survey1,1588172420000,2H,1H,-1,klj34oi2-8frk-2343-21kk-324ljklewlr3
survey2,1584291600000,2H,1H,-1,klj34oi2-8frk-2343-21kk-324ljklewlr3
```
2020-10-30 20:48:05 +01:00
---
## Data Stream Configuration
2020-10-30 20:48:05 +01:00
Modify the following keys in your `config.yaml` depending on the [data stream](../../datastreams/data-streams-introduction) you want to process.
2020-10-30 20:48:05 +01:00
=== "Phone"
2020-10-30 20:48:05 +01:00
Set `[PHONE_DATA_STREAMS][TYPE]` to the smartphone data stream you want to process (e.g. `aware_mysql`) and configure its parameters (e.g. `[DATABASE_GROUP]`). Ignore the parameters of streams you are not using (e.g. `[FOLDER]` of `aware_csv`).
2020-10-30 20:48:05 +01:00
```yaml
PHONE_DATA_STREAMS:
TYPE: aware_mysql
aware_mysql:
DATABASE_GROUP: MY_GROUP
aware_csv:
FOLDER: data/external/aware_csv
```
2020-10-30 20:48:05 +01:00
=== "aware_mysql"
| Key | Description |
|---------------------|----------------------------------------------------------------------------------------------------------------------------|
| `[DATABASE_GROUP]` | A database credentials group. Read the instructions below to set it up |
??? info "Setting up a DATABASE_GROUP and its connection credentials"
1. If you haven't done so, create an empty file called `#!bash .env` in your RAPIDS root directory: `./.env`
2. Add the following lines to `./.env` and replace your database-specific credentials (user, password, host, and database):
1. Note that the label `[MY_GROUP]` is arbitrary but it has to match `[PHONE_DATA_STREAMS][aware_mysql] [DATABASE_GROUP]`
``` yaml
[MY_GROUP]
user=MY_USER
password=MY_PASSWORD
host=MY_HOST
port=3306
database=MY_DATABASE
```
??? hint "Connecting to localhost (host machine) from inside our docker container"
If you are using RAPIDS' docker container and Docker-for-mac or Docker-for-Windows 18.03+, you can connect to a MySQL database in your host machine using `host.docker.internal` instead of `127.0.0.1` or `localhost`. In a Linux host you need to run our docker container using `docker run --network="host" -d moshiresearch/rapids:latest` and then `127.0.0.1` will point to your host machine.
---
=== "aware_csv"
| Key | Description |
|---------------------|----------------------------------------------------------------------------------------------------------------------------|
| `[FOLDER]` | Folder where you have to place a CSV file **per** phone sensor. Each file has to contain all the data from every participant you want to process. |
2020-10-30 20:48:05 +01:00
=== "Fitbit"
2020-10-30 20:48:05 +01:00
Set `[FITBIT_DATA_STREAMS][TYPE]` to the Fitbit data stream you want to process (e.g. `fitbitjson_mysql`) and configure its parameters (e.g. `[DATABASE_GROUP]`).
Ignore the parameters of streams you are not using (e.g. `[FOLDER]` of `aware_csv`).
```yaml
FITBIT_DATA_STREAMS:
TYPE: fitbitjson_mysql
fitbitjson_mysql:
DATABASE_GROUP: MY_GROUP
COLUMN_MAPPINGS_READY: False
fitbitjson_csv:
FOLDER: data/external/fitbit_csv
COLUMN_MAPPINGS_READY: False
fitbitparsed_mysql:
DATABASE_GROUP: MY_GROUP
COLUMN_MAPPINGS_READY: False
fitbitparsed_csv:
FOLDER: data/external/fitbit_csv
COLUMN_MAPPINGS_READY: False
```
=== "fitbitjson_mysql"
This data stream process Fitbit data inside a JSON column as obtained from the Fitbit API and stored in a MySQL database.
| Key | Description |
|---------------------|----------------------------------------------------------------------------------------------------------------------------|
| `[DATABASE_GROUP]` | A database credentials group. Read the instructions below to set it up |
| `[COLUMN_MAPPINGS_READY]` | Set this to `True` after you have modified this stream's `format.yaml` column mappings to match your raw data column names: [`fitbitjson_mysql`](../../datastreams/fitbitjson-mysql#format) |
??? info "Setting up a DATABASE_GROUP and its connection credentials"
1. If you haven't done so, create an empty file called `#!bash .env` in your RAPIDS root directory: `./.env`
2. Add the following lines to `./.env` and replace your database-specific credentials (user, password, host, and database):
1. Note that the label `[MY_GROUP]` is arbitrary but it has to match `[FITBIT_DATA_STREAMS][fitbitjson_mysql] [DATABASE_GROUP]`
``` yaml
[MY_GROUP]
user=MY_USER
password=MY_PASSWORD
host=MY_HOST
port=3306
database=MY_DATABASE
```
??? hint "Connecting to localhost (host machine) from inside our docker container"
If you are using RAPIDS' docker container and Docker-for-mac or Docker-for-Windows 18.03+, you can connect to a MySQL database in your host machine using `host.docker.internal` instead of `127.0.0.1` or `localhost`. In a Linux host you need to run our docker container using `docker run --network="host" -d moshiresearch/rapids:latest` and then `127.0.0.1` will point to your host machine.
---
=== "fitbitjson_csv"
This data stream process Fitbit data inside a JSON column as obtained from the Fitbit API and stored in a CSV file.
| Key | Description |
|---------------------|----------------------------------------------------------------------------------------------------------------------------|
| `[FOLDER]` | Folder where you have to place a CSV file **per** Fitbit sensor. Each file has to contain all the data from every participant you want to process. |
| `[COLUMN_MAPPINGS_READY]` | Set this to `True` after you have modified this stream's `format.yaml` column mappings to match your raw data column names: [`fitbitjson_csv`](../../datastreams/fitbitjson-csv#format) |
=== "fitbitparsed_mysql"
This data stream process Fitbit data stored in multiple columns after being parsed from the JSON column returned by Fitbit API and stored in a MySQL database.
| Key | Description |
|---------------------|----------------------------------------------------------------------------------------------------------------------------|
| `[DATABASE_GROUP]` | A database credentials group. Read the instructions below to set it up |
| `[COLUMN_MAPPINGS_READY]` | Set this to `True` after you have modified this stream's `format.yaml` column mappings to match your raw data column names: [`fitbitparsed_mysql`](../../datastreams/fitbitparsed-mysql#format) |
??? info "Setting up a DATABASE_GROUP and its connection credentials"
1. If you haven't done so, create an empty file called `#!bash .env` in your RAPIDS root directory: `./.env`
2. Add the following lines to `./.env` and replace your database-specific credentials (user, password, host, and database):
1. Note that the label `[MY_GROUP]` is arbitrary but it has to match `[FITBIT_DATA_STREAMS][fitbitparsed_mysql] [DATABASE_GROUP]`
``` yaml
[MY_GROUP]
user=MY_USER
password=MY_PASSWORD
host=MY_HOST
port=3306
database=MY_DATABASE
```
??? hint "Connecting to localhost (host machine) from inside our docker container"
If you are using RAPIDS' docker container and Docker-for-mac or Docker-for-Windows 18.03+, you can connect to a MySQL database in your host machine using `host.docker.internal` instead of `127.0.0.1` or `localhost`. In a Linux host you need to run our docker container using `docker run --network="host" -d moshiresearch/rapids:latest` and then `127.0.0.1` will point to your host machine.
---
=== "fitbitparsed_csv"
This data stream process Fitbit data stored in multiple columns (plain text) after being parsed from the JSON column returned by Fitbit API and stored in a CSV file.
2020-10-30 20:48:05 +01:00
| Key | Description |
|---------------------|----------------------------------------------------------------------------------------------------------------------------|
| `[FOLDER]` | Folder where you have to place a CSV file **per** Fitbit sensor. Each file has to contain all the data from every participant you want to process. |
| `[COLUMN_MAPPINGS_READY]` | Set this to `True` after you have modified this stream's `format.yaml` column mappings to match your raw data column names: [`fitbitparsed_csv`](../../datastreams/fitbitparsed-csv#format) |
2021-02-21 23:30:30 +01:00
=== "Empatica"
The relevant `config.yaml` section looks like this by default:
```yaml
SOURCE:
TYPE: ZIP_FILE
FOLDER: data/external/empatica
TIMEZONE:
TYPE: SINGLE # Empatica devices don't support time zones so we read this data in the timezone indicated by VALUE
VALUE: *timezone
```
**Parameters for `[EMPATICA_DATA_CONFIGURATION]`**
| Key | Description |
|---------------------|----------------------------------------------------------------------------------------------------------------------------|
| `[SOURCE] [TYPE]` | Only `ZIP_FILE` is supported (Empatica devices save sensor data in CSV files that are zipped together).|
| `[SOURCE] [FOLDER]` | The relative path to a folder containing one folder per participant. The name of a participant folder should match their pid in `config[PIDS]`, for example `p01`. Each participant folder can have one or more zip files with any name; in other words, the sensor data contained in those zip files belongs to a single participant. The zip files are [automatically](https://support.empatica.com/hc/en-us/articles/201608896-Data-export-and-formatting-from-E4-connect-) generated by Empatica and have a CSV file per sensor (`ACC`, `HR`, `TEMP`, `EDA`, `BVP`, `TAGS`). All CSV files of the same type contained in one or more zip files are uncompressed, parsed, sorted by timestamp, and joinned together.|
| `[TIMEZONE] [TYPE]` | Only `SINGLE` is supported for now |
| `[TIMEZONE] [VALUE]` | `*timezone` points to the value defined before in [Timezone of your study](#timezone-of-your-study) |
??? example "Example of an EMPATICA FOLDER"
In the file tree below, we want to process the data of three participants: `p01`, `p02`, and `p03`. `p01` has two zip files, `p02` has only one zip file, and `p03` has three zip files. Each zip will have a CSV file per sensor that are joinned together and process by RAPIDS. These zip files are generated by Empatica.
```bash
data/ # this folder exists in the root RAPIDS folder
external/
empatica/
p01/
file1.zip
file2.zip
p02/
aaaa.zip
p03/
t1.zip
t2.zip
t3.zip
```
---
## Sensor and Features to Process
2020-10-30 20:48:05 +01:00
Finally, you need to modify the `config.yaml` section of the sensors you want to extract behavioral features from. All sensors follow the same naming nomenclature (`DEVICE_SENSOR`) and parameter structure which we explain in the [Behavioral Features Introduction](../../features/feature-introduction/).
2020-10-30 20:48:05 +01:00
!!! done
Head over to [Execution](../execution/) to learn how to execute RAPIDS.