Links and CI updates to work with mike versioning

pull/103/head
JulioV 2020-11-09 13:34:02 -05:00
parent 7facbe09fe
commit f9ad0c237d
13 changed files with 33 additions and 31 deletions

View File

@ -12,6 +12,7 @@ jobs:
with:
python-version: 3.x
- run: pip install git+https://${GH_TOKEN}@github.com/carissalow/mkdocs-material-insiders.git
- run: mkdocs gh-deploy --force
- run: pip install mike
- run: mike deploy --push --update-aliases 0.1 latest
env:
GH_TOKEN: ${{ secrets.GH_TOKEN }}

3
.gitignore vendored
View File

@ -110,4 +110,5 @@ reports/
sn_profile_*/
!sn_profile_rapids
settings.dcf
tests/fakedata_generation/
tests/fakedata_generation/
site/

View File

@ -1,7 +1,7 @@
# Add New Features
!!! hint
We recommend reading the [Behavioral Features Introduction](/features/feature-introduction) before reading this page
We recommend reading the [Behavioral Features Introduction](../feature-introduction/) before reading this page
!!! hint
You won't have to deal with time zones, dates, times, data cleaning or preprocessing. The data that RAPIDS pipes to your feature extraction code is ready to process.
@ -116,9 +116,9 @@ The code to extract your behavioral features should be implemented in your provi
acc_data = filter_data_by_segment(acc_data, day_segment)
```
You should use the `filter_data_by_segment()` function to process and group those rows that belong to each of the [day segments RAPIDS could be configured with](/setup/configuration/#day-segments).
You should use the `filter_data_by_segment()` function to process and group those rows that belong to each of the [day segments RAPIDS could be configured with](../../setup/configuration/#day-segments).
Let's understand the `filter_data_by_segment()` function with an example. A RAPIDS user can extract features on any arbitrary [day segment](/setup/configuration/#day-segments). A day segment is a period of time that has a label and one or more instances. For example, the user (or you) could have requested features on a daily, weekly, and week-end basis for `p01`. The labels are arbritrary and the instances depend on the days a participant was monitored for:
Let's understand the `filter_data_by_segment()` function with an example. A RAPIDS user can extract features on any arbitrary [day segment](../../setup/configuration/#day-segments). A day segment is a period of time that has a label and one or more instances. For example, the user (or you) could have requested features on a daily, weekly, and week-end basis for `p01`. The labels are arbritrary and the instances depend on the days a participant was monitored for:
- the daily segment could be named `my_days` and if `p01` was monitored for 14 days, it would have 14 instances
- the weekly segment could be named `my_weeks` and if `p01` was monitored for 14 days, it would have 2 instances.
@ -182,4 +182,4 @@ The code to extract your behavioral features should be implemented in your provi
## New Features for Non-Existing Sensors
If you want to add features for a device or a sensor that we do not support at the moment (those that do not appear in the `"Existing Sensors"` list above), [contact us](/team) or request it on [Slack](/) and we can add the necessary code so you can follow the instructions above.
If you want to add features for a device or a sensor that we do not support at the moment (those that do not appear in the `"Existing Sensors"` list above), [contact us](../../team) or request it on [Slack](http://awareframework.com:3000/) and we can add the necessary code so you can follow the instructions above.

View File

@ -38,7 +38,7 @@ Every phone or Fitbit sensor has a corresponding config section in `config.yaml`
```
## Sensor Parameters
Each sensor configuration section has a `Parameters` subsection (see `#2` in the example). These are parameters that affect different aspects of how the raw data is download, and processed. The `TABLE` parameter exists for every sensor, but some sensors will have extra para meters like [`[PHONE_LOCATIONS]`](/features/phone-locations/). We explain these parameters in a table at the top of each sensor documentation page.
Each sensor configuration section has a `Parameters` subsection (see `#2` in the example). These are parameters that affect different aspects of how the raw data is download, and processed. The `TABLE` parameter exists for every sensor, but some sensors will have extra para meters like [`[PHONE_LOCATIONS]`](../phone-locations/). We explain these parameters in a table at the top of each sensor documentation page.
## Sensor Providers
Each sensor configuration section can have zero, one or more behavioral feature **providers** (see `#2` in the example). A provider is a script created by the core RAPIDS team or other researchers that extracts behavioral features for that sensor. For this accelerometer example we have two providers RAPIDS (see `#4`) and PANDA (see `#5`).

View File

@ -44,7 +44,7 @@ Features description for `[PHONE_ACCELEROMETER][PROVIDERS][RAPIDS]`:
## PANDA provider
These features are based on the work by [Panda et al](/citation#panda-accelerometer).
These features are based on the work by [Panda et al](../../citation#panda-accelerometer).
!!! info "Available day segments and platforms"
- Available for all day segments
@ -80,4 +80,4 @@ Features description for `[PHONE_ACCELEROMETER][PROVIDERS][PANDA]`:
!!! note "Assumptions/Observations"
1. Analyzing accelerometer data is a memory intensive task. If RAPIDS crashes is likely because the accelerometer dataset for a participant is to big to fit in memory. We are considering different alternatives to overcome this problem.
2. See [Panda et al](/citation#panda-accelerometer) for a definition of exertional and non-exertional activity episodes
2. See [Panda et al](../../citation#panda-accelerometer) for a definition of exertional and non-exertional activity episodes

View File

@ -61,4 +61,4 @@ Features description for `[PHONE_ACTIVITY_RECOGNITION][PROVIDERS][RAPIDS]`:
|`stationary`| `still` | `3`
|`unknown`| `unknown` | `4`
2. In AWARE, Activity Recognition data for Android and iOS are stored in two different database tables, RAPIDS automatically infers what platform each participant belongs to based on their [participant file](/setup/configuration/#participant-files).
2. In AWARE, Activity Recognition data for Android and iOS are stored in two different database tables, RAPIDS automatically infers what platform each participant belongs to based on their [participant file](../../setup/configuration/#participant-files).

View File

@ -6,13 +6,13 @@ Sensor parameters description for `[PHONE_APPLICATIONS_FOREGROUND]` (these param
|----------------|-----------------------------------------------------------------------------------------------------------------------------------
|`[TABLE]`| Database table where the applications foreground data is stored
|`[APPLICATION_CATEGORIES][CATALOGUE_SOURCE]` | `FILE` or `GOOGLE`. If `FILE`, app categories (genres) are read from `[CATALOGUE_FILE]`. If `[GOOGLE]`, app categories (genres) are scrapped from the Play Store
|`[APPLICATION_CATEGORIES][CATALOGUE_FILE]` | CSV file with a `package_name` and `genre` column. By default we provide the catalogue created by [Stachl et al](/citation#stachl-application-foreground) in `data/external/stachl_application_genre_catalogue.csv`
|`[APPLICATION_CATEGORIES][CATALOGUE_FILE]` | CSV file with a `package_name` and `genre` column. By default we provide the catalogue created by [Stachl et al](../../citation#stachl-applications-foreground) in `data/external/stachl_application_genre_catalogue.csv`
|`[APPLICATION_CATEGORIES][UPDATE_CATALOGUE_FILE]` | if `[CATALOGUE_SOURCE]` is equal to `FILE`, this flag signals whether or not to update `[CATALOGUE_FILE]`, if `[CATALOGUE_SOURCE]` is equal to `GOOGLE` all scraped genres will be saved to `[CATALOGUE_FILE]`
|`[APPLICATION_CATEGORIES][SCRAPE_MISSING_CATEGORIES]` | This flag signals whether or not to scrape categories (genres) missing from the `[CATALOGUE_FILE]`. If `[CATALOGUE_SOURCE]` is equal to `GOOGLE`, all genres are scraped anyway (this flag is ignored)
## RAPIDS provider
The app category (genre) catalogue used in these features was originally created by [Stachl et al](/citation#stachl-application-foreground).
The app category (genre) catalogue used in these features was originally created by [Stachl et al](../../citation#stachl-applications-foreground).
!!! info "Available day segments and platforms"
- Available for all day segments

View File

@ -11,13 +11,13 @@ Sensor parameters description for `[PHONE_LOCATIONS]`:
!!! note "Assumptions/Observations"
**Types of location data to use**
AWARE Android and iOS clients can collect location coordinates through the phone\'s GPS, the network cellular towers around the phone or Google\'s fused location API. If you want to use only the GPS provider set `[LOCATIONS_TO_USE]` to `GPS`, if you want to use all providers (not recommended due to the difference in accuracy) set `[LOCATIONS_TO_USE]` to `ALL`, if your AWARE client was configured to use fused location only or want to focus only on this provider, set `[LOCATIONS_TO_USE]` to `RESAMPLE_FUSED`. `RESAMPLE_FUSED` takes the original fused location coordinates and replicates each pair forward in time as long as the phone was sensing data as indicated by [`PHONE_VALID_SENSED_BINS`](/features/phone-data-quality/#phone-valid-sensed-bins), this is done because Google\'s API only logs a new location coordinate pair when it is sufficiently different in time or space from the previous one.
AWARE Android and iOS clients can collect location coordinates through the phone\'s GPS, the network cellular towers around the phone or Google\'s fused location API. If you want to use only the GPS provider set `[LOCATIONS_TO_USE]` to `GPS`, if you want to use all providers (not recommended due to the difference in accuracy) set `[LOCATIONS_TO_USE]` to `ALL`, if your AWARE client was configured to use fused location only or want to focus only on this provider, set `[LOCATIONS_TO_USE]` to `RESAMPLE_FUSED`. `RESAMPLE_FUSED` takes the original fused location coordinates and replicates each pair forward in time as long as the phone was sensing data as indicated by [`PHONE_VALID_SENSED_BINS`](../phone-data-quality/#phone-valid-sensed-bins), this is done because Google\'s API only logs a new location coordinate pair when it is sufficiently different in time or space from the previous one.
There are two parameters associated with resampling fused location. `FUSED_RESAMPLED_CONSECUTIVE_THRESHOLD` (in minutes, default 30) controls the maximum gap between any two coordinate pairs to replicate the last known pair (for example, participant A\'s phone did not collect data between 10.30am and 10:50am and between 11:05am and 11:40am, the last known coordinate pair will be replicated during the first period but not the second, in other words, we assume that we cannot longer guarantee the participant stayed at the last known location if the phone did not sense data for more than 30 minutes). `FUSED_RESAMPLED_TIME_SINCE_VALID_LOCATION` (in minutes, default 720 or 12 hours) stops the last known fused location from being replicated longer that this threshold even if the phone was sensing data continuously (for example, participant A went home at 9pm and their phone was sensing data without gaps until 11am the next morning, the last known location will only be replicated until 9am). If you have suggestions to modify or improve this resampling, let us know.
## BARNETT provider
These features are based on the original open-source implementation by [Barnett et al](/citation#barnett-locations) and some features created by [Canzian et al](/citation#barnett-locations).
These features are based on the original open-source implementation by [Barnett et al](../../citation#barnett-locations) and some features created by [Canzian et al](../../citation#barnett-locations).
!!! info "Available day segments and platforms"
@ -41,7 +41,7 @@ Parameters description for `[PHONE_LOCATIONS][PROVIDERS][BARNETT]`:
|`[COMPUTE]`| Set to `True` to extract `PHONE_LOCATIONS` features from the `BARNETT` provider|
|`[FEATURES]` | Features to be computed, see table below
|`[ACCURACY_LIMIT]` | An integer in meters, any location rows with an accuracy higher than this will be dropped. This number means there's a 68% probability the true location is within this radius
|`[TIMEZONE]` | Timezone where the location data was collected. By default points to the one defined in the [Initial configuration](/setup/configuration#timezone-of-your-study)
|`[TIMEZONE]` | Timezone where the location data was collected. By default points to the one defined in the [Initial configuration](../../setup/configuration#timezone-of-your-study)
|`[MINUTES_DATA_USED]` | Set to `True` to include an extra column in the final location feature file containing the number of minutes used to compute the features on each day segment. Use this for quality control purposes, the more data minutes exist for a period, the more reliable its features should be. For fused location, a single minute can contain more than one coordinate pair if the participant is moving fast enough.
@ -67,17 +67,17 @@ Features description for `[PHONE_LOCATIONS][PROVIDERS][BARNETT]` adapted from [B
!!! note "Assumptions/Observations"
**Barnett\'s et al features**
These features are based on a Pause-Flight model. A pause is defined as a mobiity trace (location pings) within a certain duration and distance (by default 300 seconds and 60 meters). A flight is any mobility trace between two pauses. Data is resampled and imputed before the features are computed. See [Barnett et al](/citation#barnett-locations) for more information. In RAPIDS we only expose two parameters for these features (timezone and accuracy limit). You can change other parameters in `src/features/phone_locations/barnett/library/MobilityFeatures.R`.
These features are based on a Pause-Flight model. A pause is defined as a mobiity trace (location pings) within a certain duration and distance (by default 300 seconds and 60 meters). A flight is any mobility trace between two pauses. Data is resampled and imputed before the features are computed. See [Barnett et al](../../citation#barnett-locations) for more information. In RAPIDS we only expose two parameters for these features (timezone and accuracy limit). You can change other parameters in `src/features/phone_locations/barnett/library/MobilityFeatures.R`.
**Significant Locations**
Significant locations are determined using K-means clustering on pauses longer than 10 minutes. The number of clusters (K) is increased until no two clusters are within 400 meters from each other. After this, pauses within a certain range of a cluster (200 meters by default) will count as a visit to that significant location. This description was adapted from the Supplementary Materials of [Barnett et al](/citation#barnett-locations).
Significant locations are determined using K-means clustering on pauses longer than 10 minutes. The number of clusters (K) is increased until no two clusters are within 400 meters from each other. After this, pauses within a certain range of a cluster (200 meters by default) will count as a visit to that significant location. This description was adapted from the Supplementary Materials of [Barnett et al](../../citation#barnett-locations).
**The Circadian Calculation**
For a detailed description of how this is calculated, see [Canzian et al](/citation#barnett-locations).
For a detailed description of how this is calculated, see [Canzian et al](../../citation#barnett-locations).
## DORYAB provider
These features are based on the original implementation by [Doryab et al.](/citation#doryab-locations).
These features are based on the original implementation by [Doryab et al.](../../citation#doryab-locations).
!!! info "Available day segments and platforms"
@ -117,7 +117,7 @@ Features description for `[PHONE_LOCATIONS][PROVIDERS][BARNETT]`:
|totaldistance |meters |Total distance travelled in a day segment using the haversine formula.
|averagespeed |km/hr |Average speed in a day segment considering only the instances labeled as Moving.
|varspeed |km/hr |Speed variance in a day segment considering only the instances labeled as Moving.
|circadianmovement |- | \"It encodes the extent to which a person's location patterns follow a 24-hour circadian cycle.\" [Doryab et al.](/citation#doryab-locations).
|circadianmovement |- | \"It encodes the extent to which a person's location patterns follow a 24-hour circadian cycle.\" [Doryab et al.](../../citation#doryab-locations).
|numberofsignificantplaces |places |Number of significant locations visited. It is calculated using the DBSCAN clustering algorithm which takes in EPS and MIN_SAMPLES as parameters to identify clusters. Each cluster is a significant place.
|numberlocationtransitions |transitions |Number of movements between any two clusters in a day segment.
|radiusgyration |meters |Quantifies the area covered by a participant
@ -139,4 +139,4 @@ Features description for `[PHONE_LOCATIONS][PROVIDERS][BARNETT]`:
Significant locations are determined using DBSCAN clustering on locations that a patient visit over the course of the period of data collection.
**The Circadian Calculation**
For a detailed description of how this is calculated, see [Canzian et al](/citation#doryab-locations).
For a detailed description of how this is calculated, see [Canzian et al](../../citation#doryab-locations).

View File

@ -1,18 +1,18 @@
# File Structure
!!! tip
Read this page if you want to learn more about how RAPIDS is structured. If you want to start using it go to [Installation](/setup/installation/) and then to [Initial Configuration](/setup/configuration/)
Read this page if you want to learn more about how RAPIDS is structured. If you want to start using it go to [Installation](../setup/installation/) and then to [Initial Configuration](../setup/configuration/)
All paths mentioned in this page are relative to RAPIDS' root folder.
If you want to extract the behavioral features that RAPIDS offers, you will only have to create or modify the [`.env` file](/setup/configuration/#database-credentials), [participants files](/setup/configuration/#participant-files), [day segment files](/setup/configuration/#day-segments), and the `config.yaml` file. The `config.yaml` file is the heart of RAPIDS and includes parameters to manage participants, data sources, sensor data, visualizations and more.
If you want to extract the behavioral features that RAPIDS offers, you will only have to create or modify the [`.env` file](../setup/configuration/#database-credentials), [participants files](../setup/configuration/#participant-files), [day segment files](../setup/configuration/#day-segments), and the `config.yaml` file. The `config.yaml` file is the heart of RAPIDS and includes parameters to manage participants, data sources, sensor data, visualizations and more.
All data is saved in `data/`. The `data/external/` folder stores any data imported or created by the user, `data/raw/` stores sensor data as imported from your database, `data/interim/` has intermediate files necessary to compute behavioral features from raw data, and `data/processed/` has all the final files with the behavioral features in folders per participant and sensor.
All the source code is saved in `src/`. The `src/data/` folder stores scripts to download, clean and pre-process sensor data, `src/features` has scripts to extract behavioral features organized in their respective subfolders , `src/models/` can host any script to create models or statistical analyses with the behavioral features you extract, and `src/visualization/` has scripts to create plots of the raw and processed data.
There are other important files and folders but only relevant if you are interested in extending RAPIDS (e.g. virtual env files, docs, tests, Dockerfile, the Snakefile, etc.). In the figure below, we represent the interactions between users and files. After a user modifies `config.yaml` and `.env` the `Snakefile` file will decide what Snakemake rules have to be executed to produce the required output files (behavioral features) and what scripts are in charge of producing such files. In addition, users can add or modifiy files in the `data` folder (for example to configure the [participants files](/setup/configuration/#participant-files) or the [day segment files](/setup/configuration/#day-segments)).
There are other important files and folders but only relevant if you are interested in extending RAPIDS (e.g. virtual env files, docs, tests, Dockerfile, the Snakefile, etc.). In the figure below, we represent the interactions between users and files. After a user modifies `config.yaml` and `.env` the `Snakefile` file will decide what Snakemake rules have to be executed to produce the required output files (behavioral features) and what scripts are in charge of producing such files. In addition, users can add or modifiy files in the `data` folder (for example to configure the [participants files](../setup/configuration/#participant-files) or the [day segment files](../setup/configuration/#day-segments)).
<figure>
<img src="/img/files.png" width="600" />

View File

@ -2,7 +2,7 @@
Reproducible Analysis Pipeline for Data Streams (RAPIDS) allows you to process smartphone and wearable data to extract **behavioral features** (a.k.a. digital biomarkers/phenotypes).
RAPIDS is open source, documented, modular, tested, and reproducible. At the moment we support smartphone data collected with [AWARE](awareframework.com/) and wearable data from Fitbit devices.
RAPIDS is open source, documented, modular, tested, and reproducible. At the moment we support smartphone data collected with [AWARE](https://awareframework.com/) and wearable data from Fitbit devices.
:material-slack: Questions or feedback can be posted on \#rapids in AWARE Framework\'s [slack](http://awareframework.com:3000/).
@ -10,7 +10,7 @@ RAPIDS is open source, documented, modular, tested, and reproducible. At the mom
:fontawesome-solid-tasks: Join our discussions on our algorithms and assumptions for feature [processing](https://github.com/carissalow/rapids/issues?q=is%3Aissue+is%3Aopen+label%3Adiscussion).
:fontawesome-solid-play: Ready to start? Go to [Installation](/setup/installation/) and then to [Initial Configuration](/setup/configuration/)
:fontawesome-solid-play: Ready to start? Go to [Installation](setup/installation/) and then to [Initial Configuration](setup/configuration/)
## How does it work?

View File

@ -10,7 +10,7 @@ You need to follow these steps to configure your RAPIDS deployment before you ca
5. Modify your [device data source configuration](#device-data-source-configuration)
6. Select what [sensors and features](#sensor-and-features-to-process) you want to process
When you are done with this initial configuration, go to [executing RAPIDS](/setup/execution).
When you are done with this initial configuration, go to [executing RAPIDS](setup/execution).
!!! hint
Every time you see `config["KEY"]` or `[KEY]` in these docs we are referring to the corresponding key in the `config.yaml` file.

View File

@ -1,6 +1,6 @@
# Execution
After you have [installed](/setup/installation) and [configured](/setup/configuration) RAPIDS, use the following command to execute it.
After you have [installed](../installation) and [configured](../configuration) RAPIDS, use the following command to execute it.
```bash
./rapids -j1
@ -33,4 +33,4 @@ After you have [installed](/setup/installation) and [configured](/setup/configur
```
!!! done "Ready to extract behavioral features"
If you are ready to extract features head over to the [Behavioral Features Introduction](/features/feature-introduction/)
If you are ready to extract features head over to the [Behavioral Features Introduction](../../features/feature-introduction/)

View File

@ -3,8 +3,8 @@ Minimal Working Example
This is a quick guide for creating and running a simple pipeline to extract missing, outgoing, and incoming call features for `daily` and `night` epochs of one participant monitored on the US East coast.
1. Install RAPIDS and make sure your `conda` environment is active (see [Installation](/setup/installation))
2. For the [Initial Configuration](/setup/configuration) steps do the following and use the example as a guide:
1. Install RAPIDS and make sure your `conda` environment is active (see [Installation](../../setup/installation))
2. For the [Initial Configuration](../../setup/configuration) steps do the following and use the example as a guide:
!!! info "Things to change on each configuration step"
1\. Setup your database connection credentials in `.env`. We assume your credentials group is called `MY_GROUP`.