Add visualization docs

pull/103/head
JulioV 2020-12-04 16:19:25 -05:00
parent a2a532ed81
commit fdd61521f8
13 changed files with 1661 additions and 0 deletions

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 288 KiB

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

View File

@ -0,0 +1,64 @@
# Data Quality Visualizations
We showcase these visualizations with a test study that collected 14 days of smartphone and Fitbit data from two participants (t01 and t02) and extracted behavioral features within five time segments (daily, morning, afternoon, evening, and night).
!!! note
[Time segments](../../setup/configuration#time-segments) (e.g. `daily`, `morning`, etc.) can have multiple instances (day 1, day 2, or morning 1, morning 2, etc.)
## 1. Histograms of phone data yield
RAPIDS provides two histograms that show the number of time segment instances that had a certain ratio of valid [yielded minutes and hours](../../features/phone-data-yield/#rapids-provider), respectively. A valid yielded minute has at least 1 row of data from any smartphone sensor and a valid yielded hour contains at least M valid minutes.
These plots can be used as a rough indication of the smartphone monitoring coverage during a study aggregated across all participants. For example, the figure below shows a valid yielded minutes histogram for daily segments and we can infer that the monitoring coverage was very good since almost all segments contain at least 90 to 100% of the expected sensed minutes.
!!! example
Click [here](../../img/h-data-yield.html) to see an example of these interactive visualizations in HTML format
<figure>
<img src="../../img/h-data-yield.png" max-width="100%" />
<figcaption>Histogram of the data yielded minute ratio for a single participant during five time segments (daily, afternoon, evening, and night)</figcaption>
</figure>
## 2. Heatmaps of overall data yield
These heatmaps are a break down per time segment and per participant of [Visualization 1](#1-histograms-of-phone-data-yield). Heatmap's rows represent participants, columns represent time segment instances and the cells color represent the valid yielded minute or hour ratio for a participant during a time segment instance.
As different participants might join a study on different dates and time segments can be of any length and start on any day, the x-axis is labelled with the time delta between the start of each time segment instance minus the start of the first instance. These plots provide a quick study overview of the monitoring coverage per person and per time segment.
The figure below shows the heatmap of the valid yielded minute ratio for participants t01 and t02 on daily segments and, as we inferred from the previous histogram, the lighter (yellow) color on most time segment instances (cells) indicate both phones sensed data without interruptions for most days (except for the first and last ones).
!!! example
Click [here](../../img/hm-data-yield-participants.html) to see an example of these interactive visualizations in HTML format
<figure>
<img src="../../img/hm-data-yield-participants.png" max-width="100%" />
<figcaption>Overall compliance heatmap for all participants</figcaption>
</figure>
## 3. Heatmap of recorded phone sensors
In these heatmaps rows represent time segment instances, columns represent minutes since the start of a time segment instance, and cells color shows the number of phone sensors that logged at least one row of data during those 1-minute windows.
RAPIDS creates a plot per participant and per time segment and can be used as a rough indication of whether time-based sensors were following their sensing schedule (e.g. if location was being sensed every 2 minutes).
The figure below shows this heatmap for phone sensors collected by participant t01 in daily time segments from Apr 23rd 2020 to May 4th 2020. We can infer that for most of the monitoring time, the participants phone logged data from at least 8 sensors each minute.
!!! example
Click [here](../../img/hm-phone-sensors.html) to see an example of these interactive visualizations in HTML format
<figure>
<img src="../../img/hm-phone-sensors.png" max-width="100%" />
<figcaption>Heatmap of the recorded phone sensors per minute and per time segment of a single participant</figcaption>
</figure>
## 4. Heatmap of sensor row count
These heatmaps are a per-sensor breakdown of [Visualization 1](#1-histograms-of-phone-data-yield) and [Visualization 2](#2-heatmaps-of-overall-data-yield). Note that the second row (ratio of valid yielded minutes) of this heatmap matches the respective participant (bottom) row the screenshot in Visualization 2.
In these heatmaps rows represent phone or Fitbit sensors, columns represent time segment instances and cells color shows the normalized (0 to 1) row count of each sensor within a time segment instance. RAPIDS creates one heatmap per participant and they can be used to judge missing data on a per participant and per sensor basis.
The figure below shows data for 16 phone sensors (including data yield) of t01s daily segments (only half of the sensor names and dates are visible in the screenshot but all can be accessed in the interactive plot). From the top two rows, we can see that the phone was sensing data for most of the monitoring period (as suggested by Figure 3 and Figure 4). We can also infer how phone usage influenced the different sensor streams; there are peaks of screen events during the first day (Apr 23rd), peaks of location coordinates on Apr 26th and Apr 30th, and no sent or received SMS except for Apr 23rd, Apr 29th and Apr 30th (unlabeled row between screen and locations).
!!! example
Click [here](../../img/hm-sensor_rows.html) to see an example of these interactive visualizations in HTML format
<figure>
<img src="../../img/hm-sensor_rows.png" max-width="100%" />
<figcaption>Heatmap of the sensor row count per time segment of a single participant</figcaption>
</figure>

View File

@ -0,0 +1,14 @@
# Feature Visualizations
## 1. Heatmap Correlation Matrix
Columns and rows are the behavioral features computed in RAPIDS, cells color represents the correlation coefficient between all days of data for every pair of features of all participants.
The user can specify a minimum number of observations ([time segment](../../setup/configuration#time-segments) instances) required to compute the correlation between two features using the `MIN_ROWS_RATIO` parameter (0.5 by default) and the correlation method (Pearson, Spearman or Kendall) with the `CORR_METHOD` parameter. In addition, this plot can be configured to only display correlation coefficients above a threshold using the `CORR_THRESHOLD` parameter (0.1 by default).
!!! example
Click [here](../../img/hm-feature-correlations.html) to see an example of these interactive visualizations in HTML format
<figure>
<img src="../../img/hm-feature-correlations.png" max-width="100%" />
<figcaption>Correlation matrix heatmap for all the features of all participants</figcaption>
</figure>

View File

@ -101,6 +101,9 @@ nav:
- Fitbit Steps Summary: features/fitbit-steps-summary.md
- Fitbit Steps Intraday: features/fitbit-steps-intraday.md
- Add New Features: features/add-new-features.md
- Visualizations:
- Data Quality: visualizations/data-quality-visualizations.md
- Features: visualizations/feature-visualizations.md
- Developers:
- Remote Support: developers/remote-support.md
- Virtual Environments: developers/virtual-environments.md