diff --git a/dev/img/hm-data-yield-participants-absolute-time.html b/dev/img/hm-data-yield-participants-absolute-time.html index e902c07c..527b5e69 100644 --- a/dev/img/hm-data-yield-participants-absolute-time.html +++ b/dev/img/hm-data-yield-participants-absolute-time.html @@ -1,11 +1,11 @@
These heatmaps are a break down per time segment and per participant of Visualization 1. Heatmap’s rows represent participants, columns represent time segment instances and the cells’ color represent the valid yielded minute or hour ratio for a participant during a time segment instance.
-As different participants might join a study on different dates and time segments can be of any length and start on any day, the x-axis can be labelled with the absolute time of the start of each time segment instance or the time delta between the start of each time segment instance minus the start of the first instance. These plots provide a quick study overview of the monitoring coverage per person and per time segment.
+As different participants might join a study on different dates and time segments can be of any length and start on any day, the x-axis can be labelled with the absolute time of each time segment instance or the time delta between each time segment instance and the start of the first instance for each participant. These plots provide a quick study overview of the monitoring coverage per person and per time segment.
The figure below shows the heatmap of the valid yielded minute ratio for participants example01 and example02 on daily segments and, as we inferred from the previous histogram, the lighter (yellow) color on most time segment instances (cells) indicate both phones sensed data without interruptions for most days (except for the first and last ones).
These heatmaps are a per-sensor breakdown of Visualization 1 and Visualization 2. Note that the second row (ratio of valid yielded minutes) of this heatmap matches the respective participant (bottom) row the screenshot in Visualization 2.
-In these heatmaps rows represent phone or Fitbit sensors, columns represent time segment instances and cell’s color shows the normalized (0 to 1) row count of each sensor within a time segment instance. RAPIDS creates one heatmap per participant and they can be used to judge missing data on a per participant and per sensor basis.
+In these heatmaps rows represent phone or Fitbit sensors, columns represent time segment instances and cell’s color shows the normalized (0 to 1) row count of each sensor within a time segment instance. A grey cell represents missing data in that time segment instance. RAPIDS creates one heatmap per participant and they can be used to judge missing data on a per participant and per sensor basis.
The figure below shows data for 14 phone sensors (including data yield) of example01’s daily segments. From the top two rows, we can see that the phone was sensing data for most of the monitoring period (as suggested by Figure 3 and Figure 4). We can also infer how phone usage influenced the different sensor streams; there are peaks of screen events during the first day (Apr 23rd), peaks of location coordinates on Apr 26th and Apr 30th, and no sent or received SMS except for Apr 23rd, Apr 29th and Apr 30th (unlabeled row between screen and locations).
Example
diff --git a/dev/workflow-examples/analysis/index.html b/dev/workflow-examples/analysis/index.html index 04299980..209fef76 100644 --- a/dev/workflow-examples/analysis/index.html +++ b/dev/workflow-examples/analysis/index.html @@ -1899,7 +1899,7 @@At this point the user can use the five plots RAPIDS provides (or implement new ones) to explore and understand the quality of the raw data and extracted features and decide what sensors, days, or participants to include and exclude. Refer to rules/reports.smk
to find the rules that generate these plots.
In this stage we perform four steps to clean our sensor feature file. First, we discard days with a data yield hour ratio less than or equal to 0.75, i.e. we include days with at least 18 hours of data. Second, we drop columns (features) with more than 30% of missing rows. Third, we drop columns with zero variance. Fourth, we drop rows (days) with more than 30% of missing columns (features). In this cleaning stage several parameters are created and exposed in example_profile/example_config.yaml
.
After this step, we kept 158 features over 11 days for the individual model of p01, 101 features over 12 days for the individual model of p02 and 106 features over 20 days for the population model. Note that the difference in the number of features between p01 and p02 is mostly due to iOS restrictions that stops researchers from collecting the same number of sensors than in Android phones.
+After this step, we kept 161 features over 11 days for the individual model of p01, 101 features over 12 days for the individual model of p02 and 109 features over 20 days for the population model. Note that the difference in the number of features between p01 and p02 is mostly due to iOS restrictions that stops researchers from collecting the same number of sensors than in Android phones.
Feature cleaning for the individual models is done in the clean_sensor_features_for_individual_participants
rule and for the population model in the clean_sensor_features_for_all_participants
rule in rules/models.smk
.
In this step we merge the cleaned features and target labels for our individual models in the merge_features_and_targets_for_individual_model
rule in rules/models.smk
. Additionally, we merge the cleaned features, target labels, and demographic features of our two participants for the population model in the merge_features_and_targets_for_population_model
rule in rules/models.smk
. These two merged files are the input for our individual and population models.