Update visualizations docs & add time flag for heatmap of overall data yield
|
@ -522,6 +522,7 @@ HISTOGRAM_PHONE_DATA_YIELD:
|
||||||
# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#2-heatmaps-of-overall-data-yield
|
# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#2-heatmaps-of-overall-data-yield
|
||||||
HEATMAP_PHONE_DATA_YIELD_PER_PARTICIPANT_PER_TIME_SEGMENT:
|
HEATMAP_PHONE_DATA_YIELD_PER_PARTICIPANT_PER_TIME_SEGMENT:
|
||||||
PLOT: False
|
PLOT: False
|
||||||
|
TIME: RELATIVE_TIME # ABSOLUTE_TIME or RELATIVE_TIME
|
||||||
|
|
||||||
# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#3-heatmap-of-recorded-phone-sensors
|
# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#3-heatmap-of-recorded-phone-sensors
|
||||||
HEATMAP_SENSORS_PER_MINUTE_PER_TIME_SEGMENT:
|
HEATMAP_SENSORS_PER_MINUTE_PER_TIME_SEGMENT:
|
||||||
|
@ -530,7 +531,7 @@ HEATMAP_SENSORS_PER_MINUTE_PER_TIME_SEGMENT:
|
||||||
# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#4-heatmap-of-sensor-row-count
|
# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#4-heatmap-of-sensor-row-count
|
||||||
HEATMAP_SENSOR_ROW_COUNT_PER_TIME_SEGMENT:
|
HEATMAP_SENSOR_ROW_COUNT_PER_TIME_SEGMENT:
|
||||||
PLOT: False
|
PLOT: False
|
||||||
SENSORS: [PHONE_ACCELEROMETER, PHONE_ACTIVITY_RECOGNITION, PHONE_APPLICATIONS_FOREGROUND, PHONE_BATTERY, PHONE_BLUETOOTH, PHONE_CALLS, PHONE_CONVERSATION, PHONE_LIGHT, PHONE_LOCATIONS, PHONE_MESSAGES, PHONE_SCREEN, PHONE_WIFI_CONNECTED, PHONE_WIFI_VISIBLE]
|
SENSORS: []
|
||||||
|
|
||||||
# Features ------
|
# Features ------
|
||||||
|
|
||||||
|
|
Before Width: | Height: | Size: 37 KiB After Width: | Height: | Size: 110 KiB |
After Width: | Height: | Size: 180 KiB |
After Width: | Height: | Size: 190 KiB |
Before Width: | Height: | Size: 46 KiB |
Before Width: | Height: | Size: 58 KiB After Width: | Height: | Size: 273 KiB |
After Width: | Height: | Size: 184 KiB |
Before Width: | Height: | Size: 53 KiB |
|
@ -1,5 +1,5 @@
|
||||||
# Data Quality Visualizations
|
# Data Quality Visualizations
|
||||||
We showcase these visualizations with a test study that collected 14 days of smartphone and Fitbit data from two participants (t01 and t02) and extracted behavioral features within five time segments (daily, morning, afternoon, evening, and night).
|
We showcase these visualizations with a test study that collected 14 days of smartphone and Fitbit data from two participants (example01 and example02) and extracted behavioral features within five time segments (daily, morning, afternoon, evening, and night).
|
||||||
|
|
||||||
!!! note
|
!!! note
|
||||||
[Time segments](../../setup/configuration#time-segments) (e.g. `daily`, `morning`, etc.) can have multiple instances (day 1, day 2, or morning 1, morning 2, etc.)
|
[Time segments](../../setup/configuration#time-segments) (e.g. `daily`, `morning`, etc.) can have multiple instances (day 1, day 2, or morning 1, morning 2, etc.)
|
||||||
|
@ -14,21 +14,33 @@ These plots can be used as a rough indication of the smartphone monitoring cover
|
||||||
|
|
||||||
<figure>
|
<figure>
|
||||||
<img src="../../img/h-data-yield.png" max-width="100%" />
|
<img src="../../img/h-data-yield.png" max-width="100%" />
|
||||||
<figcaption>Histogram of the data yielded minute ratio for a single participant during five time segments (daily, afternoon, evening, and night)</figcaption>
|
<figcaption>Histogram of the data yielded minute ratio for a single participant during five time segments (daily, morning, afternoon, evening, and night)</figcaption>
|
||||||
</figure>
|
</figure>
|
||||||
|
|
||||||
## 2. Heatmaps of overall data yield
|
## 2. Heatmaps of overall data yield
|
||||||
These heatmaps are a break down per time segment and per participant of [Visualization 1](#1-histograms-of-phone-data-yield). Heatmap's rows represent participants, columns represent time segment instances and the cells’ color represent the valid yielded minute or hour ratio for a participant during a time segment instance.
|
These heatmaps are a break down per time segment and per participant of [Visualization 1](#1-histograms-of-phone-data-yield). Heatmap's rows represent participants, columns represent time segment instances and the cells’ color represent the valid yielded minute or hour ratio for a participant during a time segment instance.
|
||||||
|
|
||||||
As different participants might join a study on different dates and time segments can be of any length and start on any day, the x-axis is labelled with the time delta between the start of each time segment instance minus the start of the first instance. These plots provide a quick study overview of the monitoring coverage per person and per time segment.
|
As different participants might join a study on different dates and time segments can be of any length and start on any day, the x-axis can be labelled with the absolute time of the start of each time segment instance or the time delta between the start of each time segment instance minus the start of the first instance. These plots provide a quick study overview of the monitoring coverage per person and per time segment.
|
||||||
|
|
||||||
The figure below shows the heatmap of the valid yielded minute ratio for participants t01 and t02 on daily segments and, as we inferred from the previous histogram, the lighter (yellow) color on most time segment instances (cells) indicate both phones sensed data without interruptions for most days (except for the first and last ones).
|
The figure below shows the heatmap of the valid yielded minute ratio for participants example01 and example02 on daily segments and, as we inferred from the previous histogram, the lighter (yellow) color on most time segment instances (cells) indicate both phones sensed data without interruptions for most days (except for the first and last ones).
|
||||||
|
|
||||||
|
=== "[ABSOLUTE_TIME]"
|
||||||
|
|
||||||
!!! example
|
!!! example
|
||||||
Click [here](../../img/hm-data-yield-participants.html) to see an example of these interactive visualizations in HTML format
|
Click [here](../../img/hm-data-yield-participants-absolute-time.html) to see an example of these interactive visualizations in HTML format
|
||||||
|
|
||||||
<figure>
|
<figure>
|
||||||
<img src="../../img/hm-data-yield-participants.png" max-width="100%" />
|
<img src="../../img/hm-data-yield-participants-absolute-time.png" max-width="100%" />
|
||||||
|
<figcaption>Overall compliance heatmap for all participants</figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
=== "[RELATIVE_TIME]"
|
||||||
|
|
||||||
|
!!! example
|
||||||
|
Click [here](../../img/hm-data-yield-participants-relative-time.html) to see an example of these interactive visualizations in HTML format
|
||||||
|
|
||||||
|
<figure>
|
||||||
|
<img src="../../img/hm-data-yield-participants-relative-time.png" max-width="100%" />
|
||||||
<figcaption>Overall compliance heatmap for all participants</figcaption>
|
<figcaption>Overall compliance heatmap for all participants</figcaption>
|
||||||
</figure>
|
</figure>
|
||||||
|
|
||||||
|
@ -38,7 +50,7 @@ In these heatmaps rows represent time segment instances, columns represent minut
|
||||||
|
|
||||||
RAPIDS creates a plot per participant and per time segment and can be used as a rough indication of whether time-based sensors were following their sensing schedule (e.g. if location was being sensed every 2 minutes).
|
RAPIDS creates a plot per participant and per time segment and can be used as a rough indication of whether time-based sensors were following their sensing schedule (e.g. if location was being sensed every 2 minutes).
|
||||||
|
|
||||||
The figure below shows this heatmap for phone sensors collected by participant t01 in daily time segments from Apr 23rd 2020 to May 4th 2020. We can infer that for most of the monitoring time, the participant’s phone logged data from at least 8 sensors each minute.
|
The figure below shows this heatmap for phone sensors collected by participant example01 in daily time segments from Apr 23rd 2020 to May 4th 2020. We can infer that for most of the monitoring time, the participant’s phone logged data from at least 7 sensors each minute.
|
||||||
|
|
||||||
!!! example
|
!!! example
|
||||||
Click [here](../../img/hm-phone-sensors.html) to see an example of these interactive visualizations in HTML format
|
Click [here](../../img/hm-phone-sensors.html) to see an example of these interactive visualizations in HTML format
|
||||||
|
@ -53,12 +65,12 @@ These heatmaps are a per-sensor breakdown of [Visualization 1](#1-histograms-of-
|
||||||
|
|
||||||
In these heatmaps rows represent phone or Fitbit sensors, columns represent time segment instances and cell’s color shows the normalized (0 to 1) row count of each sensor within a time segment instance. RAPIDS creates one heatmap per participant and they can be used to judge missing data on a per participant and per sensor basis.
|
In these heatmaps rows represent phone or Fitbit sensors, columns represent time segment instances and cell’s color shows the normalized (0 to 1) row count of each sensor within a time segment instance. RAPIDS creates one heatmap per participant and they can be used to judge missing data on a per participant and per sensor basis.
|
||||||
|
|
||||||
The figure below shows data for 16 phone sensors (including data yield) of t01’s daily segments (only half of the sensor names and dates are visible in the screenshot but all can be accessed in the interactive plot). From the top two rows, we can see that the phone was sensing data for most of the monitoring period (as suggested by Figure 3 and Figure 4). We can also infer how phone usage influenced the different sensor streams; there are peaks of screen events during the first day (Apr 23rd), peaks of location coordinates on Apr 26th and Apr 30th, and no sent or received SMS except for Apr 23rd, Apr 29th and Apr 30th (unlabeled row between screen and locations).
|
The figure below shows data for 14 phone sensors (including data yield) of example01’s daily segments. From the top two rows, we can see that the phone was sensing data for most of the monitoring period (as suggested by Figure 3 and Figure 4). We can also infer how phone usage influenced the different sensor streams; there are peaks of screen events during the first day (Apr 23rd), peaks of location coordinates on Apr 26th and Apr 30th, and no sent or received SMS except for Apr 23rd, Apr 29th and Apr 30th (unlabeled row between screen and locations).
|
||||||
|
|
||||||
!!! example
|
!!! example
|
||||||
Click [here](../../img/hm-sensor_rows.html) to see an example of these interactive visualizations in HTML format
|
Click [here](../../img/hm-sensor-rows.html) to see an example of these interactive visualizations in HTML format
|
||||||
|
|
||||||
<figure>
|
<figure>
|
||||||
<img src="../../img/hm-sensor_rows.png" max-width="100%" />
|
<img src="../../img/hm-sensor-rows.png" max-width="100%" />
|
||||||
<figcaption>Heatmap of the sensor row count per time segment of a single participant</figcaption>
|
<figcaption>Heatmap of the sensor row count per time segment of a single participant</figcaption>
|
||||||
</figure>
|
</figure>
|
||||||
|
|
|
@ -523,6 +523,7 @@ HISTOGRAM_PHONE_DATA_YIELD:
|
||||||
# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#2-heatmaps-of-overall-data-yield
|
# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#2-heatmaps-of-overall-data-yield
|
||||||
HEATMAP_PHONE_DATA_YIELD_PER_PARTICIPANT_PER_TIME_SEGMENT:
|
HEATMAP_PHONE_DATA_YIELD_PER_PARTICIPANT_PER_TIME_SEGMENT:
|
||||||
PLOT: True
|
PLOT: True
|
||||||
|
TIME: RELATIVE_TIME # ABSOLUTE_TIME or RELATIVE_TIME
|
||||||
|
|
||||||
# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#3-heatmap-of-recorded-phone-sensors
|
# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#3-heatmap-of-recorded-phone-sensors
|
||||||
HEATMAP_SENSORS_PER_MINUTE_PER_TIME_SEGMENT:
|
HEATMAP_SENSORS_PER_MINUTE_PER_TIME_SEGMENT:
|
||||||
|
@ -530,14 +531,14 @@ HEATMAP_SENSORS_PER_MINUTE_PER_TIME_SEGMENT:
|
||||||
|
|
||||||
# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#4-heatmap-of-sensor-row-count
|
# See https://www.rapids.science/latest/visualizations/data-quality-visualizations/#4-heatmap-of-sensor-row-count
|
||||||
HEATMAP_SENSOR_ROW_COUNT_PER_TIME_SEGMENT:
|
HEATMAP_SENSOR_ROW_COUNT_PER_TIME_SEGMENT:
|
||||||
PLOT: False
|
PLOT: True
|
||||||
SENSORS: [PHONE_ACTIVITY_RECOGNITION, PHONE_APPLICATIONS_FOREGROUND, PHONE_BATTERY, PHONE_BLUETOOTH, PHONE_CALLS, PHONE_CONVERSATION, PHONE_LIGHT, PHONE_LOCATIONS, PHONE_MESSAGES, PHONE_SCREEN, PHONE_WIFI_CONNECTED, PHONE_WIFI_VISIBLE]
|
SENSORS: [PHONE_ACTIVITY_RECOGNITION, PHONE_APPLICATIONS_FOREGROUND, PHONE_BATTERY, PHONE_BLUETOOTH, PHONE_CALLS, PHONE_CONVERSATION, PHONE_LIGHT, PHONE_LOCATIONS, PHONE_MESSAGES, PHONE_SCREEN, PHONE_WIFI_CONNECTED, PHONE_WIFI_VISIBLE]
|
||||||
|
|
||||||
# Features ------
|
# Features ------
|
||||||
|
|
||||||
# See https://www.rapids.science/latest/visualizations/feature-visualizations/#1-heatmap-correlation-matrix
|
# See https://www.rapids.science/latest/visualizations/feature-visualizations/#1-heatmap-correlation-matrix
|
||||||
HEATMAP_FEATURE_CORRELATION_MATRIX:
|
HEATMAP_FEATURE_CORRELATION_MATRIX:
|
||||||
PLOT: False
|
PLOT: True
|
||||||
MIN_ROWS_RATIO: 0.5
|
MIN_ROWS_RATIO: 0.5
|
||||||
CORR_THRESHOLD: 0.1
|
CORR_THRESHOLD: 0.1
|
||||||
CORR_METHOD: "pearson" # choose from {"pearson", "kendall", "spearman"}
|
CORR_METHOD: "pearson" # choose from {"pearson", "kendall", "spearman"}
|
||||||
|
|
|
@ -52,6 +52,8 @@ rule heatmap_phone_data_yield_per_participant_per_time_segment:
|
||||||
phone_data_yield = expand("data/processed/features/{pid}/phone_data_yield.csv", pid=config["PIDS"]),
|
phone_data_yield = expand("data/processed/features/{pid}/phone_data_yield.csv", pid=config["PIDS"]),
|
||||||
participant_file = expand("data/external/participant_files/{pid}.yaml", pid=config["PIDS"]),
|
participant_file = expand("data/external/participant_files/{pid}.yaml", pid=config["PIDS"]),
|
||||||
time_segments_labels = expand("data/interim/time_segments/{pid}_time_segments_labels.csv", pid=config["PIDS"])
|
time_segments_labels = expand("data/interim/time_segments/{pid}_time_segments_labels.csv", pid=config["PIDS"])
|
||||||
|
params:
|
||||||
|
time = config["HEATMAP_PHONE_DATA_YIELD_PER_PARTICIPANT_PER_TIME_SEGMENT"]["TIME"]
|
||||||
output:
|
output:
|
||||||
"reports/data_exploration/heatmap_phone_data_yield_per_participant_per_time_segment.html"
|
"reports/data_exploration/heatmap_phone_data_yield_per_participant_per_time_segment.html"
|
||||||
script:
|
script:
|
||||||
|
|
|
@ -7,20 +7,23 @@ import yaml
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
def getPhoneDataYieldHeatmap(data_for_plot, y_axis_labels, time_segment, type, html_file):
|
def getPhoneDataYieldHeatmap(data_for_plot, y_axis_labels, time_segment, type, time, html_file):
|
||||||
|
|
||||||
fig = go.Figure(data=go.Heatmap(z=data_for_plot.values.tolist(),
|
fig = go.Figure(data=go.Heatmap(z=data_for_plot.values.tolist(),
|
||||||
x=data_for_plot.columns.tolist(),
|
x=data_for_plot.columns.tolist(),
|
||||||
y=y_axis_labels,
|
y=y_axis_labels,
|
||||||
hovertext=data_for_plot.values.tolist(),
|
hovertext=data_for_plot.values.tolist(),
|
||||||
hovertemplate="Time since first segment: %{x}<br>Participant: %{y}<br>Ratiovalidyielded" + type + ": %{z}<extra></extra>",
|
hovertemplate="Time since first segment: %{x}<br>Participant: %{y}<br>Ratiovalidyielded" + type + ": %{z}<extra></extra>" if time == "RELATIVE_TIME" else "Time: %{x}<br>Participant: %{y}<br>Ratiovalidyielded" + type + ": %{z}<extra></extra>",
|
||||||
zmin=0, zmax=1,
|
zmin=0, zmax=1,
|
||||||
colorscale="Viridis"))
|
colorscale="Viridis"))
|
||||||
|
|
||||||
|
if time == "RELATIVE_TIME":
|
||||||
fig.update_layout(title="Heatmap of valid yielded " + type + " ratio for " + time_segment + " segments.<br>y-axis shows participant information (format: pid.label).<br>x-axis shows the time since their first segment.<br>z-axis (color) shows valid yielded " + type + " ratio during a segment instance.")
|
fig.update_layout(title="Heatmap of valid yielded " + type + " ratio for " + time_segment + " segments.<br>y-axis shows participant information (format: pid.label).<br>x-axis shows the time since their first segment.<br>z-axis (color) shows valid yielded " + type + " ratio during a segment instance.")
|
||||||
|
else:
|
||||||
|
fig.update_layout(title="Heatmap of valid yielded " + type + " ratio for " + time_segment + " segments.<br>y-axis shows participant information (format: pid.label).<br>x-axis shows the time.<br>z-axis (color) shows valid yielded " + type + " ratio during a segment instance.")
|
||||||
|
|
||||||
fig["layout"]["xaxis"].update(side="bottom")
|
fig["layout"]["xaxis"].update(side="bottom")
|
||||||
fig["layout"].update(xaxis_title="Time Since First Segment")
|
fig["layout"].update(xaxis_title="Time Since First Segment" if time == "RELATIVE_TIME" else "Time")
|
||||||
fig["layout"].update(margin=dict(t=160))
|
fig["layout"].update(margin=dict(t=160))
|
||||||
|
|
||||||
html_file.write(fig.to_html(full_html=False, include_plotlyjs="cdn"))
|
html_file.write(fig.to_html(full_html=False, include_plotlyjs="cdn"))
|
||||||
|
@ -30,7 +33,7 @@ def getPhoneDataYieldHeatmap(data_for_plot, y_axis_labels, time_segment, type, h
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
time = snakemake.params["time"]
|
||||||
y_axis_labels, phone_data_yield_minutes, phone_data_yield_hours = [], {}, {}
|
y_axis_labels, phone_data_yield_minutes, phone_data_yield_hours = [], {}, {}
|
||||||
for phone_data_yield_path, participant_file_path, time_segments_path in zip(snakemake.input["phone_data_yield"], snakemake.input["participant_file"], snakemake.input["time_segments_labels"]):
|
for phone_data_yield_path, participant_file_path, time_segments_path in zip(snakemake.input["phone_data_yield"], snakemake.input["participant_file"], snakemake.input["time_segments_labels"]):
|
||||||
|
|
||||||
|
@ -58,8 +61,13 @@ for phone_data_yield_path, participant_file_path, time_segments_path in zip(snak
|
||||||
|
|
||||||
if not phone_data_yield_per_segment.empty:
|
if not phone_data_yield_per_segment.empty:
|
||||||
|
|
||||||
|
if time == "RELATIVE_TIME":
|
||||||
# set number of minutes after the first start date time of local segments as x_axis_label
|
# set number of minutes after the first start date time of local segments as x_axis_label
|
||||||
phone_data_yield_per_segment.index = phone_data_yield_per_segment.index - phone_data_yield_per_segment.index.min()
|
phone_data_yield_per_segment.index = phone_data_yield_per_segment.index - phone_data_yield_per_segment.index.min()
|
||||||
|
elif time == "ABSOLUTE_TIME":
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
raise ValueError("[HEATMAP_PHONE_DATA_YIELD_PER_PARTICIPANT_PER_TIME_SEGMENT][TIME] can only be RELATIVE_TIME or ABSOLUTE_TIME")
|
||||||
|
|
||||||
phone_data_yield_minutes_per_segment = phone_data_yield_per_segment[["phone_data_yield_rapids_ratiovalidyieldedminutes"]].rename(columns={"phone_data_yield_rapids_ratiovalidyieldedminutes": y_axis_label})
|
phone_data_yield_minutes_per_segment = phone_data_yield_per_segment[["phone_data_yield_rapids_ratiovalidyieldedminutes"]].rename(columns={"phone_data_yield_rapids_ratiovalidyieldedminutes": y_axis_label})
|
||||||
phone_data_yield_hours_per_segment = phone_data_yield_per_segment[["phone_data_yield_rapids_ratiovalidyieldedhours"]].rename(columns={"phone_data_yield_rapids_ratiovalidyieldedhours": y_axis_label})
|
phone_data_yield_hours_per_segment = phone_data_yield_per_segment[["phone_data_yield_rapids_ratiovalidyieldedhours"]].rename(columns={"phone_data_yield_rapids_ratiovalidyieldedhours": y_axis_label})
|
||||||
|
@ -79,7 +87,7 @@ for time_segment in phone_data_yield_minutes.keys():
|
||||||
minutes_data_for_plot = phone_data_yield_minutes[time_segment].transpose().reindex(pd.Index(y_axis_labels)).round(3)
|
minutes_data_for_plot = phone_data_yield_minutes[time_segment].transpose().reindex(pd.Index(y_axis_labels)).round(3)
|
||||||
hours_data_for_plot = phone_data_yield_hours[time_segment].transpose().reindex(pd.Index(y_axis_labels)).round(3)
|
hours_data_for_plot = phone_data_yield_hours[time_segment].transpose().reindex(pd.Index(y_axis_labels)).round(3)
|
||||||
|
|
||||||
getPhoneDataYieldHeatmap(minutes_data_for_plot, y_axis_labels, time_segment, "minutes", html_file)
|
getPhoneDataYieldHeatmap(minutes_data_for_plot, y_axis_labels, time_segment, "minutes", time, html_file)
|
||||||
getPhoneDataYieldHeatmap(hours_data_for_plot, y_axis_labels, time_segment, "hours", html_file)
|
getPhoneDataYieldHeatmap(hours_data_for_plot, y_axis_labels, time_segment, "hours", time, html_file)
|
||||||
|
|
||||||
html_file.close()
|
html_file.close()
|
||||||
|
|