Refactor sms feature: replace "metrics" with "features"

Co-authored-by: Meng Li <AnnieLM1996@gmail.com>
pull/95/head
Mingze Cao 2020-04-08 15:02:58 -05:00
parent 97c83caebc
commit c2fbc3ad7a
4 changed files with 34 additions and 34 deletions

View File

@ -34,10 +34,10 @@ DOWNLOAD_DATASET:
READABLE_DATETIME: READABLE_DATETIME:
FIXED_TIMEZONE: *timezone FIXED_TIMEZONE: *timezone
# Communication SMS features config, TYPES and METRICS keys need to match # Communication SMS features config, TYPES and FEATURES keys need to match
SMS: SMS:
TYPES : [received, sent] TYPES : [received, sent]
METRICS: FEATURES:
received: [count, distinctcontacts, timefirstsms, timelastsms, countmostfrequentcontact] received: [count, distinctcontacts, timefirstsms, timelastsms, countmostfrequentcontact]
sent: [count, distinctcontacts, timefirstsms, timelastsms, countmostfrequentcontact] sent: [count, distinctcontacts, timefirstsms, timelastsms, countmostfrequentcontact]
DAY_SEGMENTS: *day_segments DAY_SEGMENTS: *day_segments

View File

@ -1,6 +1,6 @@
.. _rapids_metrics: .. _rapids_features:
RAPIDS Metrics RAPIDS Features
=============== ===============
Global Parameters Global Parameters
@ -24,7 +24,7 @@ Global Parameters
.. _day-segments: .. _day-segments:
- ``DAY_SEGMENTS`` - The list of day epochs that metrics can be segmented into: ``daily``, ``morning`` (6am-12pm), ``afternnon`` (12pm-6pm), ``evening`` (6pm-12am) and ``night`` (12am-6am). This list can be modified globally or on a per sensor basis. See DAY_SEGMENTS_ in ``config`` file. - ``DAY_SEGMENTS`` - The list of day epochs that features can be segmented into: ``daily``, ``morning`` (6am-12pm), ``afternnon`` (12pm-6pm), ``evening`` (6pm-12am) and ``night`` (12am-6am). This list can be modified globally or on a per sensor basis. See DAY_SEGMENTS_ in ``config`` file.
.. _timezone: .. _timezone:
@ -53,11 +53,11 @@ Global Parameters
Contains three attributes: ``BIN_SIZE``, ``MIN_VALID_HOURS``, ``MIN_BINS_PER_HOUR``. Contains three attributes: ``BIN_SIZE``, ``MIN_VALID_HOURS``, ``MIN_BINS_PER_HOUR``.
On any given day, Aware could have sensed data only for a few minutes or for 24 hours. Daily estimates of metrics should be considered more reliable the more hours Aware was running and logging data (for example, 10 calls logged on a day when only one hour of data was recorded is a less reliable measurement compared to 10 calls on a day when 23 hours of data were recorded. On any given day, Aware could have sensed data only for a few minutes or for 24 hours. Daily estimates of features should be considered more reliable the more hours Aware was running and logging data (for example, 10 calls logged on a day when only one hour of data was recorded is a less reliable measurement compared to 10 calls on a day when 23 hours of data were recorded.
Therefore, we define a valid hour as those that contain at least a certain number of valid bins. In turn, a valid bin are those that contain at least one row of data from any sensor logged within that period. We divide an hour into N bins of size ``BIN_SIZE`` (in minutes) and we mark an hour as valid if contains at least ``MIN_BINS_PER_HOUR`` of valid bins (out of the total possible number of bins that can be captured in an hour i.e. out of 60min/``BIN_SIZE`` bins). Days with valid sensed hours less than ``MIN_VALID_HOURS`` will be excluded form the output of this file. See PHONE_VALID_SENSED_DAYS_ in ``config.yaml``. Therefore, we define a valid hour as those that contain at least a certain number of valid bins. In turn, a valid bin are those that contain at least one row of data from any sensor logged within that period. We divide an hour into N bins of size ``BIN_SIZE`` (in minutes) and we mark an hour as valid if contains at least ``MIN_BINS_PER_HOUR`` of valid bins (out of the total possible number of bins that can be captured in an hour i.e. out of 60min/``BIN_SIZE`` bins). Days with valid sensed hours less than ``MIN_VALID_HOURS`` will be excluded form the output of this file. See PHONE_VALID_SENSED_DAYS_ in ``config.yaml``.
In RAPIDS, you will find that we use ``phone_sensed_bins`` (a list of all valid and invalid bins of all monitored days) to improve the estimation of metrics that are ratios over time periods like ``episodepersensedminutes`` of :ref:`Screen<screen-sensor-doc>`. In RAPIDS, you will find that we use ``phone_sensed_bins`` (a list of all valid and invalid bins of all monitored days) to improve the estimation of features that are ratios over time periods like ``episodepersensedminutes`` of :ref:`Screen<screen-sensor-doc>`.
.. _individual-sensor-settings: .. _individual-sensor-settings:
@ -88,7 +88,7 @@ See `SMS Config Code`_
.. - Apply readable datetime to SMS dataset: ``expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["SENSORS"]),`` .. - Apply readable datetime to SMS dataset: ``expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["SENSORS"]),``
- Extract SMS metrics: - Extract SMS features:
| ``expand("data/processed/{pid}/sms_{sms_type}_{day_segment}.csv".`` | ``expand("data/processed/{pid}/sms_{sms_type}_{day_segment}.csv".``
| ``pid=config["PIDS"],`` | ``pid=config["PIDS"],``
@ -105,9 +105,9 @@ See `SMS Config Code`_
- **Script:** ``src/data/readable_datetime.R`` - See the readable_datetime.R_ script. - **Script:** ``src/data/readable_datetime.R`` - See the readable_datetime.R_ script.
- **Rule:** ``rules/features.snakefile/sms_metrics`` - See the sms_metric_ rule. - **Rule:** ``rules/features.snakefile/sms_features`` - See the sms_feature_ rule.
- **Script:** ``src/features/sms_metrics.R`` - See the sms_metrics.R_ script. - **Script:** ``src/features/sms_features.R`` - See the sms_features.R_ script.
.. _sms-parameters: .. _sms-parameters:
@ -119,14 +119,14 @@ Name Description
============ =================== ============ ===================
sms_type The particular ``sms_type`` that will be analyzed. The options for this parameter are ``received`` or ``sent``. sms_type The particular ``sms_type`` that will be analyzed. The options for this parameter are ``received`` or ``sent``.
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night`` day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
metrics The different measures that can be retrieved from the dataset. These metrics are available for both ``sent`` and ``received`` SMS messages. See :ref:`Available SMS Metrics <sms-available-metrics>` Table below features The different measures that can be retrieved from the dataset. These features are available for both ``sent`` and ``received`` SMS messages. See :ref:`Available SMS Features <sms-available-features>` Table below
============ =================== ============ ===================
.. _sms-available-metrics: .. _sms-available-featues:
**Available SMS Metrics** **Available SMS Featues**
The following table shows a list of the available metrics for both ``sent`` and ``received`` SMS. The following table shows a list of the available featues for both ``sent`` and ``received`` SMS.
========================= ========= ============= ========================= ========= =============
Name Units Description Name Units Description
@ -140,14 +140,14 @@ countmostfrequentcontact SMS The count of the number of sms message
**Assumptions/Observations:** **Assumptions/Observations:**
#. ``TYPES`` and ``METRICS`` keys need to match. From example:: #. ``TYPES`` and ``FEATURES`` keys need to match. From example::
SMS: SMS:
TYPES: [sent] TYPES: [sent]
METRICS: FEATURES:
sent: [count, distinctcontacts, timefirstsms, timelastsms, countmostfrequentcontact] sent: [count, distinctcontacts, timefirstsms, timelastsms, countmostfrequentcontact]
In the above config setting code the ``TYPE`` ``sent`` matches the ``METRICS`` key ``sent``. In the above config setting code the ``TYPE`` ``sent`` matches the ``FEATURES`` key ``sent``.
.. _call-sensor-doc: .. _call-sensor-doc:
@ -731,7 +731,7 @@ See `Light Config Code`_
Name Description Name Description
============ =================== ============ ===================
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night`` day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
features The different measures that can be retrieved from the Light dataset. See :ref:`Available Light Metrics <light-available-features>` Table below features The different measures that can be retrieved from the Light dataset. See :ref:`Available Light Features <light-available-features>` Table below
============ =================== ============ ===================
.. _light-available-features: .. _light-available-features:
@ -1102,15 +1102,15 @@ See `Fitbit: Steps Config Code`_
Name Description Name Description
======================= =================== ======================= ===================
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night`` day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
metrics The different measures that can be retrieved from the dataset. See :ref:`Available Fitbit: Steps Metrics <fitbit-steps-available-metrics>` Table below features The different measures that can be retrieved from the dataset. See :ref:`Available Fitbit: Steps Metrics <fitbit-steps-available-metrics>` Table below
threshold_active_bout The maximum number of steps per minute necessary for a bout to be ``sedentary``. That is, if the step count per minute is greater than this value the bout has a status of ``active``. threshold_active_bout The maximum number of steps per minute necessary for a bout to be ``sedentary``. That is, if the step count per minute is greater than this value the bout has a status of ``active``.
======================= =================== ======================= ===================
.. _fitbit-steps-available-metrics: .. _fitbit-steps-available-features:
**Available Fitbit: Steps Metrics** **Available Fitbit: Steps Features**
The following table shows a list of the available metrics for the Fitbit: Steps dataset. The following table shows a list of the available features for the Fitbit: Steps dataset.
========================= ========= ============= ========================= ========= =============
Name Units Description Name Units Description
@ -1141,8 +1141,8 @@ stddurationactivebout minutes Std duration active bout: The standard
.. _`SMS Config Code`: https://github.com/carissalow/rapids/blob/f22d1834ee24ab3bcbf051bc3cc663903d822084/config.yaml#L38 .. _`SMS Config Code`: https://github.com/carissalow/rapids/blob/f22d1834ee24ab3bcbf051bc3cc663903d822084/config.yaml#L38
.. _AWARE: https://awareframework.com/what-is-aware/ .. _AWARE: https://awareframework.com/what-is-aware/
.. _`List of Timezones`: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones .. _`List of Timezones`: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
.. _sms_metric: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L1 .. _sms_featue: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L1
.. _sms_metrics.R: https://github.com/carissalow/rapids/blob/master/src/features/sms_metrics.R .. _sms_featues.R: https://github.com/carissalow/rapids/blob/master/src/features/sms_featues.R
.. _download_dataset: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L9 .. _download_dataset: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L9
.. _download_dataset.R: https://github.com/carissalow/rapids/blob/master/src/data/download_dataset.R .. _download_dataset.R: https://github.com/carissalow/rapids/blob/master/src/data/download_dataset.R
.. _readable_datetime: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L21 .. _readable_datetime: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L21
@ -1156,8 +1156,8 @@ stddurationactivebout minutes Std duration active bout: The standard
.. _bluetooth_feature: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L63 .. _bluetooth_feature: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L63
.. _bluetooth_features.R: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/src/features/bluetooth_features.R .. _bluetooth_features.R: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/src/features/bluetooth_features.R
.. _`Accelerometer Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L98 .. _`Accelerometer Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L98
.. _accelerometer_metrics: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L124 .. _accelerometer_featues: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L124
.. _accelerometer_metrics.py: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/src/features/accelerometer_metrics.py .. _accelerometer_featues.py: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/src/features/accelerometer_featues.py
.. _`Applications Foreground Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L102 .. _`Applications Foreground Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L102
.. _`Application Genres Config`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L54 .. _`Application Genres Config`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L54
.. _application_genres: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L81 .. _application_genres: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L81
@ -1172,7 +1172,7 @@ stddurationactivebout minutes Std duration active bout: The standard
.. _`Google Activity Recognition Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L80 .. _`Google Activity Recognition Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L80
.. _google_activity_recognition_deltas: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L41 .. _google_activity_recognition_deltas: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L41
.. _google_activity_recognition_deltas.R: https://github.com/carissalow/rapids/blob/master/src/features/google_activity_recognition_deltas.R .. _google_activity_recognition_deltas.R: https://github.com/carissalow/rapids/blob/master/src/features/google_activity_recognition_deltas.R
.. _activity_metrics: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L74 .. _activity_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L74
.. _google_activity_recognition.py: https://github.com/carissalow/rapids/blob/master/src/features/google_activity_recognition.py .. _google_activity_recognition.py: https://github.com/carissalow/rapids/blob/master/src/features/google_activity_recognition.py
.. _`Light Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L94 .. _`Light Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L94
.. _light_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L113 .. _light_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L113

View File

@ -1,14 +1,14 @@
rule sms_metrics: rule sms_features:
input: input:
"data/raw/{pid}/messages_with_datetime.csv" "data/raw/{pid}/messages_with_datetime.csv"
params: params:
sms_type = "{sms_type}", sms_type = "{sms_type}",
day_segment = "{day_segment}", day_segment = "{day_segment}",
metrics = lambda wildcards: config["SMS"]["METRICS"][wildcards.sms_type] features = lambda wildcards: config["SMS"]["FEATURES"][wildcards.sms_type]
output: output:
"data/processed/{pid}/sms_{sms_type}_{day_segment}.csv" "data/processed/{pid}/sms_{sms_type}_{day_segment}.csv"
script: script:
"../src/features/sms_metrics.R" "../src/features/sms_features.R"
rule call_features: rule call_features:
input: input:

View File

@ -7,14 +7,14 @@ library(dplyr, warn.conflicts = FALSE)
sms <- read.csv(snakemake@input[[1]]) sms <- read.csv(snakemake@input[[1]])
day_segment <- snakemake@params[["day_segment"]] day_segment <- snakemake@params[["day_segment"]]
metrics <- snakemake@params[["metrics"]] requested_features <- snakemake@params[["features"]]
sms_type <- snakemake@params[["sms_type"]] sms_type <- snakemake@params[["sms_type"]]
features <- data.frame(local_date = character(), stringsAsFactors = FALSE) features <- data.frame(local_date = character(), stringsAsFactors = FALSE)
# Compute base SMS features # Compute base SMS features
features <- merge(features, base_sms_features(sms, sms_type, day_segment, metrics), by="local_date", all = TRUE) features <- merge(features, base_sms_features(sms, sms_type, day_segment, requested_features), by="local_date", all = TRUE)
if(ncol(features) != length(metrics) + 1) if(ncol(features) != length(requested_features) + 1)
stop(paste0("The number of features in the output dataframe (=", ncol(features),") does not match the expected value (=", length(metrics)," + 1). Verify your SMS feature extraction functions")) stop(paste0("The number of features in the output dataframe (=", ncol(features),") does not match the expected value (=", length(requested_features)," + 1). Verify your SMS feature extraction functions"))
write.csv(features, snakemake@output[[1]], row.names = FALSE) write.csv(features, snakemake@output[[1]], row.names = FALSE)