Refactor sms feature: replace "metrics" with "features"

Co-authored-by: Meng Li <AnnieLM1996@gmail.com>
pull/95/head
Mingze Cao 2020-04-08 15:02:58 -05:00
parent 97c83caebc
commit c2fbc3ad7a
4 changed files with 34 additions and 34 deletions

View File

@ -34,10 +34,10 @@ DOWNLOAD_DATASET:
READABLE_DATETIME:
FIXED_TIMEZONE: *timezone
# Communication SMS features config, TYPES and METRICS keys need to match
# Communication SMS features config, TYPES and FEATURES keys need to match
SMS:
TYPES : [received, sent]
METRICS:
FEATURES:
received: [count, distinctcontacts, timefirstsms, timelastsms, countmostfrequentcontact]
sent: [count, distinctcontacts, timefirstsms, timelastsms, countmostfrequentcontact]
DAY_SEGMENTS: *day_segments

View File

@ -1,6 +1,6 @@
.. _rapids_metrics:
.. _rapids_features:
RAPIDS Metrics
RAPIDS Features
===============
Global Parameters
@ -24,7 +24,7 @@ Global Parameters
.. _day-segments:
- ``DAY_SEGMENTS`` - The list of day epochs that metrics can be segmented into: ``daily``, ``morning`` (6am-12pm), ``afternnon`` (12pm-6pm), ``evening`` (6pm-12am) and ``night`` (12am-6am). This list can be modified globally or on a per sensor basis. See DAY_SEGMENTS_ in ``config`` file.
- ``DAY_SEGMENTS`` - The list of day epochs that features can be segmented into: ``daily``, ``morning`` (6am-12pm), ``afternnon`` (12pm-6pm), ``evening`` (6pm-12am) and ``night`` (12am-6am). This list can be modified globally or on a per sensor basis. See DAY_SEGMENTS_ in ``config`` file.
.. _timezone:
@ -53,11 +53,11 @@ Global Parameters
Contains three attributes: ``BIN_SIZE``, ``MIN_VALID_HOURS``, ``MIN_BINS_PER_HOUR``.
On any given day, Aware could have sensed data only for a few minutes or for 24 hours. Daily estimates of metrics should be considered more reliable the more hours Aware was running and logging data (for example, 10 calls logged on a day when only one hour of data was recorded is a less reliable measurement compared to 10 calls on a day when 23 hours of data were recorded.
On any given day, Aware could have sensed data only for a few minutes or for 24 hours. Daily estimates of features should be considered more reliable the more hours Aware was running and logging data (for example, 10 calls logged on a day when only one hour of data was recorded is a less reliable measurement compared to 10 calls on a day when 23 hours of data were recorded.
Therefore, we define a valid hour as those that contain at least a certain number of valid bins. In turn, a valid bin are those that contain at least one row of data from any sensor logged within that period. We divide an hour into N bins of size ``BIN_SIZE`` (in minutes) and we mark an hour as valid if contains at least ``MIN_BINS_PER_HOUR`` of valid bins (out of the total possible number of bins that can be captured in an hour i.e. out of 60min/``BIN_SIZE`` bins). Days with valid sensed hours less than ``MIN_VALID_HOURS`` will be excluded form the output of this file. See PHONE_VALID_SENSED_DAYS_ in ``config.yaml``.
In RAPIDS, you will find that we use ``phone_sensed_bins`` (a list of all valid and invalid bins of all monitored days) to improve the estimation of metrics that are ratios over time periods like ``episodepersensedminutes`` of :ref:`Screen<screen-sensor-doc>`.
In RAPIDS, you will find that we use ``phone_sensed_bins`` (a list of all valid and invalid bins of all monitored days) to improve the estimation of features that are ratios over time periods like ``episodepersensedminutes`` of :ref:`Screen<screen-sensor-doc>`.
.. _individual-sensor-settings:
@ -88,7 +88,7 @@ See `SMS Config Code`_
.. - Apply readable datetime to SMS dataset: ``expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["SENSORS"]),``
- Extract SMS metrics:
- Extract SMS features:
| ``expand("data/processed/{pid}/sms_{sms_type}_{day_segment}.csv".``
| ``pid=config["PIDS"],``
@ -105,9 +105,9 @@ See `SMS Config Code`_
- **Script:** ``src/data/readable_datetime.R`` - See the readable_datetime.R_ script.
- **Rule:** ``rules/features.snakefile/sms_metrics`` - See the sms_metric_ rule.
- **Rule:** ``rules/features.snakefile/sms_features`` - See the sms_feature_ rule.
- **Script:** ``src/features/sms_metrics.R`` - See the sms_metrics.R_ script.
- **Script:** ``src/features/sms_features.R`` - See the sms_features.R_ script.
.. _sms-parameters:
@ -119,14 +119,14 @@ Name Description
============ ===================
sms_type The particular ``sms_type`` that will be analyzed. The options for this parameter are ``received`` or ``sent``.
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
metrics The different measures that can be retrieved from the dataset. These metrics are available for both ``sent`` and ``received`` SMS messages. See :ref:`Available SMS Metrics <sms-available-metrics>` Table below
features The different measures that can be retrieved from the dataset. These features are available for both ``sent`` and ``received`` SMS messages. See :ref:`Available SMS Features <sms-available-features>` Table below
============ ===================
.. _sms-available-metrics:
.. _sms-available-featues:
**Available SMS Metrics**
**Available SMS Featues**
The following table shows a list of the available metrics for both ``sent`` and ``received`` SMS.
The following table shows a list of the available featues for both ``sent`` and ``received`` SMS.
========================= ========= =============
Name Units Description
@ -140,14 +140,14 @@ countmostfrequentcontact SMS The count of the number of sms message
**Assumptions/Observations:**
#. ``TYPES`` and ``METRICS`` keys need to match. From example::
#. ``TYPES`` and ``FEATURES`` keys need to match. From example::
SMS:
TYPES: [sent]
METRICS:
FEATURES:
sent: [count, distinctcontacts, timefirstsms, timelastsms, countmostfrequentcontact]
In the above config setting code the ``TYPE`` ``sent`` matches the ``METRICS`` key ``sent``.
In the above config setting code the ``TYPE`` ``sent`` matches the ``FEATURES`` key ``sent``.
.. _call-sensor-doc:
@ -731,7 +731,7 @@ See `Light Config Code`_
Name Description
============ ===================
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
features The different measures that can be retrieved from the Light dataset. See :ref:`Available Light Metrics <light-available-features>` Table below
features The different measures that can be retrieved from the Light dataset. See :ref:`Available Light Features <light-available-features>` Table below
============ ===================
.. _light-available-features:
@ -1102,15 +1102,15 @@ See `Fitbit: Steps Config Code`_
Name Description
======================= ===================
day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
metrics The different measures that can be retrieved from the dataset. See :ref:`Available Fitbit: Steps Metrics <fitbit-steps-available-metrics>` Table below
features The different measures that can be retrieved from the dataset. See :ref:`Available Fitbit: Steps Metrics <fitbit-steps-available-metrics>` Table below
threshold_active_bout The maximum number of steps per minute necessary for a bout to be ``sedentary``. That is, if the step count per minute is greater than this value the bout has a status of ``active``.
======================= ===================
.. _fitbit-steps-available-metrics:
.. _fitbit-steps-available-features:
**Available Fitbit: Steps Metrics**
**Available Fitbit: Steps Features**
The following table shows a list of the available metrics for the Fitbit: Steps dataset.
The following table shows a list of the available features for the Fitbit: Steps dataset.
========================= ========= =============
Name Units Description
@ -1141,8 +1141,8 @@ stddurationactivebout minutes Std duration active bout: The standard
.. _`SMS Config Code`: https://github.com/carissalow/rapids/blob/f22d1834ee24ab3bcbf051bc3cc663903d822084/config.yaml#L38
.. _AWARE: https://awareframework.com/what-is-aware/
.. _`List of Timezones`: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
.. _sms_metric: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L1
.. _sms_metrics.R: https://github.com/carissalow/rapids/blob/master/src/features/sms_metrics.R
.. _sms_featue: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L1
.. _sms_featues.R: https://github.com/carissalow/rapids/blob/master/src/features/sms_featues.R
.. _download_dataset: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L9
.. _download_dataset.R: https://github.com/carissalow/rapids/blob/master/src/data/download_dataset.R
.. _readable_datetime: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L21
@ -1156,8 +1156,8 @@ stddurationactivebout minutes Std duration active bout: The standard
.. _bluetooth_feature: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L63
.. _bluetooth_features.R: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/src/features/bluetooth_features.R
.. _`Accelerometer Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L98
.. _accelerometer_metrics: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L124
.. _accelerometer_metrics.py: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/src/features/accelerometer_metrics.py
.. _accelerometer_featues: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L124
.. _accelerometer_featues.py: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/src/features/accelerometer_featues.py
.. _`Applications Foreground Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L102
.. _`Application Genres Config`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L54
.. _application_genres: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L81
@ -1172,7 +1172,7 @@ stddurationactivebout minutes Std duration active bout: The standard
.. _`Google Activity Recognition Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L80
.. _google_activity_recognition_deltas: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L41
.. _google_activity_recognition_deltas.R: https://github.com/carissalow/rapids/blob/master/src/features/google_activity_recognition_deltas.R
.. _activity_metrics: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L74
.. _activity_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L74
.. _google_activity_recognition.py: https://github.com/carissalow/rapids/blob/master/src/features/google_activity_recognition.py
.. _`Light Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L94
.. _light_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L113

View File

@ -1,14 +1,14 @@
rule sms_metrics:
rule sms_features:
input:
"data/raw/{pid}/messages_with_datetime.csv"
params:
sms_type = "{sms_type}",
day_segment = "{day_segment}",
metrics = lambda wildcards: config["SMS"]["METRICS"][wildcards.sms_type]
features = lambda wildcards: config["SMS"]["FEATURES"][wildcards.sms_type]
output:
"data/processed/{pid}/sms_{sms_type}_{day_segment}.csv"
script:
"../src/features/sms_metrics.R"
"../src/features/sms_features.R"
rule call_features:
input:

View File

@ -7,14 +7,14 @@ library(dplyr, warn.conflicts = FALSE)
sms <- read.csv(snakemake@input[[1]])
day_segment <- snakemake@params[["day_segment"]]
metrics <- snakemake@params[["metrics"]]
requested_features <- snakemake@params[["features"]]
sms_type <- snakemake@params[["sms_type"]]
features <- data.frame(local_date = character(), stringsAsFactors = FALSE)
# Compute base SMS features
features <- merge(features, base_sms_features(sms, sms_type, day_segment, metrics), by="local_date", all = TRUE)
features <- merge(features, base_sms_features(sms, sms_type, day_segment, requested_features), by="local_date", all = TRUE)
if(ncol(features) != length(metrics) + 1)
stop(paste0("The number of features in the output dataframe (=", ncol(features),") does not match the expected value (=", length(metrics)," + 1). Verify your SMS feature extraction functions"))
if(ncol(features) != length(requested_features) + 1)
stop(paste0("The number of features in the output dataframe (=", ncol(features),") does not match the expected value (=", length(requested_features)," + 1). Verify your SMS feature extraction functions"))
write.csv(features, snakemake@output[[1]], row.names = FALSE)