diff --git a/config.yaml b/config.yaml index 543f8adc..bf0b9d10 100644 --- a/config.yaml +++ b/config.yaml @@ -34,10 +34,10 @@ DOWNLOAD_DATASET: READABLE_DATETIME: FIXED_TIMEZONE: *timezone -# Communication SMS features config, TYPES and METRICS keys need to match +# Communication SMS features config, TYPES and FEATURES keys need to match SMS: TYPES : [received, sent] - METRICS: + FEATURES: received: [count, distinctcontacts, timefirstsms, timelastsms, countmostfrequentcontact] sent: [count, distinctcontacts, timefirstsms, timelastsms, countmostfrequentcontact] DAY_SEGMENTS: *day_segments diff --git a/docs/features/extracted.rst b/docs/features/extracted.rst index e4bb8d65..760c264c 100644 --- a/docs/features/extracted.rst +++ b/docs/features/extracted.rst @@ -1,6 +1,6 @@ -.. _rapids_metrics: +.. _rapids_features: -RAPIDS Metrics +RAPIDS Features =============== Global Parameters @@ -24,7 +24,7 @@ Global Parameters .. _day-segments: -- ``DAY_SEGMENTS`` - The list of day epochs that metrics can be segmented into: ``daily``, ``morning`` (6am-12pm), ``afternnon`` (12pm-6pm), ``evening`` (6pm-12am) and ``night`` (12am-6am). This list can be modified globally or on a per sensor basis. See DAY_SEGMENTS_ in ``config`` file. +- ``DAY_SEGMENTS`` - The list of day epochs that features can be segmented into: ``daily``, ``morning`` (6am-12pm), ``afternnon`` (12pm-6pm), ``evening`` (6pm-12am) and ``night`` (12am-6am). This list can be modified globally or on a per sensor basis. See DAY_SEGMENTS_ in ``config`` file. .. _timezone: @@ -53,11 +53,11 @@ Global Parameters Contains three attributes: ``BIN_SIZE``, ``MIN_VALID_HOURS``, ``MIN_BINS_PER_HOUR``. - On any given day, Aware could have sensed data only for a few minutes or for 24 hours. Daily estimates of metrics should be considered more reliable the more hours Aware was running and logging data (for example, 10 calls logged on a day when only one hour of data was recorded is a less reliable measurement compared to 10 calls on a day when 23 hours of data were recorded. + On any given day, Aware could have sensed data only for a few minutes or for 24 hours. Daily estimates of features should be considered more reliable the more hours Aware was running and logging data (for example, 10 calls logged on a day when only one hour of data was recorded is a less reliable measurement compared to 10 calls on a day when 23 hours of data were recorded. Therefore, we define a valid hour as those that contain at least a certain number of valid bins. In turn, a valid bin are those that contain at least one row of data from any sensor logged within that period. We divide an hour into N bins of size ``BIN_SIZE`` (in minutes) and we mark an hour as valid if contains at least ``MIN_BINS_PER_HOUR`` of valid bins (out of the total possible number of bins that can be captured in an hour i.e. out of 60min/``BIN_SIZE`` bins). Days with valid sensed hours less than ``MIN_VALID_HOURS`` will be excluded form the output of this file. See PHONE_VALID_SENSED_DAYS_ in ``config.yaml``. - In RAPIDS, you will find that we use ``phone_sensed_bins`` (a list of all valid and invalid bins of all monitored days) to improve the estimation of metrics that are ratios over time periods like ``episodepersensedminutes`` of :ref:`Screen`. + In RAPIDS, you will find that we use ``phone_sensed_bins`` (a list of all valid and invalid bins of all monitored days) to improve the estimation of features that are ratios over time periods like ``episodepersensedminutes`` of :ref:`Screen`. .. _individual-sensor-settings: @@ -88,7 +88,7 @@ See `SMS Config Code`_ .. - Apply readable datetime to SMS dataset: ``expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["SENSORS"]),`` -- Extract SMS metrics: +- Extract SMS features: | ``expand("data/processed/{pid}/sms_{sms_type}_{day_segment}.csv".`` | ``pid=config["PIDS"],`` @@ -105,9 +105,9 @@ See `SMS Config Code`_ - **Script:** ``src/data/readable_datetime.R`` - See the readable_datetime.R_ script. -- **Rule:** ``rules/features.snakefile/sms_metrics`` - See the sms_metric_ rule. +- **Rule:** ``rules/features.snakefile/sms_features`` - See the sms_feature_ rule. - - **Script:** ``src/features/sms_metrics.R`` - See the sms_metrics.R_ script. + - **Script:** ``src/features/sms_features.R`` - See the sms_features.R_ script. .. _sms-parameters: @@ -119,14 +119,14 @@ Name Description ============ =================== sms_type The particular ``sms_type`` that will be analyzed. The options for this parameter are ``received`` or ``sent``. day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night`` -metrics The different measures that can be retrieved from the dataset. These metrics are available for both ``sent`` and ``received`` SMS messages. See :ref:`Available SMS Metrics ` Table below +features The different measures that can be retrieved from the dataset. These features are available for both ``sent`` and ``received`` SMS messages. See :ref:`Available SMS Features ` Table below ============ =================== -.. _sms-available-metrics: +.. _sms-available-featues: -**Available SMS Metrics** +**Available SMS Featues** -The following table shows a list of the available metrics for both ``sent`` and ``received`` SMS. +The following table shows a list of the available featues for both ``sent`` and ``received`` SMS. ========================= ========= ============= Name Units Description @@ -140,14 +140,14 @@ countmostfrequentcontact SMS The count of the number of sms message **Assumptions/Observations:** - #. ``TYPES`` and ``METRICS`` keys need to match. From example:: + #. ``TYPES`` and ``FEATURES`` keys need to match. From example:: SMS: TYPES: [sent] - METRICS: + FEATURES: sent: [count, distinctcontacts, timefirstsms, timelastsms, countmostfrequentcontact] -In the above config setting code the ``TYPE`` ``sent`` matches the ``METRICS`` key ``sent``. +In the above config setting code the ``TYPE`` ``sent`` matches the ``FEATURES`` key ``sent``. .. _call-sensor-doc: @@ -731,7 +731,7 @@ See `Light Config Code`_ Name Description ============ =================== day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night`` -features The different measures that can be retrieved from the Light dataset. See :ref:`Available Light Metrics ` Table below +features The different measures that can be retrieved from the Light dataset. See :ref:`Available Light Features ` Table below ============ =================== .. _light-available-features: @@ -1102,15 +1102,15 @@ See `Fitbit: Steps Config Code`_ Name Description ======================= =================== day_segment The particular ``day_segments`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night`` -metrics The different measures that can be retrieved from the dataset. See :ref:`Available Fitbit: Steps Metrics ` Table below +features The different measures that can be retrieved from the dataset. See :ref:`Available Fitbit: Steps Metrics ` Table below threshold_active_bout The maximum number of steps per minute necessary for a bout to be ``sedentary``. That is, if the step count per minute is greater than this value the bout has a status of ``active``. ======================= =================== -.. _fitbit-steps-available-metrics: +.. _fitbit-steps-available-features: -**Available Fitbit: Steps Metrics** +**Available Fitbit: Steps Features** -The following table shows a list of the available metrics for the Fitbit: Steps dataset. +The following table shows a list of the available features for the Fitbit: Steps dataset. ========================= ========= ============= Name Units Description @@ -1141,8 +1141,8 @@ stddurationactivebout minutes Std duration active bout: The standard .. _`SMS Config Code`: https://github.com/carissalow/rapids/blob/f22d1834ee24ab3bcbf051bc3cc663903d822084/config.yaml#L38 .. _AWARE: https://awareframework.com/what-is-aware/ .. _`List of Timezones`: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones -.. _sms_metric: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L1 -.. _sms_metrics.R: https://github.com/carissalow/rapids/blob/master/src/features/sms_metrics.R +.. _sms_featue: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L1 +.. _sms_featues.R: https://github.com/carissalow/rapids/blob/master/src/features/sms_featues.R .. _download_dataset: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L9 .. _download_dataset.R: https://github.com/carissalow/rapids/blob/master/src/data/download_dataset.R .. _readable_datetime: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L21 @@ -1156,8 +1156,8 @@ stddurationactivebout minutes Std duration active bout: The standard .. _bluetooth_feature: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L63 .. _bluetooth_features.R: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/src/features/bluetooth_features.R .. _`Accelerometer Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L98 -.. _accelerometer_metrics: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L124 -.. _accelerometer_metrics.py: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/src/features/accelerometer_metrics.py +.. _accelerometer_featues: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L124 +.. _accelerometer_featues.py: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/src/features/accelerometer_featues.py .. _`Applications Foreground Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L102 .. _`Application Genres Config`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L54 .. _application_genres: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L81 @@ -1172,7 +1172,7 @@ stddurationactivebout minutes Std duration active bout: The standard .. _`Google Activity Recognition Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L80 .. _google_activity_recognition_deltas: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L41 .. _google_activity_recognition_deltas.R: https://github.com/carissalow/rapids/blob/master/src/features/google_activity_recognition_deltas.R -.. _activity_metrics: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L74 +.. _activity_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L74 .. _google_activity_recognition.py: https://github.com/carissalow/rapids/blob/master/src/features/google_activity_recognition.py .. _`Light Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L94 .. _light_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L113 diff --git a/rules/features.snakefile b/rules/features.snakefile index d04bd819..2e1c72c1 100644 --- a/rules/features.snakefile +++ b/rules/features.snakefile @@ -1,14 +1,14 @@ -rule sms_metrics: +rule sms_features: input: "data/raw/{pid}/messages_with_datetime.csv" params: sms_type = "{sms_type}", day_segment = "{day_segment}", - metrics = lambda wildcards: config["SMS"]["METRICS"][wildcards.sms_type] + features = lambda wildcards: config["SMS"]["FEATURES"][wildcards.sms_type] output: "data/processed/{pid}/sms_{sms_type}_{day_segment}.csv" script: - "../src/features/sms_metrics.R" + "../src/features/sms_features.R" rule call_features: input: diff --git a/src/features/sms_metrics.R b/src/features/sms_features.R similarity index 75% rename from src/features/sms_metrics.R rename to src/features/sms_features.R index 60473fd5..2e970a8e 100644 --- a/src/features/sms_metrics.R +++ b/src/features/sms_features.R @@ -7,14 +7,14 @@ library(dplyr, warn.conflicts = FALSE) sms <- read.csv(snakemake@input[[1]]) day_segment <- snakemake@params[["day_segment"]] -metrics <- snakemake@params[["metrics"]] +requested_features <- snakemake@params[["features"]] sms_type <- snakemake@params[["sms_type"]] features <- data.frame(local_date = character(), stringsAsFactors = FALSE) # Compute base SMS features -features <- merge(features, base_sms_features(sms, sms_type, day_segment, metrics), by="local_date", all = TRUE) +features <- merge(features, base_sms_features(sms, sms_type, day_segment, requested_features), by="local_date", all = TRUE) -if(ncol(features) != length(metrics) + 1) - stop(paste0("The number of features in the output dataframe (=", ncol(features),") does not match the expected value (=", length(metrics)," + 1). Verify your SMS feature extraction functions")) +if(ncol(features) != length(requested_features) + 1) + stop(paste0("The number of features in the output dataframe (=", ncol(features),") does not match the expected value (=", length(requested_features)," + 1). Verify your SMS feature extraction functions")) write.csv(features, snakemake@output[[1]], row.names = FALSE)