diff --git a/config.yaml b/config.yaml index c91a0aea..ea1df375 100644 --- a/config.yaml +++ b/config.yaml @@ -54,6 +54,10 @@ PHONE_DATA_STREAMS: aware_csv: FOLDER: data/external/aware_csv + + aware_influxdb: + DATABASE_GROUP: MY_GROUP + # Sensors ------ # https://www.rapids.science/latest/features/phone-accelerometer/ diff --git a/docs/change-log.md b/docs/change-log.md index fa662385..11052c13 100644 --- a/docs/change-log.md +++ b/docs/change-log.md @@ -4,7 +4,7 @@ - Add a new [Overview](../setup/overview/) page. - You can [extend](../datastreams/add-new-data-streams/) RAPIDS with your own [data streams](../datastreams/data-streams-introduction/). Data streams are data collected with other sensing apps besides AWARE (like Beiwe, mindLAMP), and stored in other data containers (databases, files) besides MySQL. - Support to analyze Empatica wearable data (thanks to Joe Kim and Brinnae Bent from the [DBDP](https://dbdp.org/)) -- Support to analyze AWARE data stored in [CSV files](../datastreams/aware-csv/) and InfluxDB databases (the latter thanks to Neil Singh) +- Support to analyze AWARE data stored in [CSV files](../datastreams/aware-csv/) and [InfluxDB](../datastreams/aware-influxdb/) databases (the latter thanks to Neil Singh) - Support to analyze data collected over [multiple time zones](../setup/configuration/#multiple-timezones) - Support for [sleep intraday features](../features/fitbit-sleep-intraday/) from the core team and also from the community (thanks to Stephen Price) - Add RAPIDS new logo diff --git a/docs/datastreams/aware-influxdb.md b/docs/datastreams/aware-influxdb.md new file mode 100644 index 00000000..7fa1b4ae --- /dev/null +++ b/docs/datastreams/aware-influxdb.md @@ -0,0 +1,18 @@ +# `aware_influxdb (beta)` + +!!! warning + This data stream is being released in beta while we test it thoroughly. + +This [data stream](../../datastreams/data-streams-introduction) handles iOS and Android sensor data collected with the [AWARE Framework](https://awareframework.com/) and stored in an InfluxDB database. + +## Container +An InfluxDB database with a table per sensor, each containing the data for all participants. + +The script to connect and download data from this container is at: +```bash +src/data/streams/aware_influxdb/container.R +``` + +## Format + +--8<---- "docs/snippets/aware_format.md" diff --git a/docs/datastreams/data-streams-introduction.md b/docs/datastreams/data-streams-introduction.md index 05adc96e..10fb84a2 100644 --- a/docs/datastreams/data-streams-introduction.md +++ b/docs/datastreams/data-streams-introduction.md @@ -6,7 +6,7 @@ For example, the `aware_mysql` data stream handles smartphone data (**device**) If you want to process a data stream using RAPIDS, make sure that your data is stored in a supported **format** and **container** (see table below). -If RAPIDS doesn't support your data stream yet (e.g. Beiwe data stored in PostgreSQL, or AWARE data stored in InfluxDB), you can always [implement a new data stream](../add-new-data-streams). If it's something you think other people might be interested on, we will be happy to include your new data stream in RAPIDS, so get in touch!. +If RAPIDS doesn't support your data stream yet (e.g. Beiwe data stored in PostgreSQL, or AWARE data stored in SQLite), you can always [implement a new data stream](../add-new-data-streams). If it's something you think other people might be interested on, we will be happy to include your new data stream in RAPIDS, so get in touch!. !!! hint Currently, you can add new data streams for smartphones, Fitbit, and Empatica devices. If you need RAPIDS to process data from **other devices**, like Oura Rings or Actigraph wearables, get in touch. It is a more complicated process that could take a couple of days to implement for someone familiar with R or Python, but we would be happy to work on it together. @@ -17,6 +17,7 @@ For reference, these are the data streams we currently support: |--|--|--|--|--| | `aware_mysql`| Phone | AWARE app | MySQL | [link](../aware-mysql) | `aware_csv`| Phone | AWARE app | CSV files | [link](../aware-csv) +| `aware_influxdb` (beta)| Phone | AWARE app | InfluxDB | [link](../aware-influxdb) | `fitbitjson_mysql`| Fitbit | JSON (per [Fitbit's API](https://dev.fitbit.com/build/reference/web-api/)) | MySQL | [link](../fitbitjson-mysql) | `fitbitjson_csv`| Fitbit | JSON (per [Fitbit's API](https://dev.fitbit.com/build/reference/web-api/)) | CSV files | [link](../fitbitjson-csv) | `fitbitparsed_mysql`| Fitbit | Parsed (parsed API data) | MySQL | [link](../fitbitparsed-mysql) diff --git a/mkdocs.yml b/mkdocs.yml index 55b29a8e..387c2be9 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -84,6 +84,7 @@ nav: - Phone: - aware_mysql: datastreams/aware-mysql.md - aware_csv: datastreams/aware-csv.md + - aware_influxdb (beta): datastreams/aware-influxdb.md - Mandatory Phone Format: datastreams/mandatory-phone-format.md - Fitbit: - fitbitjson_mysql: datastreams/fitbitjson-mysql.md diff --git a/src/data/streams/aware_influxdb/container.R b/src/data/streams/aware_influxdb/container.R new file mode 100644 index 00000000..e844be99 --- /dev/null +++ b/src/data/streams/aware_influxdb/container.R @@ -0,0 +1,104 @@ +# if you need a new package, you should add it with renv::install(package) so your renv venv is updated +library(influxdbr) +library(tidyverse) +library(yaml) + +#' @description +#' Auxiliary function to parse the connection credentials from a specifc group in ./credentials.yaml +#' You can reause most of this function if you are connection to a DB or Web API. +#' It's OK to delete this function if you don't need credentials, e.g., you are pulling data from a CSV for example. +#' @param group the yaml key containing the credentials to connect to a database +#' @preturn dbEngine a database engine (connection) ready to perform queries +get_db_engine <- function(group){ + # The working dir is aways RAPIDS root folder, so your credentials file is always /credentials.yaml + credentials <- read_yaml("./credentials.yaml") + if(!group %in% names(credentials)) + stop(paste("The credentials group",group, "does not exist in ./credentials.yaml. The only groups that exist in that file are:", paste(names(credentials), collapse = ","))) + + #replace with credentials values + conn_object <- influx_connection(host=credentials[[group]][["host"]], + user=credentials[[group]][["user"]], + pass=credentials[[group]][["password"]], + port= credentials[[group]][["port"]]) + + return(conn_object) +} + +# This file gets executed for each PHONE_SENSOR of each participant +# If you are connecting to a database the env file containing its credentials is available at "./.env" +# If you are reading a CSV file instead of a DB table, the @param sensor_container wil contain the file path as set in config.yaml +# You are not bound to databases or files, you can query a web API or whatever data source you need. + +#' @description +#' RAPIDS allows users to use the keyword "infer" (previously "multiple") to automatically infer the mobile Operative System a device was running. +#' If you have a way to infer the OS of a device ID, implement this function. For example, for AWARE data we use the "aware_device" table. +#' +#' If you don't have a way to infer the OS, call stop("Error Message") so other users know they can't use "infer" or the inference failed, +#' and they have to assign the OS manually in the participant file +#' +#' @param stream_parameters The PHONE_STREAM_PARAMETERS key in config.yaml. If you need specific parameters add them there. +#' @param device A device ID string +#' @return The OS the device ran, "android" or "ios" + +infer_device_os <- function(stream_parameters, device){ + dbEngine <- get_db_engine(stream_parameters$SOURCE$DATABASE_GROUP) + #need to re-fetch the YAML for the DB name + credentials <- read_yaml("./credentials.yaml") + message(paste0("Utilizing the Influx query for: ", device)) + #execute query string + query_object <- influx_select(dbEngine, + db = credentials[[stream_parameters$SOURCE$DATABASE_GROUP]][["database"]], + field_keys="device_id,brand", + measurement="aware_device", + where= paste0("device_id = '",device,"'"), + return_xts = FALSE) + + #fetches the table from the query_object, filtering rows with ALL n/a + #a behavior of influxdbr is that one all NA row will be returned with no matches + os <- query_object[[1]] %>% filter_all(any_vars(!is.na(.))) %>% select(c('device_id','brand','time')) + + + if(nrow(os) > 0) + return(os %>% mutate(os = ifelse(brand == "iPhone", "ios", "android")) %>% pull(os)) + else + stop(paste("We cannot infer the OS of the following device id because it does not exist in the aware_device table:", device)) + + return(os) +} + +#' @description +#' Gets the sensor data for a specific device id from a database table, file or whatever source you want to query +#' +#' @param stream_parameters The PHONE_STREAM_PARAMETERS key in config.yaml. If you need specific parameters add them there. +#' @param device A device ID string +#' @param sensor_container database table or file containing the sensor data for all participants. This is the PHONE_SENSOR[CONTAINER] key in config.yaml +#' @param columns the columns needed from this sensor (we recommend to only return these columns instead of every column in sensor_container) +#' @return A dataframe with the sensor data for device + +pull_data <- function(stream_parameters, device, sensor, sensor_container, columns){ + dbEngine <- get_db_engine(stream_parameters$SOURCE$DATABASE_GROUP) + #need to re-fetch the YAML for the DB name + credentials <- read_yaml("./credentials.yaml") + + + # Letting the user know what we are doing + message(paste0("Executing an Influx query for: ", device, " ", sensor, ". Extracting ", columns, " from ", sensor_container)) + #execute query string + query_object <- influx_select(dbEngine, + db = credentials[[stream_parameters$SOURCE$DATABASE_GROUP]][["database"]], + field_keys=paste(columns, collapse = ","), + measurement=sensor_container, + where= paste0(columns$DEVICE_ID, " = '",device,"'"), + return_xts=FALSE) + + + #fetches the table from the query_object, filtering rows with ALL n/a + #a behavior of influxdbr is that one all NA row will be returned with no matches + sensor_data <- query_object[[1]] %>% filter_all(any_vars(!is.na(.))) %>% select(c('time',columns)) + + if(nrow(sensor_data) == 0) + warning(paste("The device '", device,"' did not have data in ", sensor_container)) + + return(sensor_data) +} + diff --git a/src/data/streams/aware_influxdb/format.yaml b/src/data/streams/aware_influxdb/format.yaml new file mode 100644 index 00000000..ee0bd0c4 --- /dev/null +++ b/src/data/streams/aware_influxdb/format.yaml @@ -0,0 +1,315 @@ +PHONE_ACCELEROMETER: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + DOUBLE_VALUES_0: double_values_0 + DOUBLE_VALUES_1: double_values_1 + DOUBLE_VALUES_2: double_values_2 + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + IOS: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + DOUBLE_VALUES_0: double_values_0 + DOUBLE_VALUES_1: double_values_1 + DOUBLE_VALUES_2: double_values_2 + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + +PHONE_ACTIVITY_RECOGNITION: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + ACTIVITY_NAME: activity_name + ACTIVITY_TYPE: activity_type + CONFIDENCE: confidence + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + IOS: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + ACTIVITY_NAME: FLAG_TO_MUTATE + ACTIVITY_TYPE: FLAG_TO_MUTATE + CONFIDENCE: FLAG_TO_MUTATE + MUTATION: + COLUMN_MAPPINGS: + ACTIVITIES: activities + CONFIDENCE: confidence + SCRIPTS: # List any python or r scripts that mutate your raw data + - "src/data/streams/mutations/phone/aware/activity_recogniton_ios_unification.R" + +PHONE_APPLICATIONS_CRASHES: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + PACKAGE_NAME: package_name + APPLICATION_NAME: application_name + APPLICATION_VERSION: application_version + ERROR_SHORT: error_short + ERROR_LONG: error_long + ERROR_CONDITION: error_condition + IS_SYSTEM_APP: is_system_app + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + +PHONE_APPLICATIONS_FOREGROUND: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + PACKAGE_NAME: package_name + APPLICATION_NAME: application_name + IS_SYSTEM_APP: is_system_app + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + +PHONE_APPLICATIONS_NOTIFICATIONS: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + PACKAGE_NAME: package_name + APPLICATION_NAME: application_name + TEXT: text + SOUND: sound + VIBRATE: vibrate + DEFAULTS: defaults + FLAGS: flags + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + +PHONE_BATTERY: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + BATTERY_STATUS: battery_status + BATTERY_LEVEL: battery_level + BATTERY_SCALE: battery_scale + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + IOS: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + BATTERY_STATUS: FLAG_TO_MUTATE + BATTERY_LEVEL: battery_level + BATTERY_SCALE: battery_scale + MUTATION: + COLUMN_MAPPINGS: + BATTERY_STATUS: battery_status + SCRIPTS: + - "src/data/streams/mutations/phone/aware/battery_ios_unification.R" + +PHONE_BLUETOOTH: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + BT_ADDRESS: bt_address + BT_NAME: bt_name + BT_RSSI: bt_rssi + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + +PHONE_CALLS: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + CALL_TYPE: call_type + CALL_DURATION: call_duration + TRACE: trace + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + IOS: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + CALL_TYPE: FLAG_TO_MUTATE + CALL_DURATION: call_duration + TRACE: trace + MUTATION: + COLUMN_MAPPINGS: + CALL_TYPE: call_type + SCRIPTS: + - "src/data/streams/mutations/phone/aware/calls_ios_unification.R" + +PHONE_CONVERSATION: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + DOUBLE_ENERGY: double_energy + INFERENCE: inference + DOUBLE_CONVO_START: double_convo_start + DOUBLE_CONVO_END: double_convo_end + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + IOS: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + DOUBLE_ENERGY: double_energy + INFERENCE: inference + DOUBLE_CONVO_START: double_convo_start + DOUBLE_CONVO_END: double_convo_end + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + - "src/data/streams/mutations/phone/aware/conversation_ios_timestamp.R" + +PHONE_KEYBOARD: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + PACKAGE_NAME: package_name + BEFORE_TEXT: before_text + CURRENT_TEXT: current_text + IS_PASSWORD: is_password + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + +PHONE_LIGHT: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + DOUBLE_LIGHT_LUX: double_light_lux + ACCURACY: accuracy + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + +PHONE_LOCATIONS: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + DOUBLE_LATITUDE: double_latitude + DOUBLE_LONGITUDE: double_longitude + DOUBLE_BEARING: double_bearing + DOUBLE_SPEED: double_speed + DOUBLE_ALTITUDE: double_altitude + PROVIDER: provider + ACCURACY: accuracy + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + IOS: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + DOUBLE_LATITUDE: double_latitude + DOUBLE_LONGITUDE: double_longitude + DOUBLE_BEARING: double_bearing + DOUBLE_SPEED: double_speed + DOUBLE_ALTITUDE: double_altitude + PROVIDER: provider + ACCURACY: accuracy + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + +PHONE_LOG: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + LOG_MESSAGE: log_message + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + IOS: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + LOG_MESSAGE: log_message + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + +PHONE_MESSAGES: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + MESSAGE_TYPE: message_type + TRACE: trace + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + +PHONE_SCREEN: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + SCREEN_STATUS: screen_status + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + IOS: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + SCREEN_STATUS: FLAG_TO_MUTATE + MUTATION: + COLUMN_MAPPINGS: + SCREEN_STATUS: screen_status + SCRIPTS: # List any python or r scripts that mutate your raw data + - "src/data/streams/mutations/phone/aware/screen_ios_unification.R" + +PHONE_WIFI_CONNECTED: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + MAC_ADDRESS: mac_address + SSID: ssid + BSSID: bssid + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + IOS: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + MAC_ADDRESS: mac_address + SSID: ssid + BSSID: bssid + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data + +PHONE_WIFI_VISIBLE: + ANDROID: + RAPIDS_COLUMN_MAPPINGS: + TIMESTAMP: timestamp + DEVICE_ID: device_id + SSID: ssid + BSSID: bssid + SECURITY: security + FREQUENCY: frequency + RSSI: rssi + MUTATION: + COLUMN_MAPPINGS: + SCRIPTS: # List any python or r scripts that mutate your raw data +