Rename download_data add support for py containers
parent
4b33ee43ba
commit
41711fcdb7
14
config.yaml
14
config.yaml
|
@ -50,13 +50,13 @@ TIMEZONE:
|
||||||
########################################################################################################################
|
########################################################################################################################
|
||||||
|
|
||||||
# See https://www.rapids.science/latest/setup/configuration/#device-data-source-configuration
|
# See https://www.rapids.science/latest/setup/configuration/#device-data-source-configuration
|
||||||
PHONE_DATA_CONFIGURATION:
|
PHONE_DATA_STREAMS:
|
||||||
SOURCE:
|
USE: aware_mysql
|
||||||
TYPE: aware_mysql
|
|
||||||
DATABASE_GROUP: *database_group
|
# AVAILABLE:
|
||||||
TIMEZONE:
|
aware_mysql:
|
||||||
TYPE: SINGLE
|
DATABASE_GROUP: MY_GROUP
|
||||||
VALUE: *timezone
|
|
||||||
|
|
||||||
# Sensors ------
|
# Sensors ------
|
||||||
|
|
||||||
|
|
|
@ -14,38 +14,51 @@ The most common cases when you would want to implement a new data stream are:
|
||||||
|
|
||||||
## Formats and Containers in RAPIDS
|
## Formats and Containers in RAPIDS
|
||||||
|
|
||||||
**CONTAINER**. The container of a data stream is queried using a `container.R` script. This script implements functions that will pull data from a database, file, etc.
|
**CONTAINER**. The container of a data stream is queried using a `container.[R|py]` script. This script implements functions that will pull data from a database, file, etc.
|
||||||
|
|
||||||
**FORMAT**. The format of a data stream is described using a `format.yaml` file. A format file describes the mapping between your stream's raw data and the data that RAPIDS needs.
|
**FORMAT**. The format of a data stream is described using a `format.yaml` file. A format file describes the mapping between your stream's raw data and the data that RAPIDS needs.
|
||||||
|
|
||||||
Both the `container.R` and the `format.yaml` are saved under `src/data/streams/[stream_name]` where `[stream_name]` can be
|
Both the `container.[R|py]` and the `format.yaml` are saved under `src/data/streams/[stream_name]` where `[stream_name]` can be
|
||||||
`aware_mysql` for example.
|
`aware_mysql` for example.
|
||||||
|
|
||||||
## Implement a Container
|
## Implement a Container
|
||||||
|
|
||||||
The `container.R` script of a data stream should be implemented in R. This script must have two functions if you are implementing a stream for phone data, or one function otherwise.
|
The `container` script of a data stream should be implemented in R (strongly recommended) or python. This script must have two functions if you are implementing a stream for phone data or one function otherwise. The script can contain any other auxiliary functions that your data stream might need.
|
||||||
|
|
||||||
=== "download_data"
|
First of all, add any parameters your script might need in `config.yaml` under `(device)_DATA_STREAMS`. These parameters will be available in the `stream_parameters` argument of the one or two functions you implement. For example, if you are adding support for `Beiwe` data stored in `PostgreSQL` and your container needs a set of credentials to connect to a database, your new data stream configuration would be:
|
||||||
|
|
||||||
|
```yaml hl_lines="7 8"
|
||||||
|
PHONE_DATA_STREAMS:
|
||||||
|
USE: aware_python
|
||||||
|
|
||||||
|
# AVAILABLE:
|
||||||
|
aware_mysql:
|
||||||
|
DATABASE_GROUP: MY_GROUP
|
||||||
|
beiwe_postgresql:
|
||||||
|
DATABASE_GROUP: MY_GROUP # users define this group (user, password, host, etc.) in credentials.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
Then implement one or both of the following functions:
|
||||||
|
|
||||||
|
=== "pull_data"
|
||||||
|
|
||||||
This function returns the data columns for a specific sensor and participant. It has the following parameters:
|
This function returns the data columns for a specific sensor and participant. It has the following parameters:
|
||||||
|
|
||||||
| Param | Description |
|
| Param | Description |
|
||||||
|--------------------|-------------------------------------------------------------------------------------------------------|
|
|--------------------|-------------------------------------------------------------------------------------------------------|
|
||||||
| data_configuration | Any parameters (keys/values) set by the user in any `[DEVICE_DATA_STREAMS][stream_name]` key of `config.yaml`. For example, `[DATABASE_GROUP]` inside `[FITBIT_DATA_STREAMS][fitbitjson_mysql]` |
|
| stream_parameters | Any parameters (keys/values) set by the user in any `[DEVICE_DATA_STREAMS][stream_name]` key of `config.yaml`. For example, `[DATABASE_GROUP]` inside `[FITBIT_DATA_STREAMS][fitbitjson_mysql]` |
|
||||||
| sensor_container | The value set by the user in any `[DEVICE_SENSOR][CONTAINER]` key of `config.yaml`. It can be a table, file path, or whatever data source you want to support that contains the **data from a single sensor for all participants**. For example, `[PHONE_ACCELEROMETER][CONTAINER]`|
|
| sensor_container | The value set by the user in any `[DEVICE_SENSOR][CONTAINER]` key of `config.yaml`. It can be a table, file path, or whatever data source you want to support that contains the **data from a single sensor for all participants**. For example, `[PHONE_ACCELEROMETER][CONTAINER]`|
|
||||||
| device | The device id that you need to get the data for (this is set by the user in the [participant files](../../setup/configuration/#participant-files)). For example, in AWARE this device is a uuid|
|
| device | The device id that you need to get the data for (this is set by the user in the [participant files](../../setup/configuration/#participant-files)). For example, in AWARE this device is a uuid|
|
||||||
| columns | A list of the columns that you need to get from `sensor_container`. You specify these columns in your stream's `format.yaml`|
|
| columns | A list of the columns that you need to get from `sensor_container`. You specify these columns in your stream's `format.yaml`|
|
||||||
|
|
||||||
|
|
||||||
!!! example
|
!!! example
|
||||||
This is the `download_data` function we implemented for `aware_mysql`. Note that we can `message`, `warn` or `stop` the user during execution.
|
This is the `pull_data` function we implemented for `aware_mysql`. Note that we can `message`, `warn` or `stop` the user during execution.
|
||||||
|
|
||||||
```r
|
```r
|
||||||
download_data <- function(data_configuration, device, sensor_container, columns){
|
pull_data <- function(stream_parameters, device, sensor_container, columns){
|
||||||
group <- data_configuration$SOURCE$DATABASE_GROUP
|
# get_db_engine is an auxiliary function not shown here for brevity bu can be found in src/data/streams/aware_mysql/container.R
|
||||||
dbEngine <- dbConnect(MariaDB(), default.file = "./.env", group = group)
|
dbEngine <- get_db_engine(stream_parameters$DATABASE_GROUP)
|
||||||
|
|
||||||
|
|
||||||
query <- paste0("SELECT ", paste(columns, collapse = ",")," FROM ", sensor_container, " WHERE device_id = '", device,"'")
|
query <- paste0("SELECT ", paste(columns, collapse = ",")," FROM ", sensor_container, " WHERE device_id = '", device,"'")
|
||||||
# Letting the user know what we are doing
|
# Letting the user know what we are doing
|
||||||
message(paste0("Executing the following query to download data: ", query))
|
message(paste0("Executing the following query to download data: ", query))
|
||||||
|
@ -65,17 +78,17 @@ The `container.R` script of a data stream should be implemented in R. This scrip
|
||||||
!!! warning
|
!!! warning
|
||||||
This function is only necessary for phone data streams.
|
This function is only necessary for phone data streams.
|
||||||
|
|
||||||
RAPIDS allows users to use the keyword `infer` (previously `multiple`) to [automatically infer](../../setup/configuration/#structure-of-participants-files) the mobile Operative System a device (phone) was running.
|
RAPIDS allows users to use the keyword `infer` (previously `multiple`) to [automatically infer](../../setup/configuration/#structure-of-participants-files) the mobile Operative System a phone was running.
|
||||||
|
|
||||||
If you have a way to infer the OS of a device id, implement this function. For example, for AWARE data we use the `aware_device` table.
|
If you have a way to infer the OS of a device id, implement this function. For example, for AWARE data we use the `aware_device` table.
|
||||||
|
|
||||||
If you don't have a way to infer the OS, call `stop("Error Message")` so other users know they can't use `infer` or the inference failed, and they have to assign the OS manually in the participant file.
|
If you don't have a way to infer the OS, call `stop("Error Message")` so other users know they can't use `infer` or the inference failed, and they have to assign the OS manually in the participant file.
|
||||||
|
|
||||||
This function returns the operative system (`android` or `ios`) for a specific device. It has the following parameters:
|
This function returns the operative system (`android` or `ios`) for a specific phone device id. It has the following parameters:
|
||||||
|
|
||||||
| Param | Description |
|
| Param | Description |
|
||||||
|--------------------|-------------------------------------------------------------------------------------------------------|
|
|--------------------|-------------------------------------------------------------------------------------------------------|
|
||||||
| data_configuration | Any parameters (keys/values) set by the user in any `[DEVICE_DATA_STREAMS][stream_name]` key of `config.yaml`. For example, `[DATABASE_GROUP]` inside `[FITBIT_DATA_STREAMS][fitbitjson_mysql]` |
|
| stream_parameters | Any parameters (keys/values) set by the user in any `[DEVICE_DATA_STREAMS][stream_name]` key of `config.yaml`. For example, `[DATABASE_GROUP]` inside `[FITBIT_DATA_STREAMS][fitbitjson_mysql]` |
|
||||||
| device | The device id that you need to infer the OS for (this is set by the user in the [participant files](../../setup/configuration/#participant-files)). For example, in AWARE this device is a uuid|
|
| device | The device id that you need to infer the OS for (this is set by the user in the [participant files](../../setup/configuration/#participant-files)). For example, in AWARE this device is a uuid|
|
||||||
|
|
||||||
|
|
||||||
|
@ -83,8 +96,9 @@ The `container.R` script of a data stream should be implemented in R. This scrip
|
||||||
This is the `infer_device_os` function we implemented for `aware_mysql`. Note that we can `message`, `warn` or `stop` the user during execution.
|
This is the `infer_device_os` function we implemented for `aware_mysql`. Note that we can `message`, `warn` or `stop` the user during execution.
|
||||||
|
|
||||||
```r
|
```r
|
||||||
infer_device_os <- function(data_configuration, device){
|
infer_device_os <- function(stream_parameters, device){
|
||||||
group <- data_configuration$SOURCE$DATABASE_GROUP # specified DB credentials group in config.yaml
|
# get_db_engine is an auxiliary function not shown here for brevity bu can be found in src/data/streams/aware_mysql/container.R
|
||||||
|
group <- stream_parameters$DATABASE_GROUP
|
||||||
|
|
||||||
dbEngine <- dbConnect(MariaDB(), default.file = "./.env", group = group)
|
dbEngine <- dbConnect(MariaDB(), default.file = "./.env", group = group)
|
||||||
query <- paste0("SELECT device_id,brand FROM aware_device WHERE device_id = '", device, "'")
|
query <- paste0("SELECT device_id,brand FROM aware_device WHERE device_id = '", device, "'")
|
||||||
|
|
|
@ -41,7 +41,7 @@
|
||||||
## Every time I run force the download_dataset rule all rules are executed
|
## Every time I run force the download_dataset rule all rules are executed
|
||||||
|
|
||||||
???+ failure "Problem"
|
???+ failure "Problem"
|
||||||
When running `snakemake -j1 -R download_phone_data` or `./rapids -j1 -R download_phone_data` all the rules and files are re-computed
|
When running `snakemake -j1 -R pull_phone_data` or `./rapids -j1 -R pull_phone_data` all the rules and files are re-computed
|
||||||
|
|
||||||
???+ done "Solution"
|
???+ done "Solution"
|
||||||
This is expected behavior. The advantage of using `snakemake` under the hood is that every time a file containing data is modified every rule that depends on that file will be re-executed to update their results. In this case, since `download_dataset` updates all the raw data, and you are forcing the rule with the flag `-R` every single rule that depends on those raw files will be executed.
|
This is expected behavior. The advantage of using `snakemake` under the hood is that every time a file containing data is modified every rule that depends on that file will be re-executed to update their results. In this case, since `download_dataset` updates all the raw data, and you are forcing the rule with the flag `-R` every single rule that depends on those raw files will be executed.
|
||||||
|
@ -215,7 +215,7 @@
|
||||||
```bash
|
```bash
|
||||||
R -e 'renv::install("RMySQL")'
|
R -e 'renv::install("RMySQL")'
|
||||||
```
|
```
|
||||||
- Go to `src/data/download_phone_data.R` or `src/data/download_fitbit_data.R` and replace `library(RMariaDB)` with `library(RMySQL)`
|
- Go to `src/data/streams/pull_phone_data.R` or `src/data/streams/pull_fitbit_data.R` and replace `library(RMariaDB)` with `library(RMySQL)`
|
||||||
- In the same file(s) replace `dbEngine <- dbConnect(MariaDB(), default.file = "./.env", group = group)` with `dbEngine <- dbConnect(MySQL(), default.file = "./.env", group = group)`
|
- In the same file(s) replace `dbEngine <- dbConnect(MariaDB(), default.file = "./.env", group = group)` with `dbEngine <- dbConnect(MySQL(), default.file = "./.env", group = group)`
|
||||||
## There is no package called `RMariaDB`
|
## There is no package called `RMariaDB`
|
||||||
|
|
||||||
|
|
|
@ -30,26 +30,33 @@ def get_phone_sensor_names():
|
||||||
phone_sensor_names.append(config_key)
|
phone_sensor_names.append(config_key)
|
||||||
return phone_sensor_names
|
return phone_sensor_names
|
||||||
|
|
||||||
def download_phone_data_input_with_mutation_scripts(wilcards):
|
def pull_phone_data_input_with_mutation_scripts(wilcards):
|
||||||
import yaml
|
import yaml
|
||||||
input = dict()
|
input = dict()
|
||||||
phone_source_type = config["PHONE_DATA_CONFIGURATION"]["SOURCE"]["TYPE"]
|
phone_stream = config["PHONE_DATA_STREAMS"]["USE"]
|
||||||
|
|
||||||
input["participant_file"] = "data/external/participant_files/{pid}.yaml"
|
input["participant_file"] = "data/external/participant_files/{pid}.yaml"
|
||||||
input["rapids_schema_file"] = "src/data/streams/rapids_columns.yaml"
|
input["rapids_schema_file"] = "src/data/streams/rapids_columns.yaml"
|
||||||
input["source_schema_file"] = "src/data/streams/" + phone_source_type + "/format.yaml"
|
input["stream_format"] = "src/data/streams/" + phone_stream + "/format.yaml"
|
||||||
input["source_download_file"] = "src/data/streams/"+ phone_source_type + "/container.R"
|
|
||||||
|
|
||||||
schema = yaml.load(open(input.get("source_schema_file"), 'r'), Loader=yaml.FullLoader)
|
if Path("src/data/streams/"+ phone_stream + "/container.R").exists():
|
||||||
|
input["stream_container"] = "src/data/streams/"+ phone_stream + "/container.R"
|
||||||
|
elif Path("src/data/streams/"+ phone_stream + "/container.py").exists():
|
||||||
|
input["stream_container"] = "src/data/streams/"+ phone_stream + "/container.py"
|
||||||
|
else:
|
||||||
|
raise ValueError("The container script for {stream} is missing: src/data/streams/{stream}/container.[py|R]".format(stream=empatica_stream))
|
||||||
|
|
||||||
|
schema = yaml.load(open(input.get("stream_format"), 'r'), Loader=yaml.FullLoader)
|
||||||
sensor = ("phone_" + wilcards.sensor).upper()
|
sensor = ("phone_" + wilcards.sensor).upper()
|
||||||
if sensor not in schema:
|
if sensor not in schema:
|
||||||
raise ValueError("{sensor} is not defined in the schema {schema}".format(sensor=sensor, schema=input.get("source_schema_file")))
|
raise ValueError("{sensor} is not defined in the schema {schema}".format(sensor=sensor, schema=input.get("stream_format")))
|
||||||
|
|
||||||
for device_os in ["ANDROID", "IOS"]:
|
for device_os in ["ANDROID", "IOS"]:
|
||||||
scripts = schema[sensor][device_os]["MUTATION_SCRIPTS"]
|
scripts = schema[sensor][device_os]["MUTATION_SCRIPTS"]
|
||||||
if isinstance(scripts, list):
|
if isinstance(scripts, list):
|
||||||
for idx, script in enumerate(scripts):
|
for idx, script in enumerate(scripts):
|
||||||
if not script.lower().endswith((".py", ".r")):
|
if not script.lower().endswith((".py", ".r")):
|
||||||
raise ValueError("Mutate scripts can only be Python or R scripts (.py, .R).\n Instead we got {script} in \n [{sensor}][{device_os}] of {schema}".format(script=script, sensor=sensor, device_os=device_os, schema=input.get("source_schema_file")))
|
raise ValueError("Mutate scripts can only be Python or R scripts (.py, .R).\n Instead we got {script} in \n [{sensor}][{device_os}] of {schema}".format(script=script, sensor=sensor, device_os=device_os, schema=input.get("stream_format")))
|
||||||
input["mutationscript"+str(idx)] = script
|
input["mutationscript"+str(idx)] = script
|
||||||
return input
|
return input
|
||||||
|
|
||||||
|
|
|
@ -23,16 +23,16 @@ rule create_participants_files:
|
||||||
script:
|
script:
|
||||||
"../src/data/create_participants_files.R"
|
"../src/data/create_participants_files.R"
|
||||||
|
|
||||||
rule download_phone_data:
|
rule pull_phone_data:
|
||||||
input: unpack(download_phone_data_input_with_mutation_scripts)
|
input: unpack(pull_phone_data_input_with_mutation_scripts)
|
||||||
params:
|
params:
|
||||||
data_configuration = config["PHONE_DATA_CONFIGURATION"],
|
data_configuration = config["PHONE_DATA_STREAMS"][config["PHONE_DATA_STREAMS"]["USE"]],
|
||||||
sensor = "phone_" + "{sensor}",
|
sensor = "phone_" + "{sensor}",
|
||||||
tables = lambda wildcards: config["PHONE_" + str(wildcards.sensor).upper()]["TABLE"],
|
tables = lambda wildcards: config["PHONE_" + str(wildcards.sensor).upper()]["TABLE"],
|
||||||
output:
|
output:
|
||||||
"data/raw/{pid}/phone_{sensor}_raw.csv"
|
"data/raw/{pid}/phone_{sensor}_raw.csv"
|
||||||
script:
|
script:
|
||||||
"../src/data/download_phone_data.R"
|
"../src/data/streams/pull_phone_data.R"
|
||||||
|
|
||||||
rule download_fitbit_data:
|
rule download_fitbit_data:
|
||||||
input:
|
input:
|
||||||
|
@ -275,7 +275,7 @@ rule pull_empatica_data:
|
||||||
output:
|
output:
|
||||||
"data/raw/{pid}/empatica_{sensor}_raw.csv"
|
"data/raw/{pid}/empatica_{sensor}_raw.csv"
|
||||||
script:
|
script:
|
||||||
"../src/data/pull_empatica_data.R"
|
"../src/data/streams/pull_empatica_data.R"
|
||||||
|
|
||||||
rule empatica_readable_datetime:
|
rule empatica_readable_datetime:
|
||||||
input:
|
input:
|
||||||
|
|
|
@ -33,12 +33,12 @@ get_db_engine <- function(group){
|
||||||
#' If you don't have a way to infer the OS, call stop("Error Message") so other users know they can't use "infer" or the inference failed,
|
#' If you don't have a way to infer the OS, call stop("Error Message") so other users know they can't use "infer" or the inference failed,
|
||||||
#' and they have to assign the OS manually in the participant file
|
#' and they have to assign the OS manually in the participant file
|
||||||
#'
|
#'
|
||||||
#' @param data_configuration The PHONE_DATA_CONFIGURATION key in config.yaml. If you need specific parameters add them there.
|
#' @param stream_parameters The PHONE_STREAM_PARAMETERS key in config.yaml. If you need specific parameters add them there.
|
||||||
#' @param device A device ID string
|
#' @param device A device ID string
|
||||||
#' @return The OS the device ran, "android" or "ios"
|
#' @return The OS the device ran, "android" or "ios"
|
||||||
|
|
||||||
infer_device_os <- function(data_configuration, device){
|
infer_device_os <- function(stream_parameters, device){
|
||||||
dbEngine <- get_db_engine(data_configuration$SOURCE$DATABASE_GROUP)
|
dbEngine <- get_db_engine(stream_parameters$DATABASE_GROUP)
|
||||||
query <- paste0("SELECT device_id,brand FROM aware_device WHERE device_id = '", device, "'")
|
query <- paste0("SELECT device_id,brand FROM aware_device WHERE device_id = '", device, "'")
|
||||||
message(paste0("Executing the following query to infer phone OS: ", query))
|
message(paste0("Executing the following query to infer phone OS: ", query))
|
||||||
os <- dbGetQuery(dbEngine, query)
|
os <- dbGetQuery(dbEngine, query)
|
||||||
|
@ -55,14 +55,14 @@ infer_device_os <- function(data_configuration, device){
|
||||||
#' @description
|
#' @description
|
||||||
#' Gets the sensor data for a specific device id from a database table, file or whatever source you want to query
|
#' Gets the sensor data for a specific device id from a database table, file or whatever source you want to query
|
||||||
#'
|
#'
|
||||||
#' @param data_configuration The PHONE_DATA_CONFIGURATION key in config.yaml. If you need specific parameters add them there.
|
#' @param stream_parameters The PHONE_STREAM_PARAMETERS key in config.yaml. If you need specific parameters add them there.
|
||||||
#' @param device A device ID string
|
#' @param device A device ID string
|
||||||
#' @param sensor_container database table or file containing the sensor data for all participants. This is the PHONE_SENSOR[TABLE] key in config.yaml
|
#' @param sensor_container database table or file containing the sensor data for all participants. This is the PHONE_SENSOR[TABLE] key in config.yaml
|
||||||
#' @param columns the columns needed from this sensor (we recommend to only return these columns instead of every column in sensor_container)
|
#' @param columns the columns needed from this sensor (we recommend to only return these columns instead of every column in sensor_container)
|
||||||
#' @return A dataframe with the sensor data for device
|
#' @return A dataframe with the sensor data for device
|
||||||
|
|
||||||
download_data <- function(data_configuration, device, sensor_container, columns){
|
pull_data <- function(stream_parameters, device, sensor_container, columns){
|
||||||
dbEngine <- get_db_engine(data_configuration$SOURCE$DATABASE_GROUP)
|
dbEngine <- get_db_engine(stream_parameters$DATABASE_GROUP)
|
||||||
query <- paste0("SELECT ", paste(columns, collapse = ",")," FROM ", sensor_container, " WHERE device_id = '", device,"'")
|
query <- paste0("SELECT ", paste(columns, collapse = ",")," FROM ", sensor_container, " WHERE device_id = '", device,"'")
|
||||||
# Letting the user know what we are doing
|
# Letting the user know what we are doing
|
||||||
message(paste0("Executing the following query to download data: ", query))
|
message(paste0("Executing the following query to download data: ", query))
|
||||||
|
|
|
@ -63,7 +63,7 @@ load_container_script <- function(stream_container){
|
||||||
return(container$pull_data)
|
return(container$pull_data)
|
||||||
} else if(language == "r"){
|
} else if(language == "r"){
|
||||||
source(stream_container)
|
source(stream_container)
|
||||||
if(exists("pull_data"))
|
if(!exists("pull_data"))
|
||||||
stop(paste0("The following container.R script does not have a pull_data function: ", stream_container))
|
stop(paste0("The following container.R script does not have a pull_data function: ", stream_container))
|
||||||
return(pull_data)
|
return(pull_data)
|
||||||
}
|
}
|
|
@ -17,9 +17,9 @@ validate_deviceid_platforms <- function(device_ids, platforms, participant){
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
validate_inferred_os <- function(source_download_file, participant_file, device, device_os){
|
validate_inferred_os <- function(stream_container, participant_file, device, device_os){
|
||||||
if(!is.na(device_os) && device_os != "android" && device_os != "ios")
|
if(!is.na(device_os) && device_os != "android" && device_os != "ios")
|
||||||
stop(paste0("We tried to infer the OS for ", device, " but 'infer_device_os' function inside '",source_download_file,"' returned '",device_os,"' instead of 'android' or 'ios'. You can assign the OS manually in the participant file or report this bug on GitHub.\nParticipant file ", participant_file))
|
stop(paste0("We tried to infer the OS for ", device, " but 'infer_device_os' function inside '",stream_container,"' returned '",device_os,"' instead of 'android' or 'ios'. You can assign the OS manually in the participant file or report this bug on GitHub.\nParticipant file ", participant_file))
|
||||||
}
|
}
|
||||||
|
|
||||||
mutate_data <- function(scripts, data){
|
mutate_data <- function(scripts, data){
|
||||||
|
@ -58,7 +58,7 @@ rename_columns <- function(name_maps, data){
|
||||||
return(data)
|
return(data)
|
||||||
}
|
}
|
||||||
|
|
||||||
validate_expected_columns_mapping <- function(schema, rapids_schema, sensor, rapids_schema_file){
|
validate_expected_columns_mapping <- function(schema, rapids_schema, sensor, rapids_schema_file, stream_format){
|
||||||
android_columns <- names(schema[[sensor]][["ANDROID"]][["COLUMN_MAPPINGS"]])
|
android_columns <- names(schema[[sensor]][["ANDROID"]][["COLUMN_MAPPINGS"]])
|
||||||
android_columns <- android_columns[(android_columns != "FLAG_AS_EXTRA")]
|
android_columns <- android_columns[(android_columns != "FLAG_AS_EXTRA")]
|
||||||
|
|
||||||
|
@ -69,29 +69,47 @@ validate_expected_columns_mapping <- function(schema, rapids_schema, sensor, rap
|
||||||
if(is.null(rapids_columns))
|
if(is.null(rapids_columns))
|
||||||
stop(paste(sensor, " columns are not listed in RAPIDS' column specification. If you are adding support for a new phone sensor, add any mandatory columns in ", rapids_schema_file))
|
stop(paste(sensor, " columns are not listed in RAPIDS' column specification. If you are adding support for a new phone sensor, add any mandatory columns in ", rapids_schema_file))
|
||||||
if(length(setdiff(rapids_columns, android_columns)) > 0)
|
if(length(setdiff(rapids_columns, android_columns)) > 0)
|
||||||
stop(paste(sensor," mappings are missing one or more mandatory columns for ANDROID. The missing column mappings are for ", paste(setdiff(rapids_columns, android_columns), collapse=","),"in", rapids_schema_file))
|
stop(paste(sensor," mappings are missing one or more mandatory columns for ANDROID. The missing column mappings are for ", paste(setdiff(rapids_columns, android_columns), collapse=","),"in", stream_format, " (the mappings are case sensitive)"))
|
||||||
if(length(setdiff(rapids_columns, ios_columns)) > 0)
|
if(length(setdiff(rapids_columns, ios_columns)) > 0)
|
||||||
stop(paste(sensor," mappings are missing one or more mandatory columns for IOS. The missing column mappings are for ", paste(setdiff(rapids_columns, ios_columns), collapse=","),"in", rapids_schema_file))
|
stop(paste(sensor," mappings are missing one or more mandatory columns for IOS. The missing column mappings are for ", paste(setdiff(rapids_columns, ios_columns), collapse=","),"in", stream_format, " (the mappings are case sensitive)"))
|
||||||
if(length(setdiff(android_columns, rapids_columns)) > 0)
|
if(length(setdiff(android_columns, rapids_columns)) > 0)
|
||||||
stop(paste(sensor," mappings have one or more columns than required for ANDROID, add them as FLAG_AS_EXTRA instead. The extra column mappings are for ", paste(setdiff(android_columns, rapids_columns), collapse=","),"in", rapids_schema_file))
|
stop(paste(sensor," mappings have one or more columns than required for ANDROID, add them as FLAG_AS_EXTRA instead. The extra column mappings are for ", paste(setdiff(android_columns, rapids_columns), collapse=","),"in", stream_format, " (the mappings are case sensitive)"))
|
||||||
if(length(setdiff(ios_columns, rapids_columns)) > 0)
|
if(length(setdiff(ios_columns, rapids_columns)) > 0)
|
||||||
stop(paste(sensor," mappings have one or more columns than required for IOS, add them as FLAG_AS_EXTRA instead. The extra column mappings are for ", paste(setdiff(ios_columns, rapids_columns), collapse=","),"in", rapids_schema_file))
|
stop(paste(sensor," mappings have one or more columns than required for IOS, add them as FLAG_AS_EXTRA instead. The extra column mappings are for ", paste(setdiff(ios_columns, rapids_columns), collapse=","),"in", stream_format, " (the mappings are case sensitive)"))
|
||||||
}
|
}
|
||||||
|
|
||||||
download_phone_data <- function(){
|
load_container_script <- function(stream_container){
|
||||||
|
language <- if_else(endsWith(tolower(stream_container), "py"), "python", "r")
|
||||||
|
if(language == "python"){
|
||||||
|
library(reticulate)
|
||||||
|
container <- import_from_path(gsub(pattern = "\\.py$", "", basename(stream_container)), path = dirname(stream_container))
|
||||||
|
if(!py_has_attr(container, "pull_data"))
|
||||||
|
stop(paste0("The following container.py script does not have a pull_data function: ", stream_container))
|
||||||
|
if(!py_has_attr(container, "infer_device_os"))
|
||||||
|
stop(paste0("The following container.py script does not have a infer_device_os function: ", stream_container))
|
||||||
|
return(list("infer_device_os" = container$infer_device_os, "pull_data" = container$pull_data))
|
||||||
|
} else if(language == "r"){
|
||||||
|
source(stream_container)
|
||||||
|
if(!exists("pull_data"))
|
||||||
|
stop(paste0("The following container.R script does not have a pull_data function: ", stream_container))
|
||||||
|
if(!exists("infer_device_os"))
|
||||||
|
stop(paste0("The following container.R script does not have a infer_device_os function: ", stream_container))
|
||||||
|
return(list("infer_device_os" = infer_device_os, "pull_data" = pull_data))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pull_phone_data <- function(){
|
||||||
participant_file <- snakemake@input[["participant_file"]]
|
participant_file <- snakemake@input[["participant_file"]]
|
||||||
source_schema_file <- snakemake@input[["source_schema_file"]]
|
stream_format <- snakemake@input[["stream_format"]]
|
||||||
rapids_schema_file <- snakemake@input[["rapids_schema_file"]]
|
rapids_schema_file <- snakemake@input[["rapids_schema_file"]]
|
||||||
source_download_file <- snakemake@input[["source_download_file"]]
|
stream_container <- snakemake@input[["stream_container"]]
|
||||||
data_configuration <- snakemake@params[["data_configuration"]]
|
data_configuration <- snakemake@params[["data_configuration"]]
|
||||||
tables <- snakemake@params[["tables"]]
|
tables <- snakemake@params[["tables"]]
|
||||||
sensor <- toupper(snakemake@params[["sensor"]])
|
sensor <- toupper(snakemake@params[["sensor"]])
|
||||||
output_data_file <- snakemake@output[[1]]
|
output_data_file <- snakemake@output[[1]]
|
||||||
|
|
||||||
source(source_download_file)
|
|
||||||
|
|
||||||
participant_data <- read_yaml(participant_file)
|
participant_data <- read_yaml(participant_file)
|
||||||
schema <- read_yaml(source_schema_file)
|
stream_schema <- read_yaml(stream_format)
|
||||||
rapids_schema <- read_yaml(rapids_schema_file)
|
rapids_schema <- read_yaml(rapids_schema_file)
|
||||||
devices <- participant_data$PHONE$DEVICE_IDS
|
devices <- participant_data$PHONE$DEVICE_IDS
|
||||||
device_oss <- participant_data$PHONE$PLATFORMS
|
device_oss <- participant_data$PHONE$PLATFORMS
|
||||||
|
@ -101,30 +119,34 @@ download_phone_data <- function(){
|
||||||
if(length(device_oss) == 1)
|
if(length(device_oss) == 1)
|
||||||
device_oss <- rep(device_oss, length(devices))
|
device_oss <- rep(device_oss, length(devices))
|
||||||
|
|
||||||
validate_expected_columns_mapping(schema, rapids_schema, sensor, rapids_schema_file)
|
validate_expected_columns_mapping(stream_schema, rapids_schema, sensor, rapids_schema_file, stream_format)
|
||||||
# ANDROID or IOS COLUMN_MAPPINGS are guaranteed to be the same at this point (see validate_expected_columns_mapping function)
|
# ANDROID or IOS COLUMN_MAPPINGS are guaranteed to be the same at this point (see validate_expected_columns_mapping function)
|
||||||
expected_columns <- tolower(names(schema[[sensor]][["ANDROID"]][["COLUMN_MAPPINGS"]]))
|
expected_columns <- tolower(names(stream_schema[[sensor]][["ANDROID"]][["COLUMN_MAPPINGS"]]))
|
||||||
expected_columns <- expected_columns[(expected_columns != "flag_extra")]
|
expected_columns <- expected_columns[(expected_columns != "flag_extra")]
|
||||||
participant_data <- setNames(data.frame(matrix(ncol = length(expected_columns), nrow = 0)), expected_columns)
|
participant_data <- setNames(data.frame(matrix(ncol = length(expected_columns), nrow = 0)), expected_columns)
|
||||||
|
|
||||||
|
container_functions <- load_container_script(stream_container)
|
||||||
|
infer_device_os_container <- container_functions$infer_device_os
|
||||||
|
pull_data_container <- container_functions$pull_data
|
||||||
|
|
||||||
for(idx in seq_along(devices)){ #TODO remove length
|
for(idx in seq_along(devices)){ #TODO remove length
|
||||||
|
|
||||||
device <- devices[idx]
|
device <- devices[idx]
|
||||||
message(paste0("\nProcessing ", sensor, " for ", device))
|
message(paste0("\nProcessing ", sensor, " for ", device))
|
||||||
device_os <- ifelse(device_oss[idx] == "infer", infer_device_os(data_configuration, device), device_oss[idx])
|
device_os <- ifelse(device_oss[idx] == "infer", infer_device_os_container(data_configuration, device), device_oss[idx])
|
||||||
validate_inferred_os(basename(source_download_file), participant_file, device, device_os)
|
validate_inferred_os(basename(stream_container), participant_file, device, device_os)
|
||||||
os_table <- ifelse(length(tables) > 1, tables[[toupper(device_os)]], tables) # some sensor tables have a different name for android and ios
|
os_table <- ifelse(length(tables) > 1, tables[[toupper(device_os)]], tables) # some sensor tables have a different name for android and ios
|
||||||
|
|
||||||
columns_to_download <- schema[[sensor]][[toupper(device_os)]][["COLUMN_MAPPINGS"]]
|
columns_to_download <- stream_schema[[sensor]][[toupper(device_os)]][["COLUMN_MAPPINGS"]]
|
||||||
columns_to_download <- columns_to_download[(columns_to_download != "FLAG_TO_MUTATE")]
|
columns_to_download <- columns_to_download[(columns_to_download != "FLAG_TO_MUTATE")]
|
||||||
data <- download_data(data_configuration, device, os_table, columns_to_download)
|
data <- pull_data_container(data_configuration, device, os_table, columns_to_download)
|
||||||
|
|
||||||
# Rename all COLUMN_MAPPINGS except those mapped as FLAG_AS_EXTRA or FLAG_TO_MUTATE
|
# Rename all COLUMN_MAPPINGS except those mapped as FLAG_AS_EXTRA or FLAG_TO_MUTATE
|
||||||
columns_to_rename <- schema[[sensor]][[toupper(device_os)]][["COLUMN_MAPPINGS"]]
|
columns_to_rename <- stream_schema[[sensor]][[toupper(device_os)]][["COLUMN_MAPPINGS"]]
|
||||||
columns_to_rename <- (columns_to_rename[(columns_to_rename != "FLAG_TO_MUTATE" & names(columns_to_rename) != "FLAG_AS_EXTRA")])
|
columns_to_rename <- (columns_to_rename[(columns_to_rename != "FLAG_TO_MUTATE" & names(columns_to_rename) != "FLAG_AS_EXTRA")])
|
||||||
renamed_data <- rename_columns(columns_to_rename, data)
|
renamed_data <- rename_columns(columns_to_rename, data)
|
||||||
|
|
||||||
mutation_scripts <- schema[[sensor]][[toupper(device_os)]][["MUTATION_SCRIPTS"]]
|
mutation_scripts <- stream_schema[[sensor]][[toupper(device_os)]][["MUTATION_SCRIPTS"]]
|
||||||
mutated_data <- mutate_data(mutation_scripts, renamed_data)
|
mutated_data <- mutate_data(mutation_scripts, renamed_data)
|
||||||
|
|
||||||
if(length(setdiff(expected_columns, colnames(mutated_data))) > 0)
|
if(length(setdiff(expected_columns, colnames(mutated_data))) > 0)
|
||||||
|
@ -136,4 +158,4 @@ download_phone_data <- function(){
|
||||||
write_csv(participant_data, output_data_file)
|
write_csv(participant_data, output_data_file)
|
||||||
}
|
}
|
||||||
|
|
||||||
download_phone_data()
|
pull_phone_data()
|
|
@ -6,7 +6,7 @@ required:
|
||||||
- PIDS
|
- PIDS
|
||||||
- CREATE_PARTICIPANT_FILES
|
- CREATE_PARTICIPANT_FILES
|
||||||
- TIME_SEGMENTS
|
- TIME_SEGMENTS
|
||||||
- PHONE_DATA_CONFIGURATION
|
- PHONE_DATA_STREAMS
|
||||||
- PHONE_ACCELEROMETER
|
- PHONE_ACCELEROMETER
|
||||||
- PHONE_ACTIVITY_RECOGNITION
|
- PHONE_ACTIVITY_RECOGNITION
|
||||||
- PHONE_APPLICATIONS_CRASHES
|
- PHONE_APPLICATIONS_CRASHES
|
||||||
|
@ -209,16 +209,16 @@ properties:
|
||||||
INCLUDE_PAST_PERIODIC_SEGMENTS:
|
INCLUDE_PAST_PERIODIC_SEGMENTS:
|
||||||
type: boolean
|
type: boolean
|
||||||
|
|
||||||
PHONE_DATA_CONFIGURATION:
|
PHONE_DATA_STREAMS:
|
||||||
allOf:
|
type: object
|
||||||
- $ref: "#/definitions/DATA_CONFIGURATION"
|
properties:
|
||||||
- properties:
|
USE:
|
||||||
SOURCE:
|
type: string
|
||||||
type: object
|
aware_mysql:
|
||||||
properties:
|
type: object
|
||||||
TYPE:
|
properties:
|
||||||
type: string
|
DATABASE_GROUP:
|
||||||
enum: [aware_mysql]
|
type: string
|
||||||
|
|
||||||
PHONE_ACCELEROMETER:
|
PHONE_ACCELEROMETER:
|
||||||
type: object
|
type: object
|
||||||
|
|
Loading…
Reference in New Issue