Add support for multi-platform participants
parent
0a6bef4257
commit
06edf53016
|
@ -53,6 +53,7 @@ if config["SCREEN"]["COMPUTE"]:
|
|||
raise ValueError("Error: Add your screen table (and as many sensor tables as you have) to TABLES_FOR_SENSED_BINS in config.yaml. This is necessary to compute phone_sensed_bins (bins of time when the smartphone was sensing data)")
|
||||
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["SCREEN"]["DB_TABLE"]))
|
||||
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["SCREEN"]["DB_TABLE"]))
|
||||
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=config["PIDS"], sensor=config["SCREEN"]["DB_TABLE"]))
|
||||
files_to_compute.extend(expand("data/processed/{pid}/screen_deltas.csv", pid=config["PIDS"]))
|
||||
files_to_compute.extend(expand("data/processed/{pid}/screen_{day_segment}.csv", pid = config["PIDS"], day_segment = config["SCREEN"]["DAY_SEGMENTS"]))
|
||||
|
||||
|
|
|
@ -648,6 +648,7 @@ See `Screen Config Code`_
|
|||
|
||||
- Rule ``rules/preprocessing.snakefile/download_dataset``
|
||||
- Rule ``rules/preprocessing.snakefile/readable_datetime``
|
||||
- Rule ``rules/preprocessing.snakefile/unify_ios_android``
|
||||
- Rule ``rules/features.snakefile/screen_deltas``
|
||||
- Rule ``rules/features.snakefile/screen_features``
|
||||
|
||||
|
@ -683,7 +684,11 @@ firstuseafter minutes Minutes until the first unlock e
|
|||
|
||||
**Assumptions/Observations:**
|
||||
|
||||
An ``unlock`` episode is considered as the time between an ``unlock`` event and a ``lock`` event. iOS recorded these episodes reliably (albeit some duplicated ``lock`` events within milliseconds from each other). However, in Android there are some events unrelated to the screen state because of multiple consecutive ``unlock``/``lock`` events, so we keep the closest pair. In our experiments these cases are less than 10% of the screen events collected. This happens because ``ACTION_SCREEN_OFF`` and ``ON`` are "sent when the device becomes non-interactive which may have nothing to do with the screen turning off". Additionally, in Android it is possible to measure the time spent on the ``lock`` screen before an ``unlock`` event as well as the total screen time (i.e. ``ON`` to ``OFF``) but we are only keeping ``unlock`` episodes (``unlock`` to ``OFF``) to be consistent with iOS.
|
||||
In Android, ``lock`` events can happen right after an ``off`` event, after a few seconds of an ``off`` event, or never happen depending on the phone's settings, therefore, an ``unlock`` episode is defined as the time between an ``unlock`` and a ``off`` event. In iOS, ``on`` and ``off`` events do not exist, so an ``unlock`` episode is defined as the time between an ``unlock`` and a ``lock`` event.
|
||||
|
||||
Events in iOS are recorded reliably albeit some duplicated ``lock`` events within milliseconds from each other, so we only keep consecutive unlock/lock pairs. In Android you cand find multiple consecutive ``unlock`` or ``lock`` events, so we only keep consecutive unlock/off pairs. In our experiments these cases are less than 10% of the screen events collected and this happens because ``ACTION_SCREEN_OFF`` and ``ACTION_SCREEN_ON`` are "sent when the device becomes non-interactive which may have nothing to do with the screen turning off". In addition to unlock/off episodes, in Android it is possible to measure the time spent on the lock screen before an ``unlock`` event as well as the total screen time (i.e. ``ON`` to ``OFF``) but these are not implemented at the moment.
|
||||
|
||||
To unify the screen processing and use the same code in RAPIDS, we replace LOCKED episodes with OFF episodes (2 with 0) in iOS. However, as mentioned above this is still computing ``unlock`` to ``lock`` episodes.
|
||||
|
||||
.. _conversation-sensor-doc:
|
||||
|
||||
|
|
|
@ -140,8 +140,8 @@ Once RAPIDS is installed, follow these steps to start processing mobile data.
|
|||
|
||||
#. **Manually**. Create one file per participant in the ``rapids/data/external/`` directory. The file should NOT have an extension (i.e., no .txt). The name of the file will become the label for that participant in the pipeline.
|
||||
|
||||
- The first line of the file should be the Aware ``device_id`` for that participant. If one participant has multiple device_ids (i.e. Aware had to be re-installed), add all device_ids separated by commas.
|
||||
- The second line should list the device's operating system (``android`` or ``ios``)
|
||||
- The first line of the file should be the Aware ``device_id`` for that participant. If one participant has multiple device_ids (i.e. Aware had to be re-installed), add all device_ids separated by commas.
|
||||
- The second line should list the device's operating system (``android`` or ``ios``). If a participant used more than one device (i.e., the participant changed phones and/or platforms mid-study) you can a) list each platform matching the order of the first line (``android,ios``), b) use ``android`` or ``ios`` if all phones belong to the same platform, or c) if you have an ``aware_device`` table in your database, set this line to ``multiple`` and RAPIDS will infer the multiple platforms automatically.
|
||||
- The third line is an optional human-friendly label that will appear in any plots for that participant.
|
||||
- The fourth line is optional and contains a start and end date separated by a comma ``YYYYMMDD,YYYYMMDD`` (e.g., ``20201301,20202505``). If these dates are specified, only data within this range will be processed, otherwise, all data from the device(s) will be used.
|
||||
|
||||
|
|
|
@ -1,7 +1,12 @@
|
|||
def optional_ar_input(wildcards):
|
||||
with open("data/external/"+wildcards.pid, encoding="ISO-8859-1") as external_file:
|
||||
external_file_content = external_file.readlines()
|
||||
platform = external_file_content[1].strip()
|
||||
platforms = external_file_content[1].strip().split(",")
|
||||
if platforms[0] == "multiple" or (len(platforms) > 1 and "android" in platforms and "ios" in platforms):
|
||||
platform = "android"
|
||||
else:
|
||||
platform = platforms[0]
|
||||
|
||||
if platform == "android":
|
||||
return ["data/raw/{pid}/" + config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["ANDROID"] + "_with_datetime_unified.csv",
|
||||
"data/processed/{pid}/" + config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["ANDROID"] + "_deltas.csv"]
|
||||
|
@ -9,16 +14,23 @@ def optional_ar_input(wildcards):
|
|||
return ["data/raw/{pid}/"+config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]+"_with_datetime_unified.csv",
|
||||
"data/processed/{pid}/"+config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]+"_deltas.csv"]
|
||||
else:
|
||||
return []
|
||||
raise ValueError("Platform (line 2) in a participant file should be 'android', 'ios', or 'multiple'. You typed '" + platforms + "'")
|
||||
|
||||
def optional_conversation_input(wildcards):
|
||||
with open("data/external/"+wildcards.pid, encoding="ISO-8859-1") as external_file:
|
||||
external_file_content = external_file.readlines()
|
||||
platform = external_file_content[1].strip()
|
||||
platforms = external_file_content[1].strip().split(",")
|
||||
if platforms[0] == "multiple" or (len(platforms) > 1 and "android" in platforms and "ios" in platforms):
|
||||
platform = "android"
|
||||
else:
|
||||
platform = platforms[0]
|
||||
|
||||
if platform == "android":
|
||||
return ["data/raw/{pid}/" + config["CONVERSATION"]["DB_TABLE"]["ANDROID"] + "_with_datetime.csv"]
|
||||
else:
|
||||
elif platform == "ios":
|
||||
return ["data/raw/{pid}/" + config["CONVERSATION"]["DB_TABLE"]["IOS"] + "_with_datetime.csv"]
|
||||
else:
|
||||
raise ValueError("Platform (line 2) in a participant file should be 'android' or 'ios', or 'multiple'. You typed '" + platforms + "'")
|
||||
|
||||
def optional_location_input(wildcards):
|
||||
if config["BARNETT_LOCATION"]["LOCATIONS_TO_USE"] == "RESAMPLE_FUSED":
|
||||
|
@ -66,8 +78,7 @@ rule battery_deltas:
|
|||
|
||||
rule screen_deltas:
|
||||
input:
|
||||
screen = expand("data/raw/{{pid}}/{sensor}_with_datetime.csv", sensor=config["SCREEN"]["DB_TABLE"]),
|
||||
participant_info = "data/external/{pid}"
|
||||
screen = expand("data/raw/{{pid}}/{sensor}_with_datetime_unified.csv", sensor=config["SCREEN"]["DB_TABLE"])
|
||||
output:
|
||||
"data/processed/{pid}/screen_deltas.csv"
|
||||
script:
|
||||
|
|
|
@ -13,7 +13,9 @@ rule download_dataset:
|
|||
params:
|
||||
group = config["DOWNLOAD_DATASET"]["GROUP"],
|
||||
table = "{sensor}",
|
||||
timezone = config["TIMEZONE"]
|
||||
timezone = config["TIMEZONE"],
|
||||
aware_multiplatform_tables = config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["ANDROID"] + "," + config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"] + "," + config["CONVERSATION"]["DB_TABLE"]["ANDROID"] + "," + config["CONVERSATION"]["DB_TABLE"]["IOS"],
|
||||
unifiable_sensors = {"calls": config["CALLS"]["DB_TABLE"], "battery": config["BATTERY"]["DB_TABLE"], "screen": config["SCREEN"]["DB_TABLE"], "ios_activity_recognition": config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]}
|
||||
output:
|
||||
"data/raw/{pid}/{sensor}_raw.csv"
|
||||
script:
|
||||
|
@ -62,7 +64,8 @@ rule unify_ios_android:
|
|||
sensor_data = "data/raw/{pid}/{sensor}_with_datetime.csv",
|
||||
participant_info = "data/external/{pid}"
|
||||
params:
|
||||
sensor = "{sensor}"
|
||||
sensor = "{sensor}",
|
||||
unifiable_sensors = {"calls": config["CALLS"]["DB_TABLE"], "battery": config["BATTERY"]["DB_TABLE"], "screen": config["SCREEN"]["DB_TABLE"], "ios_activity_recognition": config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]}
|
||||
output:
|
||||
"data/raw/{pid}/{sensor}_with_datetime_unified.csv"
|
||||
script:
|
||||
|
|
|
@ -1,18 +1,54 @@
|
|||
source("renv/activate.R")
|
||||
|
||||
source("src/data/unify_utils.R")
|
||||
library(RMySQL)
|
||||
library(stringr)
|
||||
library(dplyr)
|
||||
library(readr)
|
||||
|
||||
validate_deviceid_platforms <- function(device_ids, platforms){
|
||||
if(length(device_ids) == 1){
|
||||
if(length(platforms) > 1 || (platforms != "android" && platforms != "ios"))
|
||||
stop(paste0("If you have 1 device_id, its platform should be 'android' or 'ios' but you typed: '", paste0(platforms, collapse = ","), "'. Participant file: ", participant))
|
||||
} else if(length(device_ids) > 1 && length(platforms) == 1){
|
||||
if(platforms != "android" && platforms != "ios" && platforms != "multiple")
|
||||
stop(paste0("If you have more than 1 device_id, platform should be 'android', 'ios' OR 'multiple' but you typed: '", paste0(platforms, collapse = "s,"), "'. Participant file: ", participant))
|
||||
} else if(length(device_ids) > 1 && length(platforms) > 1){
|
||||
if(length(device_ids) != length(platforms))
|
||||
stop(paste0("The number of device_ids should match the number of platforms. Participant file:", participant))
|
||||
if(all(intersect(c("android", "ios"), unique(platforms)) != c("android", "ios")))
|
||||
stop(paste0("If you have more than 1 device_id and more than 1 platform, the platforms should be a mix of 'android' AND 'ios' but you typed: '", paste0(platforms, collapse = ","), "'. Participant file: ", participant))
|
||||
}
|
||||
}
|
||||
|
||||
is_multiplaform_participant <- function(dbEngine, device_ids, platforms){
|
||||
# Multiple android and ios platforms or the same platform (android, ios) for multiple devices
|
||||
if((length(device_ids) > 1 && length(platforms) > 1) || (length(device_ids) > 1 && length(platforms) == 1 && (platforms == "android" || platforms == "ios"))){
|
||||
return(TRUE)
|
||||
}
|
||||
# Multiple platforms for multiple devices, we search the platform for every device in the aware_device table
|
||||
if(length(device_ids) > 1 && length(platforms) == 1 && platforms == "multiple"){
|
||||
devices_platforms <- dbGetQuery(dbEngine, paste0("SELECT device_id,brand FROM aware_device WHERE device_id IN ('", paste0(device_ids, collapse = "','"), "')"))
|
||||
platforms <- devices_platforms %>% distinct(brand) %>% pull(brand)
|
||||
# Android phones have different brands so we check that we got at least two different platforms and one of them is iPhone
|
||||
if(length(platforms) > 1 && "iPhone" %in% platforms){
|
||||
return(TRUE)
|
||||
}
|
||||
}
|
||||
return(FALSE)
|
||||
}
|
||||
|
||||
participant <- snakemake@input[[1]]
|
||||
group <- snakemake@params[["group"]]
|
||||
table <- snakemake@params[["table"]]
|
||||
timezone <- snakemake@params[["timezone"]]
|
||||
aware_multiplatform_tables <- str_split(snakemake@params[["aware_multiplatform_tables"]], ",")[[1]]
|
||||
unifiable_tables = snakemake@params[["unifiable_sensors"]]
|
||||
sensor_file <- snakemake@output[[1]]
|
||||
|
||||
device_ids <- readLines(participant, n=1)
|
||||
unified_device_id <- tail(strsplit(device_ids, ",")[[1]], 1)
|
||||
device_ids <- strsplit(readLines(participant, n=1), ",")[[1]]
|
||||
unified_device_id <- tail(device_ids, 1)
|
||||
platforms <- strsplit(readLines(participant, n=2)[[2]], ",")[[1]]
|
||||
validate_deviceid_platforms(device_ids, platforms)
|
||||
|
||||
# Read start and end date from the participant file to filter data within that range
|
||||
start_date <- strsplit(readLines(participant, n=4)[4], ",")[[1]][1]
|
||||
|
@ -20,30 +56,32 @@ end_date <- strsplit(readLines(participant, n=4)[4], ",")[[1]][2]
|
|||
start_datetime_utc = format(as.POSIXct(paste0(start_date, " 00:00:00"),format="%Y/%m/%d %H:%M:%S",origin="1970-01-01",tz=timezone), tz="UTC")
|
||||
end_datetime_utc = format(as.POSIXct(paste0(end_date, " 23:59:59"),format="%Y/%m/%d %H:%M:%S",origin="1970-01-01",tz=timezone), tz="UTC")
|
||||
|
||||
stopDB <- dbConnect(MySQL(), default.file = "./.env", group = group)
|
||||
dbEngine <- dbConnect(MySQL(), default.file = "./.env", group = group)
|
||||
|
||||
# Get existent columns in table
|
||||
query <- paste0("SELECT * FROM ", table, " LIMIT 1")
|
||||
available_columns <- colnames(dbGetQuery(stopDB, query))
|
||||
available_columns <- colnames(dbGetQuery(dbEngine, paste0("SELECT * FROM ", table, " LIMIT 1")))
|
||||
|
||||
if("device_id" %in% available_columns){
|
||||
query <- paste0("SELECT * FROM ", table, " WHERE device_id IN ('", gsub(",", "','", device_ids), "')")
|
||||
if("timestamp" %in% available_columns && !(is.na(start_datetime_utc)) && !(is.na(end_datetime_utc)) && start_datetime_utc < end_datetime_utc){
|
||||
query <- paste0(query, "AND timestamp BETWEEN 1000*UNIX_TIMESTAMP('", start_datetime_utc, "') AND 1000*UNIX_TIMESTAMP('", end_datetime_utc, "')")
|
||||
}
|
||||
sensor_data <- dbGetQuery(stopDB, query)
|
||||
|
||||
if("timestamp" %in% available_columns){
|
||||
sensor_data <- sensor_data %>% arrange(timestamp)
|
||||
}
|
||||
|
||||
# Unify device_id
|
||||
sensor_data <- sensor_data %>% mutate(device_id = unified_device_id)
|
||||
|
||||
# Droping duplicates on all columns except for _id or id
|
||||
sensor_data <- sensor_data %>% distinct(!!!syms(setdiff(names(sensor_data), c("_id", "id"))))
|
||||
} else {
|
||||
print(paste0("Table ", table, "does not have a device_id column (Aware ID) to link its data to a participant"))
|
||||
}
|
||||
if(is_multiplaform_participant(dbEngine, device_ids, platforms)){
|
||||
sensor_data <- unify_raw_data(dbEngine, table, start_datetime_utc, end_datetime_utc, aware_multiplatform_tables, unifiable_tables, device_ids, platforms)
|
||||
}else {
|
||||
query <- paste0("SELECT * FROM ", table, " WHERE device_id IN ('", paste0(device_ids, collapse = "','"), "')")
|
||||
if("timestamp" %in% available_columns && !(is.na(start_datetime_utc)) && !(is.na(end_datetime_utc)) && start_datetime_utc < end_datetime_utc)
|
||||
query <- paste0(query, "AND timestamp BETWEEN 1000*UNIX_TIMESTAMP('", start_datetime_utc, "') AND 1000*UNIX_TIMESTAMP('", end_datetime_utc, "')")
|
||||
sensor_data <- dbGetQuery(dbEngine, query)
|
||||
}
|
||||
|
||||
if("timestamp" %in% available_columns)
|
||||
sensor_data <- sensor_data %>% arrange(timestamp)
|
||||
|
||||
# Unify device_id
|
||||
sensor_data <- sensor_data %>% mutate(device_id = unified_device_id)
|
||||
|
||||
# Droping duplicates on all columns except for _id or id
|
||||
sensor_data <- sensor_data %>% distinct(!!!syms(setdiff(names(sensor_data), c("_id", "id"))))
|
||||
|
||||
} else
|
||||
stop(paste0("Table ", table, "does not have a device_id column (Aware ID) to link its data to a participant"))
|
||||
|
||||
write_csv(sensor_data, sensor_file)
|
||||
dbDisconnect(stopDB)
|
||||
dbDisconnect(dbEngine)
|
|
@ -1,131 +1,14 @@
|
|||
source("renv/activate.R")
|
||||
|
||||
library(dplyr)
|
||||
library(stringr)
|
||||
|
||||
unify_ios_battery <- function(ios_battery){
|
||||
# We only need to unify battery data for iOS client V1. V2 does it out-of-the-box
|
||||
# V1 will not have rows where battery_status is equal to 4
|
||||
if(nrow(ios_battery %>% filter(battery_status == 4)) == 0)
|
||||
ios_battery <- ios_battery %>%
|
||||
mutate(battery_status = replace(battery_status, battery_status == 3, 5),
|
||||
battery_status = replace(battery_status, battery_status == 1, 3))
|
||||
return(ios_battery)
|
||||
}
|
||||
|
||||
unify_ios_calls <- function(ios_calls){
|
||||
# Android’s call types 1=incoming, 2=outgoing, 3=missed
|
||||
# iOS' call status 1=incoming, 2=connected, 3=dialing, 4=disconnected
|
||||
# iOS' call types based on call status: (1,2,4)=incoming=1, (3,2,4)=outgoing=2, (1,4) or (3,4)=missed=3
|
||||
# Sometimes (due to a possible bug in Aware) sequences get logged on the exact same timestamp, thus 3-item sequences can be 2,3,4 or 3,2,4
|
||||
# Even tho iOS stores the duration of ringing/dialing for missed calls, we set it to 0 to match Android
|
||||
|
||||
ios_calls <- ios_calls %>%
|
||||
arrange(trace, timestamp, call_type) %>%
|
||||
group_by(trace) %>%
|
||||
# search for the disconnect event, as it is common to outgoing, received and missed calls
|
||||
mutate(completed_call = ifelse(call_type == 4, 2, 0),
|
||||
# assign the same ID to all events before a 4
|
||||
completed_call = cumsum(c(1, head(completed_call, -1) != tail(completed_call, -1))),
|
||||
# hack to match ID of last event (4) to that of the previous rows
|
||||
completed_call = ifelse(call_type == 4, completed_call - 1, completed_call)) %>%
|
||||
summarise(call_type_sequence = paste(call_type, collapse = ","), # collapse all events before a 4
|
||||
# use this if Android measures calls' duration from pick up to hang up
|
||||
# duration = last(call_duration),
|
||||
# sanity check, timestamp_diff should be equal or close to duration sum
|
||||
# timestamp_diff = trunc((last(timestamp) - first(timestamp)) / 1000)
|
||||
# use this if Android measures calls' duration from dialing/ringing to hang up
|
||||
call_duration = sum(call_duration),
|
||||
|
||||
timestamp = first(timestamp),
|
||||
utc_date_time = first(utc_date_time),
|
||||
local_date_time = first(local_date_time),
|
||||
local_date = first(local_date),
|
||||
local_time = first(local_time),
|
||||
local_hour = first(local_hour),
|
||||
local_minute = first(local_minute),
|
||||
local_day_segment = first(local_day_segment)
|
||||
) %>%
|
||||
mutate(call_type = case_when(
|
||||
call_type_sequence == "1,2,4" | call_type_sequence == "2,1,4" ~ 1, # incoming
|
||||
call_type_sequence == "1,4" ~ 3, # missed
|
||||
call_type_sequence == "3,2,4" | call_type_sequence == "2,3,4" ~ 2, # outgoing
|
||||
call_type_sequence == "3,4" ~ 4, # outgoing missed, we create this temp missed state to assign a duration of 0 below
|
||||
TRUE ~ -1), # other, call sequences without a disconnect (4) event are discarded
|
||||
# assign a duration of 0 to incoming and outgoing missed calls
|
||||
call_duration = ifelse(call_type == 3 | call_type == 4, 0, call_duration),
|
||||
# get rid of the temp missed call type, set to 3 to match Android
|
||||
call_type = ifelse(call_type == 4, 3, call_type)
|
||||
) %>%
|
||||
# discard sequences without an event 4 (disconnect)
|
||||
filter(call_type > 0) %>%
|
||||
ungroup() %>%
|
||||
arrange(timestamp)
|
||||
|
||||
return(ios_calls)
|
||||
}
|
||||
|
||||
clean_ios_activity_column <- function(ios_gar){
|
||||
ios_gar <- ios_gar %>%
|
||||
mutate(activities = str_replace_all(activities, pattern = '("|\\[|\\])', replacement = ""))
|
||||
|
||||
existent_multiple_activities <- ios_gar %>%
|
||||
filter(str_detect(activities, ",")) %>%
|
||||
group_by(activities) %>%
|
||||
summarise(mutiple_activities = unique(activities)) %>%
|
||||
pull(mutiple_activities)
|
||||
|
||||
known_multiple_activities <- c("stationary,automotive")
|
||||
unkown_multiple_actvities <- setdiff(existent_multiple_activities, known_multiple_activities)
|
||||
if(length(unkown_multiple_actvities) > 0){
|
||||
stop(paste0("There are unkwown combinations of ios activities, you need to implement the decision of the ones to keep: ", unkown_multiple_actvities))
|
||||
}
|
||||
|
||||
ios_gar <- ios_gar %>%
|
||||
mutate(activities = str_replace_all(activities, pattern = "stationary,automotive", replacement = "automotive"))
|
||||
|
||||
return(ios_gar)
|
||||
}
|
||||
|
||||
unify_ios_gar <- function(ios_gar){
|
||||
# We only need to unify Google Activity Recognition data for iOS
|
||||
# discard rows where activities column is blank
|
||||
ios_gar <- ios_gar[-which(ios_gar$activities == ""), ]
|
||||
# clean "activities" column of ios_gar
|
||||
ios_gar <- clean_ios_activity_column(ios_gar)
|
||||
|
||||
# make it compatible with android version: generate "activity_name" and "activity_type" columns
|
||||
ios_gar <- ios_gar %>%
|
||||
mutate(activity_name = case_when(activities == "automotive" ~ "in_vehicle",
|
||||
activities == "cycling" ~ "on_bicycle",
|
||||
activities == "walking" | activities == "running" ~ "on_foot",
|
||||
activities == "stationary" ~ "still"),
|
||||
activity_type = case_when(activities == "automotive" ~ 0,
|
||||
activities == "cycling" ~ 1,
|
||||
activities == "walking" | activities == "running" ~ 2,
|
||||
activities == "stationary" ~ 3,
|
||||
activities == "unknown" ~ 4))
|
||||
|
||||
return(ios_gar)
|
||||
}
|
||||
|
||||
source("src/data/unify_utils.R")
|
||||
|
||||
sensor_data <- read.csv(snakemake@input[["sensor_data"]], stringsAsFactors = FALSE)
|
||||
participant_info <- snakemake@input[["participant_info"]]
|
||||
sensor <- snakemake@params[["sensor"]]
|
||||
platform <- readLines(participant_info, n=2)[[2]]
|
||||
unifiable_sensors = snakemake@params[["unifiable_sensors"]]
|
||||
|
||||
platforms <- strsplit(readLines(participant_info, n=2)[[2]], ",")[[1]]
|
||||
platform <- ifelse(platforms[1] == "multiple" | (length(platforms) > 1 & "android" %in% platforms & "ios" %in% platforms), "android", platforms[1])
|
||||
|
||||
sensor_data <- unify_data(sensor_data, sensor, platform, unifiable_sensors)
|
||||
|
||||
if(sensor == "calls"){
|
||||
if(platform == "ios"){
|
||||
sensor_data = unify_ios_calls(sensor_data)
|
||||
}
|
||||
# android calls remain unchanged
|
||||
} else if(sensor == "battery"){
|
||||
if(platform == "ios"){
|
||||
sensor_data = unify_ios_battery(sensor_data)
|
||||
}
|
||||
# android battery remains unchanged
|
||||
} else if(sensor == "plugin_ios_activity_recognition"){
|
||||
sensor_data = unify_ios_gar(sensor_data)
|
||||
}
|
||||
write.csv(sensor_data, snakemake@output[[1]], row.names = FALSE)
|
||||
|
|
|
@ -0,0 +1,190 @@
|
|||
library(dplyr)
|
||||
library(stringr)
|
||||
|
||||
unify_ios_screen <- function(ios_screen){
|
||||
# In Android we only process UNLOCK to OFF episodes. In iOS we only process UNLOCK to LOCKED episodes,
|
||||
# thus, we replace LOCKED with OFF episodes (2 to 0) so we can use Android's code for iOS
|
||||
ios_screen <- ios_screen %>%
|
||||
# only keep consecutive pairs of 3,2 events
|
||||
filter( (screen_status == 3 & lead(screen_status) == 2) | (screen_status == 2 & lag(screen_status) == 3) ) %>%
|
||||
mutate(screen_status = replace(screen_status, screen_status == 2, 0))
|
||||
return(ios_screen)
|
||||
}
|
||||
|
||||
unify_ios_battery <- function(ios_battery){
|
||||
# We only need to unify battery data for iOS client V1. V2 does it out-of-the-box
|
||||
# V1 will not have rows where battery_status is equal to 4
|
||||
if(nrow(ios_battery %>% filter(battery_status == 4)) == 0)
|
||||
ios_battery <- ios_battery %>%
|
||||
mutate(battery_status = replace(battery_status, battery_status == 3, 5),
|
||||
battery_status = replace(battery_status, battery_status == 1, 3))
|
||||
return(ios_battery)
|
||||
}
|
||||
|
||||
unify_ios_calls <- function(ios_calls){
|
||||
# Android’s call types 1=incoming, 2=outgoing, 3=missed
|
||||
# iOS' call status 1=incoming, 2=connected, 3=dialing, 4=disconnected
|
||||
# iOS' call types based on call status: (1,2,4)=incoming=1, (3,2,4)=outgoing=2, (1,4) or (3,4)=missed=3
|
||||
# Sometimes (due to a possible bug in Aware) sequences get logged on the exact same timestamp, thus 3-item sequences can be 2,3,4 or 3,2,4
|
||||
# Even tho iOS stores the duration of ringing/dialing for missed calls, we set it to 0 to match Android
|
||||
|
||||
ios_calls <- ios_calls %>%
|
||||
arrange(trace, timestamp, call_type) %>%
|
||||
group_by(trace) %>%
|
||||
# search for the disconnect event, as it is common to outgoing, received and missed calls
|
||||
mutate(completed_call = ifelse(call_type == 4, 2, 0),
|
||||
# assign the same ID to all events before a 4
|
||||
completed_call = cumsum(c(1, head(completed_call, -1) != tail(completed_call, -1))),
|
||||
# hack to match ID of last event (4) to that of the previous rows
|
||||
completed_call = ifelse(call_type == 4, completed_call - 1, completed_call))
|
||||
|
||||
# We check utc_date_time and local_date_time exist because sometimes we call this function from
|
||||
# download_dataset to unify multi-platform participants. At that point such time columns are missing
|
||||
if("utc_date_time" %in% colnames(ios_calls) && "local_date_time" %in% colnames(ios_calls)){
|
||||
ios_calls <- ios_calls %>% summarise(call_type_sequence = paste(call_type, collapse = ","), # collapse all events before a 4
|
||||
# sanity check, timestamp_diff should be equal or close to duration sum
|
||||
# timestamp_diff = trunc((last(timestamp) - first(timestamp)) / 1000)
|
||||
# use duration = last(call_duration) if Android measures calls' duration from pick up to hang up
|
||||
# use call_duration = sum(call_duration) if Android measures calls' duration from dialing/ringing to hang up
|
||||
call_duration = sum(call_duration),
|
||||
timestamp = first(timestamp),
|
||||
utc_date_time = first(utc_date_time),
|
||||
local_date_time = first(local_date_time),
|
||||
local_date = first(local_date),
|
||||
local_time = first(local_time),
|
||||
local_hour = first(local_hour),
|
||||
local_minute = first(local_minute),
|
||||
local_day_segment = first(local_day_segment))
|
||||
}
|
||||
else {
|
||||
ios_calls <- ios_calls %>% summarise(call_type_sequence = paste(call_type, collapse = ","), call_duration = sum(call_duration), timestamp = first(timestamp))
|
||||
}
|
||||
ios_calls <- ios_calls %>% mutate(call_type = case_when(
|
||||
call_type_sequence == "1,2,4" | call_type_sequence == "2,1,4" ~ 1, # incoming
|
||||
call_type_sequence == "1,4" ~ 3, # missed
|
||||
call_type_sequence == "3,2,4" | call_type_sequence == "2,3,4" ~ 2, # outgoing
|
||||
call_type_sequence == "3,4" ~ 4, # outgoing missed, we create this temp missed state to assign a duration of 0 below
|
||||
TRUE ~ -1), # other, call sequences without a disconnect (4) event are discarded
|
||||
# assign a duration of 0 to incoming and outgoing missed calls
|
||||
call_duration = ifelse(call_type == 3 | call_type == 4, 0, call_duration),
|
||||
# get rid of the temp missed call type, set to 3 to match Android
|
||||
call_type = ifelse(call_type == 4, 3, call_type)
|
||||
) %>%
|
||||
# discard sequences without an event 4 (disconnect)
|
||||
filter(call_type > 0) %>%
|
||||
ungroup() %>%
|
||||
arrange(timestamp)
|
||||
|
||||
return(ios_calls)
|
||||
}
|
||||
|
||||
clean_ios_activity_column <- function(ios_gar){
|
||||
ios_gar <- ios_gar %>%
|
||||
mutate(activities = str_replace_all(activities, pattern = '("|\\[|\\])', replacement = ""))
|
||||
|
||||
existent_multiple_activities <- ios_gar %>%
|
||||
filter(str_detect(activities, ",")) %>%
|
||||
group_by(activities) %>%
|
||||
summarise(mutiple_activities = unique(activities)) %>%
|
||||
pull(mutiple_activities)
|
||||
|
||||
known_multiple_activities <- c("stationary,automotive")
|
||||
unkown_multiple_actvities <- setdiff(existent_multiple_activities, known_multiple_activities)
|
||||
if(length(unkown_multiple_actvities) > 0){
|
||||
stop(paste0("There are unkwown combinations of ios activities, you need to implement the decision of the ones to keep: ", unkown_multiple_actvities))
|
||||
}
|
||||
|
||||
ios_gar <- ios_gar %>%
|
||||
mutate(activities = str_replace_all(activities, pattern = "stationary,automotive", replacement = "automotive"))
|
||||
|
||||
return(ios_gar)
|
||||
}
|
||||
|
||||
unify_ios_gar <- function(ios_gar){
|
||||
# We only need to unify Google Activity Recognition data for iOS
|
||||
# discard rows where activities column is blank
|
||||
ios_gar <- ios_gar[-which(ios_gar$activities == ""), ]
|
||||
# clean "activities" column of ios_gar
|
||||
ios_gar <- clean_ios_activity_column(ios_gar)
|
||||
|
||||
# make it compatible with android version: generate "activity_name" and "activity_type" columns
|
||||
ios_gar <- ios_gar %>%
|
||||
mutate(activity_name = case_when(activities == "automotive" ~ "in_vehicle",
|
||||
activities == "cycling" ~ "on_bicycle",
|
||||
activities == "walking" | activities == "running" ~ "on_foot",
|
||||
activities == "stationary" ~ "still"),
|
||||
activity_type = case_when(activities == "automotive" ~ 0,
|
||||
activities == "cycling" ~ 1,
|
||||
activities == "walking" | activities == "running" ~ 2,
|
||||
activities == "stationary" ~ 3,
|
||||
activities == "unknown" ~ 4))
|
||||
|
||||
return(ios_gar)
|
||||
}
|
||||
|
||||
# This function is used in download_dataset.R
|
||||
unify_raw_data <- function(dbEngine, table, start_datetime_utc, end_datetime_utc, aware_multiplatform_tables, unifiable_tables, device_ids, platforms){
|
||||
# If platforms is 'multiple', fetch each device_id's platform from aware_device, otherwise, use those given by the user
|
||||
if(length(platforms) == 1 && platforms == "multiple")
|
||||
devices_platforms <- dbGetQuery(dbEngine, paste0("SELECT device_id,brand FROM aware_device WHERE device_id IN ('", paste0(device_ids, collapse = "','"), "')")) %>%
|
||||
mutate(platform = ifelse(brand == "iPhone", "ios", "android"))
|
||||
else
|
||||
devices_platforms <- data.frame(device_id = device_ids, platform = platforms)
|
||||
|
||||
# Get existent tables in database
|
||||
available_tables_in_db <- dbGetQuery(dbEngine, paste0("SELECT table_name FROM information_schema.tables WHERE table_type = 'base table' AND table_schema='", dbGetInfo(dbEngine)$dbname,"'")) %>% pull(table_name)
|
||||
|
||||
# Parse the table names for activity recognition and conversation plugins because they are different between android and ios
|
||||
ar_tables <- setNames(aware_multiplatform_tables[1:2], c("android", "ios"))
|
||||
conversation_tables <- setNames(aware_multiplatform_tables[3:4], c("android", "ios"))
|
||||
|
||||
participants_sensordata <- list()
|
||||
for(i in 1:nrow(devices_platforms)) {
|
||||
row <- devices_platforms[i,]
|
||||
device_id <- row$device_id
|
||||
platform <- row$platform
|
||||
|
||||
# Handle special cases when tables for the same sensor have different names for Android and iOS (AR and conversation)
|
||||
if(table %in% ar_tables)
|
||||
table <- ar_tables[[platform]]
|
||||
else if(table %in% conversation_tables)
|
||||
table <- conversation_tables[[platform]]
|
||||
|
||||
if(table %in% available_tables_in_db){
|
||||
query <- paste0("SELECT * FROM ", table, " WHERE device_id IN ('", device_id, "')")
|
||||
if("timestamp" %in% available_columns && !(is.na(start_datetime_utc)) && !(is.na(end_datetime_utc)) && start_datetime_utc < end_datetime_utc){
|
||||
query <- paste0(query, "AND timestamp BETWEEN 1000*UNIX_TIMESTAMP('", start_datetime_utc, "') AND 1000*UNIX_TIMESTAMP('", end_datetime_utc, "')")
|
||||
}
|
||||
sensor_data <- unify_data(dbGetQuery(dbEngine, query), table, platform, unifiable_tables)
|
||||
participants_sensordata <- append(participants_sensordata, list(sensor_data))
|
||||
}else{
|
||||
warning(paste0("Missing ", table, " table. We unified the data from ", paste0(devices_platforms$device_id, collapse = " and "), " but without records from this missing table for ", device_id))
|
||||
}
|
||||
}
|
||||
unified_data <- bind_rows(participants_sensordata)
|
||||
return(unified_data)
|
||||
|
||||
}
|
||||
|
||||
# This function is used in unify_ios_android.R and unify_raw_data function
|
||||
unify_data <- function(sensor_data, sensor, platform, unifiable_sensors){
|
||||
if(sensor == unifiable_sensors$calls){
|
||||
if(platform == "ios"){
|
||||
sensor_data = unify_ios_calls(sensor_data)
|
||||
}
|
||||
# android calls remain unchanged
|
||||
} else if(sensor == unifiable_sensors$battery){
|
||||
if(platform == "ios"){
|
||||
sensor_data = unify_ios_battery(sensor_data)
|
||||
}
|
||||
# android battery remains unchanged
|
||||
} else if(sensor == unifiable_sensors$ios_activity_recognition){
|
||||
sensor_data = unify_ios_gar(sensor_data)
|
||||
} else if(sensor == unifiable_sensors$screen){
|
||||
if(platform == "ios"){
|
||||
sensor_data = unify_ios_screen(sensor_data)
|
||||
}
|
||||
# android screen remains unchanged
|
||||
}
|
||||
return(sensor_data)
|
||||
}
|
|
@ -6,7 +6,6 @@ library(stringr)
|
|||
|
||||
screen <- read.csv(snakemake@input[["screen"]])
|
||||
participant_info <- snakemake@input[["participant_info"]]
|
||||
platform <- readLines(participant_info, n=2)[[2]]
|
||||
|
||||
# Screen States
|
||||
# Android: https://github.com/denzilferreira/aware-client/blob/78ccc22f0f822f8421bef9b1a73d36e71b8aa85b/aware-core/src/main/java/com/aware/Screen.java
|
||||
|
@ -25,42 +24,23 @@ swap_screen_status <- function(data, status1, status2, time_buffer){
|
|||
screen_status = ifelse(screen_status == 800L, status1, screen_status))
|
||||
}
|
||||
|
||||
get_ios_screen_episodes <- function(screen){
|
||||
episodes <- screen %>%
|
||||
# only keep consecutive pairs of 3,2 events
|
||||
filter( (screen_status == 3 & lead(screen_status) == 2) | (screen_status == 2 & lag(screen_status) == 3) ) %>%
|
||||
# in iOS and after our filtering, screen episodes should end with a LOCK event (2)
|
||||
mutate(episode_id = ifelse(screen_status == 2, 1:n(), NA_integer_)) %>%
|
||||
fill(episode_id, .direction = "updown") %>%
|
||||
group_by(episode_id) %>%
|
||||
summarise(episode = "unlock",
|
||||
screen_sequence = toString(screen_status),
|
||||
time_diff = (last(timestamp) - first(timestamp)) / (1000 * 60),
|
||||
local_start_date_time = first(local_date_time),
|
||||
local_end_date_time = last(local_date_time),
|
||||
local_start_date = first(local_date),
|
||||
local_end_date = last(local_date),
|
||||
local_start_day_segment = first(local_day_segment),
|
||||
local_end_day_segment = last(local_day_segment))
|
||||
}
|
||||
|
||||
get_android_screen_episodes <- function(screen){
|
||||
# Aware logs LOCK events after turning the screen ON or OFF but we filter them out to simplify this analysis.
|
||||
get_screen_episodes <- function(screen){
|
||||
# Aware Android logs LOCK events after turning the screen ON or OFF but we filter them out to simplify this analysis.
|
||||
# The code below only process UNLOCK to OFF episodes, but it's possible to modify it for ON to OFF (see line 61) or ON to UNLOCK episodes.
|
||||
|
||||
episodes <- screen %>%
|
||||
# filter out LOCK events (2) that come within 50 milliseconds of an ON (1) or OFF (0) event
|
||||
# Relevant for Android. Remove LOCK events (2) that come within 50 milliseconds of an ON (1) or OFF (0) event
|
||||
filter(!(screen_status == 2 & lag(screen_status) == 1 & timestamp - lag(timestamp) < 50)) %>%
|
||||
filter(!(screen_status == 2 & lag(screen_status) == 0 & timestamp - lag(timestamp) < 50)) %>%
|
||||
# in Android and after our filtering, screen episodes should end with a OFF event (0)
|
||||
# After our filtering, screen episodes should end with a OFF event (0)
|
||||
mutate(episode_id = ifelse(screen_status == 0, 1:n(), NA_integer_)) %>%
|
||||
fill(episode_id, .direction = "updown") %>%
|
||||
group_by(episode_id) %>%
|
||||
# Rarely, UNLOCK events (3) get logged just before ON events (1). If this happens within 800ms, swap them
|
||||
# Relevant for Android. Rarely, UNLOCK events (3) get logged just before ON events (1). If this happens within 800ms, swap them
|
||||
swap_screen_status(3L, 1L, 800) %>%
|
||||
# to be consistent with iOS we filter out events (and thus sequences) starting with an ON (1) event
|
||||
# Relevant for Android. To be consistent with iOS we remove events (and thus sequences) starting with an ON (1) event
|
||||
filter(screen_status != 1) %>%
|
||||
# only keep consecutive 3,0 pairs (UNLOCK, OFF)
|
||||
# Only keep consecutive 3,0 pairs (UNLOCK, OFF)
|
||||
filter( (screen_status == 3 & lead(screen_status) == 0) | (screen_status == 0 & lag(screen_status) == 3) ) %>%
|
||||
summarise(episode = "unlock",
|
||||
screen_sequence = toString(screen_status),
|
||||
|
@ -88,12 +68,8 @@ if(nrow(screen) < 2){
|
|||
local_end_date = character(),
|
||||
local_start_day_segment = character(),
|
||||
local_end_day_segment = character())
|
||||
} else if(platform == "ios"){
|
||||
episodes <- get_ios_screen_episodes(screen)
|
||||
} else if(platform == "android"){
|
||||
episodes <- get_android_screen_episodes(screen)
|
||||
} else {
|
||||
print(paste0("The platform (second line) in ", participant_info, " should be android or ios"))
|
||||
episodes <- get_screen_episodes(screen)
|
||||
}
|
||||
|
||||
write.csv(episodes, snakemake@output[[1]], row.names = FALSE)
|
||||
|
|
Loading…
Reference in New Issue