Add support for multi-platform participants

pull/95/head
JulioV 2020-06-30 17:34:18 -04:00
parent 0a6bef4257
commit 06edf53016
9 changed files with 299 additions and 192 deletions

View File

@ -53,6 +53,7 @@ if config["SCREEN"]["COMPUTE"]:
raise ValueError("Error: Add your screen table (and as many sensor tables as you have) to TABLES_FOR_SENSED_BINS in config.yaml. This is necessary to compute phone_sensed_bins (bins of time when the smartphone was sensing data)")
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["SCREEN"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime.csv", pid=config["PIDS"], sensor=config["SCREEN"]["DB_TABLE"]))
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_with_datetime_unified.csv", pid=config["PIDS"], sensor=config["SCREEN"]["DB_TABLE"]))
files_to_compute.extend(expand("data/processed/{pid}/screen_deltas.csv", pid=config["PIDS"]))
files_to_compute.extend(expand("data/processed/{pid}/screen_{day_segment}.csv", pid = config["PIDS"], day_segment = config["SCREEN"]["DAY_SEGMENTS"]))

View File

@ -648,6 +648,7 @@ See `Screen Config Code`_
- Rule ``rules/preprocessing.snakefile/download_dataset``
- Rule ``rules/preprocessing.snakefile/readable_datetime``
- Rule ``rules/preprocessing.snakefile/unify_ios_android``
- Rule ``rules/features.snakefile/screen_deltas``
- Rule ``rules/features.snakefile/screen_features``
@ -683,7 +684,11 @@ firstuseafter minutes Minutes until the first unlock e
**Assumptions/Observations:**
An ``unlock`` episode is considered as the time between an ``unlock`` event and a ``lock`` event. iOS recorded these episodes reliably (albeit some duplicated ``lock`` events within milliseconds from each other). However, in Android there are some events unrelated to the screen state because of multiple consecutive ``unlock``/``lock`` events, so we keep the closest pair. In our experiments these cases are less than 10% of the screen events collected. This happens because ``ACTION_SCREEN_OFF`` and ``ON`` are "sent when the device becomes non-interactive which may have nothing to do with the screen turning off". Additionally, in Android it is possible to measure the time spent on the ``lock`` screen before an ``unlock`` event as well as the total screen time (i.e. ``ON`` to ``OFF``) but we are only keeping ``unlock`` episodes (``unlock`` to ``OFF``) to be consistent with iOS.
In Android, ``lock`` events can happen right after an ``off`` event, after a few seconds of an ``off`` event, or never happen depending on the phone's settings, therefore, an ``unlock`` episode is defined as the time between an ``unlock`` and a ``off`` event. In iOS, ``on`` and ``off`` events do not exist, so an ``unlock`` episode is defined as the time between an ``unlock`` and a ``lock`` event.
Events in iOS are recorded reliably albeit some duplicated ``lock`` events within milliseconds from each other, so we only keep consecutive unlock/lock pairs. In Android you cand find multiple consecutive ``unlock`` or ``lock`` events, so we only keep consecutive unlock/off pairs. In our experiments these cases are less than 10% of the screen events collected and this happens because ``ACTION_SCREEN_OFF`` and ``ACTION_SCREEN_ON`` are "sent when the device becomes non-interactive which may have nothing to do with the screen turning off". In addition to unlock/off episodes, in Android it is possible to measure the time spent on the lock screen before an ``unlock`` event as well as the total screen time (i.e. ``ON`` to ``OFF``) but these are not implemented at the moment.
To unify the screen processing and use the same code in RAPIDS, we replace LOCKED episodes with OFF episodes (2 with 0) in iOS. However, as mentioned above this is still computing ``unlock`` to ``lock`` episodes.
.. _conversation-sensor-doc:

View File

@ -141,7 +141,7 @@ Once RAPIDS is installed, follow these steps to start processing mobile data.
#. **Manually**. Create one file per participant in the ``rapids/data/external/`` directory. The file should NOT have an extension (i.e., no .txt). The name of the file will become the label for that participant in the pipeline.
- The first line of the file should be the Aware ``device_id`` for that participant. If one participant has multiple device_ids (i.e. Aware had to be re-installed), add all device_ids separated by commas.
- The second line should list the device's operating system (``android`` or ``ios``)
- The second line should list the device's operating system (``android`` or ``ios``). If a participant used more than one device (i.e., the participant changed phones and/or platforms mid-study) you can a) list each platform matching the order of the first line (``android,ios``), b) use ``android`` or ``ios`` if all phones belong to the same platform, or c) if you have an ``aware_device`` table in your database, set this line to ``multiple`` and RAPIDS will infer the multiple platforms automatically.
- The third line is an optional human-friendly label that will appear in any plots for that participant.
- The fourth line is optional and contains a start and end date separated by a comma ``YYYYMMDD,YYYYMMDD`` (e.g., ``20201301,20202505``). If these dates are specified, only data within this range will be processed, otherwise, all data from the device(s) will be used.

View File

@ -1,7 +1,12 @@
def optional_ar_input(wildcards):
with open("data/external/"+wildcards.pid, encoding="ISO-8859-1") as external_file:
external_file_content = external_file.readlines()
platform = external_file_content[1].strip()
platforms = external_file_content[1].strip().split(",")
if platforms[0] == "multiple" or (len(platforms) > 1 and "android" in platforms and "ios" in platforms):
platform = "android"
else:
platform = platforms[0]
if platform == "android":
return ["data/raw/{pid}/" + config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["ANDROID"] + "_with_datetime_unified.csv",
"data/processed/{pid}/" + config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["ANDROID"] + "_deltas.csv"]
@ -9,16 +14,23 @@ def optional_ar_input(wildcards):
return ["data/raw/{pid}/"+config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]+"_with_datetime_unified.csv",
"data/processed/{pid}/"+config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]+"_deltas.csv"]
else:
return []
raise ValueError("Platform (line 2) in a participant file should be 'android', 'ios', or 'multiple'. You typed '" + platforms + "'")
def optional_conversation_input(wildcards):
with open("data/external/"+wildcards.pid, encoding="ISO-8859-1") as external_file:
external_file_content = external_file.readlines()
platform = external_file_content[1].strip()
platforms = external_file_content[1].strip().split(",")
if platforms[0] == "multiple" or (len(platforms) > 1 and "android" in platforms and "ios" in platforms):
platform = "android"
else:
platform = platforms[0]
if platform == "android":
return ["data/raw/{pid}/" + config["CONVERSATION"]["DB_TABLE"]["ANDROID"] + "_with_datetime.csv"]
else:
elif platform == "ios":
return ["data/raw/{pid}/" + config["CONVERSATION"]["DB_TABLE"]["IOS"] + "_with_datetime.csv"]
else:
raise ValueError("Platform (line 2) in a participant file should be 'android' or 'ios', or 'multiple'. You typed '" + platforms + "'")
def optional_location_input(wildcards):
if config["BARNETT_LOCATION"]["LOCATIONS_TO_USE"] == "RESAMPLE_FUSED":
@ -66,8 +78,7 @@ rule battery_deltas:
rule screen_deltas:
input:
screen = expand("data/raw/{{pid}}/{sensor}_with_datetime.csv", sensor=config["SCREEN"]["DB_TABLE"]),
participant_info = "data/external/{pid}"
screen = expand("data/raw/{{pid}}/{sensor}_with_datetime_unified.csv", sensor=config["SCREEN"]["DB_TABLE"])
output:
"data/processed/{pid}/screen_deltas.csv"
script:

View File

@ -13,7 +13,9 @@ rule download_dataset:
params:
group = config["DOWNLOAD_DATASET"]["GROUP"],
table = "{sensor}",
timezone = config["TIMEZONE"]
timezone = config["TIMEZONE"],
aware_multiplatform_tables = config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["ANDROID"] + "," + config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"] + "," + config["CONVERSATION"]["DB_TABLE"]["ANDROID"] + "," + config["CONVERSATION"]["DB_TABLE"]["IOS"],
unifiable_sensors = {"calls": config["CALLS"]["DB_TABLE"], "battery": config["BATTERY"]["DB_TABLE"], "screen": config["SCREEN"]["DB_TABLE"], "ios_activity_recognition": config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]}
output:
"data/raw/{pid}/{sensor}_raw.csv"
script:
@ -62,7 +64,8 @@ rule unify_ios_android:
sensor_data = "data/raw/{pid}/{sensor}_with_datetime.csv",
participant_info = "data/external/{pid}"
params:
sensor = "{sensor}"
sensor = "{sensor}",
unifiable_sensors = {"calls": config["CALLS"]["DB_TABLE"], "battery": config["BATTERY"]["DB_TABLE"], "screen": config["SCREEN"]["DB_TABLE"], "ios_activity_recognition": config["ACTIVITY_RECOGNITION"]["DB_TABLE"]["IOS"]}
output:
"data/raw/{pid}/{sensor}_with_datetime_unified.csv"
script:

View File

@ -1,18 +1,54 @@
source("renv/activate.R")
source("src/data/unify_utils.R")
library(RMySQL)
library(stringr)
library(dplyr)
library(readr)
validate_deviceid_platforms <- function(device_ids, platforms){
if(length(device_ids) == 1){
if(length(platforms) > 1 || (platforms != "android" && platforms != "ios"))
stop(paste0("If you have 1 device_id, its platform should be 'android' or 'ios' but you typed: '", paste0(platforms, collapse = ","), "'. Participant file: ", participant))
} else if(length(device_ids) > 1 && length(platforms) == 1){
if(platforms != "android" && platforms != "ios" && platforms != "multiple")
stop(paste0("If you have more than 1 device_id, platform should be 'android', 'ios' OR 'multiple' but you typed: '", paste0(platforms, collapse = "s,"), "'. Participant file: ", participant))
} else if(length(device_ids) > 1 && length(platforms) > 1){
if(length(device_ids) != length(platforms))
stop(paste0("The number of device_ids should match the number of platforms. Participant file:", participant))
if(all(intersect(c("android", "ios"), unique(platforms)) != c("android", "ios")))
stop(paste0("If you have more than 1 device_id and more than 1 platform, the platforms should be a mix of 'android' AND 'ios' but you typed: '", paste0(platforms, collapse = ","), "'. Participant file: ", participant))
}
}
is_multiplaform_participant <- function(dbEngine, device_ids, platforms){
# Multiple android and ios platforms or the same platform (android, ios) for multiple devices
if((length(device_ids) > 1 && length(platforms) > 1) || (length(device_ids) > 1 && length(platforms) == 1 && (platforms == "android" || platforms == "ios"))){
return(TRUE)
}
# Multiple platforms for multiple devices, we search the platform for every device in the aware_device table
if(length(device_ids) > 1 && length(platforms) == 1 && platforms == "multiple"){
devices_platforms <- dbGetQuery(dbEngine, paste0("SELECT device_id,brand FROM aware_device WHERE device_id IN ('", paste0(device_ids, collapse = "','"), "')"))
platforms <- devices_platforms %>% distinct(brand) %>% pull(brand)
# Android phones have different brands so we check that we got at least two different platforms and one of them is iPhone
if(length(platforms) > 1 && "iPhone" %in% platforms){
return(TRUE)
}
}
return(FALSE)
}
participant <- snakemake@input[[1]]
group <- snakemake@params[["group"]]
table <- snakemake@params[["table"]]
timezone <- snakemake@params[["timezone"]]
aware_multiplatform_tables <- str_split(snakemake@params[["aware_multiplatform_tables"]], ",")[[1]]
unifiable_tables = snakemake@params[["unifiable_sensors"]]
sensor_file <- snakemake@output[[1]]
device_ids <- readLines(participant, n=1)
unified_device_id <- tail(strsplit(device_ids, ",")[[1]], 1)
device_ids <- strsplit(readLines(participant, n=1), ",")[[1]]
unified_device_id <- tail(device_ids, 1)
platforms <- strsplit(readLines(participant, n=2)[[2]], ",")[[1]]
validate_deviceid_platforms(device_ids, platforms)
# Read start and end date from the participant file to filter data within that range
start_date <- strsplit(readLines(participant, n=4)[4], ",")[[1]][1]
@ -20,30 +56,32 @@ end_date <- strsplit(readLines(participant, n=4)[4], ",")[[1]][2]
start_datetime_utc = format(as.POSIXct(paste0(start_date, " 00:00:00"),format="%Y/%m/%d %H:%M:%S",origin="1970-01-01",tz=timezone), tz="UTC")
end_datetime_utc = format(as.POSIXct(paste0(end_date, " 23:59:59"),format="%Y/%m/%d %H:%M:%S",origin="1970-01-01",tz=timezone), tz="UTC")
stopDB <- dbConnect(MySQL(), default.file = "./.env", group = group)
dbEngine <- dbConnect(MySQL(), default.file = "./.env", group = group)
# Get existent columns in table
query <- paste0("SELECT * FROM ", table, " LIMIT 1")
available_columns <- colnames(dbGetQuery(stopDB, query))
available_columns <- colnames(dbGetQuery(dbEngine, paste0("SELECT * FROM ", table, " LIMIT 1")))
if("device_id" %in% available_columns){
query <- paste0("SELECT * FROM ", table, " WHERE device_id IN ('", gsub(",", "','", device_ids), "')")
if("timestamp" %in% available_columns && !(is.na(start_datetime_utc)) && !(is.na(end_datetime_utc)) && start_datetime_utc < end_datetime_utc){
if(is_multiplaform_participant(dbEngine, device_ids, platforms)){
sensor_data <- unify_raw_data(dbEngine, table, start_datetime_utc, end_datetime_utc, aware_multiplatform_tables, unifiable_tables, device_ids, platforms)
}else {
query <- paste0("SELECT * FROM ", table, " WHERE device_id IN ('", paste0(device_ids, collapse = "','"), "')")
if("timestamp" %in% available_columns && !(is.na(start_datetime_utc)) && !(is.na(end_datetime_utc)) && start_datetime_utc < end_datetime_utc)
query <- paste0(query, "AND timestamp BETWEEN 1000*UNIX_TIMESTAMP('", start_datetime_utc, "') AND 1000*UNIX_TIMESTAMP('", end_datetime_utc, "')")
sensor_data <- dbGetQuery(dbEngine, query)
}
sensor_data <- dbGetQuery(stopDB, query)
if("timestamp" %in% available_columns){
if("timestamp" %in% available_columns)
sensor_data <- sensor_data %>% arrange(timestamp)
}
# Unify device_id
sensor_data <- sensor_data %>% mutate(device_id = unified_device_id)
# Droping duplicates on all columns except for _id or id
sensor_data <- sensor_data %>% distinct(!!!syms(setdiff(names(sensor_data), c("_id", "id"))))
} else {
print(paste0("Table ", table, "does not have a device_id column (Aware ID) to link its data to a participant"))
}
} else
stop(paste0("Table ", table, "does not have a device_id column (Aware ID) to link its data to a participant"))
write_csv(sensor_data, sensor_file)
dbDisconnect(stopDB)
dbDisconnect(dbEngine)

View File

@ -1,131 +1,14 @@
source("renv/activate.R")
library(dplyr)
library(stringr)
unify_ios_battery <- function(ios_battery){
# We only need to unify battery data for iOS client V1. V2 does it out-of-the-box
# V1 will not have rows where battery_status is equal to 4
if(nrow(ios_battery %>% filter(battery_status == 4)) == 0)
ios_battery <- ios_battery %>%
mutate(battery_status = replace(battery_status, battery_status == 3, 5),
battery_status = replace(battery_status, battery_status == 1, 3))
return(ios_battery)
}
unify_ios_calls <- function(ios_calls){
# Androids call types 1=incoming, 2=outgoing, 3=missed
# iOS' call status 1=incoming, 2=connected, 3=dialing, 4=disconnected
# iOS' call types based on call status: (1,2,4)=incoming=1, (3,2,4)=outgoing=2, (1,4) or (3,4)=missed=3
# Sometimes (due to a possible bug in Aware) sequences get logged on the exact same timestamp, thus 3-item sequences can be 2,3,4 or 3,2,4
# Even tho iOS stores the duration of ringing/dialing for missed calls, we set it to 0 to match Android
ios_calls <- ios_calls %>%
arrange(trace, timestamp, call_type) %>%
group_by(trace) %>%
# search for the disconnect event, as it is common to outgoing, received and missed calls
mutate(completed_call = ifelse(call_type == 4, 2, 0),
# assign the same ID to all events before a 4
completed_call = cumsum(c(1, head(completed_call, -1) != tail(completed_call, -1))),
# hack to match ID of last event (4) to that of the previous rows
completed_call = ifelse(call_type == 4, completed_call - 1, completed_call)) %>%
summarise(call_type_sequence = paste(call_type, collapse = ","), # collapse all events before a 4
# use this if Android measures calls' duration from pick up to hang up
# duration = last(call_duration),
# sanity check, timestamp_diff should be equal or close to duration sum
# timestamp_diff = trunc((last(timestamp) - first(timestamp)) / 1000)
# use this if Android measures calls' duration from dialing/ringing to hang up
call_duration = sum(call_duration),
timestamp = first(timestamp),
utc_date_time = first(utc_date_time),
local_date_time = first(local_date_time),
local_date = first(local_date),
local_time = first(local_time),
local_hour = first(local_hour),
local_minute = first(local_minute),
local_day_segment = first(local_day_segment)
) %>%
mutate(call_type = case_when(
call_type_sequence == "1,2,4" | call_type_sequence == "2,1,4" ~ 1, # incoming
call_type_sequence == "1,4" ~ 3, # missed
call_type_sequence == "3,2,4" | call_type_sequence == "2,3,4" ~ 2, # outgoing
call_type_sequence == "3,4" ~ 4, # outgoing missed, we create this temp missed state to assign a duration of 0 below
TRUE ~ -1), # other, call sequences without a disconnect (4) event are discarded
# assign a duration of 0 to incoming and outgoing missed calls
call_duration = ifelse(call_type == 3 | call_type == 4, 0, call_duration),
# get rid of the temp missed call type, set to 3 to match Android
call_type = ifelse(call_type == 4, 3, call_type)
) %>%
# discard sequences without an event 4 (disconnect)
filter(call_type > 0) %>%
ungroup() %>%
arrange(timestamp)
return(ios_calls)
}
clean_ios_activity_column <- function(ios_gar){
ios_gar <- ios_gar %>%
mutate(activities = str_replace_all(activities, pattern = '("|\\[|\\])', replacement = ""))
existent_multiple_activities <- ios_gar %>%
filter(str_detect(activities, ",")) %>%
group_by(activities) %>%
summarise(mutiple_activities = unique(activities)) %>%
pull(mutiple_activities)
known_multiple_activities <- c("stationary,automotive")
unkown_multiple_actvities <- setdiff(existent_multiple_activities, known_multiple_activities)
if(length(unkown_multiple_actvities) > 0){
stop(paste0("There are unkwown combinations of ios activities, you need to implement the decision of the ones to keep: ", unkown_multiple_actvities))
}
ios_gar <- ios_gar %>%
mutate(activities = str_replace_all(activities, pattern = "stationary,automotive", replacement = "automotive"))
return(ios_gar)
}
unify_ios_gar <- function(ios_gar){
# We only need to unify Google Activity Recognition data for iOS
# discard rows where activities column is blank
ios_gar <- ios_gar[-which(ios_gar$activities == ""), ]
# clean "activities" column of ios_gar
ios_gar <- clean_ios_activity_column(ios_gar)
# make it compatible with android version: generate "activity_name" and "activity_type" columns
ios_gar <- ios_gar %>%
mutate(activity_name = case_when(activities == "automotive" ~ "in_vehicle",
activities == "cycling" ~ "on_bicycle",
activities == "walking" | activities == "running" ~ "on_foot",
activities == "stationary" ~ "still"),
activity_type = case_when(activities == "automotive" ~ 0,
activities == "cycling" ~ 1,
activities == "walking" | activities == "running" ~ 2,
activities == "stationary" ~ 3,
activities == "unknown" ~ 4))
return(ios_gar)
}
source("src/data/unify_utils.R")
sensor_data <- read.csv(snakemake@input[["sensor_data"]], stringsAsFactors = FALSE)
participant_info <- snakemake@input[["participant_info"]]
sensor <- snakemake@params[["sensor"]]
platform <- readLines(participant_info, n=2)[[2]]
unifiable_sensors = snakemake@params[["unifiable_sensors"]]
platforms <- strsplit(readLines(participant_info, n=2)[[2]], ",")[[1]]
platform <- ifelse(platforms[1] == "multiple" | (length(platforms) > 1 & "android" %in% platforms & "ios" %in% platforms), "android", platforms[1])
sensor_data <- unify_data(sensor_data, sensor, platform, unifiable_sensors)
if(sensor == "calls"){
if(platform == "ios"){
sensor_data = unify_ios_calls(sensor_data)
}
# android calls remain unchanged
} else if(sensor == "battery"){
if(platform == "ios"){
sensor_data = unify_ios_battery(sensor_data)
}
# android battery remains unchanged
} else if(sensor == "plugin_ios_activity_recognition"){
sensor_data = unify_ios_gar(sensor_data)
}
write.csv(sensor_data, snakemake@output[[1]], row.names = FALSE)

View File

@ -0,0 +1,190 @@
library(dplyr)
library(stringr)
unify_ios_screen <- function(ios_screen){
# In Android we only process UNLOCK to OFF episodes. In iOS we only process UNLOCK to LOCKED episodes,
# thus, we replace LOCKED with OFF episodes (2 to 0) so we can use Android's code for iOS
ios_screen <- ios_screen %>%
# only keep consecutive pairs of 3,2 events
filter( (screen_status == 3 & lead(screen_status) == 2) | (screen_status == 2 & lag(screen_status) == 3) ) %>%
mutate(screen_status = replace(screen_status, screen_status == 2, 0))
return(ios_screen)
}
unify_ios_battery <- function(ios_battery){
# We only need to unify battery data for iOS client V1. V2 does it out-of-the-box
# V1 will not have rows where battery_status is equal to 4
if(nrow(ios_battery %>% filter(battery_status == 4)) == 0)
ios_battery <- ios_battery %>%
mutate(battery_status = replace(battery_status, battery_status == 3, 5),
battery_status = replace(battery_status, battery_status == 1, 3))
return(ios_battery)
}
unify_ios_calls <- function(ios_calls){
# Androids call types 1=incoming, 2=outgoing, 3=missed
# iOS' call status 1=incoming, 2=connected, 3=dialing, 4=disconnected
# iOS' call types based on call status: (1,2,4)=incoming=1, (3,2,4)=outgoing=2, (1,4) or (3,4)=missed=3
# Sometimes (due to a possible bug in Aware) sequences get logged on the exact same timestamp, thus 3-item sequences can be 2,3,4 or 3,2,4
# Even tho iOS stores the duration of ringing/dialing for missed calls, we set it to 0 to match Android
ios_calls <- ios_calls %>%
arrange(trace, timestamp, call_type) %>%
group_by(trace) %>%
# search for the disconnect event, as it is common to outgoing, received and missed calls
mutate(completed_call = ifelse(call_type == 4, 2, 0),
# assign the same ID to all events before a 4
completed_call = cumsum(c(1, head(completed_call, -1) != tail(completed_call, -1))),
# hack to match ID of last event (4) to that of the previous rows
completed_call = ifelse(call_type == 4, completed_call - 1, completed_call))
# We check utc_date_time and local_date_time exist because sometimes we call this function from
# download_dataset to unify multi-platform participants. At that point such time columns are missing
if("utc_date_time" %in% colnames(ios_calls) && "local_date_time" %in% colnames(ios_calls)){
ios_calls <- ios_calls %>% summarise(call_type_sequence = paste(call_type, collapse = ","), # collapse all events before a 4
# sanity check, timestamp_diff should be equal or close to duration sum
# timestamp_diff = trunc((last(timestamp) - first(timestamp)) / 1000)
# use duration = last(call_duration) if Android measures calls' duration from pick up to hang up
# use call_duration = sum(call_duration) if Android measures calls' duration from dialing/ringing to hang up
call_duration = sum(call_duration),
timestamp = first(timestamp),
utc_date_time = first(utc_date_time),
local_date_time = first(local_date_time),
local_date = first(local_date),
local_time = first(local_time),
local_hour = first(local_hour),
local_minute = first(local_minute),
local_day_segment = first(local_day_segment))
}
else {
ios_calls <- ios_calls %>% summarise(call_type_sequence = paste(call_type, collapse = ","), call_duration = sum(call_duration), timestamp = first(timestamp))
}
ios_calls <- ios_calls %>% mutate(call_type = case_when(
call_type_sequence == "1,2,4" | call_type_sequence == "2,1,4" ~ 1, # incoming
call_type_sequence == "1,4" ~ 3, # missed
call_type_sequence == "3,2,4" | call_type_sequence == "2,3,4" ~ 2, # outgoing
call_type_sequence == "3,4" ~ 4, # outgoing missed, we create this temp missed state to assign a duration of 0 below
TRUE ~ -1), # other, call sequences without a disconnect (4) event are discarded
# assign a duration of 0 to incoming and outgoing missed calls
call_duration = ifelse(call_type == 3 | call_type == 4, 0, call_duration),
# get rid of the temp missed call type, set to 3 to match Android
call_type = ifelse(call_type == 4, 3, call_type)
) %>%
# discard sequences without an event 4 (disconnect)
filter(call_type > 0) %>%
ungroup() %>%
arrange(timestamp)
return(ios_calls)
}
clean_ios_activity_column <- function(ios_gar){
ios_gar <- ios_gar %>%
mutate(activities = str_replace_all(activities, pattern = '("|\\[|\\])', replacement = ""))
existent_multiple_activities <- ios_gar %>%
filter(str_detect(activities, ",")) %>%
group_by(activities) %>%
summarise(mutiple_activities = unique(activities)) %>%
pull(mutiple_activities)
known_multiple_activities <- c("stationary,automotive")
unkown_multiple_actvities <- setdiff(existent_multiple_activities, known_multiple_activities)
if(length(unkown_multiple_actvities) > 0){
stop(paste0("There are unkwown combinations of ios activities, you need to implement the decision of the ones to keep: ", unkown_multiple_actvities))
}
ios_gar <- ios_gar %>%
mutate(activities = str_replace_all(activities, pattern = "stationary,automotive", replacement = "automotive"))
return(ios_gar)
}
unify_ios_gar <- function(ios_gar){
# We only need to unify Google Activity Recognition data for iOS
# discard rows where activities column is blank
ios_gar <- ios_gar[-which(ios_gar$activities == ""), ]
# clean "activities" column of ios_gar
ios_gar <- clean_ios_activity_column(ios_gar)
# make it compatible with android version: generate "activity_name" and "activity_type" columns
ios_gar <- ios_gar %>%
mutate(activity_name = case_when(activities == "automotive" ~ "in_vehicle",
activities == "cycling" ~ "on_bicycle",
activities == "walking" | activities == "running" ~ "on_foot",
activities == "stationary" ~ "still"),
activity_type = case_when(activities == "automotive" ~ 0,
activities == "cycling" ~ 1,
activities == "walking" | activities == "running" ~ 2,
activities == "stationary" ~ 3,
activities == "unknown" ~ 4))
return(ios_gar)
}
# This function is used in download_dataset.R
unify_raw_data <- function(dbEngine, table, start_datetime_utc, end_datetime_utc, aware_multiplatform_tables, unifiable_tables, device_ids, platforms){
# If platforms is 'multiple', fetch each device_id's platform from aware_device, otherwise, use those given by the user
if(length(platforms) == 1 && platforms == "multiple")
devices_platforms <- dbGetQuery(dbEngine, paste0("SELECT device_id,brand FROM aware_device WHERE device_id IN ('", paste0(device_ids, collapse = "','"), "')")) %>%
mutate(platform = ifelse(brand == "iPhone", "ios", "android"))
else
devices_platforms <- data.frame(device_id = device_ids, platform = platforms)
# Get existent tables in database
available_tables_in_db <- dbGetQuery(dbEngine, paste0("SELECT table_name FROM information_schema.tables WHERE table_type = 'base table' AND table_schema='", dbGetInfo(dbEngine)$dbname,"'")) %>% pull(table_name)
# Parse the table names for activity recognition and conversation plugins because they are different between android and ios
ar_tables <- setNames(aware_multiplatform_tables[1:2], c("android", "ios"))
conversation_tables <- setNames(aware_multiplatform_tables[3:4], c("android", "ios"))
participants_sensordata <- list()
for(i in 1:nrow(devices_platforms)) {
row <- devices_platforms[i,]
device_id <- row$device_id
platform <- row$platform
# Handle special cases when tables for the same sensor have different names for Android and iOS (AR and conversation)
if(table %in% ar_tables)
table <- ar_tables[[platform]]
else if(table %in% conversation_tables)
table <- conversation_tables[[platform]]
if(table %in% available_tables_in_db){
query <- paste0("SELECT * FROM ", table, " WHERE device_id IN ('", device_id, "')")
if("timestamp" %in% available_columns && !(is.na(start_datetime_utc)) && !(is.na(end_datetime_utc)) && start_datetime_utc < end_datetime_utc){
query <- paste0(query, "AND timestamp BETWEEN 1000*UNIX_TIMESTAMP('", start_datetime_utc, "') AND 1000*UNIX_TIMESTAMP('", end_datetime_utc, "')")
}
sensor_data <- unify_data(dbGetQuery(dbEngine, query), table, platform, unifiable_tables)
participants_sensordata <- append(participants_sensordata, list(sensor_data))
}else{
warning(paste0("Missing ", table, " table. We unified the data from ", paste0(devices_platforms$device_id, collapse = " and "), " but without records from this missing table for ", device_id))
}
}
unified_data <- bind_rows(participants_sensordata)
return(unified_data)
}
# This function is used in unify_ios_android.R and unify_raw_data function
unify_data <- function(sensor_data, sensor, platform, unifiable_sensors){
if(sensor == unifiable_sensors$calls){
if(platform == "ios"){
sensor_data = unify_ios_calls(sensor_data)
}
# android calls remain unchanged
} else if(sensor == unifiable_sensors$battery){
if(platform == "ios"){
sensor_data = unify_ios_battery(sensor_data)
}
# android battery remains unchanged
} else if(sensor == unifiable_sensors$ios_activity_recognition){
sensor_data = unify_ios_gar(sensor_data)
} else if(sensor == unifiable_sensors$screen){
if(platform == "ios"){
sensor_data = unify_ios_screen(sensor_data)
}
# android screen remains unchanged
}
return(sensor_data)
}

View File

@ -6,7 +6,6 @@ library(stringr)
screen <- read.csv(snakemake@input[["screen"]])
participant_info <- snakemake@input[["participant_info"]]
platform <- readLines(participant_info, n=2)[[2]]
# Screen States
# Android: https://github.com/denzilferreira/aware-client/blob/78ccc22f0f822f8421bef9b1a73d36e71b8aa85b/aware-core/src/main/java/com/aware/Screen.java
@ -25,42 +24,23 @@ swap_screen_status <- function(data, status1, status2, time_buffer){
screen_status = ifelse(screen_status == 800L, status1, screen_status))
}
get_ios_screen_episodes <- function(screen){
episodes <- screen %>%
# only keep consecutive pairs of 3,2 events
filter( (screen_status == 3 & lead(screen_status) == 2) | (screen_status == 2 & lag(screen_status) == 3) ) %>%
# in iOS and after our filtering, screen episodes should end with a LOCK event (2)
mutate(episode_id = ifelse(screen_status == 2, 1:n(), NA_integer_)) %>%
fill(episode_id, .direction = "updown") %>%
group_by(episode_id) %>%
summarise(episode = "unlock",
screen_sequence = toString(screen_status),
time_diff = (last(timestamp) - first(timestamp)) / (1000 * 60),
local_start_date_time = first(local_date_time),
local_end_date_time = last(local_date_time),
local_start_date = first(local_date),
local_end_date = last(local_date),
local_start_day_segment = first(local_day_segment),
local_end_day_segment = last(local_day_segment))
}
get_android_screen_episodes <- function(screen){
# Aware logs LOCK events after turning the screen ON or OFF but we filter them out to simplify this analysis.
get_screen_episodes <- function(screen){
# Aware Android logs LOCK events after turning the screen ON or OFF but we filter them out to simplify this analysis.
# The code below only process UNLOCK to OFF episodes, but it's possible to modify it for ON to OFF (see line 61) or ON to UNLOCK episodes.
episodes <- screen %>%
# filter out LOCK events (2) that come within 50 milliseconds of an ON (1) or OFF (0) event
# Relevant for Android. Remove LOCK events (2) that come within 50 milliseconds of an ON (1) or OFF (0) event
filter(!(screen_status == 2 & lag(screen_status) == 1 & timestamp - lag(timestamp) < 50)) %>%
filter(!(screen_status == 2 & lag(screen_status) == 0 & timestamp - lag(timestamp) < 50)) %>%
# in Android and after our filtering, screen episodes should end with a OFF event (0)
# After our filtering, screen episodes should end with a OFF event (0)
mutate(episode_id = ifelse(screen_status == 0, 1:n(), NA_integer_)) %>%
fill(episode_id, .direction = "updown") %>%
group_by(episode_id) %>%
# Rarely, UNLOCK events (3) get logged just before ON events (1). If this happens within 800ms, swap them
# Relevant for Android. Rarely, UNLOCK events (3) get logged just before ON events (1). If this happens within 800ms, swap them
swap_screen_status(3L, 1L, 800) %>%
# to be consistent with iOS we filter out events (and thus sequences) starting with an ON (1) event
# Relevant for Android. To be consistent with iOS we remove events (and thus sequences) starting with an ON (1) event
filter(screen_status != 1) %>%
# only keep consecutive 3,0 pairs (UNLOCK, OFF)
# Only keep consecutive 3,0 pairs (UNLOCK, OFF)
filter( (screen_status == 3 & lead(screen_status) == 0) | (screen_status == 0 & lag(screen_status) == 3) ) %>%
summarise(episode = "unlock",
screen_sequence = toString(screen_status),
@ -88,12 +68,8 @@ if(nrow(screen) < 2){
local_end_date = character(),
local_start_day_segment = character(),
local_end_day_segment = character())
} else if(platform == "ios"){
episodes <- get_ios_screen_episodes(screen)
} else if(platform == "android"){
episodes <- get_android_screen_episodes(screen)
} else {
print(paste0("The platform (second line) in ", participant_info, " should be android or ios"))
episodes <- get_screen_episodes(screen)
}
write.csv(episodes, snakemake@output[[1]], row.names = FALSE)