stress_at_work_analysis/presentation/StressfulEvents.Rmd

---
title: "Stressful event detection"
output: html_notebook
---

```{r chunk_options, include=FALSE}
knitr::opts_chunk$set(
  comment = "#>", echo = FALSE, fig.width = 6
)
```

```{r libraries, include=FALSE}
library(knitr)
library(kableExtra)
library(stringr)
library(RColorBrewer)
library(magrittr)
library(tidyverse)
```

```{r fig_setup, include=FALSE}
accent <- RColorBrewer::brewer.pal(7, "Accent")
```


```{r read_data, include=FALSE}
podatki <- read_csv("E:/STRAWresults/stressfulness_event_with_target_0_ver2/input_appraisal_stressfulness_event_mean.csv")
podatki %<>% mutate(pid = as_factor(pid))
```

# Event descriptions

Participants were asked "Was there a particular event that created tension in you?" with the following options:

- 0 - No
- 1 - Yes, slightly
- 2 - Yes, moderately
- 3 - Yes, considerably
- 4 - Yes, extremely

If they answered anything but "No", they were also asked about the event's perceived threat (e.g. "Did this event make you feel anxious?") and challenge (e.g. "How eager are you to tackle this event?").
We only consider general "stressfulness" in this presentation.

Most of the time, nothing stressful happened:

```{r target_table}
kable(table(podatki$target), col.names = c("stressfulness", "frequency")) %>%
  kable_styling(full_width = FALSE)
```

Most participants had somewhere between 0 and 10 stressful events.

```{r target_distribution}
podatki %>%
  group_by(pid) %>%
  summarise(no_of_events = sum(target > 0)) %>%
  ggplot(aes(no_of_events)) +
  geom_histogram(binwidth = 1, fill = accent[1]) +
  coord_cartesian(expand = FALSE) +
  labs(x = "Number of events per participant") +
  theme_classic()
```

When a stressful event occurred, participants mostly perceived it as slightly to moderately stressful on average.

```{r mean_stressfulness_distribution}
podatki %>%
  filter(target > 0) %>%
  group_by(pid) %>%
  summarise(mean_stressfulness = mean(target)) %>%
  ggplot(aes(mean_stressfulness)) +
  geom_histogram(binwidth = 0.1, fill = accent[1]) +
  coord_cartesian(expand = FALSE) +
  labs(x = "Mean stressfulness per participant") +
  theme_classic()
```

# Problem description

We are trying to predict whether a stressful event occurred, i.e. stressfulness > 0, or not (stressfulness == 0).
First, set up a leave-one-subject-out validation and use original distribution of the class variable.

For this, the majority classifier has a mean accuracy of 0.85 (and median 0.90), while the F1-score, precision and recall are all 0.

We also have an option to validate the results differently, such as with "half-loso", i.e. leaving half of the subject's data in the training set and only use half for testing, or k-fold cross-validation.
Additionally, we can undersample the majority class to balance the dataset.

# Results
## Leave one subject out, original distribution

```{r event_detection}
scores <- read_csv("event_stressful_detection_loso.csv", col_types = "ffdd")
scores_wide <- scores %>%
  select(!max) %>%
  pivot_wider(names_from = metric,
              names_sep = "_",
              values_from = mean) %>%
  rename_all(~str_replace(.,"^test_",""))
kable(scores_wide, digits = 2) %>%
  column_spec(4, color = 'white', background = 'black') %>%
  kable_styling(full_width = TRUE)
```