Commit Graph

210 Commits (7504aa34cffdc6d8c84293eb5c8a91e88a256927)

Author SHA1 Message Date
Primoz 7504aa34cf Add additional categorical features (uncomment). 2022-11-28 13:42:46 +01:00
Primoz 9a218c8e2a Add a script for two class train test split clustering classification. 2022-11-25 14:44:11 +01:00
Primoz 98f78d72fc Create a classification models class and use it in the ml pipeline script. 2022-11-25 12:35:45 +01:00
Primoz 218b684514 Automize clustering classification logic and add parameters at the begining of the scripts. General changes and improvements. 2022-11-24 16:12:20 +01:00
Primoz ddde80b421 Add classification with clustering ml pipeline script. 2022-11-24 09:24:13 +01:00
Primoz 7afef5582f Add TEMP lime_survey cols 2022-11-22 14:44:33 +01:00
Primoz 183758cd37 Improve general ml classification pipeline script. 2022-11-22 14:31:49 +01:00
Primoz 40029a8205 Add a script for ml classification pipeline. 2022-11-21 14:47:19 +01:00
Primoz ae0f54ecc2 Combine different segment scripts and set ml pipeline as a regression problem. 2022-11-21 11:41:11 +01:00
Primoz 8defb271c9 Extend ml pipeline scripts with two additional CV methods. 2022-11-21 11:23:47 +01:00
Primoz b59798df26 Add a new file tailored for stressfulness event regression. 2022-11-16 14:49:40 +01:00
Primoz 87ebb9f296 Delete files ... add to gitignore 2022-11-16 11:08:03 +01:00
Primoz 1d8dcf8b21 Add 30 min features data and modify script. 2022-11-02 15:16:19 +01:00
Primoz 9f7fa0c8e0 Add 18 hour daily data and slightly modify jupyter script. 2022-10-18 10:29:59 +02:00
Primoz cdff4da930 Merge branch 'ml_pipeline' of https://repo.ijs.si/junoslukan/straw2analysis into ml_pipeline 2022-10-17 22:15:17 +02:00
Primoz ad5f50babe Correctly imputed data uploaded on STRAW (all targets) 2022-10-12 12:48:10 +02:00
Primoz 466cd3dc23 Processing of a newly cleaned script. Addition of two ML models. And modifications with one hot encoding. 2022-10-10 16:47:00 +02:00
Primoz 27b2282ee0 Datasets (phone&E4 features) and Jupyter script of regression models. 2022-08-24 16:18:40 +02:00
junos a8fd96d2f1 Add analysis using RAPIDS. 2022-08-23 16:41:41 +02:00
junos e33a49c9fc Add a demo of pipeline. 2021-11-17 10:44:49 +01:00
junos d34c2ec5e9 Merge branch 'ambient' into ml_pipeline 2021-11-17 10:39:55 +01:00
junos 005b09cfdf [WIP] Fix tests to use pyprojroot. 2021-10-29 12:07:12 +02:00
junos 6fc0d962ae Remove low values of pressure. 2021-10-22 18:09:17 +02:00
junos 92fbda242b Explore barometer and temperature data.
Add docstrings to models.
2021-10-14 17:59:33 +02:00
junos 6302a0f0d9 Merge ambient sensors into one file.
Explore barometer sensor data for one phone.
2021-10-13 16:57:38 +02:00
junos a63a7eac99 [WIP] Add a test for SensorFeatures.
Additional analysis for adherence.
Small corrections.
2021-10-13 13:39:58 +02:00
junos b8c7606664 Add an option to read cached labels from a file. 2021-09-15 15:45:49 +02:00
junos ed062d25ee Add export capabilities to labels.py. 2021-09-15 15:36:36 +02:00
junos 20748890a8 Further refactor by moving helper functions. 2021-09-15 15:14:54 +02:00
junos 28699a0fdf Enable reading features from csv files. 2021-09-14 17:42:34 +02:00
junos af9e81fe40 Document the SensorFeatures class and its __init__ method. 2021-09-13 17:43:47 +02:00
junos b19eebbb92 Refactor machine_learning/pipeline.py by defining one class by file. 2021-09-13 11:41:57 +02:00
junos c1bb4ddf0f Save calculated features to csv files. 2021-08-23 16:36:26 +02:00
junos 0152fbe4ac Delete the leftover class.
Add more prints.
2021-08-23 16:09:23 +02:00
junos 3611fc76f7 Fill NaNs after merging all features. 2021-08-21 19:48:57 +02:00
junos ee30c042ea Fill NaNs introduced in merge for proximity. 2021-08-21 19:40:42 +02:00
junos a71e132edf Prepare the first full pipeline. 2021-08-21 19:04:09 +02:00
junos 24c4bef7e2 Print some more messages. 2021-08-21 19:03:44 +02:00
junos 11381d6447 Add some print statements for monitoring progress. 2021-08-21 18:54:02 +02:00
junos d19995385d Account for the case when there is no data for days with labels. 2021-08-21 18:49:57 +02:00
junos f73f86486a Fill communication features with appropriate values. 2021-08-21 18:28:22 +02:00
junos aed73bb7ed Add fill values for communication for rows with no calls/smses. 2021-08-21 18:17:58 +02:00
junos 8507ff5761 Check for NaNs in the data, since sklearn.LinearRegression cannot handle them. 2021-08-21 17:46:00 +02:00
junos 0b85ee8fdc Merge branch 'master' into ml_pipeline 2021-08-21 17:37:45 +02:00
junos e2e268148d Fill in 0.5 for undefined ratio.
When there are no calls and no smses (of a particular type), the ratio is undefined. But since their number is the same, I argue that the ratio can represent that with a 0.5, similarly to the case where no_calls_all = no_sms_all != 0.
2021-08-21 17:33:31 +02:00
junos 00015a3b8d Fill in zeroes when joining or unstacking.
If there are no calls or smses for a particular day, there is no corresponding row in the features dataframe. When joining these, however, NaNs were introduced. Since a value of 0 is meaningful for all of these features, replace NaNs with 0's.
2021-08-21 17:31:15 +02:00
junos 065cd4347e [WIP] Add a class for model validation. 2021-08-20 19:44:50 +02:00
junos 0b98d59aad Aggregate labels using grouping_variable. 2021-08-20 19:17:22 +02:00
junos 08fdec34f1 Merge features into a common df.
But first, group communication by the grouping_variable.
2021-08-20 17:59:00 +02:00
junos 72b16af75c Make group_by consistent with communication. 2021-08-20 17:52:31 +02:00