Commit Graph

64 Commits (35c09374dde3e961b4041133204160f4e6f82bb6)

Author SHA1 Message Date
junos 35c09374dd Free up memory during model building. 2023-05-10 21:44:40 +02:00
junos b505fb2b6a Thoroughly refactor regression runner. 2023-05-10 20:30:51 +02:00
junos 24744c288d Extract one step of preparation into a separate function. 2023-05-10 15:28:09 +02:00
junos caeaf03239 Provide data instead of csv input. 2023-05-10 15:20:33 +02:00
junos 3e38b64b45 Merge branch 'ml_pipeline' 2023-05-10 15:02:17 +02:00
junos c66e046014 Use methods in helper.py. 2023-04-21 21:41:00 +02:00
junos 583ee82e80 Add xgboost to dependencies and reformat helper.py. 2023-04-21 21:33:06 +02:00
Primoz 26804cf8ea Repair preprocessing one hot encoding of test set. 2023-04-21 13:24:31 +02:00
Primoz 259be708aa Improve the feature selection method with validations etc. 2023-04-20 13:26:20 +02:00
Primoz 0594993133 Add GroupKFold to feature selection CV. Start with generic metric calculation procedure. 2023-04-20 11:20:26 +02:00
Primoz 1cbc743cf7 Add kBest method to initially filter out the worst performing features. Update comments. 2023-04-20 10:12:16 +02:00
Primoz ce13a9e13b Implement feature selection method which is used in ML pipeline. 2023-04-19 15:56:34 +02:00
Primoz 10ca47583c Implement feature selection methods (WIP). 2023-04-14 17:20:22 +02:00
Primoz bccc1cd1de Clean and fix Preprocessing module. 2023-02-23 10:40:58 +01:00
Primoz 9ed863b7a1 Add a CrossValidation module with all the required methods. 2023-02-23 10:40:17 +01:00
Primoz f69cb25266 Add planning comments. 2023-02-22 18:12:52 +01:00
Primoz 7f6ae9b323 Add imputation and One-Hot Encoding Methods. 2023-02-22 18:05:01 +01:00
Primoz 8f6cb3f444 Add preprocessing class. 2023-02-22 13:44:03 +01:00
Primoz ef12f64fe5 Add feature selection Class skeleton. 2023-02-20 11:51:34 +01:00
junos 852e17afbe Define classification models method. 2022-12-08 10:00:14 +01:00
junos 8131626c4a Include more metrics in regression helper methods. 2022-12-07 21:25:05 +01:00
junos 95ab66fd81 Move to presentation. 2022-12-07 21:24:20 +01:00
junos 12f2c927fa Merge branch 'ml_pipeline' of repo.ijs.si:junoslukan/straw2analysis into ml_pipeline
# Conflicts:
#	exploration/ml_pipeline_daily.py - deleted
2022-12-07 15:36:52 +01:00
junos 71e1fcf8ca Save results. 2022-12-07 15:33:18 +01:00
Primoz 9a218c8e2a Add a script for two class train test split clustering classification. 2022-11-25 14:44:11 +01:00
Primoz 98f78d72fc Create a classification models class and use it in the ml pipeline script. 2022-11-25 12:35:45 +01:00
junos ae2d7a038d Present results. 2022-11-16 21:36:43 +01:00
junos 389198b17f Prepare data in a separate step.
Change categorical features.
2022-11-16 19:34:18 +01:00
junos c462d55096 Update function with imputation already handled. 2022-11-16 18:18:12 +01:00
junos a5c09a292f Move function to helper.py 2022-11-16 18:13:30 +01:00
junos e33a49c9fc Add a demo of pipeline. 2021-11-17 10:44:49 +01:00
junos 005b09cfdf [WIP] Fix tests to use pyprojroot. 2021-10-29 12:07:12 +02:00
junos a63a7eac99 [WIP] Add a test for SensorFeatures.
Additional analysis for adherence.
Small corrections.
2021-10-13 13:39:58 +02:00
junos b8c7606664 Add an option to read cached labels from a file. 2021-09-15 15:45:49 +02:00
junos ed062d25ee Add export capabilities to labels.py. 2021-09-15 15:36:36 +02:00
junos 20748890a8 Further refactor by moving helper functions. 2021-09-15 15:14:54 +02:00
junos 28699a0fdf Enable reading features from csv files. 2021-09-14 17:42:34 +02:00
junos af9e81fe40 Document the SensorFeatures class and its __init__ method. 2021-09-13 17:43:47 +02:00
junos b19eebbb92 Refactor machine_learning/pipeline.py by defining one class by file. 2021-09-13 11:41:57 +02:00
junos c1bb4ddf0f Save calculated features to csv files. 2021-08-23 16:36:26 +02:00
junos 0152fbe4ac Delete the leftover class.
Add more prints.
2021-08-23 16:09:23 +02:00
junos 3611fc76f7 Fill NaNs after merging all features. 2021-08-21 19:48:57 +02:00
junos ee30c042ea Fill NaNs introduced in merge for proximity. 2021-08-21 19:40:42 +02:00
junos a71e132edf Prepare the first full pipeline. 2021-08-21 19:04:09 +02:00
junos 24c4bef7e2 Print some more messages. 2021-08-21 19:03:44 +02:00
junos 11381d6447 Add some print statements for monitoring progress. 2021-08-21 18:54:02 +02:00
junos d19995385d Account for the case when there is no data for days with labels. 2021-08-21 18:49:57 +02:00
junos f73f86486a Fill communication features with appropriate values. 2021-08-21 18:28:22 +02:00
junos 8507ff5761 Check for NaNs in the data, since sklearn.LinearRegression cannot handle them. 2021-08-21 17:46:00 +02:00
junos 065cd4347e [WIP] Add a class for model validation. 2021-08-20 19:44:50 +02:00