Commit Graph

177 Commits (0152fbe4acbdb8e33dea12f4d34f3c63034b7e5b)

Author SHA1 Message Date
junos 0152fbe4ac Delete the leftover class.
Add more prints.
2021-08-23 16:09:23 +02:00
junos 3611fc76f7 Fill NaNs after merging all features. 2021-08-21 19:48:57 +02:00
junos ee30c042ea Fill NaNs introduced in merge for proximity. 2021-08-21 19:40:42 +02:00
junos a71e132edf Prepare the first full pipeline. 2021-08-21 19:04:09 +02:00
junos 24c4bef7e2 Print some more messages. 2021-08-21 19:03:44 +02:00
junos 11381d6447 Add some print statements for monitoring progress. 2021-08-21 18:54:02 +02:00
junos d19995385d Account for the case when there is no data for days with labels. 2021-08-21 18:49:57 +02:00
junos f73f86486a Fill communication features with appropriate values. 2021-08-21 18:28:22 +02:00
junos aed73bb7ed Add fill values for communication for rows with no calls/smses. 2021-08-21 18:17:58 +02:00
junos 8507ff5761 Check for NaNs in the data, since sklearn.LinearRegression cannot handle them. 2021-08-21 17:46:00 +02:00
junos 0b85ee8fdc Merge branch 'master' into ml_pipeline 2021-08-21 17:37:45 +02:00
junos e2e268148d Fill in 0.5 for undefined ratio.
When there are no calls and no smses (of a particular type), the ratio is undefined. But since their number is the same, I argue that the ratio can represent that with a 0.5, similarly to the case where no_calls_all = no_sms_all != 0.
2021-08-21 17:33:31 +02:00
junos 00015a3b8d Fill in zeroes when joining or unstacking.
If there are no calls or smses for a particular day, there is no corresponding row in the features dataframe. When joining these, however, NaNs were introduced. Since a value of 0 is meaningful for all of these features, replace NaNs with 0's.
2021-08-21 17:31:15 +02:00
junos 065cd4347e [WIP] Add a class for model validation. 2021-08-20 19:44:50 +02:00
junos 0b98d59aad Aggregate labels using grouping_variable. 2021-08-20 19:17:22 +02:00
junos 08fdec34f1 Merge features into a common df.
But first, group communication by the grouping_variable.
2021-08-20 17:59:00 +02:00
junos 72b16af75c Make group_by consistent with communication. 2021-08-20 17:52:31 +02:00
junos d6337e82ac Merge branch 'master' into ml_pipeline 2021-08-20 17:43:53 +02:00
junos 9a319ac6e5 Add an option to group on other than just participant_id. 2021-08-20 17:41:12 +02:00
junos 6592612db7 Add a similar class for labels. 2021-08-19 17:44:04 +02:00
junos 97c693d252 Add a getter for communication data. 2021-08-19 17:36:26 +02:00
junos 93f136b080 Add a method to get communication features. 2021-08-19 17:32:02 +02:00
junos 5be3e82797 Accept nested feature configuration.
To do this, pass a dict as parameters to SensorFeatures class, rather than actually reading the object from yaml file.
2021-08-19 17:23:23 +02:00
junos 429aa43bd1 Add communication features to pipeline. 2021-08-19 17:05:44 +02:00
junos 0ed34e97b3 Convert the class into a YAML object.
Add an example config file and demonstrate its usage in ex_ml_pipeline.ipynb.
2021-08-19 16:31:42 +02:00
junos 52664eb40b Implement getters. 2021-08-19 11:47:59 +02:00
junos de92e1309d Merge branch 'master' into ml_pipeline 2021-08-18 17:30:36 +02:00
junos 777e6f0a58 calls_sms_features() now returns all communication features. 2021-08-18 15:41:47 +02:00
junos 2d78aacd18 Compile a list of contact features and add a test. 2021-08-18 15:35:42 +02:00
junos c88336481e Add a test for SMS features. 2021-08-18 15:28:46 +02:00
junos 1bc996413e Clarify names for no_all calls/sms feature.
Add another test.
2021-08-18 15:23:30 +02:00
junos a2a44c202a Calculate common features outside if...else. 2021-08-18 10:54:54 +02:00
junos 4740e94d37 Fix a bug introduced in e7fe4e8398 . 2021-08-18 10:51:48 +02:00
junos b1ad8d1309 List calls features. 2021-08-17 16:27:34 +02:00
junos bb75abcb9b Add tests for proximity. 2021-08-17 16:07:52 +02:00
junos e7fe4e8398 Simplify merge into join. 2021-08-17 13:53:19 +02:00
Junos Lukan cf28aa547a Merge branch 'communication' into 'master'
separated features

See merge request junoslukan/straw2analysis!2
2021-08-17 11:42:03 +00:00
junos 4d73b9d5ff Add tests for proximity. 2021-08-17 10:51:51 +02:00
junos 3821314dd9 [WIP] Separate the features part from the pipeline. 2021-08-16 18:11:25 +02:00
junos d6f36ec8f8 [WIP] Finish the class by assigning columns and validating model. 2021-08-13 17:41:04 +02:00
junos b06ec6e1ae [WIP] Methods to get the labels and data plus aggregate them. 2021-08-12 19:07:14 +02:00
junos 622477f19f [WIP] Start merging steps into a class for a pipeline. 2021-08-12 17:38:08 +02:00
junos 577a874288 Add an example for linear regression. 2021-08-12 16:54:00 +02:00
junos c8bb481508 Add a parameter for grouping. 2021-08-12 15:07:20 +02:00
junos 98f1df81c6 Use the same function for ESM and other data. 2021-08-11 17:26:44 +02:00
junos ad85f79bc5 Move datetime calculation to a separate function. 2021-08-11 17:19:14 +02:00
junos 070cfdba80 Start machine learning pipeline example.
Select data and labels.
2021-08-11 16:42:30 +02:00
junos c6d0e4391e Add a couple of proximity features. 2021-08-11 16:40:19 +02:00
junos af65d0864f Add a simple function for recoding proximity. 2021-08-11 15:04:27 +02:00
junos a2180aee54 Fix assignment to use loc.
For assigning a value to selected rows (a subset), regular slicing using [] produces a KeyError.
2021-08-11 14:53:59 +02:00