rapids/automl_test.py

58 lines
2.2 KiB
Python
Raw Normal View History

Squashed commit of the following: commit 8a6b52a97c95dcd8b70b980b4f46421b1a847905 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Nov 29 11:35:49 2022 +0000 Switch to 30_before ERS with corresponding targets. commit 244a05373014b14bc4c75db8ceb68a04dc5328df Author: Primoz <sisko.primoz@gmail.com> Date: Tue Nov 29 11:19:43 2022 +0000 Change output files settings to nonstandardized. commit be0324fd01d70c58a9eefd84ccb23d06a42ab57c Author: Primoz <sisko.primoz@gmail.com> Date: Mon Nov 28 12:44:25 2022 +0000 Fix some bugs and set categorical columns as categories dtypes. commit 99c2fab8f9ab9cf2ed40a952019f994299b78c05 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Nov 16 09:50:18 2022 +0000 Fix a bug in the making of the individual model (when there is no target in the participants columns). commit 286de93bfd55710d77c2e4bab899e73381f9a4a3 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Nov 15 11:21:51 2022 +0000 Fix some bugs and extend ERS and cleaning scripts with multiple stress event targets logic. commit ab803ee49c2898d2dd4d49de64f763f549c077e3 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Nov 15 10:14:07 2022 +0000 Add additional appraisal targets. commit 621f11b2d98cb3e17d86c3be902ebd40f94d3079 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Nov 15 09:53:31 2022 +0000 Fix a bug related to wrong user input (duplicated events). commit bd41f42a5da7e28b9de190a53b5f2195fad0920d Author: Primoz <sisko.primoz@gmail.com> Date: Mon Nov 14 15:07:36 2022 +0000 Rename target_ to segmenting_ method. commit a543ce372f1fd6cdd402be8ebdbe54610e449ce2 Author: Primoz <sisko.primoz@gmail.com> Date: Mon Nov 14 15:04:16 2022 +0000 Add comments for event_related_script understanding. commit 74b454b07bf69bdcd0517a940c5296e5f77c492e Author: Primoz <sisko.primoz@gmail.com> Date: Fri Nov 11 09:15:12 2022 +0000 Apply changes to string answers to make them language-generic. commit 6ebe83e47ea4da0066f4cd9dedbdb726cef5d06c Author: Primoz <sisko.primoz@gmail.com> Date: Thu Nov 10 12:42:52 2022 +0000 Improve the ERS extract method with a couple of validations. commit 00350ef8ca2ef43aed71f76609917021ce920f31 Author: Primoz <sisko.primoz@gmail.com> Date: Thu Nov 10 10:32:58 2022 +0000 Change config for stressfulness event target method. commit e4985c91214ac6ecd43501d52b67816d26d4fdef Author: Primoz <sisko.primoz@gmail.com> Date: Thu Nov 10 10:29:11 2022 +0000 Override stressfulness event target with extracted values from csv. commit a668b6e8dad4dd393802ed4d5e948d6d42d60a01 Author: Primoz <sisko.primoz@gmail.com> Date: Thu Nov 10 09:37:27 2022 +0000 Extract ERS and stress event targets to csv files (completed). commit 9199b53ded1c5d858882d9826aa15e5e2102ab08 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Nov 9 15:11:51 2022 +0000 Get, join and start processing required ERS stress event data. commit f3c6a66da9a7ce92d4296436927f48b3d1a467b8 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Nov 8 15:53:43 2022 +0000 Begin with stress events in the ERS script. commit 0b3e9226b3c683e87ef758756f0d83350642a716 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Nov 8 14:44:24 2022 +0000 Make small corrections in ERS file. commit 2d83f7ddecbc3880e8d901fc2aae25621a280f75 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Nov 8 11:32:05 2022 +0000 Begin the ERS logic for 90-minutes events. commit 1da72a7cbe3178ef911ddb263ea30e5da79407ad Author: Primoz <sisko.primoz@gmail.com> Date: Tue Nov 8 09:45:37 2022 +0000 Rename targets method in config. commit 9f441afc16ff9b75e131eb47d41edb570b51190c Author: Primoz <sisko.primoz@gmail.com> Date: Fri Nov 4 15:09:04 2022 +0000 Begin ERS logic for 90-minutes events. commit c1c9f4d05ac8bdf03f4759623165f8e795eaa5a9 Merge: 62f46ea3 7ab0280d Author: Primoz <sisko.primoz@gmail.com> Date: Fri Nov 4 09:11:58 2022 +0000 Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning commit 62f46ea3763ba47af2a7c62d6b0b453cfd549929 Author: Primoz <sisko.primoz@gmail.com> Date: Fri Nov 4 09:11:53 2022 +0000 Prepare method-based logic for ERS generating. commit 7ab0280d7ed23022a4d2de0caa7bba3a43dff9d4 Author: Primoz <sisko.primoz@gmail.com> Date: Fri Nov 4 08:58:08 2022 +0000 Correctly rename stressful event target variable. commit eefa9f3f4d4c95374dd31907787ed767c636fe37 Author: Primoz <sisko.primoz@gmail.com> Date: Thu Nov 3 14:49:54 2022 +0000 Add new target: stressfulness_event. commit 5e8174dd41f5a4c2aa35c74fb4bd9b19a918d1c4 Author: Primoz <sisko.primoz@gmail.com> Date: Thu Nov 3 13:52:45 2022 +0000 Add new target: stressfulness_period. commit 35c1a762e7179c7b11f8f1154c1a4f4133402324 Author: Primoz <sisko.primoz@gmail.com> Date: Thu Nov 3 13:51:18 2022 +0000 Improve filtering by esm_session and device_id. commit 02264b21fd43212dd4636d300321e254da392149 Author: Primoz <sisko.primoz@gmail.com> Date: Thu Nov 3 09:30:12 2022 +0000 Add logic for target selection in ERS processing. commit 0ce8723bdb72bafa40924ff4fd3a2328e81808eb Author: Primoz <sisko.primoz@gmail.com> Date: Wed Nov 2 14:01:21 2022 +0000 Extend imputation logic within the cleaning script. commit 30b38bfc028b6e3261701feb915f9e3820a77c75 Author: Primoz <sisko.primoz@gmail.com> Date: Fri Oct 28 09:00:13 2022 +0000 Fix the generating procedure of ERS file for participants with multiple devices. commit cd137af15a9f03724593595a152f34a24323088f Author: Primoz <sisko.primoz@gmail.com> Date: Thu Oct 27 14:20:15 2022 +0000 Config for 30 minute EMA segments. commit 3c0585a566ede91dd97e2ba0a6617a4f42e72617 Author: Primoz <sisko.primoz@gmail.com> Date: Thu Oct 27 14:12:56 2022 +0000 Remove obsolete comments. commit 6b487fcf7b64a722e9c5486764155764332ba73c Author: Primoz <sisko.primoz@gmail.com> Date: Thu Oct 27 14:11:42 2022 +0000 Set E4 data yield to 1 if it is over 1. Optimize E4 data_yield script. commit 5d17c92e54427cc70a7f747be398ad59af98c5bf Merge: a31fdd14 0d143e6a Author: Primoz <sisko.primoz@gmail.com> Date: Wed Oct 26 14:18:20 2022 +0000 Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning commit a31fdd1479193a67b6a6985cc42fdc66ac7be1a4 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Oct 26 14:18:08 2022 +0000 Start to test empatica_data_yield precieved error. commit 936324d234b6636366b1a569793452a4cca54847 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Oct 26 14:17:27 2022 +0000 Switch config for 30 minutes event related segments. commit da0a4596f814a40997efe782239cb296e89fe606 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Oct 26 14:16:25 2022 +0000 Add additional ESM processing logic for ERS csv extraction. commit d4d74818e69eab2755f988391978400096902b5d Author: Primoz <sisko.primoz@gmail.com> Date: Wed Oct 26 14:14:32 2022 +0000 Fix a bug - missing time_segment column when df is empty commit 14ff59914b4b3071ff7a40bf9daaba558ff66746 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Oct 26 09:59:46 2022 +0000 Fix to correct dtypes. commit 6ab0ac5329d0be06bbe8357203422c7d981c325a Author: Primoz <sisko.primoz@gmail.com> Date: Wed Oct 26 09:57:26 2022 +0000 Optimize memory consumption with dtype definition while reading csv file. commit 0d143e6aadaf9b67b2fbfd6cdf8c218d5720f157 Merge: 8acac501 b92a3aa3 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Oct 25 15:28:27 2022 +0000 Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning commit 8acac501251105332d1d3d4863e88744b9528844 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Oct 25 15:26:43 2022 +0000 Add safenet when features dataframe is empty. commit b92a3aa37a1968b7129cdb2b7ebaf9beda981428 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Oct 25 15:25:22 2022 +0000 Remove unwanted output or other error producing code. commit bfd637eb9c40872b954b608aef420b24393d3e6f Author: Primoz <sisko.primoz@gmail.com> Date: Tue Oct 25 08:53:44 2022 +0000 Improve strings formatting in straw_events file. commit 0d81ad5756c54f03b2b200b801672b9eb39f27d5 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Oct 19 13:35:04 2022 +0000 Debug assignment of segments to rows commit cea451d344e3b0bcc110c77f2845a9077bbe5d75 Merge: e88bbd54 cf38d9f1 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Oct 18 09:15:06 2022 +0000 Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning commit e88bbd548fcd2bb1f9057b27b572afa5a6d028a5 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Oct 18 09:15:00 2022 +0000 Add new daily segment and filter by segment in the cleaning script. commit cf38d9f175c5bcc9d0f8c70f49e59d2b56d8ada7 Author: Primoz <sisko.primoz@gmail.com> Date: Mon Oct 17 15:07:33 2022 +0000 Implement ERS generating logic. commit f3ca56cdbf22d7eed38f920889c4b19f49da7760 Author: Primoz <sisko.primoz@gmail.com> Date: Fri Oct 14 14:46:28 2022 +0000 Start with ERS logic integration within Snakemake. commit 797aa98f4fe2faf411c5130d27c51c5c606e5c0e Author: Primoz <sisko.primoz@gmail.com> Date: Wed Oct 12 15:51:50 2022 +0000 Config for ERS testing. commit 9baff159cd1a61019410838efefa4b59a6c4981c Author: Primoz <sisko.primoz@gmail.com> Date: Wed Oct 12 15:51:23 2022 +0000 Changes needed for testing and starting of the Event-Related Segments. commit 0f21273508654133a51d1e20e485c3de33dc2779 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Oct 12 12:32:51 2022 +0000 Bugs fix commit 55517eb737463e21c6f6ad18f4711d38bd02fec5 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Oct 12 12:23:11 2022 +0000 Necessary commit before proceeding. commit de15a52dba43325c171584464995fa4311403e7e Author: Primoz <sisko.primoz@gmail.com> Date: Tue Oct 11 08:36:23 2022 +0000 Bug fix commit 1ad25bb5727d30affd2ee063386f2f5ca52e0d63 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Oct 11 08:26:17 2022 +0000 Few modifications of some imputation values in cleaning script and feature extraction. commit 9884b383cf6e1b339738deea8cc54a6d264e5c87 Author: Primoz <sisko.primoz@gmail.com> Date: Mon Oct 10 16:45:38 2022 +0000 Testing new data with AutoML. commit 2dc89c083c99d43da5e1eb9cbc1c07d39901f27d Author: Primoz <sisko.primoz@gmail.com> Date: Fri Oct 7 08:52:12 2022 +0000 Small changes in cleaning overall commit 001d40072973797f159b94b48ac3add10138f58e Author: Primoz <sisko.primoz@gmail.com> Date: Thu Oct 6 14:28:12 2022 +0000 Clean features and create input files based on all possible targets. commit 1e38d9bf1e4d1caa10ff8c5245a59a9b20d4d7f9 Author: Primoz <sisko.primoz@gmail.com> Date: Thu Oct 6 13:27:38 2022 +0000 Standardization and correlation visualization in overall cleaning script. commit a34412a18dbd75aafa9f3bba9303ab6a962cec03 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Oct 5 14:16:55 2022 +0000 E4 data yield corrections. Changes in overal cs - standardization. commit 437459648f16f71acec6794a07404bfeb781f907 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Oct 5 13:35:05 2022 +0000 Errors fix: individual script - treat participants missing data. commit 53f6cc60d5d8262a3bbb3cad82e2ebf49cb9e2a5 Author: Primoz <sisko.primoz@gmail.com> Date: Mon Oct 3 13:06:39 2022 +0000 Config and cleaning script necessary changes ... commit bbeabeee6ff7e3f8cc429470404c2c4b246b6a22 Author: Primoz <sisko.primoz@gmail.com> Date: Mon Oct 3 12:53:31 2022 +0000 Last changes before processing on the server. commit 44531c6d94705757110df6d087f3762699d7211a Author: Primoz <sisko.primoz@gmail.com> Date: Fri Sep 30 10:04:07 2022 +0000 Code cleaning, reworking cleaning individual based on changes in overall script. Changes in thresholds. commit 7ac7cd5a3714521fc9a36a8371554a921df132fe Author: Primoz <sisko.primoz@gmail.com> Date: Thu Sep 29 14:33:21 2022 +0000 Preparation of the overall cleaning script. commit 68fd69dadab8c7e14095469d5ed1d9921b4c0ce7 Author: Primoz <sisko.primoz@gmail.com> Date: Thu Sep 29 11:55:25 2022 +0000 Cleaning script for individuals: corrections and comments. commit a4f0d056a047aaf3458b043eba5e98df76da039a Author: Primoz <sisko.primoz@gmail.com> Date: Thu Sep 29 11:44:27 2022 +0000 Fillna for app foreground and activity recognition commit 6286e7a44c7e2e99116b83fc3c06ce4398d022e5 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Sep 28 12:47:08 2022 +0000 firstuseafter column removed from contextual imputation commit 9b3447febd075f4dc5ede6a4071f72821835e7a2 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Sep 28 12:40:05 2022 +0000 Contextual imputation correction commit d6adda30cf95e6d9660e6001225dee461bbe3704 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Sep 28 12:37:51 2022 +0000 Contextual imputation on time(first/last) features. commit 8af4ef11dc711dffd86c71c99b15e9451aa94ecd Author: Primoz <sisko.primoz@gmail.com> Date: Wed Sep 28 10:02:47 2022 +0000 Contextual imputation by feature type. commit 536b9494cdfbc6b31d41a74126ac6af1d8fe62e0 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Sep 27 14:12:08 2022 +0000 Cleaning script corrections commit f0b87c9dd02d31223c2b69c2586221ffe202e21f Author: Primoz <sisko.primoz@gmail.com> Date: Tue Sep 27 09:54:15 2022 +0000 Debugging of the empatica data yield integration. commit 7fcdb873fe910f7243fb71ffe56abfdd9ebb81e3 Merge: 5c7bb0f4 bd53dc16 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Sep 27 07:50:29 2022 +0000 Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning commit 5c7bb0f4c14ab4413ed3682268bb7e65ac56fc83 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Sep 27 07:48:32 2022 +0000 Config changes commit bd53dc1684b2b6fadf5653e2a9cc00c953410149 Author: Primoz <sisko.primoz@gmail.com> Date: Mon Sep 26 15:54:00 2022 +0000 Empatica data yield usage in the cleaning script. commit d9a574c550f8beb1cf5662e164ffa66048106dc9 Author: Primoz <sisko.primoz@gmail.com> Date: Fri Sep 23 13:24:50 2022 +0000 Changes in the cleaning script and preparation of empatica data yield method. commit 19aa8707c0b61790d9d4c1834944c1711d8777aa Author: Primoz <sisko.primoz@gmail.com> Date: Thu Sep 22 13:45:51 2022 +0000 Redefined cleaning steps after revision commit 247d758cb7eaa93ed9c7df2b0711325ffa27d985 Merge: 90ee99e4 7493aaa6 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Sep 21 07:18:01 2022 +0000 Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning commit 90ee99e4b99eeb32650df06d60e43c0ee1e66a87 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Sep 21 07:16:00 2022 +0000 Remove TODO comments commit 7493aaa64368f81adbf478b43411063f56a9a87e Author: Primoz <sisko.primoz@gmail.com> Date: Tue Sep 20 12:57:55 2022 +0000 Small changes in cleaning scrtipt and missing vals testing. commit eaf4340afd7db8a43174a4789cfb9ba5a831dac4 Author: Primoz <sisko.primoz@gmail.com> Date: Tue Sep 20 08:03:48 2022 +0000 Small imputation and cleaning corrections. commit a96ea508c628d8c8c736126b65772c06c7cf4ddc Author: Primoz <sisko.primoz@gmail.com> Date: Mon Sep 19 07:34:02 2022 +0000 Fill NaN of Empatica's SD second order feature (must be tested). commit 52e11cdcab51ec8ef97d379304e9297dd233773f Author: Primoz <sisko.primoz@gmail.com> Date: Mon Sep 19 07:25:54 2022 +0000 Configurations for new standardization path. commit 92aff93e65f765ca531bf414c2bdb2e6d7ca1240 Author: Primoz <sisko.primoz@gmail.com> Date: Mon Sep 19 07:25:16 2022 +0000 Remove standardization script. commit 18b63127deefb7cee0061ed741af179a4d6fdb01 Author: Primoz <sisko.primoz@gmail.com> Date: Mon Sep 19 06:16:26 2022 +0000 Removed all standardizaton rules and configurations. commit 62982866cd9f3f45a335d26b67c3fdf3035c6d1e Author: Primoz <sisko.primoz@gmail.com> Date: Fri Sep 16 13:24:21 2022 +0000 Phone wifi visible inspection (WIP) commit 0ce6da5444b7d0e3a30c8f6ca73422dc88d928b2 Author: Primoz <sisko.primoz@gmail.com> Date: Fri Sep 16 11:30:08 2022 +0000 kNN imputation relocation and execution only on specific columns. commit e3b78c8a85084b4cd873350df5bc2b75009fae83 Author: Primoz <sisko.primoz@gmail.com> Date: Fri Sep 16 10:58:57 2022 +0000 Impute selected phone features with 0. Wifi visible, screen, and light. commit 7d85f75d218ef9b3af5ae077343a58e62bb5fa2c Author: Primoz <sisko.primoz@gmail.com> Date: Fri Sep 16 09:03:30 2022 +0000 Changes in phone features NaN values script. commit 385e21409d1d3978588fea56cc26a8d241e3af35 Author: Primoz <sisko.primoz@gmail.com> Date: Thu Sep 15 14:16:58 2022 +0000 Changes in NaN values testing script. commit 18002f59e1c0e161f68b2a4cc217e62a6a86e467 Author: Primoz <sisko.primoz@gmail.com> Date: Thu Sep 15 10:48:59 2022 +0000 Doryab bluetooth and locations features fill in NaN values. commit 3cf7ca41aac7772c89ac90ad0a26e938d39f3f34 Merge: d27a4a71 d5ab5a03 Author: Primoz <sisko.primoz@gmail.com> Date: Wed Sep 14 15:38:32 2022 +0000 Merge branch 'imputation_and_cleaning' of https://repo.ijs.si/junoslukan/rapids into imputation_and_cleaning commit d5ab5a0394cdd395088723ff80d78f2257f8e37c Author: Primoz <sisko.primoz@gmail.com> Date: Wed Sep 14 14:13:03 2022 +0000 Writing testing scripts to determine the point of manual imputation. commit dfbb758902bebdbe86b42753b0252b96ec6e791f Author: Primoz <sisko.primoz@gmail.com> Date: Tue Sep 13 13:54:06 2022 +0000 Changes in AutoML params and environment.yml commit 4ec371ed96102d4b1f3b85557cb5fc3c388c1cef Author: Primoz <sisko.primoz@gmail.com> Date: Tue Sep 13 09:51:03 2022 +0000 Testing auto-sklearn commit d27a4a71c81a50f4d30e5eb5bd9b664610da268a Author: Primoz <sisko.primoz@gmail.com> Date: Mon Sep 12 13:44:17 2022 +0000 Reorganisation and reordering of the cleaning script. commit 15d792089d3b37dd381b1cb02b11f1106920d304 Author: Primoz <sisko.primoz@gmail.com> Date: Thu Sep 1 10:33:36 2022 +0000 Changes in cleaning script: - target extracted from config to remove rows where target is nan - prepared sns.heatmap for further missing values analysis - necessary changes in config and participant p01 - picture of heatmap which shows the values state after cleaning commit cb351e0ff6f6325e86fb2867116a44505a44eae1 Author: Primoz <sisko.primoz@gmail.com> Date: Thu Sep 1 10:06:57 2022 +0000 Unnecessary line (rows with no target value will be removed in cleaning script). commit 86299d346b5c3d706190db0dc14555309d17629f Author: Primoz <sisko.primoz@gmail.com> Date: Thu Sep 1 09:57:21 2022 +0000 Impute phone and sms NAs with 0 commit 3f7ec80c18b1237c49dfdd9a6736cdf518e0b7fb Author: Primoz <sisko.primoz@gmail.com> Date: Wed Aug 31 10:18:50 2022 +0000 Preparation a) phone_calls 0 imputation b) remove rows with NaN target
2022-12-08 17:04:39 +01:00
from pprint import pprint
import sklearn.metrics
import autosklearn.regression
import datetime
import importlib
import os
import sys
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import yaml
from sklearn import linear_model, svm, kernel_ridge, gaussian_process
from sklearn.model_selection import LeaveOneGroupOut, cross_val_score, train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.impute import SimpleImputer
model_input = pd.read_csv("data/processed/models/population_model/input_PANAS_negative_affect_mean.csv") # Standardizirani podatki
model_input.dropna(axis=1, how="all", inplace=True)
model_input.dropna(axis=0, how="any", subset=["target"], inplace=True)
categorical_feature_colnames = ["gender", "startlanguage"]
categorical_feature_colnames += [col for col in model_input.columns if "mostcommonactivity" in col or "homelabel" in col]
categorical_features = model_input[categorical_feature_colnames].copy()
mode_categorical_features = categorical_features.mode().iloc[0]
categorical_features = categorical_features.fillna(mode_categorical_features)
categorical_features = categorical_features.apply(lambda col: col.astype("category"))
if not categorical_features.empty:
categorical_features = pd.get_dummies(categorical_features)
numerical_features = model_input.drop(categorical_feature_colnames, axis=1)
model_in = pd.concat([numerical_features, categorical_features], axis=1)
index_columns = ["local_segment", "local_segment_label", "local_segment_start_datetime", "local_segment_end_datetime"]
model_in.set_index(index_columns, inplace=True)
X_train, X_test, y_train, y_test = train_test_split(model_in.drop(["target", "pid"], axis=1), model_in["target"], test_size=0.30)
automl = autosklearn.regression.AutoSklearnRegressor(
time_left_for_this_task=7200,
per_run_time_limit=120
)
automl.fit(X_train, y_train, dataset_name='straw')
print(automl.leaderboard())
pprint(automl.show_models(), indent=4)
train_predictions = automl.predict(X_train)
print("Train R2 score:", sklearn.metrics.r2_score(y_train, train_predictions))
test_predictions = automl.predict(X_test)
print("Test R2 score:", sklearn.metrics.r2_score(y_test, test_predictions))
import sys
sys.exit()