stress_at_work_analysis/exploration/ml_pipeline_regression.py

# ---
# jupyter:
#   jupytext:
#     formats: ipynb,py:percent
#     text_representation:
#       extension: .py
#       format_name: percent
#       format_version: '1.3'
#       jupytext_version: 1.13.0
#   kernelspec:
#     display_name: straw2analysis
#     language: python
#     name: straw2analysis
# ---

# %%
import os
import sys

import pandas as pd

from machine_learning.helper import (
    impute_encode_categorical_features,
    prepare_cross_validator,
    prepare_sklearn_data_format,
    run_all_regression_models,
)

nb_dir = os.path.split(os.getcwd())[0]
if nb_dir not in sys.path:
    sys.path.append(nb_dir)

# %%
model_input = pd.read_csv(
    "../data/intradaily_30_min_all_targets/input_JCQ_job_demand_mean.csv"
)

# %%
CV_METHOD = "half_logo"  # logo, half_logo, 5kfold

model_input_encoded = impute_encode_categorical_features(model_input)
# %%
data_x, data_y, data_groups = prepare_sklearn_data_format(
    model_input_encoded, CV_METHOD
)
cross_validator = prepare_cross_validator(data_x, data_y, data_groups, CV_METHOD)
# %%
scores = run_all_regression_models(data_x, data_y, data_groups, cross_validator)
Processing of a newly cleaned script. Addition of two ML models. And modifications with one hot encoding. 2022-10-10 16:47:00 +02:00			`# ---`
			`# jupyter:`
			`# jupytext:`
			`# formats: ipynb,py:percent`
			`# text_representation:`
			`# extension: .py`
			`# format_name: percent`
			`# format_version: '1.3'`
			`# jupytext_version: 1.13.0`
			`# kernelspec:`
			`# display_name: straw2analysis`
			`# language: python`
			`# name: straw2analysis`
			`# ---`

Thoroughly refactor regression runner. 2023-05-10 20:30:51 +02:00			`# %%`
Processing of a newly cleaned script. Addition of two ML models. And modifications with one hot encoding. 2022-10-10 16:47:00 +02:00			`import os`
			`import sys`

			`import pandas as pd`
Reformat ml_pipeline_regression.py 2023-04-21 21:34:54 +02:00
Thoroughly refactor regression runner. 2023-05-10 20:30:51 +02:00			`from machine_learning.helper import (`
			`impute_encode_categorical_features,`
			`prepare_cross_validator,`
			`prepare_sklearn_data_format,`
			`run_all_regression_models,`
			`)`
Processing of a newly cleaned script. Addition of two ML models. And modifications with one hot encoding. 2022-10-10 16:47:00 +02:00
			`nb_dir = os.path.split(os.getcwd())[0]`
			`if nb_dir not in sys.path:`
			`sys.path.append(nb_dir)`

Thoroughly refactor regression runner. 2023-05-10 20:30:51 +02:00			`# %%`
Reformat ml_pipeline_regression.py 2023-04-21 21:34:54 +02:00			`model_input = pd.read_csv(`
			`"../data/intradaily_30_min_all_targets/input_JCQ_job_demand_mean.csv"`
			`)`
Processing of a newly cleaned script. Addition of two ML models. And modifications with one hot encoding. 2022-10-10 16:47:00 +02:00
Thoroughly refactor regression runner. 2023-05-10 20:30:51 +02:00			`# %%`
			`CV_METHOD = "half_logo" # logo, half_logo, 5kfold`
Processing of a newly cleaned script. Addition of two ML models. And modifications with one hot encoding. 2022-10-10 16:47:00 +02:00
Thoroughly refactor regression runner. 2023-05-10 20:30:51 +02:00			`model_input_encoded = impute_encode_categorical_features(model_input)`
			`# %%`
			`data_x, data_y, data_groups = prepare_sklearn_data_format(`
			`model_input_encoded, CV_METHOD`
Processing of a newly cleaned script. Addition of two ML models. And modifications with one hot encoding. 2022-10-10 16:47:00 +02:00			`)`
Thoroughly refactor regression runner. 2023-05-10 20:30:51 +02:00			`cross_validator = prepare_cross_validator(data_x, data_y, data_groups, CV_METHOD)`
Processing of a newly cleaned script. Addition of two ML models. And modifications with one hot encoding. 2022-10-10 16:47:00 +02:00			`# %%`
Thoroughly refactor regression runner. 2023-05-10 20:30:51 +02:00			`scores = run_all_regression_models(data_x, data_y, data_groups, cross_validator)`