From 17a34149872d4b77bea03d524b0c26460a7ad08e Mon Sep 17 00:00:00 2001 From: Meng Li <34143965+Meng6@users.noreply.github.com> Date: Mon, 7 Dec 2020 10:56:37 -0500 Subject: [PATCH] Update analysis.md --- docs/workflow-examples/analysis.md | 8 ++++---- example_profile/example_config.yaml | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/workflow-examples/analysis.md b/docs/workflow-examples/analysis.md index b8bd36c4..fe226f1a 100644 --- a/docs/workflow-examples/analysis.md +++ b/docs/workflow-examples/analysis.md @@ -37,10 +37,10 @@ In total, our example workflow has nine steps that are in charge of sensor data ## Configure and run the analysis workflow example 1. [Install](../../setup/installation) RAPIDS -2. Configure the [user credentials](../../setup/configuration/#database-credentials) of a local or remote MySQL server with writing permissions in your `.env` file. +2. Configure the [user credentials](../../setup/configuration/#database-credentials) of a local or remote MySQL server with writing permissions in your `.env` file. The example config file is at `example_profile/example_config.yaml`. 3. Unzip the [test database](https://osf.io/skqfv/files/) to `data/external/rapids_example.sql` and run: ```bash - ./rapids -j1 restore_sql_file + ./rapids -j1 restore_sql_file --profile example_profile ``` 4. Create the participant files for this example by running: ```bash @@ -78,12 +78,12 @@ In total, our example workflow has nine steps that are in charge of sensor data ??? info "7. Merge features and targets." In this step we merge the cleaned features and target labels for our individual models in the `merge_features_and_targets_for_individual_model` rule in `rules/models.smk`. Additionally, we merge the cleaned features, target labels, and demographic features of our two participants for the population model in the `merge_features_and_targets_for_population_model` rule in `rules/models.smk`. These two merged files are the input for our individual and population models. -??? info "8. Modeling." +??? info "8. Modelling." This stage has three phases: model building, training and evaluation. In the building phase we impute, normalize and oversample our dataset. Missing numeric values in each column are imputed with their mean and we impute missing categorical values with their mode. We normalize each numeric column with one of three strategies (min-max, z-score, and scikit-learn package’s robust scaler) and we one-hot encode each categorial feature as a numerical array. We oversample our imbalanced dataset using SMOTE (Synthetic Minority Over-sampling Technique) or a Random Over sampler from scikit-learn. All these parameters are exposed in `example_profile/example_config.yaml`. - In the training phase, we create eight models: logistic regression, k-nearest neighbors, support vector machine, decision tree, random forest, gradient boosting classifier, extreme gradient boosting classifier and a light gradient boosting machine. We cross-validate each model with an inner cycle to tune hyper-parameters based on the Macro F1 score and an outer cycle to predict the test set on a model with the best hyper-parameters. Both cross-validation cycles use a leave-one-participant-out strategy. Parameters for each model like weights and learning rates are exposed in `example_profile/example_config.yaml`. + In the training phase, we create eight models: logistic regression, k-nearest neighbors, support vector machine, decision tree, random forest, gradient boosting classifier, extreme gradient boosting classifier and a light gradient boosting machine. We cross-validate each model with an inner cycle to tune hyper-parameters based on the Macro F1 score and an outer cycle to predict the test set on a model with the best hyper-parameters. Both cross-validation cycles use a leave-one-out strategy. Parameters for each model like weights and learning rates are exposed in `example_profile/example_config.yaml`. Finally, in the evaluation phase we compute the accuracy, Macro F1, kappa, area under the curve and per class precision, recall and F1 score of all folds of the outer cross-validation cycle. diff --git a/example_profile/example_config.yaml b/example_profile/example_config.yaml index a7939096..2ace4951 100644 --- a/example_profile/example_config.yaml +++ b/example_profile/example_config.yaml @@ -1,6 +1,6 @@ # See https://www.rapids.science/setup/configuration/#database-credentials DATABASE_GROUP: &database_group - RAPIDS_EXAMPLE + MY_GROUP # See https://www.rapids.science/setup/configuration/#timezone-of-your-study TIMEZONE: &timezone