Update docs with download_participants and phone sensed days

pull/95/head
JulioV 2020-03-09 14:50:06 -04:00
parent 1c413838ad
commit 1f6bc70479
3 changed files with 113 additions and 84 deletions

View File

@ -43,7 +43,15 @@ This following is documentation of on the RAPIDS metrics settings in the configu
.. _phone-valid-sensed-days: .. _phone-valid-sensed-days:
- ``PHONE_VALID_SENSED_DAYS`` - Specifies the ``BIN_SIZE``, ``MIN_VALID_HOURS``, ``MIN_BINS_PER_HOUR``. ``BIN_SIZE`` is the time that the data is aggregated. ``MIN_VALID_HOURS`` is the minimum numbers of hours data will be gathered within a 24-hour period (a day). Finally, ``MIN_BINS_PER_HOUR`` specifies minimum number of bins that are captured per hour. This is out of the total possible number of bins that can be captured in an hour i.e. out of 60min/``BIN_SIZE`` bins. See PHONE_VALID_SENSED_DAYS_ in ``config`` file. - ``PHONE_VALID_SENSED_DAYS``.
Contains three attributes: ``BIN_SIZE``, ``MIN_VALID_HOURS``, ``MIN_BINS_PER_HOUR``.
On any given day, Aware could have sensed data only for a few minutes or for 24 hours. Daily estimates of metrics should be considered more reliable the more hours Aware was running and logging data (for example, 10 calls logged on a day when only one hour of data was recorded is a less reliable measurement compared to 10 calls on a day when 23 hours of data were recorded.
Therefore, we define a valid hour as those that contain at least a certain number of valid bins. In turn, a valid bin are those that contain at least one row of data from any sensor logged within that period. We divide an hour into N bins of size ``BIN_SIZE`` (in minutes) and we mark an hour as valid if contains at least ``MIN_BINS_PER_HOUR`` of valid bins (out of the total possible number of bins that can be captured in an hour i.e. out of 60min/``BIN_SIZE`` bins). Days with valid sensed hours less than ``MIN_VALID_HOURS`` will be excluded form the output of this file. See PHONE_VALID_SENSED_DAYS_ in ``config.yaml``.
In RAPIDS, you will find that we use ``phone_sensed_bins`` (a list of all valid and invalid bins of all monitored days) to improve the estimation of metrics that are ratios over time periods like ``episodepersensedminutes`` of :ref:`Screen<screen-sensor-doc>`.
.. _individual-sensor-settings: .. _individual-sensor-settings:

View File

@ -39,7 +39,7 @@ macOS (tested on Catalina 10.15)
- ``conda env create -f environment.yml -n rapids`` - ``conda env create -f environment.yml -n rapids``
- ``conda activate rapids`` - ``conda activate rapids``
#. Install r packages and virtual environment: #. Install R packages and virtual environment:
- ``snakemake packrat_install`` - ``snakemake packrat_install``
- ``snakemake packrat_init`` - ``snakemake packrat_init``
@ -82,7 +82,7 @@ Linux (tested on Ubuntu 16.04)
- ``conda env create -f environment.yml -n MY_ENV_NAME`` - ``conda env create -f environment.yml -n MY_ENV_NAME``
- ``conda activate MY_ENV_NAME`` - ``conda activate MY_ENV_NAME``
#. Install r packages and virtual environment: #. Install R packages and virtual environment:
- ``snakemake packrat_install`` - ``snakemake packrat_install``
- ``snakemake packrat_init`` - ``snakemake packrat_init``
@ -94,53 +94,74 @@ Linux (tested on Ubuntu 16.04)
Usage Usage
====== ======
Once you have the installation for your specific operating system complete, you can follow these steps to get starting using the pipeline. Once you have the installation for your specific operating system complete, you can follow these steps to start using RAPIDS.
#. Configure the participants you want to analyze:
- Create one file per participant in the ``rapids/data/external/`` directory. The file should NOT have an extension (i.e. no .txt). The name of the file will become the label for that participant in the pipeline.
- The first line of the file should be a comma separated list with each of the device ID numbers for that participant as it appears in AWARE.
- If AWARE is removed and reinstalled on the device, a new device ID is generated.
- The second line should list the device's operating system (Android or iOS)
- As an example. Let's say participant `p00` had 2 AWARE device_id numbers and was running Android OS. Their file would be named `p00` and contain:
.. code-block:: bash
3a7b0d0a-a9ce-4059-ab98-93a7b189da8a,44f20139-50cc-4b13-bdde-0d5a3889e8f9
android
.. _db-configuration:
#. Configure the database connection: #. Configure the database connection:
- Create an empty file called `.env` in the root directory (``rapids/``) - Create an empty file called `.env` in the root directory (``rapids/``)
- Add and complete the following lines: - Add the following lines and replace your database specific credentials (user, password, and host):
.. code-block:: bash .. code-block:: bash
[MY_GROUP_NAME] [MY_GROUP]
user=MyUSER user=MyUSER
password=MyPassword password=MyPassword
host=MyIP host=MyIP
port=3306 port=3306
- Replace your database specific credentials with those listed above. .. note::
- ``MY_GROUP_NAME`` is a custom label you assign when setting up the database configuration. It does not have to relate to your
.. _the-install-note: ``MY_GROUP`` is a custom label you assign when setting up the database configuration. It has to match ``DATABASE_GROUP`` in the ``config.yaml`` file_. It does not have to relate to your database credentials.
.. note:: #. Configure the participants you want to analyze:
- ``MY_GROUP_NAME`` must also be assigned to the ``DATABASE_GROUP`` variable in the ``config.yaml`` file, which is located in the root directory (``rapids/config.yaml``).
- Ensure that your list of ``SENSORS`` in the ``config.yaml`` file correspond to the sensors used in the ``all`` rule in the ``Snakefile`` file (See :ref:`rapids-structure` for more information)
#. Once the all of the installation and configurations has been completed the following command can be run to pull the default sample dataset that comes with this project.:: - **Automatically**. You can automatically include all devices that are stored in the ``aware_device`` table, if you have especial requirements see the Manual configuration::
$ snakemake snakemake download_participants
- **Manually**. Create one file per participant in the ``rapids/data/external/`` directory. The file should NOT have an extension (i.e. no .txt). The name of the file will become the label for that participant in the pipeline.
- The first line of the file should be the Aware ``device_id`` for that participant. If one participant has multiple device_ids (i.e. Aware had to be re-installed), add all device_ids separated by commas.
- The second line should list the device's operating system (``android`` or ``ios``)
- The third line is a human friendly label that will appear in any plots for that participant.
- The forth line contains a start and end date separated by a comma (e.g. ``20201301,20202505``). Only data wihtin these dates will be included in the pipeline.
For example, let's say participant `p01` had two AWARE device_ids and they were running Android between Feb 1st 2020 and March 3rd 2020. Their participant file would be named ``p01`` and contain:
.. code-block:: bash
3a7b0d0a-a9ce-4059-ab98-93a7b189da8a,44f20139-50cc-4b13-bdde-0d5a3889e8f9
android
Participant01
2020/02/01,2020/03/03
#. Configure the sensors to process:
- The variable ``SENSORS`` in the ``config.yaml`` file_ should match existent sensor tables in your Aware database (See :ref:`rapids-structure` for more information). Each item in this list will be processed in RAPIDS.
This pulls sample data from AWARE_ and processes it with the default rules that come with RAPIDS. .. note::
It is beneficial to list all collected sensors even if you don't plan to include them in a model later on in the pipeline. This is because we use all data available to estimate whether the phone was sensing data or not (i.e. to know if Aware crashed or the battery died). See :ref:`PHONE_VALID_SENSED_DAYS<phone-valid-sensed-days>` for more information.
#. Execute RAPIDS
- Standard execution::
snakemake
- Standard execution over multiple cores::
snakemake - j 8
- Force a rule (useful if you modify your code and want to update its results)::
snakemake -R RULE_NAME
.. _bug: https://github.com/Homebrew/linuxbrew-core/issues/17812 .. _bug: https://github.com/Homebrew/linuxbrew-core/issues/17812
.. _instructions: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html .. _instructions: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html
.. _brew: https://docs.brew.sh/Homebrew-on-Linux .. _brew: https://docs.brew.sh/Homebrew-on-Linux
.. _AWARE: https://awareframework.com/what-is-aware/ .. _AWARE: https://awareframework.com/what-is-aware/
.. _file: https://github.com/carissalow/rapids/blob/master/config.yaml#L22

View File

@ -3,60 +3,6 @@
RAPIDS Structure RAPIDS Structure
================= =================
::
├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── config.yaml <- The configuration settings for the pipeline.
├── environment.yml <- Environmental settings - channels and dependences that are installed in the env)
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── docs <- A default Sphinx project; see sphinx-doc.org for details
├── models <- Trained and serialized models, model predictions, or model summaries
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
`1.0-jqp-initial-data-exploration`.
├── packrat <- Installed R dependences. (Packrat is a dependency management system for R)
├── references <- Data dictionaries, manuals, and all other explanatory materials.
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting.
├── rules
│ ├── features <- Rules to process the feature data pulled in to pipeline.
│ ├── packrat <- Rules for setting up packrat.
│ ├── preprocessing <- Preprocessing rules to clean data before processing.
│ ├── analysis <- Analytic rules that are applied to the data.
│ └── reports <- Snakefile used to produce reports.
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported
├── Snakemake <- The root snakemake file (the equivalent of a Makefile)
├── src <- Source code for use in this project. Can be in any language e.g. Python,
│ │ R, Julia, etc.
│ │
│ ├── data <- Scripts to download or generate data. Can be in any language e.g. Python,
│ │ R, Julia, etc.
│ │
│ ├── features <- Scripts to turn raw data into features for modeling. Can be in any language
│ │ e.g. Python, R, Julia, etc.
│ │
│ ├── models <- Scripts to train models and then use trained models to make prediction. Can
│ │ be in any language e.g. Python, R, Julia, etc.
│ │
│ └── visualization <- Scripts to create exploratory and results oriented visualizations. Can be
│ in any language e.g. Python, R, Julia, etc.
└── tox.ini <- tox file with settings for running tox; see tox.testrun.org
.. _the-snakefile-file: .. _the-snakefile-file:
The ``Snakefile`` File The ``Snakefile`` File
@ -117,7 +63,7 @@ The configurations for the pipeline are defined in the ``config.yaml`` (See `con
- ``PIDS`` - This is the list of the participant IDs to include in the analysis. Create a file for each participant with a matching name ``pXXX`` containing the device_id in the ``data/external/`` directory. (Remember step 8 on the :ref:`install-page` page) - ``PIDS`` - This is the list of the participant IDs to include in the analysis. Create a file for each participant with a matching name ``pXXX`` containing the device_id in the ``data/external/`` directory. (Remember step 8 on the :ref:`install-page` page)
- ``DAY_SEGMENTS`` - A variable used to list all of the common day segments. - ``DAY_SEGMENTS`` - A variable used to list all of the common day segments.
- ``TIMEZONE`` - Time variable. Use timezone names from the `List of Timezone`_ and double check your code, for example EST is not US Eastern Time. - ``TIMEZONE`` - Time variable. Use timezone names from the `List of Timezone`_ and double check your code, for example EST is not US Eastern Time.
- ``DATABASE_GROUP`` - A variable for the name of the database group that the project uses. (Remember :ref:`Installation Note <the-install-note>`.) - ``DATABASE_GROUP`` - Label for the database credentials group. (See :ref:`Configure the database connection <db-configuration>`.)
- ``DOWNLOAD_DATASET`` - Variable used to store the name of the dataset that will be download for analysis. - ``DOWNLOAD_DATASET`` - Variable used to store the name of the dataset that will be download for analysis.
There are a number of other settings that are specific to the sensor/feature that will be pulled and analyzed by the pipeline. An example of the configuration settings for the :ref:`sms` data is shown below:: There are a number of other settings that are specific to the sensor/feature that will be pulled and analyzed by the pipeline. An example of the configuration settings for the :ref:`sms` data is shown below::
@ -242,3 +188,57 @@ This contains the reports of the results of the analysis done by the pipeline.
.. _`visualization directory`: https://github.com/carissalow/rapids/tree/master/src/visualization .. _`visualization directory`: https://github.com/carissalow/rapids/tree/master/src/visualization
.. _`config.yaml`: https://github.com/carissalow/rapids/blob/master/config.yaml .. _`config.yaml`: https://github.com/carissalow/rapids/blob/master/config.yaml
.. _`Snakefile`: https://github.com/carissalow/rapids/blob/master/Snakefile .. _`Snakefile`: https://github.com/carissalow/rapids/blob/master/Snakefile
::
├── LICENSE
├── Makefile <- Makefile with commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── config.yaml <- The configuration settings for the pipeline.
├── environment.yml <- Environmental settings - channels and dependences that are installed in the env)
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── docs <- A default Sphinx project; see sphinx-doc.org for details
├── models <- Trained and serialized models, model predictions, or model summaries
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
`1.0-jqp-initial-data-exploration`.
├── packrat <- Installed R dependences. (Packrat is a dependency management system for R)
├── references <- Data dictionaries, manuals, and all other explanatory materials.
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting.
├── rules
│ ├── features <- Rules to process the feature data pulled in to pipeline.
│ ├── packrat <- Rules for setting up packrat.
│ ├── preprocessing <- Preprocessing rules to clean data before processing.
│ ├── analysis <- Analytic rules that are applied to the data.
│ └── reports <- Snakefile used to produce reports.
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported
├── Snakemake <- The root snakemake file (the equivalent of a Makefile)
├── src <- Source code for use in this project. Can be in any language e.g. Python,
│ │ R, Julia, etc.
│ │
│ ├── data <- Scripts to download or generate data. Can be in any language e.g. Python,
│ │ R, Julia, etc.
│ │
│ ├── features <- Scripts to turn raw data into features for modeling. Can be in any language
│ │ e.g. Python, R, Julia, etc.
│ │
│ ├── models <- Scripts to train models and then use trained models to make prediction. Can
│ │ be in any language e.g. Python, R, Julia, etc.
│ │
│ └── visualization <- Scripts to create exploratory and results oriented visualizations. Can be
│ in any language e.g. Python, R, Julia, etc.
└── tox.ini <- tox file with settings for running tox; see tox.testrun.org