Updated documentation.
parent
84b9bd4e12
commit
270eeabe3b
|
@ -0,0 +1,42 @@
|
|||
.. _test-cases:
|
||||
|
||||
Test Cases
|
||||
-----------
|
||||
|
||||
Along with the continued development and the addition of new sensors and features to the RAPIDS pipeline, tests for the currently available sensors and features are being implemented. Since this is a Work In Progress this page will be updated with the list of sensors and features for which testing is available. For each of the sensors listed a description of the data used for testing (test cases) are outline. Currently for all intent and testing purposes the ``tests/data/raw/test01/`` contains all the test data files for testing android data formats and ``tests/data/raw/test02/`` contains all the test data files for testing iOS data formats. It follows that the expected (verified output) are contained in the ``tests/data/processed/test01/`` and ``tests/data/processed/test02/`` for Android and iOS respectively.
|
||||
|
||||
List of Sensor with Tests
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
The following is a list of the sesors that testing is currently available.
|
||||
|
||||
|
||||
Messages (SMS)
|
||||
"""""""""""""""
|
||||
|
||||
- The raw message data file contains data for 2 separate days.
|
||||
- The data for the first day contains records 5 records for every ``epoch``.
|
||||
- The second day's data contains 6 records for each of only 2 ``epoch`` (currently ``morning`` and ``evening``)
|
||||
- The raw message data contains records for both ``message_types`` (i.e. ``recieved`` and ``sent``) in both days in all epochs. The number records with each ``message_types`` per epoch is randomly distributed There is at least one records with each ``message_types`` per epoch.
|
||||
- There is one raw message data file each for testing both iOS and Android data.
|
||||
|
||||
Calls
|
||||
"""""""
|
||||
|
||||
Due to the difference in the format of the raw call data for iOS and Android (see the **Assumptions/Observations** section of :ref:`Calls<call-sensor-doc>`) the following is the expected results the ``calls_with_datetime_unified.csv``. This would give a better idea of the use cases being tested since the ``calls_with_datetime_unified.csv`` would make both the iOS and Android data comparable.
|
||||
|
||||
- The call data would contain data for 2 days.
|
||||
- The data for the first day contains 6 records for every ``epoch``.
|
||||
- The second day's data contains 6 records for each of only 2 ``epoch`` (currently ``morning`` and ``evening``)
|
||||
- The call data contains records for all ``call_types`` (i.e. ``incoming``, ``outgoing`` and ``missed``) in both days in all epochs. The number records with each of the ``call_types`` per epoch is randomly distributed. There is at least one records with each ``call_types`` per epoch.
|
||||
- There is one call data file each for testing both iOS and Android data.
|
||||
|
||||
Screen
|
||||
""""""""
|
||||
|
||||
Due to the difference in the format of the raw screen data for iOS and Android (see the **Assumptions/Observations** section of :ref:`Screen<screen-sensor-doc>`) the following is the expected results the ``screen_deltas.csv``. This would give a better idea of the use cases being tested since the ``screen_deltas.csv`` would make both the iOS and Android data comparable. These files are used to calculate the features for the screen sensor.
|
||||
|
||||
- The screen delta data file contains data for 1 day.
|
||||
- The screen delta data contains 1 record to represent an ``unlock`` episode that falls within an ``epoch`` for every ``epoch``.
|
||||
- The screen delta data contains 1 record to represent an ``unlock`` episode that falls across the boundary of 2 epochs. Namely the ``unlock`` episode starts in one epoch and ends in the next, thus there is a record for ``unlock`` episodes that fall across ``night`` to ``morning``, ``morning`` to ``afternoon`` and finally ``afternoon`` to ``night``
|
||||
- The testing is done for ``unlock`` episode_type.
|
||||
- There is one screen data file each for testing both iOS and Android data formats.
|
|
@ -6,13 +6,15 @@ The following is a simple guide to testing RAPIDS. All files necessary for testi
|
|||
::
|
||||
|
||||
├── tests
|
||||
│ ├── data <- Replication of the project root data directory for testing.
|
||||
│ │ ├── external <- Contains Data from third party sources used for testing.
|
||||
│ ├── data <- Replica of the project root data directory for testing.
|
||||
│ │ ├── external <- Contains the fake testing participant files.
|
||||
│ │ ├── interim <- The expected intermediate data that has been transformed.
|
||||
│ │ ├── processed <- The expected final, canonical data sets for modeling.
|
||||
│ │ └── raw <- The specially created raw input datasets that will be used for testing.
|
||||
│ │ ├── processed <- The expected final data, canonical data sets for modeling used to test/validate feature calculations.
|
||||
│ │ └── raw <- The specially created raw input datasets (fake data) that will be used for testing.
|
||||
│ │
|
||||
│ ├── scripts <- Scripts for testing. Add test scripts in this directory.
|
||||
│ │ ├── run_tests.sh <- The shell script to runs RAPIDS pipeline test data and test the results
|
||||
│ │ ├── test_sensor_features.py <- The default test script for testing RAPIDS builting sensor features.
|
||||
│ │ └── utils.py <- Contains any helper functions and methods.
|
||||
│ │
|
||||
│ ├── settings <- The directory contains the config and settings files for testing snakemake.
|
||||
|
@ -22,31 +24,28 @@ The following is a simple guide to testing RAPIDS. All files necessary for testi
|
|||
│ └── Snakefile <- The Snakefile for testing only. It contains the rules that you would be testing.
|
||||
│
|
||||
|
||||
To begin testing RAPIDS place the input data ``csv`` files in ``tests/data/raw`` and ``data/raw``. The expected output files of RAPIDS after processing the input data should be placed in ``tests/data/processesd``.
|
||||
|
||||
The Snakemake rule(s) that are to be tested must be placed in the ``tests/Snakemake`` file. The current ``tests/Snakemake`` is a good example of how to define them.
|
||||
Steps for Testing
|
||||
""""""""""""""""""
|
||||
|
||||
After storing your test scripts in ``tests/scripts``, you can run all rules in the ``tests/Snakemake`` with:
|
||||
#. To begin testing RAPIDS place the fake raw input data ``csv`` files in ``tests/data/raw/``. The fake participant files should be placed in ``tests/data/external/``. The expected output files of RAPIDS after processing the input data should be placed in ``tests/data/processesd/``.
|
||||
|
||||
#. The Snakemake rule(s) that are to be tested must be placed in the ``tests/Snakemake`` file. The current ``tests/Snakemake`` is a good example of how to define them. (At the time of writing this documentation the snakefile contains rules messages (SMS), calls and screen)
|
||||
|
||||
#. Edit the ``tests/settings/config.yaml``. Add and/or remove the rules to be run for testing from the ``forcerun`` list.
|
||||
|
||||
#. Edit the ``tests/settings/testing_config.yaml`` with the necessary configuration settings for running the rules to be tested.
|
||||
|
||||
#. Add any additional testscripts in ``tests/scripts``.
|
||||
|
||||
#. Uncomment or comment off lines in the testing shell script ``tests/scripts/run_tests.sh``.
|
||||
|
||||
#. Run the testing shell script.
|
||||
|
||||
::
|
||||
|
||||
snakemake --profile tests/settings
|
||||
$ tests/scripts/run_tests.sh
|
||||
|
||||
Or run a single rule with
|
||||
|
||||
::
|
||||
|
||||
snakemake --profile tests/settings -R sms_features
|
||||
|
||||
The above example runs the ``sms_features`` rule that is defined in the ``tests/Snakemake`` file. Replace this with the name of the rule you want to test. The ``--profile`` flag is used to run Snakemake with the ``Snakfile`` and ``testing_config.yaml`` file stored in ``tests/settings``.
|
||||
|
||||
Once RAPIDS has processed the sample data, the next step is to test the output. Testing is implemented using Python's Unittest. To run all the tests scripts stored in the ``tests/scripts`` directory use the following command:
|
||||
|
||||
::
|
||||
|
||||
python -m unittest discover tests/scripts/ -v
|
||||
|
||||
The ``discover`` flag finds and runs all the test scripts within the ``tests/scripts`` directory that start with ``test_``. The name of all test methods in these scripts should also start with ``test_``.
|
||||
|
||||
The following is a snippet of the output you should see after running your test.
|
||||
|
||||
|
@ -61,4 +60,8 @@ The following is a snippet of the output you should see after running your test.
|
|||
|
||||
The results above show that the first test ``test_sensors_files_exist`` passed while ``test_sensors_features_calculations`` failed. In addition you should get the traceback of the failure (not shown here). For more information on how to implement test scripts and use unittest please see `Unittest Documentation`_
|
||||
|
||||
Testing of the RAPIDS sensors and features is a work-in-progess. Please see :ref:`test-cases` for a list of sensors and features that have testing currently available.
|
||||
|
||||
Currently the repository is set up to test a number of senssors out of the box by simply running the ``tests/scripts/run_tests.sh`` command once the RAPIDS python environment is active.
|
||||
|
||||
.. _`Unittest Documentation`: https://docs.python.org/3.7/library/unittest.html#command-line-interface
|
||||
|
|
|
@ -3,12 +3,14 @@
|
|||
RAPIDS Features
|
||||
===============
|
||||
|
||||
*How do I compute any of these features?* In your ``config.yaml``, go to the sensor section you are interested in and set the corresponding ``COMPUTE`` option to ``TRUE`` as well as ``DB_TABLE`` to the senor's table name in your database (the default table name is the one assigned by Aware), for example:
|
||||
*How do I compute any of these features?* In your ``config.yaml``, go to the sensor section you are interested in and set the corresponding ``COMPUTE`` option to ``TRUE`` as well as ``DB_TABLE`` to the senor's table name in your database (the default table name is the one assigned by Aware), for example
|
||||
::
|
||||
|
||||
| ``MESSAGES:``
|
||||
| ``COMPUTE: True``
|
||||
| ``DB_TABLE: messages``
|
||||
| ``...``
|
||||
MESSAGES:
|
||||
COMPUTE: True
|
||||
DB_TABLE: messages
|
||||
...
|
||||
|
||||
|
||||
If you want to extract phone_valid_sensed_days.csv, screen features or locaton features based on fused location data don't forget to configure ``TABLES_FOR_SENSED_BINS`` (see below).
|
||||
|
||||
|
@ -21,14 +23,6 @@ Global Parameters
|
|||
|
||||
- ``TABLES_FOR_SENSED_BINS`` - Add as many sensor tables as you have in your database. All sensors included are used to compute ``phone_sensed_bins.csv`` (bins of time when the smartphone was sensing data). In turn, these bins are used to compute ``PHONE_VALID_SENSED_DAYS`` (see below), ``episodepersensedminutes`` feature of :ref:`Screen<screen-sensor-doc>` and to resample fused location data if you configure Barnett's location features to use ``RESAMPLE_FUSED``. See TABLES_FOR_SENSED_BINS_ variable in ``config`` file (therefore, when you are extracting screen or Barnett's location features, screen and locations tables are mandatory).
|
||||
|
||||
.. _fitbit-table:
|
||||
|
||||
- ``FITBIT_TABLE`` - The table in your database that contains your Fitbit data in a field named `fitbit_data` in JSON format.
|
||||
|
||||
.. _fitbit-sensors:
|
||||
|
||||
- ``FITBIT_SENSORS`` - The list of sensors to be parsed from the fitbit table: ``heartrate``, ``steps``, ``sleep``.
|
||||
|
||||
.. _pid:
|
||||
|
||||
- ``PID`` - The list of participant ids to be included in the analysis. These should match the names of the files created in the ``data/external`` directory (:ref:`see more details<db-configuration>`).
|
||||
|
@ -75,10 +69,10 @@ Global Parameters
|
|||
.. _individual-sensor-settings:
|
||||
|
||||
|
||||
.. _sms-sensor-doc:
|
||||
.. _messages-sensor-doc:
|
||||
|
||||
Messages (SMS)
|
||||
"""""
|
||||
"""""""""""""""
|
||||
|
||||
See `Messages Config Code`_
|
||||
|
||||
|
@ -90,39 +84,40 @@ See `Messages Config Code`_
|
|||
|
||||
- Rule ``rules/preprocessing.snakefile/download_dataset``
|
||||
- Rule ``rules/preprocessing.snakefile/readable_datetime``
|
||||
- Rule ``rules/features.snakefile/sms_features``
|
||||
- Rule ``rules/features.snakefile/messages_features``
|
||||
|
||||
.. _sms-parameters:
|
||||
.. _messages-parameters:
|
||||
|
||||
**SMS Rule Parameters (sms_features):**
|
||||
**Messages Rule Parameters (messages_features):**
|
||||
|
||||
============ ===================
|
||||
Name Description
|
||||
============ ===================
|
||||
sms_type The particular ``sms_type`` that will be analyzed. The options for this parameter are ``received`` or ``sent``.
|
||||
day_segment The particular ``day_segment`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
|
||||
features Features to be computed, see table below
|
||||
============ ===================
|
||||
============== ===================
|
||||
Name Description
|
||||
============== ===================
|
||||
messages_type The particular ``messages_type`` that will be analyzed. The options for this parameter are ``received`` or ``sent``.
|
||||
day_segment The particular ``day_segment`` that will be analyzed. The available options are ``daily``, ``morning``, ``afternoon``, ``evening``, ``night``
|
||||
features Features to be computed, see table below
|
||||
============== ===================
|
||||
|
||||
.. _sms-available-features:
|
||||
.. _messages-available-features:
|
||||
|
||||
**Available SMS Featues**
|
||||
**Available Message Features**
|
||||
|
||||
========================= ========= =============
|
||||
Name Units Description
|
||||
========================= ========= =============
|
||||
count SMS Number of SMS of type ``sms_type`` that occurred during a particular ``day_segment``.
|
||||
distinctcontacts contacts Number of distinct contacts that are associated with a particular ``sms_type`` during a particular ``day_segment``.
|
||||
timefirstsms minutes Number of minutes between 12:00am (midnight) and the first ``SMS`` of a particular ``sms_type``.
|
||||
timelastsms minutes Number of minutes between 12:00am (midnight) and the last ``SMS`` of a particular ``sms_type``.
|
||||
countmostfrequentcontact SMS Number of ``SMS`` messages from the contact with the most messages of ``sms_type`` during a ``day_segment`` throughout the whole dataset of each participant.
|
||||
count messages Number of messages of type ``messages_type`` that occurred during a particular ``day_segment``.
|
||||
distinctcontacts contacts Number of distinct contacts that are associated with a particular ``messages_type`` during a particular ``day_segment``.
|
||||
timefirstsms minutes Number of minutes between 12:00am (midnight) and the first ``message`` of a particular ``messages_type``.
|
||||
timelastsms minutes Number of minutes between 12:00am (midnight) and the last ``message`` of a particular ``messages_type``.
|
||||
countmostfrequentcontact messages Number of messages from the contact with the most messages of ``messages_type`` during a ``day_segment`` throughout the whole dataset of each participant.
|
||||
========================= ========= =============
|
||||
|
||||
**Assumptions/Observations:**
|
||||
|
||||
``TYPES`` and ``FEATURES`` keys in ``config.yaml`` need to match. For example, below the ``TYPE`` ``sent`` matches the ``FEATURES`` key ``sent``::
|
||||
|
||||
SMS:
|
||||
MESSAGES:
|
||||
...
|
||||
TYPES: [sent]
|
||||
FEATURES:
|
||||
sent: [count, distinctcontacts, timefirstsms, timelastsms, countmostfrequentcontact]
|
||||
|
@ -197,6 +192,7 @@ countmostfrequentcontact calls The number of ``missed`` calls during
|
|||
``TYPES`` and ``FEATURES`` keys in ``config.yaml`` need to match. For example, below the ``TYPE`` ``missed`` matches the ``FEATURES`` key ``missed``::
|
||||
|
||||
CALLS:
|
||||
...
|
||||
TYPES: [missed]
|
||||
FEATURES:
|
||||
missed: [count, distinctcontacts, timefirstcall, timelastcall, countmostfrequentcontact]
|
||||
|
@ -246,6 +242,7 @@ countscansmostuniquedevice scans Number of scans of the most scanned
|
|||
|
||||
**Assumptions/Observations:** N/A
|
||||
|
||||
|
||||
.. _wifi-sensor-doc:
|
||||
|
||||
WiFi
|
||||
|
@ -453,9 +450,12 @@ For Aware iOS client V1 we swap battery status 3 to 5 and 1 to 3, client V2 does
|
|||
|
||||
.. _activity-recognition-sensor-doc:
|
||||
|
||||
|
||||
Activity Recognition
|
||||
""""""""""""""""""""""""""""
|
||||
|
||||
See `Activity Recognition Config Code`_
|
||||
|
||||
**Available Epochs:** daily, morning, afternoon, evening, night
|
||||
|
||||
**Available Platforms:** Android and iOS
|
||||
|
@ -515,9 +515,9 @@ See `Light Config Code`_
|
|||
|
||||
**Rule Chain:**
|
||||
|
||||
- **Rule:** ``rules/preprocessing.snakefile/download_dataset`` - See the download_dataset_ rule.
|
||||
- **Rule:** ``rules/preprocessing.snakefile/readable_datetime`` - See the readable_datetime_ rule.
|
||||
- **Rule:** ``rules/features.snakefile/light_features`` - See the light_features_ rule.
|
||||
- Rule: ``rules/preprocessing.snakefile/download_dataset``
|
||||
- Rule: ``rules/preprocessing.snakefile/readable_datetime``
|
||||
- Rule: ``rules/features.snakefile/light_features``
|
||||
|
||||
.. _light-parameters:
|
||||
|
||||
|
@ -686,7 +686,7 @@ An ``unlock`` episode is considered as the time between an ``unlock`` event and
|
|||
.. _conversation-sensor-doc:
|
||||
|
||||
Conversation
|
||||
""""""""
|
||||
""""""""""""""
|
||||
|
||||
See `Conversation Config Code`_
|
||||
|
||||
|
@ -802,21 +802,21 @@ Only features from summary data are available at the momement.
|
|||
|
||||
The `fitbit_with_datetime` rule will extract Summary data (`fitbit_sleep_summary_with_datetime.csv`) and Intraday data (`fitbit_sleep_intraday_with_datetime.csv`). There are two versions of Fitbit's sleep API (`version 1`_ and `version 1.2`_), and each provides raw sleep data in a different format:
|
||||
|
||||
- Sleep level. In ``v1``, sleep level is an integer with three possible values (1, 2, 3) while in ``v1.2`` is a string. We convert integer levels to strings, "asleep", "restless" or "awake" respectively.
|
||||
- Count summaries. For Summary data, ``v1`` contains "count_awake", "duration_awake", "count_awakenings", "count_restless", and "duration_restless" fields for every sleep record while ``v1.2`` does not.
|
||||
- Types of sleep records. ``v1.2`` has two types of sleep records: "classic" and "stages". The "classic" type contains three sleep levels: "awake", "restless" and "asleep". The "stages" type contains four sleep levels: "wake", "deep", "light", and "rem". Sleep records from ``v1`` will have the same sleep levels as `v1.2` classic type; therefore we set their type to "classic".
|
||||
- Unified level of sleep. For intraday data, we unify sleep levels of each sleep record with a column named "unified_level". Based on `this Fitbit forum post`_ , we merge levels into two categories:
|
||||
- For the "classic" type unified_level is one of {0, 1} where 0 means awake and groups "awake" + "restless", while 1 means asleep and groups "asleep".
|
||||
- For the "stages" type, unified_level is one of {0, 1} where 0 means awake and groups "wake" while 1 means asleep and groups "deep" + "light" + "rem".
|
||||
- Short Data. In ``v1.2``, records of type "stages" contain "shortData" in addition to "data". We merge both to extract intraday data.
|
||||
- "data" contains sleep stages and any wake periods > 3 minutes (180 seconds).
|
||||
- "shortData" contains short wake periods representing physiological awakenings that are <= 3 minutes (180 seconds).
|
||||
- The following columns of Summary data are not computed by RAPIDS but taken directly from columns with a similar name provided by Fitbit's API: `efficiency`, `minutes_after_wakeup`, `minutes_asleep`, `minutes_awake`, `minutes_to_fall_asleep`, `minutes_in_bed`, `is_main_sleep` and `type`
|
||||
- The following columns of Intraday data are not computed by RAPIDS but taken directly from columns with a similar name provided by Fitbit's API: `original_level`, `is_main_sleep` and `type`. We compute `unified_level` as explained above.
|
||||
- Sleep level. In ``v1``, sleep level is an integer with three possible values (1, 2, 3) while in ``v1.2`` is a string. We convert integer levels to strings, ``asleep``,``restless`` or ``awake`` respectively.
|
||||
- Count summaries. For Summary data, ``v1`` contains ``count_awake``, ``duration_awake``, ``count_awakenings``, ``count_restless``, and ``duration_restless`` fields for every sleep record while ``v1.2`` does not.
|
||||
- Types of sleep records. ``v1.2`` has two types of sleep records: ``classic`` and ``stages``. The ``classic`` type contains three sleep levels: ``awake``, ``restless`` and ``asleep``. The ``stages`` type contains four sleep levels: ``wake``, ``deep``, ``light``, and ``rem``. Sleep records from ``v1`` will have the same sleep levels as `v1.2` classic type; therefore we set their type to ``classic``.
|
||||
- Unified level of sleep. For intraday data, we unify sleep levels of each sleep record with a column named ``unified_level``. Based on `this Fitbit forum post`_ , we merge levels into two categories:
|
||||
- For the ``classic`` type unified_level is one of {0, 1} where 0 means awake and groups ``awake`` + ``restless``, while 1 means asleep and groups ``asleep``.
|
||||
- For the ``stages`` type, unified_level is one of {0, 1} where 0 means awake and groups ``wake`` while 1 means asleep and groups ``deep`` + ``light`` + ``rem``.
|
||||
- Short Data. In ``v1.2``, records of type ``stages`` contain ``shortData`` in addition to ``data``. We merge both to extract intraday data.
|
||||
- ``data`` contains sleep stages and any wake periods > 3 minutes (180 seconds).
|
||||
- ``shortData`` contains short wake periods representing physiological awakenings that are <= 3 minutes (180 seconds).
|
||||
- The following columns of Summary data are not computed by RAPIDS but taken directly from columns with a similar name provided by Fitbit's API: ``efficiency``, ``minutes_after_wakeup``, ``minutes_asleep``, ``minutes_awake``, ``minutes_to_fall_asleep``, ``minutes_in_bed``, ``is_main_sleep`` and ``type``
|
||||
- The following columns of Intraday data are not computed by RAPIDS but taken directly from columns with a similar name provided by Fitbit's API: ``original_level``, ``is_main_sleep`` and ``type``. We compute ``unified_level`` as explained above.
|
||||
|
||||
These are examples of intraday and summary data:
|
||||
|
||||
- Intraday data (at 30-second intervals for "stages" type or 60-second intervals for "classic" type)
|
||||
- Intraday data (at 30-second intervals for ``stages`` type or 60-second intervals for ``classic`` type)
|
||||
|
||||
========= ============== ============= ============= ====== =================== ========== =========== ========= ================= ========== ========== ============ =================
|
||||
device_id original_level unified_level is_main_sleep type local_date_time local_date local_month local_day local_day_of_week local_time local_hour local_minute local_day_segment
|
||||
|
@ -837,6 +837,7 @@ did 90 0 381 54
|
|||
did 88 0 498 86 0 584 1 stages 2020-05-22 22:03:00 2020-05-23 07:47:03 2020-05-22 2020-05-23 evening morning
|
||||
========= ========== ==================== ============== ============= ====================== ============== ============= ====== ===================== =================== ================ ============== ======================= =====================
|
||||
|
||||
|
||||
.. _fitbit-heart-rate-sensor-doc:
|
||||
|
||||
Fitbit: Heart Rate
|
||||
|
@ -892,6 +893,7 @@ There are four heart rate zones: ``out_of_range``, ``fat_burn``, ``cardio``, and
|
|||
|
||||
Calories' accuracy depends on the users’ Fitbit profile (weight, height, etc.).
|
||||
|
||||
|
||||
.. _fitbit-steps-sensor-doc:
|
||||
|
||||
Fitbit: Steps
|
||||
|
@ -957,75 +959,31 @@ Active and sedentary bouts. If the step count per minute is smaller than ``THRES
|
|||
|
||||
.. -------------------------Links ------------------------------------ ..
|
||||
|
||||
.. _TABLES_FOR_SENSED_BINS: https://github.com/carissalow/rapids/blob/f22d1834ee24ab3bcbf051bc3cc663903d822084/config.yaml#L2
|
||||
.. _`SMS Config Code`: https://github.com/carissalow/rapids/blob/f22d1834ee24ab3bcbf051bc3cc663903d822084/config.yaml#L38
|
||||
.. _TABLES_FOR_SENSED_BINS: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L3
|
||||
.. _`Messages Config Code`: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L35
|
||||
.. _AWARE: https://awareframework.com/what-is-aware/
|
||||
.. _`List of Timezones`: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
|
||||
.. _sms_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L1
|
||||
.. _sms_features.R: https://github.com/carissalow/rapids/blob/master/src/features/sms_featues.R
|
||||
.. _download_dataset: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L9
|
||||
.. _download_dataset.R: https://github.com/carissalow/rapids/blob/master/src/data/download_dataset.R
|
||||
.. _readable_datetime: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L21
|
||||
.. _readable_datetime.R: https://github.com/carissalow/rapids/blob/master/src/data/readable_datetime.R
|
||||
.. _DAY_SEGMENTS: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L13
|
||||
.. _PHONE_VALID_SENSED_DAYS: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L60
|
||||
.. _`Call Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L46
|
||||
.. _call_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L13
|
||||
.. _call_features.R: https://github.com/carissalow/rapids/blob/master/src/features/call_features.R
|
||||
.. _`Bluetooth Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L76
|
||||
.. _bluetooth_feature: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L63
|
||||
.. _bluetooth_features.R: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/src/features/bluetooth_features.R
|
||||
.. _`Accelerometer Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L98
|
||||
.. _accelerometer_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L124
|
||||
.. _accelerometer_features.py: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/src/features/accelerometer_featues.py
|
||||
.. _`Applications Foreground Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L102
|
||||
.. _`Application Genres Config`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L54
|
||||
.. _application_genres: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L81
|
||||
.. _application_genres.R: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/src/data/application_genres.R
|
||||
.. _applications_foreground_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L135
|
||||
.. _applications_foreground_features.py: https://github.com/carissalow/rapids/blob/master/src/features/accelerometer_features.py
|
||||
.. _`Battery Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L84
|
||||
.. _battery_deltas: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L25
|
||||
.. _battery_deltas.R: https://github.com/carissalow/rapids/blob/master/src/features/battery_deltas.R
|
||||
.. _battery_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L86
|
||||
.. _battery_features.py : https://github.com/carissalow/rapids/blob/master/src/features/battery_features.py
|
||||
.. _`Google Activity Recognition Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L80
|
||||
.. _google_activity_recognition_deltas: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L41
|
||||
.. _google_activity_recognition_deltas.R: https://github.com/carissalow/rapids/blob/master/src/features/google_activity_recognition_deltas.R
|
||||
.. _activity_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L74
|
||||
.. _google_activity_recognition.py: https://github.com/carissalow/rapids/blob/master/src/features/google_activity_recognition.py
|
||||
.. _`Light Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L94
|
||||
.. _light_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L113
|
||||
.. _light_features.py: https://github.com/carissalow/rapids/blob/master/src/features/light_features.py
|
||||
.. _`Location (Barnett’s) Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L70
|
||||
.. _phone_sensed_bins: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L46
|
||||
.. _phone_sensed_bins.R: https://github.com/carissalow/rapids/blob/master/src/data/phone_sensed_bins.R
|
||||
.. _resample_fused_location: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L67
|
||||
.. _resample_fused_location.R: https://github.com/carissalow/rapids/blob/master/src/data/resample_fused_location.R
|
||||
.. _location_barnett_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L49
|
||||
.. _location_barnett_features.R: https://github.com/carissalow/rapids/blob/master/src/features/location_barnett_features.R
|
||||
.. _`Screen Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L88
|
||||
.. _screen_deltas: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L33
|
||||
.. _screen_deltas.R: https://github.com/carissalow/rapids/blob/master/src/features/screen_deltas.R
|
||||
.. _screen_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L97
|
||||
.. _screen_features.py: https://github.com/carissalow/rapids/blob/master/src/features/screen_features.py
|
||||
.. _fitbit_with_datetime: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/preprocessing.snakefile#L94
|
||||
.. _fitbit_readable_datetime.py: https://github.com/carissalow/rapids/blob/master/src/data/fitbit_readable_datetime.py
|
||||
.. _`Fitbit: Sleep Config Code`: https://github.com/carissalow/rapids/blob/e952e27350c7ae02703bd444e8f92979e37d9ba6/config.yaml#L129
|
||||
.. _fitbit_sleep_features: https://github.com/carissalow/rapids/blob/e952e27350c7ae02703bd444e8f92979e37d9ba6/rules/features.snakefile#L209
|
||||
.. _fitbit_sleep_features.py: https://github.com/carissalow/rapids/blob/master/src/features/fitbit_sleep_features.py
|
||||
.. _DAY_SEGMENTS: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L10
|
||||
.. _PHONE_VALID_SENSED_DAYS: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L61
|
||||
.. _`Call Config Code`: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L45
|
||||
.. _`WiFi Config Code`: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L169
|
||||
.. _`Bluetooth Config Code`: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L81
|
||||
.. _`Accelerometer Config Code`: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L115
|
||||
.. _`Applications Foreground Config Code`: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L125
|
||||
.. _`Battery Config Code`: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L95
|
||||
.. _`Activity Recognition Config Code`: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L87
|
||||
.. _`Light Config Code`: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L109
|
||||
.. _`Location (Barnett’s) Config Code`: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L71
|
||||
.. _`Screen Config Code`: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L101
|
||||
.. _`Fitbit: Sleep Config Code`: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L162
|
||||
.. _`version 1`: https://dev.fitbit.com/build/reference/web-api/sleep-v1/
|
||||
.. _`version 1.2`: https://dev.fitbit.com/build/reference/web-api/sleep/
|
||||
.. _`Conversation Config Code`: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L191
|
||||
.. _`this Fitbit forum post`: https://community.fitbit.com/t5/Alta/What-does-Restless-mean-in-sleep-tracking/td-p/2989011
|
||||
.. _ shortData: https://dev.fitbit.com/build/reference/web-api/sleep/#interpreting-the-sleep-stage-and-short-data
|
||||
.. _`Fitbit: Heart Rate Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L113
|
||||
.. _fitbit_heartrate_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L151
|
||||
.. _fitbit_heartrate_features.py: https://github.com/carissalow/rapids/blob/master/src/features/fitbit_heartrate_features.py
|
||||
.. _`Fitbit: Steps Config Code`: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L117
|
||||
.. _fitbit_step_features: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/rules/features.snakefile#L162
|
||||
.. _fitbit_step_features.py: https://github.com/carissalow/rapids/blob/master/src/features/fitbit_step_features.py
|
||||
.. _shortData: https://dev.fitbit.com/build/reference/web-api/sleep/#interpreting-the-sleep-stage-and-short-data
|
||||
.. _`Fitbit: Heart Rate Config Code`: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L138
|
||||
.. _`Fitbit: Steps Config Code`: https://github.com/carissalow/rapids/blob/0c53fd275e628819cf79cf5b87006ce1ad9e597c/config.yaml#L145
|
||||
.. _`Fitbit documentation`: https://help.fitbit.com/articles/en_US/Help_article/1565
|
||||
.. _`Custom Catalogue File`: https://github.com/carissalow/rapids/blob/master/data/external/stachl_application_genre_catalogue.csv
|
||||
.. _top1global: https://github.com/carissalow/rapids/blob/765bb462636d5029a05f54d4c558487e3786b90b/config.yaml#L108
|
||||
.. _`Beiwe Summary Statistics`: http://wiki.beiwe.org/wiki/Summary_Statistics
|
||||
.. _`Pause-Flight Model`: https://academic.oup.com/biostatistics/advance-article/doi/10.1093/biostatistics/kxy059/5145908
|
||||
|
|
|
@ -33,4 +33,5 @@ Contents:
|
|||
develop/documentation
|
||||
develop/features
|
||||
develop/contributors
|
||||
develop/testing
|
||||
develop/testing
|
||||
develop/test_cases
|
|
@ -1,7 +1,7 @@
|
|||
.. _minimal-working-example:
|
||||
|
||||
Minimal Working Example
|
||||
=======================
|
||||
========================
|
||||
|
||||
This is a quick guide for creating and running a simple pipeline to extract call features for daily and night epochs of one participant monitored on the US East coast.
|
||||
|
||||
|
@ -13,23 +13,24 @@ This is a quick guide for creating and running a simple pipeline to extract call
|
|||
|
||||
#. Modify the following settings in the ``config.yaml`` file with the values shown below (leave all other settings as they are)
|
||||
|
||||
::
|
||||
PIDS: [p01]
|
||||
|
||||
DAY_SEGMENTS: &day_segments
|
||||
[daily, night]
|
||||
::
|
||||
|
||||
TIMEZONE: &timezone
|
||||
America/New_York
|
||||
|
||||
DATABASE_GROUP: &database_group
|
||||
MY_GROUP (change this if you added your DB credentials to .env with a different label)
|
||||
|
||||
CALLS:
|
||||
COMPUTE: True
|
||||
DB_TABLE: calls (only change DB_TABLE if your database calls table has a different name)
|
||||
PIDS: [p01]
|
||||
|
||||
For more information on the ``calls`` sensor see :ref:`call-sensor-doc`
|
||||
DAY_SEGMENTS: &day_segments
|
||||
[daily, night]
|
||||
|
||||
TIMEZONE: &timezone
|
||||
America/New_York
|
||||
|
||||
DATABASE_GROUP: &database_group
|
||||
MY_GROUP (change this if you added your DB credentials to .env with a different label)
|
||||
|
||||
CALLS:
|
||||
COMPUTE: True
|
||||
DB_TABLE: calls (only change DB_TABLE if your database calls table has a different name)
|
||||
|
||||
For more information on the ``calls`` sensor see :ref:`call-sensor-doc`
|
||||
|
||||
#. Run the following command to execute RAPIDS
|
||||
|
||||
|
|
|
@ -10,7 +10,7 @@ The ``config.yaml`` File
|
|||
|
||||
RAPIDS configuration settings are defined in ``config.yaml`` (See `config.yaml`_). This is the only file that you need to understand in order to compute the features that RAPIDS ships with.
|
||||
|
||||
It has global settings like ``TABLES_FOR_SENSED_BINS``, ``PIDS``, ``DAY_SEGMENTS``, among others (see :ref:`global-sensor-doc` for more information). As well as per sensor settings, for example, for the :ref:`sms-sensor-doc`::
|
||||
It has global settings like ``TABLES_FOR_SENSED_BINS``, ``PIDS``, ``DAY_SEGMENTS``, among others (see :ref:`global-sensor-doc` for more information). As well as per sensor settings, for example, for the :ref:`messages-sensor-doc`::
|
||||
|
||||
| ``MESSAGES:``
|
||||
| ``COMPUTE: True``
|
||||
|
@ -21,7 +21,7 @@ It has global settings like ``TABLES_FOR_SENSED_BINS``, ``PIDS``, ``DAY_SEGMENTS
|
|||
|
||||
The ``Snakefile`` File
|
||||
----------------------
|
||||
The ``Snakefile`` file (see the actual `Snakefile`_) pulls the entire system together. The first line in this file identifies the configuration file. Next are a list of included directives that import the rules used to pull, clean, process, analyze and report data. Finally, the ``all`` rule lists the files that need to be computed (raw files, intermediate files, feature files, reports, etc).
|
||||
The ``Snakefile`` file (see the actual `Snakefile`_) pulls the entire system together. The first line in this file identifies the configuration file. Next are a list of included directives that import the rules used to pull, clean, process, analyze and report data. After initializing the list of``files_to_compute`` by checking the config file for the sensors that ``COMPUTE`` is ``True`` the ``all`` rule is called with the list of files that need to be computed (raw files, intermediate files, feature files, reports, etc).
|
||||
|
||||
.. _includes-section:
|
||||
|
||||
|
@ -42,13 +42,15 @@ Includes are relative to the root directory.
|
|||
|
||||
``Rule all:``
|
||||
"""""""""""""
|
||||
In RAPIDS the ``all`` rule lists the output files we expect the pipeline to compute using the ``expand`` directive. The ``expand`` function allows us to generate a list of file paths that have a common structure except for PIDS or other parameters. Consider the following::
|
||||
In RAPIDS the ``all`` rule lists the output files we expect the pipeline to compute using the ``expand`` directive. Before the ``all`` rule is called snakemake checks the ``config.yaml`` and adds all the rules for which the sensors ``COMPUTE`` parameter is ``True``. The ``expand`` function allows us to generate a list of file paths that have a common structure except for PIDS or other parameters. Consider the following::
|
||||
|
||||
expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["SENSORS"]),
|
||||
|
||||
If ``pids = ['p01','p02']`` and ``sensor = ['sms', 'calls']`` then the above directive would produce::
|
||||
files_to_compute.extend(expand("data/raw/{pid}/{sensor}_raw.csv", pid=config["PIDS"], sensor=config["MESSAGES"]["DB_TABLE"]))
|
||||
|
||||
["data/raw/p01/sms_raw.csv", "data/raw/p01/calls_raw.csv", "data/raw/p02/sms_raw.csv", "data/raw/p02/calls_raw.csv"]
|
||||
If ``pids = ['p01','p02']`` and ``sensor = ['messages', 'calls']`` then the above directive would produce::
|
||||
|
||||
["data/raw/p01/messages_raw.csv", "data/raw/p01/calls_raw.csv", "data/raw/p02/messages_raw.csv", "data/raw/p02/calls_raw.csv"]
|
||||
|
||||
Thus, this allows us to define all the desired output files without having to manually list each path for every participant and every sensor. The way Snakemake works is that it looks for the rule that produces the desired output files and then executes that rule. For more information on ``expand`` see `The Expand Function`_
|
||||
|
||||
|
@ -86,44 +88,44 @@ A Snakemake workflow is defined by rules (See the features_ snakefile as an actu
|
|||
|
||||
A sample rule from the RAPIDS source code is shown below::
|
||||
|
||||
rule sms_features:
|
||||
rule messages_features:
|
||||
input:
|
||||
"data/raw/{pid}/messages_with_datetime.csv"
|
||||
expand("data/raw/{{pid}}/{sensor}_with_datetime.csv", sensor=config["MESSAGES"]["DB_TABLE"])
|
||||
params:
|
||||
sms_type = "{sms_type}",
|
||||
messages_type = "{messages_type}",
|
||||
day_segment = "{day_segment}",
|
||||
features = lambda wildcards: config["SMS"]["FEATURES"][wildcards.sms_type]
|
||||
features = lambda wildcards: config["MESSAGES"]["FEATURES"][wildcards.messages_type]
|
||||
output:
|
||||
"data/processed/{pid}/sms_{sms_type}_{day_segment}.csv"
|
||||
"data/processed/{pid}/messages_{messages_type}_{day_segment}.csv"
|
||||
script:
|
||||
"../src/features/sms_features.R"
|
||||
"../src/features/messages_features.R"
|
||||
|
||||
|
||||
The ``rule`` directive specifies the name of the rule that is being defined. ``params`` defines additional parameters for the rule's script. In the example above, the parameters are passed to the ``sms_features.R`` script as an dictionary. Instead of ``script`` a ``shell`` command call can also be called by replacing the ``script`` directive of the rule and replacing it with::
|
||||
The ``rule`` directive specifies the name of the rule that is being defined. ``params`` defines additional parameters for the rule's script. In the example above, the parameters are passed to the ``messages_features.R`` script as an dictionary. Instead of ``script`` a ``shell`` command call can also be called by replacing the ``script`` directive of the rule and replacing it with::
|
||||
|
||||
shell: "somecommand {input} {output}"
|
||||
|
||||
It should be noted that rules can defined without input and output as seen in the ``renv.snakemake``. For more information see `Rules documentation`_ and for an actual example see the `renv`_ snakefile.
|
||||
It should be noted that rules can be defined without input and output as seen in the ``renv.snakemake``. For more information see `Rules documentation`_ and for an actual example see the `renv`_ snakefile.
|
||||
|
||||
.. _wildcards:
|
||||
|
||||
Wildcards
|
||||
""""""""""
|
||||
There are times when the same rule should be applied to different participants and day segments. For this we use wildcards ``{my_wildcard}``. All wildcards are inferred from the files listed in the ``all` rule of the ``Snakefile`` file and therfore from the output of any rule::
|
||||
There are times when the same rule should be applied to different participants and day segments. For this we use wildcards ``{my_wildcard}``. All wildcards are inferred from the files listed in the ``all` rule of the ``Snakefile`` file and therefore from the output of any rule::
|
||||
|
||||
rule sms_features:
|
||||
rule messages_features:
|
||||
input:
|
||||
"data/raw/{pid}/messages_with_datetime.csv"
|
||||
expand("data/raw/{{pid}}/{sensor}_with_datetime.csv", sensor=config["MESSAGES"]["DB_TABLE"])
|
||||
params:
|
||||
sms_type = "{sms_type}",
|
||||
messages_type = "{messages_type}",
|
||||
day_segment = "{day_segment}",
|
||||
features = lambda wildcards: config["SMS"]["FEATURES"][wildcards.sms_type]
|
||||
features = lambda wildcards: config["MESSAGES"]["FEATURES"][wildcards.messages_type]
|
||||
output:
|
||||
"data/processed/{pid}/sms_{sms_type}_{day_segment}.csv"
|
||||
"data/processed/{pid}/messages_{messages_type}_{day_segment}.csv"
|
||||
script:
|
||||
"../src/features/sms_features.R"
|
||||
"../src/features/messages_features.R"
|
||||
|
||||
If the rule’s output matches a requested file, the substrings matched by the wildcards are propagated to the input and params directives. For example, if another rule in the workflow requires the file ``data/processed/p01/sms_sent_daily.csv``, Snakemake recognizes that the above rule is able to produce it by setting ``pid=p01``, ``sms_type=sent`` and ``day_segment=daily``. Thus, it requests the input file ``data/raw/p01/messages_with_datetime.csv`` as input, sets ``sms_type=sent``, ``day_segment=daily`` in the ``params`` directive and executes the script. ``../src/features/sms_features.R``. See the preprocessing_ snakefile for an actual example.
|
||||
If the rule’s output matches a requested file, the substrings matched by the wildcards are propagated to the input and params directives. For example, if another rule in the workflow requires the file ``data/processed/p01/messages_sent_daily.csv``, Snakemake recognizes that the above rule is able to produce it by setting ``pid=p01``, ``messages_type=sent`` and ``day_segment=daily``. Thus, it requests the input file ``data/raw/p01/messages_with_datetime.csv`` as input, sets ``messages_type=sent``, ``day_segment=daily`` in the ``params`` directive and executes the script. ``../src/features/messages_features.R``. See the preprocessing_ snakefile for an actual example.
|
||||
|
||||
|
||||
.. _the-data-directory:
|
||||
|
@ -152,33 +154,7 @@ The ``src`` directory holds all the scripts used by the pipeline for data manipu
|
|||
- ``visualization`` - This directory contains the scripts to create plots and reports. See `visualization directory`_
|
||||
|
||||
|
||||
.. _the-report-directory:
|
||||
|
||||
The ``reports`` Directory
|
||||
--------------------------
|
||||
|
||||
This directory contains reports and visualizations.
|
||||
|
||||
.. _Python: https://www.python.org/
|
||||
.. _Julia: https://julialang.org/
|
||||
.. _R: https://www.r-project.org/
|
||||
.. _`List of Timezone`: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
|
||||
.. _`The Expand Function`: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#the-expand-function
|
||||
.. _`example snakefile`: https://github.com/carissalow/rapids/blob/master/rules/features.snakefile
|
||||
.. _renv: https://github.com/carissalow/rapids/blob/master/rules/renv.snakefile
|
||||
.. _preprocessing: https://github.com/carissalow/rapids/blob/master/rules/preprocessing.snakefile
|
||||
.. _features: https://github.com/carissalow/rapids/blob/master/rules/features.snakefile
|
||||
.. _models: https://github.com/carissalow/rapids/blob/master/rules/models.snakefile
|
||||
.. _reports: https://github.com/carissalow/rapids/blob/master/rules/reports.snakefile
|
||||
.. _mystudy: https://github.com/carissalow/rapids/blob/master/rules/mystudy.snakefile
|
||||
.. _`Rules documentation`: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#rules
|
||||
.. _`data directory`: https://github.com/carissalow/rapids/tree/master/src/data
|
||||
.. _`features directory`: https://github.com/carissalow/rapids/tree/master/src/features
|
||||
.. _`models directory`: https://github.com/carissalow/rapids/tree/master/src/models
|
||||
.. _`visualization directory`: https://github.com/carissalow/rapids/tree/master/src/visualization
|
||||
.. _`config.yaml`: https://github.com/carissalow/rapids/blob/master/config.yaml
|
||||
.. _`Snakefile`: https://github.com/carissalow/rapids/blob/master/Snakefile
|
||||
|
||||
.. _RAPIDS_directory_structure:
|
||||
|
||||
::
|
||||
|
||||
|
@ -241,4 +217,25 @@ This directory contains reports and visualizations.
|
|||
│ ├── settings <- The config and settings files for running tests.
|
||||
│ └── Snakefile <- The Snakefile for testing only.
|
||||
│
|
||||
└── tox.ini <- tox file with settings for running tox; see tox.testrun.org
|
||||
└── tox.ini <- tox file with settings for running tox; see tox.testrun.org
|
||||
|
||||
|
||||
.. _Python: https://www.python.org/
|
||||
.. _Julia: https://julialang.org/
|
||||
.. _R: https://www.r-project.org/
|
||||
.. _`List of Timezone`: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
|
||||
.. _`The Expand Function`: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#the-expand-function
|
||||
.. _`example snakefile`: https://github.com/carissalow/rapids/blob/master/rules/features.snakefile
|
||||
.. _renv: https://github.com/carissalow/rapids/blob/master/rules/renv.snakefile
|
||||
.. _preprocessing: https://github.com/carissalow/rapids/blob/master/rules/preprocessing.snakefile
|
||||
.. _features: https://github.com/carissalow/rapids/blob/master/rules/features.snakefile
|
||||
.. _models: https://github.com/carissalow/rapids/blob/master/rules/models.snakefile
|
||||
.. _reports: https://github.com/carissalow/rapids/blob/master/rules/reports.snakefile
|
||||
.. _mystudy: https://github.com/carissalow/rapids/blob/master/rules/mystudy.snakefile
|
||||
.. _`Rules documentation`: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#rules
|
||||
.. _`data directory`: https://github.com/carissalow/rapids/tree/master/src/data
|
||||
.. _`features directory`: https://github.com/carissalow/rapids/tree/master/src/features
|
||||
.. _`models directory`: https://github.com/carissalow/rapids/tree/master/src/models
|
||||
.. _`visualization directory`: https://github.com/carissalow/rapids/tree/master/src/visualization
|
||||
.. _`config.yaml`: https://github.com/carissalow/rapids/blob/master/config.yaml
|
||||
.. _`Snakefile`: https://github.com/carissalow/rapids/blob/master/Snakefile
|
||||
|
|
|
@ -3,16 +3,32 @@
|
|||
|
||||
echo Setting up for testing...
|
||||
|
||||
echo Copying files...
|
||||
# Uncomment the section below if neccessary to remove old files when testing locally
|
||||
# echo deleting old data...
|
||||
# rm -rf data/raw/*
|
||||
# rm -rf data/processed/*
|
||||
# rm -rf data/interim/*
|
||||
# rm -rf data/external/test*
|
||||
|
||||
echo Copying files...
|
||||
cp -r tests/data/raw/* data/raw
|
||||
cp tests/data/external/* data/external
|
||||
|
||||
# Uncomment the section below to backup snakemake file when testing locally
|
||||
# echo Backing up preprocessing...
|
||||
# cp rules/preprocessing.snakefile bak
|
||||
|
||||
echo Disabling downloading of dataset...
|
||||
sed -e '10,20 s/^/#/' -e 's/rules.download_dataset.output/"data\/raw\/\{pid\}\/\{sensor\}_raw\.csv"/' rules/preprocessing.snakefile > tmp
|
||||
cp tmp rules/preprocessing.snakefile
|
||||
|
||||
echo Disabling downloading of dataset...
|
||||
snakemake --profile tests/settings
|
||||
echo Running RAPIDS Pipeline on testdata...
|
||||
snakemake --profile tests/settings
|
||||
|
||||
echo Running tests on data produced...
|
||||
python -m unittest discover tests/scripts/ -v
|
||||
python -m unittest discover tests/scripts/ -v
|
||||
|
||||
# Uncomment to return snakemake back to the original version when testing locally
|
||||
# echo Cleaning up...
|
||||
# mv bak rules/preprocessing.snakefile
|
||||
# rm tmp
|
Loading…
Reference in New Issue