Update add-new-features.md

pull/103/head
Meng Li 2020-11-30 15:43:03 -05:00
parent 016bdbfe8c
commit 8daa5ef20e
1 changed files with 14 additions and 8 deletions

View File

@ -26,12 +26,18 @@ As a tutorial, we will add a new provider for `PHONE_ACCELEROMETER` called `VEGA
- Phone Bluetooth - Phone Bluetooth
- Phone Calls - Phone Calls
- Phone Conversation - Phone Conversation
- Phone Data Yield
- Phone Light - Phone Light
- Phone Locations - Phone Locations
- Phone Messages - Phone Messages
- Phone Screen - Phone Screen
- Phone WiFI Connected - Phone WiFI Connected
- Phone WiFI Visible - Phone WiFI Visible
- Fitbit Heart Rate Summary
- Fitbit Heart Rate Intraday
- Fitbit Sleep Summary
- Fitbit Steps Summary
- Fitbit Steps Intraday
### Modify the `config.yaml` file ### Modify the `config.yaml` file
@ -129,7 +135,7 @@ The code to extract your behavioral features should be implemented in your provi
Thus `filter_data_by_segment()` comes in handy, it will return a data frame that contains the rows that were logged during a day segment plus an extra column called `local_segment`. This new column will have as many unique values as day segment instances exist (14, 2, and 2 for our `p01`'s `my_days`, `my_weeks`, and `my_weekends` examples). After filtering, **you should group the data frame by this column and compute any desired features**, for example: Thus `filter_data_by_segment()` comes in handy, it will return a data frame that contains the rows that were logged during a day segment plus an extra column called `local_segment`. This new column will have as many unique values as day segment instances exist (14, 2, and 2 for our `p01`'s `my_days`, `my_weeks`, and `my_weekends` examples). After filtering, **you should group the data frame by this column and compute any desired features**, for example:
```python ```python
acc_features["acc_rapids_maxmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].max() acc_features["maxmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].max()
``` ```
The reason RAPIDS does not filter the participant's data set for you is because your code might need to compute something based on a participant's complete dataset before computing their features. For example, you might want to identify the number that called a participant the most throughout the study before computing a feature with the number of calls the participant received from this number. The reason RAPIDS does not filter the participant's data set for you is because your code might need to compute something based on a participant's complete dataset before computing their features. For example, you might want to identify the number that called a participant the most throughout the study before computing a feature with the number of calls the participant received from this number.
@ -139,7 +145,7 @@ The code to extract your behavioral features should be implemented in your provi
- One row per day segment instance (e.g. 14 our `p01`'s `my_days` example) - One row per day segment instance (e.g. 14 our `p01`'s `my_days` example)
- The `local_segment` column added by `filter_data_by_segment()` - The `local_segment` column added by `filter_data_by_segment()`
- One column per feature. By convention the name of your features should only contain letters or numbers (`feature1`). RAPIDS will automatically add the right sensor and provider prefix (`accelerometr_vega_`) - One column per feature. By convention the name of your features should only contain letters or numbers (`feature1`). RAPIDS will automatically add the right sensor and provider prefix (`phone_accelerometr_vega_`)
??? example "`PHONE_ACCELEROMETER` Provider Example" ??? example "`PHONE_ACCELEROMETER` Provider Example"
For your reference, this a short example of our own provider (`RAPIDS`) for `PHONE_ACCELEROMETER` that computes five acceleration features For your reference, this a short example of our own provider (`RAPIDS`) for `PHONE_ACCELEROMETER` that computes five acceleration features
@ -154,7 +160,7 @@ The code to extract your behavioral features should be implemented in your provi
# the subset of requested features this function can compute # the subset of requested features this function can compute
features_to_compute = list(set(requested_features) & set(base_features_names)) features_to_compute = list(set(requested_features) & set(base_features_names))
acc_features = pd.DataFrame(columns=["local_segment"] + ["acc_rapids_" + x for x in features_to_compute]) acc_features = pd.DataFrame(columns=["local_segment"] + features_to_compute)
if not acc_data.empty: if not acc_data.empty:
acc_data = filter_data_by_segment(acc_data, day_segment) acc_data = filter_data_by_segment(acc_data, day_segment)
@ -165,15 +171,15 @@ The code to extract your behavioral features should be implemented in your provi
acc_data = acc_data.assign(magnitude = magnitude.values) acc_data = acc_data.assign(magnitude = magnitude.values)
if "maxmagnitude" in features_to_compute: if "maxmagnitude" in features_to_compute:
acc_features["acc_rapids_maxmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].max() acc_features["maxmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].max()
if "minmagnitude" in features_to_compute: if "minmagnitude" in features_to_compute:
acc_features["acc_rapids_minmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].min() acc_features["minmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].min()
if "avgmagnitude" in features_to_compute: if "avgmagnitude" in features_to_compute:
acc_features["acc_rapids_avgmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].mean() acc_features["avgmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].mean()
if "medianmagnitude" in features_to_compute: if "medianmagnitude" in features_to_compute:
acc_features["acc_rapids_medianmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].median() acc_features["medianmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].median()
if "stdmagnitude" in features_to_compute: if "stdmagnitude" in features_to_compute:
acc_features["acc_rapids_stdmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].std() acc_features["stdmagnitude"] = acc_data.groupby(["local_segment"])["magnitude"].std()
acc_features = acc_features.reset_index() acc_features = acc_features.reset_index()