rapids/calculatingfeatures/README.md

207 lines
11 KiB
Markdown

# Calculating features
## Usage
- Install the library with:
```
pip install pep517
python -m pep517.build .
Alternative:
pip install build
python -m build
```
- Basic usage is:
```
from calculatingfeatures.CalculatingFeatures.helper_functions import convert1DEmpaticaToArray, convertInputInto2d, frequencyFeatureNames, hrvFeatureNames
from calculatingfeatures.CalculatingFeatures.calculate_features import calculateFeatures
import pandas as pd
pathToHrvCsv = "example_data/S2_E4_Data/BVP.csv"
windowLength = 500
# get an array of values from HRV empatica file
hrv_data, startTimeStamp, sampleRate = convert1DEmpaticaToArray(pathToHrvCsv)
# Convert the HRV data into 2D array
hrv_data_2D = convertInputInto2d(hrv_data, windowLength)
# Create a list with feature names
featureNames = []
featureNames.extend(hrvFeatureNames)
featureNames.extend(frequencyFeatureNames)
pd.set_option('display.max_columns', None)
# Calculate features
calculatedFeatures = calculateFeatures(hrv_data_2D, fs=int(sampleRate), featureNames=featureNames)
```
- More usage examples are located in **usage_examples.ipynb** file
## Features
- Features are returned (from calculateFeatures() function) in a Pandas DataFrame object.
- In the case if a feature couldn't be calculated (for example, if input signal is invalid), NaN value is returned.
- Further in this section, the list with descriptions of all possible features is presented.
### GSR features:
These features are useful for 1D GSR(EDA) signals
- `mean`: mean of the signal
- `std`: standard deviation of signal
- `q25`: 0.25 quantile
- `q75`: 0.75 quantile
- `qd`: q75 - q25
- `deriv`: sum of gradients of the signal
- `power`: power of the signal (mean of squared signal)
- `numPeaks`: number of EDA peaks
- `ratePeaks`: average number of peaks per second
- `powerPeaks`: power of peaks (mean of signal at indexes of peaks)
- `sumPosDeriv`: sum of positive derivatives divided by number of all derivatives
- `propPosDeriv`: proportion of positive derivatives per all derivatives
- `derivTonic`: sum of gradients of the tonic
- `sigTonicDifference`: mean of tonic subtracted from signal
- `freqFeats`:
- `maxPeakAmplitudeChangeBefore`: maximum peak amplitude change before peak
- `maxPeakAmplitudeChangeAfter`: maximum peak amplitude change after peak
- `avgPeakAmplitudeChangeBefore`: average peak amplitude change before peak
- `avgPeakAmplitudeChangeAfter`: average peak amplitude change after peak
- `avgPeakChangeRatio`: avg_peak_increase_time / avg_peak_decrease_time
- `maxPeakIncreaseTime`: maximum peak increase time
- `maxPeakDecreaseTime`: maximum peak decrease time
- `maxPeakDuration`: maximum peak duration
- `maxPeakChangeRatio`: max_peak_increase_time / max_peak_decrease_time
- `avgPeakIncreaseTime`: average peak increase time
- `avgPeakDecreaseTime`: average peak decreade time
- `avgPeakDuration`: average peak duration
- `maxPeakResponseSlopeBefore`: maximum peak response slope before peak
- `maxPeakResponseSlopeAfter`: maximum peak response slope after peak
- `signalOverallChange`: maximum difference between samples (max(sig)-min(sig))
- `changeDuration`: duration between maximum and minimum values
- `changeRate`: change_duration / signal_overall_change
- `significantIncrease`:
- `significantDecrease`:
### HRV features:
These features are useful for 1D HRV(BVP) signals.
If number of RR intervals (numRR) is less than `length of sample / (2 * sampling rate)` (30 BPM) or greater than `length of sample / (sampling rate / 4)` (240 BPM), BPM value is incorrect and thus, all other HRV features are set to NaN.
- `meanHr`: mean heart rate
- `ibi`: mean interbeat interval
- `sdnn`: standard deviation of the ibi
- `sdsd`: standard deviation of the differences between all subsequent R-R intervals
- `rmssd`: root of the mean of the list of squared differences
- `pnn20`: the proportion of NN20 intervals to all intervals
- `pnn50`: the proportion of NN50 intervals to all intervals
- `sd`:
- `sd2`:
- `sd1/sd2`: sd / sd2 ratio
- `numRR`: number of RR intervals
### Accelerometer features:
These features are useful for 3D signals from accelerometer
- `meanLow`: mean of low-pass filtered signal
- `areaLow`: area under the low-pass filtered signal
- `totalAbsoluteAreaBand`: sum of absolute areas under the band-pass filtered x, y and z signal
- `totalMagnitudeBand`: square root of sum of squared band-pass filtered x, y and z components
- `entropyBand`: entropy of band-pass filtered signal
- `skewnessBand`: skewness of band-pass filtered signal
- `kurtosisBand`: kurtosis of band-pass filtered signal
- `postureDistanceLow`: calculates difference between mean values for a given sensor (low-pass filtered)
- `absoluteMeanBand`: mean of band-pass filtered signal
- `absoluteAreaBand`: area under the band-pass filtered signal
- `quartilesBand`: quartiles of band-pass filtered signal
- `interQuartileRangeBand`: inter quartile range of band-pass filtered signal
- `varianceBand`: variance of band-pass filtered signal
- `coefficientOfVariationBand`: dispersion of band-pass filtered signal
- `amplitudeBand`: difference between maximum and minimum sample of band-pass filtered signal
- `totalEnergyBand`: total magnitude of band-pass filtered signal
- `dominantFrequencyEnergyBand`: ratio of energy in dominant frequency
- `meanCrossingRateBand`: the number of signal crossings with mean of band-pass filtered signal
- `correlationBand`: Pearson's correlation between band-pass filtered axis
- `quartilesMagnitudesBand`: quartiles at 25%, 50% and 75% per band-pass filtered signal
- `interQuartileRangeMagnitudesBand`: interquartile range of band-pass filtered signal
- `areaUnderAccelerationMagnitude`: area under acceleration magnitude
- `peaksDataLow`: number of peaks, sum of peak values, peak avg, amplitude avg
- `sumPerComponentBand`: sum per component of band-pass filtered signal
- `velocityBand`: velocity of the band-pass filtered signal
- `meanKineticEnergyBand`: mean kinetic energy 1/2*mV^2 of band-pass filtered signal
- `totalKineticEnergyBand`: total kinetic energy 1/2*mV^2 for all axes (band-pass filtered)
- `squareSumOfComponent`: squared sum of component
- `sumOfSquareComponents`: sum of squared components
- `averageVectorLength`: mean of magnitude vector
- `averageVectorLengthPower`: square mean of magnitude vector
- `rollAvgLow`: maximum difference of low-pass filtered roll samples
- `pitchAvgLow`: maximum difference of low-pass filtered pitch samples
- `rollStdDevLow`: standard deviation of roll (calculated from low-pass filtered signal)
- `pitchStdDevLow`: standard deviation of pitch (calculated from low-pass filtered signal)
- `rollMotionAmountLow`: amount of wrist roll (from low-pass filtered signal) motion
- `rollMotionRegularityLow`: regularity of wrist roll motion
- `manipulationLow`: manipulation of low-pass filtered signals
- `rollPeaks`: number of roll peaks, sum of roll peak values, roll peak avg, roll amplitude avg
- `pitchPeaks`: number of pitch peaks, sum of pitch peak values, pitch peak avg, pitch amplitude avg
- `rollPitchCorrelation`: correlation between roll and peak (obtained from low-pass filtered signal)
### Gyroscope features:
These features are useful for 3D signals from gyroscope
- `meanLow`: mean of low-pass filtered signal
- `areaLow`: area under the low-pass filtered signal
- `totalAbsoluteAreaLow`: sum of absolute areas under the low-pass filtered x, y and z signal
- `totalMagnitudeLow`: square root of sum of squared band-pass filtered x, y and z components
- `entropyLow`: entropy of low-pass filtered signal
- `skewnessLow`: skewness of low-pass filtered signal
- `kurtosisLow`: kurtosis of low-pass filtered signal
- `quartilesLow`: quartiles of low-pass filtered signal
- `interQuartileRangeLow`: inter quartile range of low-pass filtered signal
- `varianceLow`: variance of low-pass filtered signal
- `coefficientOfVariationLow`: dispersion of low-pass filtered signal
- `amplitudeLow`: difference between maximum and minimum sample of low-pass filtered signal
- `totalEnergyLow`: total magnitude of low-pass filtered signal
- `dominantFrequencyEnergyLow`: ratio of energy in dominant frequency
- `meanCrossingRateLow`: the number of signal crossings with mean of low-pass filtered signal
- `correlationLow`: Pearson's correlation between low-pass filtered axis
- `quartilesMagnitudeLow`: quartiles at 25%, 50% and 75% per low-pass filtered signal
- `interQuartileRangeMagnitudesLow`: interquartile range of band-pass filtered signal
- `areaUnderMagnitude`: area under magnitude
- `peaksCountLow`: number of peaks in low-pass filtered signal
- `averageVectorLengthLow`: mean of low-pass filtered magnitude vector
- `averageVectorLengthPowerLow`: square mean of low-pass filtered magnitude vector
### Generic features:
These are generic features, useful for many different types of signals
- `autocorrelations`: autocorrelations of the given signal with lags 5, 10, 20, 30, 50, 75 and 100
- `countAboveMean`: number of values in signal that are higher than the mean of signal
- `countBelowMean`: number of values in signal that are lower than the mean of signal
- `maximum`: maximum value of the signal
- `minimum`: minimum value of the signal
- `meanAbsChange`: the mean of absolute differences between subsequent time series values
- `longestStrikeAboveMean`: longest part of signal above mean
- `longestStrikeBelowMean`: longest part of signal below mean
- `stdDev`: standard deviation of the signal
- `median`: median of the signal
- `meanChange`: the mean over the differences between subsequent time series values
- `numberOfZeroCrossings`: number of crossings of signal on 0
- `absEnergy`: the absolute energy of the time series which is the sum over the squared values
- `linearTrendSlope`: a linear least-squares regression for the values of the time series versus the sequence from 0 to length of the time series minus one
- `ratioBeyondRSigma`: ratio of values that are more than r*std(x) (so r sigma) away from the mean of signal. r in this case is 2.5
- `binnedEntropy`: entropy of binned values
- `numOfPeaksAutocorr`: number of peaks of autocorrelations
- `numberOfZeroCrossingsAutocorr`: number of crossings of autocorrelations on 0
- `areaAutocorr`: area under autocorrelations
- `calcMeanCrossingRateAutocorr`: the number of autocorrelation crossings with mean
- `countAboveMeanAutocorr`: umber of values in signal that are higher than the mean of autocorrelation
- `sumPer`: sum per component
- `sumSquared`: squared sum per component
- `squareSumOfComponent`: square sum of component
- `sumOfSquareComponents`:sum of square components
### Frequency features:
These are frequency features, useful for many different types of signals. The signal is converted to power spectral density signal and features are calculated on this signal
- `fqHighestPeakFreqs`: three frequencies corresponding to the largest peaks added to features
- `fqHighestPeaks`: three largest peaks added to features
- `fqEnergyFeat`: energy calculated as the sum of the squared FFT component magnitudes, and normalized
- `fqEntropyFeat`: entropy of the FFT of the signal
- `fqHistogramBins`: Binned distribution (histogram)
- `fqAbsMean`: absolute mean of the raw signal
- `fqSkewness`: skewness of the power spectrum of the data
- `fqKurtosis`: kurtosis of the power spectrum of the data
- `fqInterquart`: inter quartile range of the raw signal