rapids/calculatingfeatures
Primoz d1c59de2e9 Add folder structure for CF testing and EDA test. 2022-03-21 10:40:18 +00:00
..
CalculatingFeatures Change of the relative import statements. 2022-03-21 10:38:15 +00:00
cf_tests Add folder structure for CF testing and EDA test. 2022-03-21 10:40:18 +00:00
eda_explorer Build calc features lib and related packages. 2022-03-21 08:28:28 +00:00
example_data Build calc features lib and related packages. 2022-03-21 08:28:28 +00:00
.gitignore Add folder structure for CF testing and EDA test. 2022-03-21 10:40:18 +00:00
README.md Build calc features lib and related packages. 2022-03-21 08:28:28 +00:00
__init__.py Build calc features lib and related packages. 2022-03-21 08:28:28 +00:00
pyproject.toml Build calc features lib and related packages. 2022-03-21 08:28:28 +00:00
setup.cfg Build calc features lib and related packages. 2022-03-21 08:28:28 +00:00
usage_examples.ipynb Build calc features lib and related packages. 2022-03-21 08:28:28 +00:00

README.md

Calculating features

Usage

  • Install the library with:
     pip install pep517
     python -m pep517.build .
    
    ternative: 
    p install build
    thon -m build
    
  • Basic usage is:
    from calculatingfeatures.CalculatingFeatures.helper_functions import convert1DEmpaticaToArray, convertInputInto2d, frequencyFeatureNames, hrvFeatureNames
    from calculatingfeatures.CalculatingFeatures.calculate_features import calculateFeatures
    import pandas as pd
    
    pathToHrvCsv = "example_data/S2_E4_Data/BVP.csv"
    windowLength = 500
    
    # get an array of values from HRV empatica file
    hrv_data, startTimeStamp, sampleRate = convert1DEmpaticaToArray(pathToHrvCsv)
    
    # Convert the HRV data into 2D array
    hrv_data_2D = convertInputInto2d(hrv_data, windowLength)
    
    # Create a list with feature names
    featureNames = []
    featureNames.extend(hrvFeatureNames)
    featureNames.extend(frequencyFeatureNames)
    
    pd.set_option('display.max_columns', None)
    
    # Calculate features
    calculatedFeatures = calculateFeatures(hrv_data_2D, fs=int(sampleRate), featureNames=featureNames)
    
  • More usage examples are located in usage_examples.ipynb file

Features

  • Features are returned (from calculateFeatures() function) in a Pandas DataFrame object.
  • In the case if a feature couldn't be calculated (for example, if input signal is invalid), NaN value is returned.
  • Further in this section, the list with descriptions of all possible features is presented.

GSR features:

These features are useful for 1D GSR(EDA) signals

  • mean: mean of the signal
  • std: standard deviation of signal
  • q25: 0.25 quantile
  • q75: 0.75 quantile
  • qd: q75 - q25
  • deriv: sum of gradients of the signal
  • power: power of the signal (mean of squared signal)
  • numPeaks: number of EDA peaks
  • ratePeaks: average number of peaks per second
  • powerPeaks: power of peaks (mean of signal at indexes of peaks)
  • sumPosDeriv: sum of positive derivatives divided by number of all derivatives
  • propPosDeriv: proportion of positive derivatives per all derivatives
  • derivTonic: sum of gradients of the tonic
  • sigTonicDifference: mean of tonic subtracted from signal
  • freqFeats:
  • maxPeakAmplitudeChangeBefore: maximum peak amplitude change before peak
  • maxPeakAmplitudeChangeAfter: maximum peak amplitude change after peak
  • avgPeakAmplitudeChangeBefore: average peak amplitude change before peak
  • avgPeakAmplitudeChangeAfter: average peak amplitude change after peak
  • avgPeakChangeRatio: avg_peak_increase_time / avg_peak_decrease_time
  • maxPeakIncreaseTime: maximum peak increase time
  • maxPeakDecreaseTime: maximum peak decrease time
  • maxPeakDuration: maximum peak duration
  • maxPeakChangeRatio: max_peak_increase_time / max_peak_decrease_time
  • avgPeakIncreaseTime: average peak increase time
  • avgPeakDecreaseTime: average peak decreade time
  • avgPeakDuration: average peak duration
  • maxPeakResponseSlopeBefore: maximum peak response slope before peak
  • maxPeakResponseSlopeAfter: maximum peak response slope after peak
  • signalOverallChange: maximum difference between samples (max(sig)-min(sig))
  • changeDuration: duration between maximum and minimum values
  • changeRate: change_duration / signal_overall_change
  • significantIncrease:
  • significantDecrease:

HRV features:

These features are useful for 1D HRV(BVP) signals.

If number of RR intervals (numRR) is less than length of sample / (2 * sampling rate) (30 BPM) or greater than length of sample / (sampling rate / 4) (240 BPM), BPM value is incorrect and thus, all other HRV features are set to NaN.

  • meanHr: mean heart rate
  • ibi: mean interbeat interval
  • sdnn: standard deviation of the ibi
  • sdsd: standard deviation of the differences between all subsequent R-R intervals
  • rmssd: root of the mean of the list of squared differences
  • pnn20: the proportion of NN20 intervals to all intervals
  • pnn50: the proportion of NN50 intervals to all intervals
  • sd:
  • sd2:
  • sd1/sd2: sd / sd2 ratio
  • numRR: number of RR intervals

Accelerometer features:

These features are useful for 3D signals from accelerometer

  • meanLow: mean of low-pass filtered signal
  • areaLow: area under the low-pass filtered signal
  • totalAbsoluteAreaBand: sum of absolute areas under the band-pass filtered x, y and z signal
  • totalMagnitudeBand: square root of sum of squared band-pass filtered x, y and z components
  • entropyBand: entropy of band-pass filtered signal
  • skewnessBand: skewness of band-pass filtered signal
  • kurtosisBand: kurtosis of band-pass filtered signal
  • postureDistanceLow: calculates difference between mean values for a given sensor (low-pass filtered)
  • absoluteMeanBand: mean of band-pass filtered signal
  • absoluteAreaBand: area under the band-pass filtered signal
  • quartilesBand: quartiles of band-pass filtered signal
  • interQuartileRangeBand: inter quartile range of band-pass filtered signal
  • varianceBand: variance of band-pass filtered signal
  • coefficientOfVariationBand: dispersion of band-pass filtered signal
  • amplitudeBand: difference between maximum and minimum sample of band-pass filtered signal
  • totalEnergyBand: total magnitude of band-pass filtered signal
  • dominantFrequencyEnergyBand: ratio of energy in dominant frequency
  • meanCrossingRateBand: the number of signal crossings with mean of band-pass filtered signal
  • correlationBand: Pearson's correlation between band-pass filtered axis
  • quartilesMagnitudesBand: quartiles at 25%, 50% and 75% per band-pass filtered signal
  • interQuartileRangeMagnitudesBand: interquartile range of band-pass filtered signal
  • areaUnderAccelerationMagnitude: area under acceleration magnitude
  • peaksDataLow: number of peaks, sum of peak values, peak avg, amplitude avg
  • sumPerComponentBand: sum per component of band-pass filtered signal
  • velocityBand: velocity of the band-pass filtered signal
  • meanKineticEnergyBand: mean kinetic energy 1/2*mV^2 of band-pass filtered signal
  • totalKineticEnergyBand: total kinetic energy 1/2*mV^2 for all axes (band-pass filtered)
  • squareSumOfComponent: squared sum of component
  • sumOfSquareComponents: sum of squared components
  • averageVectorLength: mean of magnitude vector
  • averageVectorLengthPower: square mean of magnitude vector
  • rollAvgLow: maximum difference of low-pass filtered roll samples
  • pitchAvgLow: maximum difference of low-pass filtered pitch samples
  • rollStdDevLow: standard deviation of roll (calculated from low-pass filtered signal)
  • pitchStdDevLow: standard deviation of pitch (calculated from low-pass filtered signal)
  • rollMotionAmountLow: amount of wrist roll (from low-pass filtered signal) motion
  • rollMotionRegularityLow: regularity of wrist roll motion
  • manipulationLow: manipulation of low-pass filtered signals
  • rollPeaks: number of roll peaks, sum of roll peak values, roll peak avg, roll amplitude avg
  • pitchPeaks: number of pitch peaks, sum of pitch peak values, pitch peak avg, pitch amplitude avg
  • rollPitchCorrelation: correlation between roll and peak (obtained from low-pass filtered signal)

Gyroscope features:

These features are useful for 3D signals from gyroscope

  • meanLow: mean of low-pass filtered signal
  • areaLow: area under the low-pass filtered signal
  • totalAbsoluteAreaLow: sum of absolute areas under the low-pass filtered x, y and z signal
  • totalMagnitudeLow: square root of sum of squared band-pass filtered x, y and z components
  • entropyLow: entropy of low-pass filtered signal
  • skewnessLow: skewness of low-pass filtered signal
  • kurtosisLow: kurtosis of low-pass filtered signal
  • quartilesLow: quartiles of low-pass filtered signal
  • interQuartileRangeLow: inter quartile range of low-pass filtered signal
  • varianceLow: variance of low-pass filtered signal
  • coefficientOfVariationLow: dispersion of low-pass filtered signal
  • amplitudeLow: difference between maximum and minimum sample of low-pass filtered signal
  • totalEnergyLow: total magnitude of low-pass filtered signal
  • dominantFrequencyEnergyLow: ratio of energy in dominant frequency
  • meanCrossingRateLow: the number of signal crossings with mean of low-pass filtered signal
  • correlationLow: Pearson's correlation between low-pass filtered axis
  • quartilesMagnitudeLow: quartiles at 25%, 50% and 75% per low-pass filtered signal
  • interQuartileRangeMagnitudesLow: interquartile range of band-pass filtered signal
  • areaUnderMagnitude: area under magnitude
  • peaksCountLow: number of peaks in low-pass filtered signal
  • averageVectorLengthLow: mean of low-pass filtered magnitude vector
  • averageVectorLengthPowerLow: square mean of low-pass filtered magnitude vector

Generic features:

These are generic features, useful for many different types of signals

  • autocorrelations: autocorrelations of the given signal with lags 5, 10, 20, 30, 50, 75 and 100
  • countAboveMean: number of values in signal that are higher than the mean of signal
  • countBelowMean: number of values in signal that are lower than the mean of signal
  • maximum: maximum value of the signal
  • minimum: minimum value of the signal
  • meanAbsChange: the mean of absolute differences between subsequent time series values
  • longestStrikeAboveMean: longest part of signal above mean
  • longestStrikeBelowMean: longest part of signal below mean
  • stdDev: standard deviation of the signal
  • median: median of the signal
  • meanChange: the mean over the differences between subsequent time series values
  • numberOfZeroCrossings: number of crossings of signal on 0
  • absEnergy: the absolute energy of the time series which is the sum over the squared values
  • linearTrendSlope: a linear least-squares regression for the values of the time series versus the sequence from 0 to length of the time series minus one
  • ratioBeyondRSigma: ratio of values that are more than r*std(x) (so r sigma) away from the mean of signal. r in this case is 2.5
  • binnedEntropy: entropy of binned values
  • numOfPeaksAutocorr: number of peaks of autocorrelations
  • numberOfZeroCrossingsAutocorr: number of crossings of autocorrelations on 0
  • areaAutocorr: area under autocorrelations
  • calcMeanCrossingRateAutocorr: the number of autocorrelation crossings with mean
  • countAboveMeanAutocorr: umber of values in signal that are higher than the mean of autocorrelation
  • sumPer: sum per component
  • sumSquared: squared sum per component
  • squareSumOfComponent: square sum of component
  • sumOfSquareComponents:sum of square components

Frequency features:

These are frequency features, useful for many different types of signals. The signal is converted to power spectral density signal and features are calculated on this signal

  • fqHighestPeakFreqs: three frequencies corresponding to the largest peaks added to features
  • fqHighestPeaks: three largest peaks added to features
  • fqEnergyFeat: energy calculated as the sum of the squared FFT component magnitudes, and normalized
  • fqEntropyFeat: entropy of the FFT of the signal
  • fqHistogramBins: Binned distribution (histogram)
  • fqAbsMean: absolute mean of the raw signal
  • fqSkewness: skewness of the power spectrum of the data
  • fqKurtosis: kurtosis of the power spectrum of the data
  • fqInterquart: inter quartile range of the raw signal