Example 0 - Input data description

This notebook introduces the input data format needed to feed in classes EMGMeasurement, EMGMeasurementCollection, and DataProcessingManager to process the EMG signals.

The data in this package is saved as numpy arrays. Converting raw data from different types of files into numpy arrays is also discussed in this notebook.

[1]:

import numpy as np

Input data format of single trial

Signal data of one channel

Signal data of the trial which has one channel should be stored in either “a 1d ndarray with shape (n_samples,)”, or “a 2d ndarray with shape (n_samples, 1)”.

Here is a sample data in a 1d ndarray with shape (n_samples,), where n_samples is 20.

[2]:

data = np.array([20.3, 41.0, 53.9, 63.3, 39.5, 24.9, 26.1, 24.0, 44.1, 42.0,
                 37.4, 24.6, -21.8, -56.3, -48.1, -45.0, -29.1, -9.6, 5.3, 1.4])
print(data.shape)
data

(20,)

[2]:

array([ 20.3,  41. ,  53.9,  63.3,  39.5,  24.9,  26.1,  24. ,  44.1,
        42. ,  37.4,  24.6, -21.8, -56.3, -48.1, -45. , -29.1,  -9.6,
         5.3,   1.4])

Here is a sample data in a 2d ndarray with shape (n_samples, 1), where n_samples is 20.

[3]:

data = np.array([[20.3], [41.0], [53.9], [63.3], [39.5], [24.9], [26.1], [24.0], [44.1], [42.0],
                 [37.4], [24.6], [-21.8], [-56.3], [-48.1], [-45.0], [-29.1], [-9.6], [ 5.3], [ 1.4]])
print(data.shape)
data

(20, 1)

[3]:

array([[ 20.3],
       [ 41. ],
       [ 53.9],
       [ 63.3],
       [ 39.5],
       [ 24.9],
       [ 26.1],
       [ 24. ],
       [ 44.1],
       [ 42. ],
       [ 37.4],
       [ 24.6],
       [-21.8],
       [-56.3],
       [-48.1],
       [-45. ],
       [-29.1],
       [ -9.6],
       [  5.3],
       [  1.4]])

Signal data of multiple channels

Signal data of the trial should be stored in a 2d ndarray with shape (n_samples, n_channels).

Here is a sample data in a 2d ndarray with shape (n_samples, n_channels), where n_samples is 20 and n_channels is 2.

[4]:

data = np.array([[20.3, 1.1], [41.0, 2.9], [53.9, 1.4], [63.3, -0.2], [39.5, 4.4],
                 [24.9, 7.2], [26.1, 9.9], [24.0, 19.1], [44.1, 14.2], [42.0, 18.8],
                 [37.4, 17.2], [24.6, 17.9], [-21.8, 11.1], [-56.3, 13.9], [-48.1, 15.4],
                 [-45.0, 19.4], [-29.1, 12.1], [-9.6, 16.9], [5.3, 12.4], [1.4, 9.0]])
print(data.shape)
data

(20, 2)

[4]:

array([[ 20.3,   1.1],
       [ 41. ,   2.9],
       [ 53.9,   1.4],
       [ 63.3,  -0.2],
       [ 39.5,   4.4],
       [ 24.9,   7.2],
       [ 26.1,   9.9],
       [ 24. ,  19.1],
       [ 44.1,  14.2],
       [ 42. ,  18.8],
       [ 37.4,  17.2],
       [ 24.6,  17.9],
       [-21.8,  11.1],
       [-56.3,  13.9],
       [-48.1,  15.4],
       [-45. ,  19.4],
       [-29.1,  12.1],
       [ -9.6,  16.9],
       [  5.3,  12.4],
       [  1.4,   9. ]])

Timestamp data

If given by the users, timestamp data of the trial should be stored in a 1d ndarray with shape (n_samples,).

Here is a sample data in a 1d ndarray with shape (n_samples,), where n_samples is 20.

[5]:

timestamp = np.array([0.000, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,
                      0.010, 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017, 0.018, 0.019])
print(timestamp.shape)
timestamp

(20,)

[5]:

array([0.   , 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008,
       0.009, 0.01 , 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017,
       0.018, 0.019])

Timestamps are not required to start from 0 or be equally spaced. Its values come from the actual data.

For example, the following timestamps are valid.

[6]:

timestamp = np.array([0.006, 0.007, 0.008, 0.009, 0.010, 0.011, 0.012, 0.013, 0.014, 0.015,
                      0.016, 0.017, 0.018, 0.019, 0.020, 0.021, 0.022, 0.023, 0.024, 0.025])

[7]:

timestamp = np.array([0.002, 0.007, 0.008, 0.009, 0.010, 0.011, 0.012, 0.013, 0.014, 0.015,
                      0.016, 0.017, 0.018, 0.019, 0.020, 0.021, 0.022, 0.023, 0.024, 0.025])

If the users do not give the values of timestamp, they will be generated starting from 0 and in increments of 1/hz, where hz is the given sample frequency.

For example, when n_samples is 20 and hz is 1000, timestamp will be generated as the following by default.

[8]:

timestamp = np.array([0.000, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,
                      0.010, 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017, 0.018, 0.019])

Input data format of multiple trials

Signal data of multiple trials are organized into a list, in which each element is the signal data of one trial and its format follows the format of single trial described in the previous section.

Here is a sample data of 3 trials, and the signal data in each trial has 2 channels and 18, 22, 19 samples respectively.

[9]:

all_data = [np.array([[8.698, -9.613], [7.172, -2.594], [3.51, -5.951], [8.087, -2.899], [5.035, -2.289],
                      [10.529, -0.763], [-2.289, 4.73], [4.12, 1.068], [0.153, -4.12], [12.665, -9.918],
                      [9.613, -7.782], [15.106, -9.918], [9.003, -12.665], [7.477, -22.43], [2.899, -10.223],
                      [5.646, -11.749], [-5.646, -8.698], [1.373, -9.918]]),
            np.array([[3.51, -5.951], [10.529, -7.172], [13.275, -8.087], [11.444, -13.275], [12.054, -13.275],
                      [11.139, -6.561], [9.308, -7.782], [8.087, -7.477], [18.463, -1.068], [1.984, -4.12],
                      [-14.801, -12.97], [-21.21, -10.834], [-27.313, 7.477], [-42.878, -5.646], [-42.573, -5.646],
                      [-35.859, -9.613], [-42.268, -10.834], [-23.041, -10.529], [-16.022, -9.613], [-12.97, -11.749],
                      [3.204, -5.951], [6.866, -6.866]]),
            np.array([[-9.613, -12.665], [-22.125, -16.937], [-23.346, -11.749], [-42.268, -8.087], [-53.864, -9.613],
                      [-41.047, -7.477], [-27.924, -16.632], [-28.839, -8.698], [-19.684, -8.392], [-17.242, -4.73],
                      [6.866, -8.698], [37.995, 3.204], [78.279, -3.815], [102.694, -3.204], [132.296, -9.003],
                      [141.756, -9.308], [113.375, -4.12], [102.388, -12.97], [86.823, -6.866]])]
print(f'Number of trials = {len(all_data)}')
for k in range(len(all_data)):
  print(f'Shape of signal data of trial {k} is {all_data[k].shape}')

Number of trials = 3
Shape of signal data of trial 0 is (18, 2)
Shape of signal data of trial 1 is (22, 2)
Shape of signal data of trial 2 is (19, 2)

If given, timestamp data of multiple trials are organized into a list, in which each element is the timestamp data of one trial and its format follows the format of single trial described in the previous section.

Here is a sample data of 3 trials, and the timestamp data in each trial has 18, 22, 19 samples respectively.

[10]:

all_timestamp = [np.array([0.000, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,
                           0.010, 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017]),
                 np.array([0.000, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,
                           0.010, 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017, 0.018, 0.019,
                           0.020, 0.021]),
                 np.array([0.000, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,
                           0.010, 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017, 0.018])]
print(f'Number of trials = {len(all_timestamp)}')
for k in range(len(all_timestamp)):
  print(f'Shape of timestamp data of trial {k} is {all_timestamp[k].shape}')

Number of trials = 3
Shape of timestamp data of trial 0 is (18,)
Shape of timestamp data of trial 1 is (22,)
Shape of timestamp data of trial 2 is (19,)

Coverting raw data from files into numpy array

Raw data in structured format

If the raw data files are structured text file, for example, in csv format, then the users can use the function numpy.genfromtxt to load the files into numpy arrays.

Example 3 loads the data with numpy.genfromtxt.

If the raw data files are MATLAB files (i.e., .mat), then the users can use the function scipy.io.loadmat to load the files into numpy arrays.

Example 4 loads the data with scipy.io.loadmat.

After the data is loaded from a file, if not the whole data is needed, indexing on ndarrays can be used to extract the needed parts of the data.

Raw data in a more free format

If the raw data files are in a more free format, the users can use Python’s io module to open files and extract the signal (and timestamp) data according to the structure of each individual file.

For example, the data sample below shows that there are several header lines in the beginning of the file. Handling this type of data is shown in Example 1 and Example 2.

File Name: marcha_2.log
Channel 4: 'Recto Femoral', 43665 values, engineering units: mV, no filters.
Channel 5: 'Biceps Femoral', 43665 values, engineering units: mV, no filters.
Channel 6: 'Vasto Medial', 43665 values, engineering units: mV, no filters.
Channel 7: 'EMG Semitendinoso', 43665 values, engineering units: mV, no filters.
Channel 8: 'Flexo-Extension', 43665 values, engineering units: deg, no filters.

0.0067  -0.021  0.0675  -0.0195 4.7
-0.0053 -0.0368 0.1372  -0.0225 4.9
-0.0053 0.003   0.1365  -0.0143 4.8
-0.0046 0.0082  0.135   -0.003  4.7
............