Open In Colab

Example 0 - Input data description

This notebook introduces the input data format needed to feed in classes EMGMeasurement, EMGMeasurementCollection, and DataProcessingManager to process the EMG signals.

The data in this package is saved as numpy arrays. Converting raw data from different types of files into numpy arrays is also discussed in this notebook.

[1]:
import numpy as np

Input data format of single trial

Signal data of one channel

Signal data of the trial which has one channel should be stored in either “a 1d ndarray with shape (n_samples,)”, or “a 2d ndarray with shape (n_samples, 1)”.

Here is a sample data in a 1d ndarray with shape (n_samples,), where n_samples is 20.

[2]:
data = np.array([20.3, 41.0, 53.9, 63.3, 39.5, 24.9, 26.1, 24.0, 44.1, 42.0,
                 37.4, 24.6, -21.8, -56.3, -48.1, -45.0, -29.1, -9.6, 5.3, 1.4])
print(data.shape)
data
(20,)
[2]:
array([ 20.3,  41. ,  53.9,  63.3,  39.5,  24.9,  26.1,  24. ,  44.1,
        42. ,  37.4,  24.6, -21.8, -56.3, -48.1, -45. , -29.1,  -9.6,
         5.3,   1.4])

Here is a sample data in a 2d ndarray with shape (n_samples, 1), where n_samples is 20.

[3]:
data = np.array([[20.3], [41.0], [53.9], [63.3], [39.5], [24.9], [26.1], [24.0], [44.1], [42.0],
                 [37.4], [24.6], [-21.8], [-56.3], [-48.1], [-45.0], [-29.1], [-9.6], [ 5.3], [ 1.4]])
print(data.shape)
data
(20, 1)
[3]:
array([[ 20.3],
       [ 41. ],
       [ 53.9],
       [ 63.3],
       [ 39.5],
       [ 24.9],
       [ 26.1],
       [ 24. ],
       [ 44.1],
       [ 42. ],
       [ 37.4],
       [ 24.6],
       [-21.8],
       [-56.3],
       [-48.1],
       [-45. ],
       [-29.1],
       [ -9.6],
       [  5.3],
       [  1.4]])

Signal data of multiple channels

Signal data of the trial should be stored in a 2d ndarray with shape (n_samples, n_channels).

Here is a sample data in a 2d ndarray with shape (n_samples, n_channels), where n_samples is 20 and n_channels is 2.

[4]:
data = np.array([[20.3, 1.1], [41.0, 2.9], [53.9, 1.4], [63.3, -0.2], [39.5, 4.4],
                 [24.9, 7.2], [26.1, 9.9], [24.0, 19.1], [44.1, 14.2], [42.0, 18.8],
                 [37.4, 17.2], [24.6, 17.9], [-21.8, 11.1], [-56.3, 13.9], [-48.1, 15.4],
                 [-45.0, 19.4], [-29.1, 12.1], [-9.6, 16.9], [5.3, 12.4], [1.4, 9.0]])
print(data.shape)
data
(20, 2)
[4]:
array([[ 20.3,   1.1],
       [ 41. ,   2.9],
       [ 53.9,   1.4],
       [ 63.3,  -0.2],
       [ 39.5,   4.4],
       [ 24.9,   7.2],
       [ 26.1,   9.9],
       [ 24. ,  19.1],
       [ 44.1,  14.2],
       [ 42. ,  18.8],
       [ 37.4,  17.2],
       [ 24.6,  17.9],
       [-21.8,  11.1],
       [-56.3,  13.9],
       [-48.1,  15.4],
       [-45. ,  19.4],
       [-29.1,  12.1],
       [ -9.6,  16.9],
       [  5.3,  12.4],
       [  1.4,   9. ]])

Timestamp data

If given by the users, timestamp data of the trial should be stored in a 1d ndarray with shape (n_samples,).

Here is a sample data in a 1d ndarray with shape (n_samples,), where n_samples is 20.

[5]:
timestamp = np.array([0.000, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,
                      0.010, 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017, 0.018, 0.019])
print(timestamp.shape)
timestamp
(20,)
[5]:
array([0.   , 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008,
       0.009, 0.01 , 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017,
       0.018, 0.019])

Timestamps are not required to start from 0 or be equally spaced. Its values come from the actual data.

For example, the following timestamps are valid.

[6]:
timestamp = np.array([0.006, 0.007, 0.008, 0.009, 0.010, 0.011, 0.012, 0.013, 0.014, 0.015,
                      0.016, 0.017, 0.018, 0.019, 0.020, 0.021, 0.022, 0.023, 0.024, 0.025])
[7]:
timestamp = np.array([0.002, 0.007, 0.008, 0.009, 0.010, 0.011, 0.012, 0.013, 0.014, 0.015,
                      0.016, 0.017, 0.018, 0.019, 0.020, 0.021, 0.022, 0.023, 0.024, 0.025])

If the users do not give the values of timestamp, they will be generated starting from 0 and in increments of 1/hz, where hz is the given sample frequency.

For example, when n_samples is 20 and hz is 1000, timestamp will be generated as the following by default.

[8]:
timestamp = np.array([0.000, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,
                      0.010, 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017, 0.018, 0.019])

Input data format of multiple trials

Signal data of multiple trials are organized into a list, in which each element is the signal data of one trial and its format follows the format of single trial described in the previous section.

Here is a sample data of 3 trials, and the signal data in each trial has 2 channels and 18, 22, 19 samples respectively.

[9]:
all_data = [np.array([[8.698, -9.613], [7.172, -2.594], [3.51, -5.951], [8.087, -2.899], [5.035, -2.289],
                      [10.529, -0.763], [-2.289, 4.73], [4.12, 1.068], [0.153, -4.12], [12.665, -9.918],
                      [9.613, -7.782], [15.106, -9.918], [9.003, -12.665], [7.477, -22.43], [2.899, -10.223],
                      [5.646, -11.749], [-5.646, -8.698], [1.373, -9.918]]),
            np.array([[3.51, -5.951], [10.529, -7.172], [13.275, -8.087], [11.444, -13.275], [12.054, -13.275],
                      [11.139, -6.561], [9.308, -7.782], [8.087, -7.477], [18.463, -1.068], [1.984, -4.12],
                      [-14.801, -12.97], [-21.21, -10.834], [-27.313, 7.477], [-42.878, -5.646], [-42.573, -5.646],
                      [-35.859, -9.613], [-42.268, -10.834], [-23.041, -10.529], [-16.022, -9.613], [-12.97, -11.749],
                      [3.204, -5.951], [6.866, -6.866]]),
            np.array([[-9.613, -12.665], [-22.125, -16.937], [-23.346, -11.749], [-42.268, -8.087], [-53.864, -9.613],
                      [-41.047, -7.477], [-27.924, -16.632], [-28.839, -8.698], [-19.684, -8.392], [-17.242, -4.73],
                      [6.866, -8.698], [37.995, 3.204], [78.279, -3.815], [102.694, -3.204], [132.296, -9.003],
                      [141.756, -9.308], [113.375, -4.12], [102.388, -12.97], [86.823, -6.866]])]
print(f'Number of trials = {len(all_data)}')
for k in range(len(all_data)):
  print(f'Shape of signal data of trial {k} is {all_data[k].shape}')
Number of trials = 3
Shape of signal data of trial 0 is (18, 2)
Shape of signal data of trial 1 is (22, 2)
Shape of signal data of trial 2 is (19, 2)

If given, timestamp data of multiple trials are organized into a list, in which each element is the timestamp data of one trial and its format follows the format of single trial described in the previous section.

Here is a sample data of 3 trials, and the timestamp data in each trial has 18, 22, 19 samples respectively.

[10]:
all_timestamp = [np.array([0.000, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,
                           0.010, 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017]),
                 np.array([0.000, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,
                           0.010, 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017, 0.018, 0.019,
                           0.020, 0.021]),
                 np.array([0.000, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,
                           0.010, 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017, 0.018])]
print(f'Number of trials = {len(all_timestamp)}')
for k in range(len(all_timestamp)):
  print(f'Shape of timestamp data of trial {k} is {all_timestamp[k].shape}')
Number of trials = 3
Shape of timestamp data of trial 0 is (18,)
Shape of timestamp data of trial 1 is (22,)
Shape of timestamp data of trial 2 is (19,)

Coverting raw data from files into numpy array

Raw data in structured format

If the raw data files are structured text file, for example, in csv format, then the users can use the function numpy.genfromtxt to load the files into numpy arrays.

Example 3 loads the data with numpy.genfromtxt.

If the raw data files are MATLAB files (i.e., .mat), then the users can use the function scipy.io.loadmat to load the files into numpy arrays.

Example 4 loads the data with scipy.io.loadmat.

After the data is loaded from a file, if not the whole data is needed, indexing on ndarrays can be used to extract the needed parts of the data.

Raw data in a more free format

If the raw data files are in a more free format, the users can use Python’s io module to open files and extract the signal (and timestamp) data according to the structure of each individual file.

For example, the data sample below shows that there are several header lines in the beginning of the file. Handling this type of data is shown in Example 1 and Example 2.

File Name: marcha_2.log
Channel 4: 'Recto Femoral', 43665 values, engineering units: mV, no filters.
Channel 5: 'Biceps Femoral', 43665 values, engineering units: mV, no filters.
Channel 6: 'Vasto Medial', 43665 values, engineering units: mV, no filters.
Channel 7: 'EMG Semitendinoso', 43665 values, engineering units: mV, no filters.
Channel 8: 'Flexo-Extension', 43665 values, engineering units: deg, no filters.

0.0067  -0.021  0.0675  -0.0195 4.7
-0.0053 -0.0368 0.1372  -0.0225 4.9
-0.0053 0.003   0.1365  -0.0143 4.8
-0.0046 0.0082  0.135   -0.003  4.7
............