{"cells":[{"cell_type":"markdown","source":["[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aalhossary/pyemgpipeline/blob/master/docs/notebooks/ex0_input_data_description.ipynb)"],"metadata":{"id":"yHQ14ntqEaGs"}},{"cell_type":"markdown","metadata":{"tags":[],"id":"Lj9p-iOusNVd"},"source":["# Example 0 - Input data description"]},{"cell_type":"markdown","metadata":{"id":"s2T2BkTAsNVi"},"source":["This notebook introduces the input data format needed to feed in classes [EMGMeasurement](https://aalhossary.github.io/pyemgpipeline/api/pyemgpipeline.wrappers.html#emgmeasurement), [EMGMeasurementCollection](https://aalhossary.github.io/pyemgpipeline/api/pyemgpipeline.wrappers.html#emgmeasurementcollection), and [DataProcessingManager](https://aalhossary.github.io/pyemgpipeline/api/pyemgpipeline.wrappers.html#dataprocessingmanager) to process the EMG signals.\n","\n","The data in this package is saved as numpy arrays. Converting raw data from different types of files into numpy arrays is also discussed in this notebook."]},{"cell_type":"code","source":["import numpy as np"],"metadata":{"id":"umE1jtvWvzzd","executionInfo":{"status":"ok","timestamp":1648746756653,"user_tz":-480,"elapsed":7,"user":{"displayName":"Tsung-Lin Wu","userId":"08812001621921766841"}}},"execution_count":1,"outputs":[]},{"cell_type":"markdown","source":["## Input data format of single trial"],"metadata":{"id":"GiNwQfRcMX25"}},{"cell_type":"markdown","source":["### Signal data of one channel"],"metadata":{"id":"BpOKNZveMgdU"}},{"cell_type":"markdown","source":["Signal data of the trial which has one channel should be stored in either \"a 1d ndarray with shape (*n_samples*,)\", or \"a 2d ndarray with shape (*n_samples*, 1)\".\n","\n","Here is a sample data in a 1d ndarray with shape (*n_samples*,), where *n_samples* is 20."],"metadata":{"id":"7gQXkhviMkPl"}},{"cell_type":"code","source":["data = np.array([20.3, 41.0, 53.9, 63.3, 39.5, 24.9, 26.1, 24.0, 44.1, 42.0,\n"," 37.4, 24.6, -21.8, -56.3, -48.1, -45.0, -29.1, -9.6, 5.3, 1.4])\n","print(data.shape)\n","data"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"VzPBZTM0NYHz","executionInfo":{"status":"ok","timestamp":1648746757225,"user_tz":-480,"elapsed":56,"user":{"displayName":"Tsung-Lin Wu","userId":"08812001621921766841"}},"outputId":"04d13cc8-2e75-44a0-d2b2-493aac0e94a7"},"execution_count":2,"outputs":[{"output_type":"stream","name":"stdout","text":["(20,)\n"]},{"output_type":"execute_result","data":{"text/plain":["array([ 20.3, 41. , 53.9, 63.3, 39.5, 24.9, 26.1, 24. , 44.1,\n"," 42. , 37.4, 24.6, -21.8, -56.3, -48.1, -45. , -29.1, -9.6,\n"," 5.3, 1.4])"]},"metadata":{},"execution_count":2}]},{"cell_type":"markdown","source":["Here is a sample data in a 2d ndarray with shape (*n_samples*, 1), where *n_samples* is 20."],"metadata":{"id":"QcKu-Oc7OvgX"}},{"cell_type":"code","source":["data = np.array([[20.3], [41.0], [53.9], [63.3], [39.5], [24.9], [26.1], [24.0], [44.1], [42.0],\n"," [37.4], [24.6], [-21.8], [-56.3], [-48.1], [-45.0], [-29.1], [-9.6], [ 5.3], [ 1.4]])\n","print(data.shape)\n","data"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"DIKHoiHpNoJj","executionInfo":{"status":"ok","timestamp":1648746757226,"user_tz":-480,"elapsed":48,"user":{"displayName":"Tsung-Lin Wu","userId":"08812001621921766841"}},"outputId":"5f6c883e-a9c4-4814-f97f-30358badd294"},"execution_count":3,"outputs":[{"output_type":"stream","name":"stdout","text":["(20, 1)\n"]},{"output_type":"execute_result","data":{"text/plain":["array([[ 20.3],\n"," [ 41. ],\n"," [ 53.9],\n"," [ 63.3],\n"," [ 39.5],\n"," [ 24.9],\n"," [ 26.1],\n"," [ 24. ],\n"," [ 44.1],\n"," [ 42. ],\n"," [ 37.4],\n"," [ 24.6],\n"," [-21.8],\n"," [-56.3],\n"," [-48.1],\n"," [-45. ],\n"," [-29.1],\n"," [ -9.6],\n"," [ 5.3],\n"," [ 1.4]])"]},"metadata":{},"execution_count":3}]},{"cell_type":"markdown","source":["### Signal data of multiple channels"],"metadata":{"id":"EN6hdx7LO4_M"}},{"cell_type":"markdown","source":["Signal data of the trial should be stored in a 2d ndarray with shape (*n_samples*, *n_channels*).\n","\n","Here is a sample data in a 2d ndarray with shape (*n_samples*, *n_channels*), where *n_samples* is 20 and *n_channels* is 2."],"metadata":{"id":"Wm9ziID1O5BW"}},{"cell_type":"code","source":["data = np.array([[20.3, 1.1], [41.0, 2.9], [53.9, 1.4], [63.3, -0.2], [39.5, 4.4],\n"," [24.9, 7.2], [26.1, 9.9], [24.0, 19.1], [44.1, 14.2], [42.0, 18.8],\n"," [37.4, 17.2], [24.6, 17.9], [-21.8, 11.1], [-56.3, 13.9], [-48.1, 15.4],\n"," [-45.0, 19.4], [-29.1, 12.1], [-9.6, 16.9], [5.3, 12.4], [1.4, 9.0]])\n","print(data.shape)\n","data"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"_KpMxuLHNYKF","executionInfo":{"status":"ok","timestamp":1648746757227,"user_tz":-480,"elapsed":38,"user":{"displayName":"Tsung-Lin Wu","userId":"08812001621921766841"}},"outputId":"95b2d0d3-a9c7-4a7e-cb15-d13b08ed8a08"},"execution_count":4,"outputs":[{"output_type":"stream","name":"stdout","text":["(20, 2)\n"]},{"output_type":"execute_result","data":{"text/plain":["array([[ 20.3, 1.1],\n"," [ 41. , 2.9],\n"," [ 53.9, 1.4],\n"," [ 63.3, -0.2],\n"," [ 39.5, 4.4],\n"," [ 24.9, 7.2],\n"," [ 26.1, 9.9],\n"," [ 24. , 19.1],\n"," [ 44.1, 14.2],\n"," [ 42. , 18.8],\n"," [ 37.4, 17.2],\n"," [ 24.6, 17.9],\n"," [-21.8, 11.1],\n"," [-56.3, 13.9],\n"," [-48.1, 15.4],\n"," [-45. , 19.4],\n"," [-29.1, 12.1],\n"," [ -9.6, 16.9],\n"," [ 5.3, 12.4],\n"," [ 1.4, 9. ]])"]},"metadata":{},"execution_count":4}]},{"cell_type":"markdown","source":["### Timestamp data"],"metadata":{"id":"eno3fq_kYlaY"}},{"cell_type":"markdown","source":["If given by the users, timestamp data of the trial should be stored in a 1d ndarray with shape (*n_samples*,).\n","\n","Here is a sample data in a 1d ndarray with shape (*n_samples*,), where *n_samples* is 20."],"metadata":{"id":"y8vNXEwJbrAj"}},{"cell_type":"code","source":["timestamp = np.array([0.000, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,\n"," 0.010, 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017, 0.018, 0.019])\n","print(timestamp.shape)\n","timestamp"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"nmvVKdJJYo1q","executionInfo":{"status":"ok","timestamp":1648746757228,"user_tz":-480,"elapsed":33,"user":{"displayName":"Tsung-Lin Wu","userId":"08812001621921766841"}},"outputId":"144bc886-69cd-4964-fd80-00da26db4514"},"execution_count":5,"outputs":[{"output_type":"stream","name":"stdout","text":["(20,)\n"]},{"output_type":"execute_result","data":{"text/plain":["array([0. , 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008,\n"," 0.009, 0.01 , 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017,\n"," 0.018, 0.019])"]},"metadata":{},"execution_count":5}]},{"cell_type":"markdown","source":["Timestamps are not required to start from 0 or be equally spaced. Its values come from the actual data.\n","\n","For example, the following timestamps are valid."],"metadata":{"id":"r-yfM7ayc0kc"}},{"cell_type":"code","source":["timestamp = np.array([0.006, 0.007, 0.008, 0.009, 0.010, 0.011, 0.012, 0.013, 0.014, 0.015,\n"," 0.016, 0.017, 0.018, 0.019, 0.020, 0.021, 0.022, 0.023, 0.024, 0.025])"],"metadata":{"id":"nO10AYPXYo4Q","executionInfo":{"status":"ok","timestamp":1648746757229,"user_tz":-480,"elapsed":28,"user":{"displayName":"Tsung-Lin Wu","userId":"08812001621921766841"}}},"execution_count":6,"outputs":[]},{"cell_type":"code","source":["timestamp = np.array([0.002, 0.007, 0.008, 0.009, 0.010, 0.011, 0.012, 0.013, 0.014, 0.015,\n"," 0.016, 0.017, 0.018, 0.019, 0.020, 0.021, 0.022, 0.023, 0.024, 0.025])"],"metadata":{"id":"aKdPl_R8dS4N","executionInfo":{"status":"ok","timestamp":1648746757230,"user_tz":-480,"elapsed":28,"user":{"displayName":"Tsung-Lin Wu","userId":"08812001621921766841"}}},"execution_count":7,"outputs":[]},{"cell_type":"markdown","source":["If the users do not give the values of timestamp, they will be generated starting from 0 and in increments of 1/*hz*, where *hz* is the given sample frequency.\n","\n","For example, when *n_samples* is 20 and *hz* is 1000, timestamp will be generated as the following by default."],"metadata":{"id":"IB2PO7hYeW6g"}},{"cell_type":"code","source":["timestamp = np.array([0.000, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,\n"," 0.010, 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017, 0.018, 0.019])"],"metadata":{"id":"kYQbKIfpdS6K","executionInfo":{"status":"ok","timestamp":1648746757232,"user_tz":-480,"elapsed":29,"user":{"displayName":"Tsung-Lin Wu","userId":"08812001621921766841"}}},"execution_count":8,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"cBhS2F5isNV2"},"source":["## Input data format of multiple trials"]},{"cell_type":"markdown","source":["Signal data of multiple trials are organized into a list, in which each element is the signal data of one trial and its format follows the format of single trial described in the previous section.\n","\n","Here is a sample data of 3 trials, and the signal data in each trial has 2 channels and 18, 22, 19 samples respectively."],"metadata":{"id":"FznVOs7ugloG"}},{"cell_type":"code","source":["all_data = [np.array([[8.698, -9.613], [7.172, -2.594], [3.51, -5.951], [8.087, -2.899], [5.035, -2.289],\n"," [10.529, -0.763], [-2.289, 4.73], [4.12, 1.068], [0.153, -4.12], [12.665, -9.918],\n"," [9.613, -7.782], [15.106, -9.918], [9.003, -12.665], [7.477, -22.43], [2.899, -10.223],\n"," [5.646, -11.749], [-5.646, -8.698], [1.373, -9.918]]),\n"," np.array([[3.51, -5.951], [10.529, -7.172], [13.275, -8.087], [11.444, -13.275], [12.054, -13.275],\n"," [11.139, -6.561], [9.308, -7.782], [8.087, -7.477], [18.463, -1.068], [1.984, -4.12],\n"," [-14.801, -12.97], [-21.21, -10.834], [-27.313, 7.477], [-42.878, -5.646], [-42.573, -5.646],\n"," [-35.859, -9.613], [-42.268, -10.834], [-23.041, -10.529], [-16.022, -9.613], [-12.97, -11.749],\n"," [3.204, -5.951], [6.866, -6.866]]),\n"," np.array([[-9.613, -12.665], [-22.125, -16.937], [-23.346, -11.749], [-42.268, -8.087], [-53.864, -9.613],\n"," [-41.047, -7.477], [-27.924, -16.632], [-28.839, -8.698], [-19.684, -8.392], [-17.242, -4.73],\n"," [6.866, -8.698], [37.995, 3.204], [78.279, -3.815], [102.694, -3.204], [132.296, -9.003],\n"," [141.756, -9.308], [113.375, -4.12], [102.388, -12.97], [86.823, -6.866]])]\n","print(f'Number of trials = {len(all_data)}')\n","for k in range(len(all_data)):\n"," print(f'Shape of signal data of trial {k} is {all_data[k].shape}')"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"bepdnU5TPbDc","executionInfo":{"status":"ok","timestamp":1648746757233,"user_tz":-480,"elapsed":29,"user":{"displayName":"Tsung-Lin Wu","userId":"08812001621921766841"}},"outputId":"43338d48-044f-4350-9f3a-f62900ca7047"},"execution_count":9,"outputs":[{"output_type":"stream","name":"stdout","text":["Number of trials = 3\n","Shape of signal data of trial 0 is (18, 2)\n","Shape of signal data of trial 1 is (22, 2)\n","Shape of signal data of trial 2 is (19, 2)\n"]}]},{"cell_type":"markdown","source":["If given, timestamp data of multiple trials are organized into a list, in which each element is the timestamp data of one trial and its format follows the format of single trial described in the previous section.\n","\n","Here is a sample data of 3 trials, and the timestamp data in each trial has 18, 22, 19 samples respectively."],"metadata":{"id":"hXrkqvY7rlK3"}},{"cell_type":"code","source":["all_timestamp = [np.array([0.000, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,\n"," 0.010, 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017]),\n"," np.array([0.000, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,\n"," 0.010, 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017, 0.018, 0.019,\n"," 0.020, 0.021]),\n"," np.array([0.000, 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,\n"," 0.010, 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017, 0.018])]\n","print(f'Number of trials = {len(all_timestamp)}')\n","for k in range(len(all_timestamp)):\n"," print(f'Shape of timestamp data of trial {k} is {all_timestamp[k].shape}')"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"xCCFLUCWqpL7","executionInfo":{"status":"ok","timestamp":1648746757234,"user_tz":-480,"elapsed":26,"user":{"displayName":"Tsung-Lin Wu","userId":"08812001621921766841"}},"outputId":"2797d6c8-0666-4c4d-a6c9-e62b33f2eb89"},"execution_count":10,"outputs":[{"output_type":"stream","name":"stdout","text":["Number of trials = 3\n","Shape of timestamp data of trial 0 is (18,)\n","Shape of timestamp data of trial 1 is (22,)\n","Shape of timestamp data of trial 2 is (19,)\n"]}]},{"cell_type":"markdown","source":["## Coverting raw data from files into numpy array"],"metadata":{"id":"9p71CMsVmsDW"}},{"cell_type":"markdown","source":["### Raw data in structured format"],"metadata":{"id":"nbe6PFymtOVM"}},{"cell_type":"markdown","source":["If the raw data files are structured text file, for example, in csv format, then the users can use the function [numpy.genfromtxt](https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html) to load the files into numpy arrays.\n","\n","[Example 3](https://aalhossary.github.io/pyemgpipeline/notebooks/ex3_DataProcessingManager.html) loads the data with numpy.genfromtxt."],"metadata":{"id":"J0ikDFRjtpLu"}},{"cell_type":"markdown","source":["If the raw data files are MATLAB files (i.e., .mat), then the users can use the function [scipy.io.loadmat](https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html) to load the files into numpy arrays.\n","\n","[Example 4](https://aalhossary.github.io/pyemgpipeline/notebooks/ex4_DataProcessingManager.html) loads the data with scipy.io.loadmat."],"metadata":{"id":"5JY3r52wu-Hz"}},{"cell_type":"markdown","source":["After the data is loaded from a file, if not the whole data is needed, [indexing on ndarrays](https://numpy.org/doc/stable/user/basics.indexing.html) can be used to extract the needed parts of the data."],"metadata":{"id":"RfAUKdjlQ4h7"}},{"cell_type":"markdown","source":["### Raw data in a more free format"],"metadata":{"id":"qWMJ0xJctQt8"}},{"cell_type":"markdown","source":["If the raw data files are in a more free format, the users can use Python's [io module](https://docs.python.org/3/library/io.html) to open files and extract the signal (and timestamp) data according to the structure of each individual file.\n","\n","For example, the data sample below shows that there are several header lines in the beginning of the file. Handling this type of data is shown in [Example 1](https://aalhossary.github.io/pyemgpipeline/notebooks/ex1_EMGMeasurement.html) and [Example 2](https://aalhossary.github.io/pyemgpipeline/notebooks/ex2_EMGMeasurementCollection.html)."],"metadata":{"id":"kxYFNfucOkNv"}},{"cell_type":"markdown","source":["```\n","File Name: marcha_2.log\n","Channel 4: 'Recto Femoral', 43665 values, engineering units: mV, no filters.\n","Channel 5: 'Biceps Femoral', 43665 values, engineering units: mV, no filters.\n","Channel 6: 'Vasto Medial', 43665 values, engineering units: mV, no filters.\n","Channel 7: 'EMG Semitendinoso', 43665 values, engineering units: mV, no filters.\n","Channel 8: 'Flexo-Extension', 43665 values, engineering units: deg, no filters.\n","\n","0.0067\t-0.021\t0.0675\t-0.0195\t4.7\n","-0.0053\t-0.0368\t0.1372\t-0.0225\t4.9\n","-0.0053\t0.003\t0.1365\t-0.0143\t4.8\n","-0.0046\t0.0082\t0.135\t-0.003\t4.7\n","............\n","```"],"metadata":{"id":"sEezCOIVwFnY"}}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.6.6"},"colab":{"name":"ex0_input_data_description.ipynb","provenance":[],"collapsed_sections":[]}},"nbformat":4,"nbformat_minor":0}