Overview¶
This tutorial notebook walks you through preparing a fictional EMG-BIDS dataset using the Python-based MUniverse package.
The dataset contains both surface and invasive EMG, together with some motor unit spike time labels (extracted via manual decomposition from the invaisve EMG). At the time of making this notebook, invasive EMG is not yet officially standardized by EMG-BIDS. Here a custom approach is used, where metadata specific to invasive EMG is encoded in user-defined metadata fields (in a way that we will still obatin a valid BIDS dataset).
To illustarte arbitary electrode combinations, the descibed setup (see Figure below) is intentionally more complex than most regular datasets. In total, there are 3 high-density surface EMG arrays, two invasive EMG arrays (thin filament electrodes), two invasive fine-wire electrodes, two concentric needle electrodes, as well as two reference electrodes and one ground electrode. Furthermore, the EMG grids have different shapes (i.e., numbers of rows and columns). The dataset contains 6 subjects who each performed 3 tasks, i.e., a resting task to measure the baseline noise and two trapozoidal isometric contractions at 30 and 50 of the subject's maximum voluntary contraction (MVC) intesnity.
Introduction to BIDS¶
Why BIDS?¶
BIDS (Brain Imaging Data Structure) is a community standard for organising neuroscience data so that it is FAIR — Findable, Accessible, Interoperable, and Reusable. A BIDS-formatted dataset is self-describing: anyone who picks it up, regardless of which lab or software they use, can understand what was recorded, how it was recorded, and who the participants were, without having to contact the authors. This means every recording file is accompanied by a set of metadata files that describe the hardware, the electrode layout, the coordinate system, and the channel list. None of this lives in a separate Word document or lab notebook — it travels with the data. The BIDS standard includes various data aquistiotion modalities that are relevant to neuroschience (e.g., EEG, MEG, MRI, ...). EMG-BIDS is a relativly new addition and currently is the only standardized format for reporting EMG experiments.
Structure of a BIDS dataset¶
BIDS organizes data and metadata hierarchically using a well-defined folder structure and file name convention. Below, you can see the final folder structure of this tutorial dataset. The actual data is stored in .edf files (one option for data files in EMG-BIDS). Since data files are just a matrix of numbers, accompanying metadata files are essential to make sense of the data. This is what all other files are and which are organized in different layers:
- Dataset level for global information on the experiment (e.g., README.md, dataset_description.json and participants.tsv)
- Subject/session level for information specific to one experimental session (in this example _electrodes.tsv and _coordsystem.json)
- Recording level for information that is specific to each individual data aquisition (in this example _channels.tsv, _emg.json, _events.tsv)
This notebook will guide you through all required file types (and thereby outline the purpose of each file) and show how you can use MUniverse package to genearte and validate your own EMG-BIDS dataset.
FictionalDatasetExample/
├── dataset_description.json
├── electrodes.json
├── events.json
├── participants.json
├── participants.tsv
├── README.md
├── sub-01/
│ └── emg/
│ ├── sub-01_electrodes.tsv
│ ├── sub-01_space-forearmCoordSys_coordsystem.json
│ ├── sub-01_space-grid1CoordSys_coordsystem.json
│ ├── sub-01_space-grid2CoordSys_coordsystem.json
│ ├── sub-01_space-grid3CoordSys_coordsystem.json
│ ├── sub-01_space-intraGrid1CoordSys_coordsystem.json
│ ├── sub-01_space-intraGrid2CoordSys_coordsystem.json
│ ├── sub-01_task-restNoise_run-01_channels.json
│ ├── sub-01_task-restNoise_run-01_channels.tsv
│ ├── sub-01_task-restNoise_run-01_emg.edf
│ ├── sub-01_task-restNoise_run-01_emg.json
│ ├── sub-01_task-trapezoidalContraction30PercentMVC_run-01_channels.json
│ ├── sub-01_task-trapezoidalContraction30PercentMVC_run-01_channels.tsv
│ ├── sub-01_task-trapezoidalContraction30PercentMVC_run-01_emg.edf
│ ├── sub-01_task-trapezoidalContraction30PercentMVC_run-01_emg.json
│ ├── sub-01_task-trapezoidalContraction30PercentMVC_run-01_events.tsv
│ ├── sub-01_task-trapezoidalContraction50PercentMVC_run-01_channels.json
│ ├── sub-01_task-trapezoidalContraction50PercentMVC_run-01_channels.tsv
│ ├── sub-01_task-trapezoidalContraction50PercentMVC_run-01_emg.edf
│ ├── sub-01_task-trapezoidalContraction50PercentMVC_run-01_emg.json
│ └── sub-01_task-trapezoidalContraction50PercentMVC_run-01_events.tsv
├── sub-02/
│ └── ...
Electrodes vs Channels¶
Before considering the actual metadata files, let's quickly recall what we mean when talking about channels and electrodes. An electrode is a physical object attached to the skin. A channel is the digitised signal that the acquisition software stores to disk — typically one channel per electrode, though referencing schemes mean the relationship is not always one-to-one. BIDS keeps these concepts separate because knowing where a channel was recorded is not the same as knowing what signal it carries.
The BIDS-MUniverse utilities¶
Now we could go through the definitions of the BIDS standard and manually generate each file. That quickly becomes time-consuming and prone to errors. To help you with generating, reading and validating EMG-BIDS datasets, the MUniverse package provides you with a BIDSDataset and EMGBIDSRecording class, that can be cosnidered templates of the components in a EMG-BIDS dataset. These classes come with build in read, write and validate functions (together with a few more utilities) and the only thing to do remains filling these templates with life. The build in set_metadata function allows you to poulate the template with information stored in dictonaries, dataframes as well as from .json or .tsv files. To generate your own dataset make sure to install the MUniverse package.
Building an example EMG-BIDS dataset¶
With some basic Python skills, you should be capable to easily adjust the provided example towards your own dataset. First we need to import the required Python dependencies:
# Basic imports
import numpy as np
import pandas as pd
import copy
from muniverse.utils.bids_routines import (
BIDSDataset,
EMGBIDSRecording,
BIDSDecompositionDerivative
)
Global (dataset wide) metadata¶
As we have seen, BIDS contains both global (dataset-level) and recording-specific data and metadata. The BIDSDataset class will take care of the dataset-level metadata and data files. It has the following attributes (and you might regonize some patterns shown in the example dataset structure):
- root (str): The root folder of a BIDS dataset
- datasetname (str): The name of your BIDS dataset
- readme (str): The README file of your BIDS dataset stored as a string
- dataset_sidecar (dict): A dictonary capturing the content of the _dataset.json file
- subjects_data (pd.DataFrame): A table with subject information and pre-defined columns "participant_id", "age", "sex", "handedness", "weight" and "height"
- subjects_sidecar (dict): A dictonary capturing the content of the _subjects.json file
- BIDSIGNORE (list of str): A list of files/file type to be ignored by the validator
Readme File¶
The readme file is esentially the first thing to be inspected when accesing a BIDS dataset and thus should provide guidance through your dataset. This ranges from general information on how to access the data, a description of independent, dependent and control variables, the experimental setup and tasks, data quality assessment, missing data points, to information that did not fit into any of the other metadata files (also see template file).
For our goal to anotate a EMG-BIDS dataset with the MUniverse package, we define this information in a string, whereby it is possible to use markdown syntax.
readme = """
# Header 1
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
# Header 2
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
"""
Dataset sidecar¶
The dataset sidecar (dataset_description.json) contains general information about the dataset in a structured way (e.g., who created the dataset, under which license the dataset is published or the corresponding ethics approval). For our goal of making a EMG-BIDS dataset using the MUniverse package we can simply define the associated metadata in a dictonary (or store a template in a .json file):
dataset_sidecar = {
"Name": "FictionalDatasetExample", # The name of your dataset
"License": "CC BY 4.0", # License this dataset will be available under.
"DatasetType": "raw", # Indicates that this unprocessed data
"Authors": [
"alice",
"bob"
], # List of individuals who contributed to the creation/curation of the dataset., has to be a list even if it has only one entry
"ReferencesAndLinks": [
"citation of related publication as text",
"related publication as DOI"
], # E.g. the name of the related publication and the corresponing DOI.
"EthicsApprovals": [
"number of ethics approval."
], # List of ethics committee approvals of the research protocols and/or protocol identifiers. Has to be a list even if it has only one entry
"GeneratedBy": [
{
"Name":"MUniverse"
}
], # A list of tools used to generate this dataset (must be a list of objects)
}
Subjects data & sidecar¶
Next we define a tabular file -- participants.tsv -- describing the properties of the participants (e.g., age, biological sex or experimental groups). To be processed with the MUniverse software, we can store this information in a dictonary (conatining for each metadata field a list that has the length of the number of subjects) or a DataFrame. In this example dataset, we have 6 subjects -- indicated by a unqiue ID -- and for which we report the age, biological sex, handedness, weight, height and which are subdivided into two experimental groups (placebo and control):
subjects_data = {
"participant_id": ["sub-01", "sub-02", "sub-03", "sub-04", "sub-05", "sub-06"], # Unique subject identifier. Every participant_id must start with "sub-"
"age": [42, 43, 44, 45, 46, 47], # optional
"sex": ["M", "F", "M", "F", "M", "F"], # optional
"handedness": ["R", "L", "R", "R", "L", "R"], # optional
"weight": [70, 68, 66, 64, 62, 60], # optional
"height": [1.7, 1.72, 1.74, 1.76, 1.78, 1.8], # optional
"group": ["T", "T", "T", "P", "P", "P"], # optional
}
The tabular subjects file needs to be accompanied by a corresponding _participants.json file. This file specifies the content stored in the tabular file. The MUniverse package alread includes a build in template for the columns age, sex, handednss, weight and height. Thus, we only need to update the additional group field:
subjects_sidecar = {
"group": {
"Description": "Group the subject belongs to.",
"Levels": {
"T": "Treatment",
"P": "Placebo"
}
}
}
Bidsignore¶
Similar to a .gitignore file, we can create a file named .bidsignore to indicate any files/file types that should be ignored by the BIDS validator. Here, we will include motor unit spike time labels, that can be treated as a BIDS derivative. However, derivatives (i.e., any information extracted from the raw data) can be highly specific, are hard to standardize and for now we ignore them in the validator (hopefully in the future there will be some consense on how to report decomposed motor unit spikes). In the MUniverse framework, we can put all ignored files/folders/file types in a list of strings.
BIDSIGNORE = ["derivatives/"]
Storing the dataset level metadata¶
After anotaing all required metadata fields, we now can assemble them with the BIDSDataset class:
# (i) Init the BIDS-dataset class
FictionalDatasetExample = BIDSDataset(
datasetname="FictionalDatasetExample",
path="./" # Where do you want to store the dataset
)
# (ii) Set metadata
FictionalDatasetExample.set_metadata(field_name='subjects_data', source=subjects_data)
FictionalDatasetExample.get_default_participant_sidecar() # Pre-implemented template
FictionalDatasetExample.set_metadata(field_name='subjects_sidecar', source=subjects_sidecar) # Add your custom fields
FictionalDatasetExample.set_metadata(field_name='dataset_sidecar', source=dataset_sidecar)
FictionalDatasetExample.readme = readme
FictionalDatasetExample.BIDSIGNORE = BIDSIGNORE
# (iii) Write
FictionalDatasetExample.write()
Local (recording specific) metadata¶
After finalizing the gloabl dataset. We will have a closer look at the actual recordings included in this dataset and how to describe them. Here, each subject perfomed two trapozoidal isometric tasks (at 30 and 50 percent of the maximum vuluntary contraction intensity) and for each subject EMG was meassured at rest to quantify the noise. The electrode configuration is fixed for each subject, such that we can make use of the BIDS-inheritance principles to limit the number of files (in particular, the electrode configuration is fixed for one experiemental session). As also many other parameters are fixed across all recordings (e.g., in this example all recordings where collected with the same data aquisition system and electrode configuration), it also turns out usefull to first build templates for the recording-specific metadata files and later only update the recording-specific metadata fields. For the actual data, for illustartive purposes, we will use random numbers. To add your own data, you simply need to import the actually recorded data as numpy array.
For the recording-specific data and metadata we make use of the EMGBIDSRecording class. The attributes of this class consider the data itself (emg.edf for 16-bit data or emg.bdf for 24-bit data), where the data is stored (datapath). the connected metadata files (emg.json, channels.tsv, electrodes.tsv, events.tsv, ccordsystem.json), the labels (subject, session, task, ...) used to spcecify the recording as well as some information needed to link the recording to a dataset (root, datasetname):
- root (str): The root folder of your BIDS dataset
- datasetname (str): The name of a BIDS dataset
- datapath (str): The folder where the recording file is/will be stored
- subject_label (str): The label of the subject the recording belongs to
- session_label (str or None): The label of the session the recording belongs to (potional)
- task_label (str): The label of the task perfomed in this recording
- acq_label (str or None): An optional label to distnguish multiple aquisition modes (e.g., high bs low sampling rate)
- run_label (str or None): An optional label to distnguish multiple repetitions of the same task
- recording_label (str or None): An optional label to distnguish data files from different aquisition systems
- data (np.ndarray): The actual data matrx (n_channels, n_samples)
- fsamp (float): The sampling rate in Hz
- fileformat {"edf", "bdf"} The file format used to store the data matrix (EMG-BIDS allows edf and bdf)
- emg_sidecar (dict): Dictonary corresponding to the "_emg.json" file
- channels (pd.DataFrame): Table with channel-specific metadata
- channels_sidecar (dict): A dictonary corresponding to the "_channels.json" file
- electrodes (pd.DataFrame): Table with electrode-specific metadata
- electrodes_sidecar (dict): A dictonary corresponding to the "_electrodes.json" file
- coord_sidecar (dict of dict): Dictonary of dictonaries, whereby each key corresponds to one coordinate system
- events (pd.DataFrame): Table of events describing the experiment. Must contain the columns "onset" and "duration"
- events_sidecar (dict): Dictonary corresponding to the "_events.json" file
- inherited_metadata (dict): Dictonary of inherited metadata files
- inherited_levels (dict): Dictonary with the levels of the inherited metadata files
EMG sidecar¶
For a EMG recording, each data file needs to have a associated EMG sidecar file (_emg.json). This file describes the parameters of the recording system (e.g., the sampling rate or the utilized data aquistion hardware) and the experimental set up (e.g., what was the task or how electrodes have been placed). In many datasets, the utilized hardware is idendical across recordings. So it can easily happen that the EMG sidecar files have a lot of redundant information and only a few fields are recording specific (e.g., related to the specific task). Thus, it is convinient to make a template and for each recording only update the recording-specific fields.
emg_sidecar_template = {
"EMGPlacementScheme": "Measured", # Must be the keyword "Measured" if you have grids and specify electrode locations in coordinate systems.
"EMGPlacementSchemeDescription": "(i) Surface EMG: lorem ipsum. (ii) Invasive thin film EMG: lorem ipsum. (iii) Invasive Fine Wire EMG: lorem ipsum. (iv) Invasive Needle EMG: lorem ipsum.", # Describe how electrodes are placed. Include anatomical landmarks used to position. Include the measurement method for placement. Include placement of reference electrode(s). Include placement of ground electrode. Include if a dry linear array for fiber alignment was used or not. Include if innervation zone was measured and how electrodes are positioned relative to it. For different types of electrodes (surface grid, invasive grid, fine wire, etc) use i), ii), iii), ... to separate placement description (similar to our provided example).
"EMGReference": "ChannelSpecific",
"EMGGround": "G1", # The name of the ground electrode (as specified in electrodes metadata).
"SamplingFrequency": 2048, # The main sampling frequency (in Hz) of your data. If your dataset contains more than one sampling frequency contact us.
"PowerLineFrequency": 50, # Frequency (in Hz) of the power grid where the data was recorded.
"RecordingType": "continuous",
"HardwareFilters": {
"Low-pass filter": {
"Frequency": 500,
"Roll-off": "6dB/Octave"
},
"High-pass filter": {
"Frequency": 20,
"Roll-off": "6dB/Octave"
}
}, # A json object containing filter parameters. Use "n/a" if no filter was used.
"SoftwareFilters": "n/a", # A json object containing filter parameters. Use "n/a" if no filter was used.
"EMGChannelCount":45,
"TaskName": "restNoise",
"TaskDescription": "Relaxed muscle for 5 seconds.",
"Instructions": "Relax your muscle completely.",
"Preamplification": 1, # Amplification built into an EMG bipolar sensor, electrode grid, or other device.
"Gain": 100, # Signal gain from an in-line amplifier, applied between the EMG sensor/device and the data acquisition computer.
"Manufacturer": "some amplifier manufacturer", # Manufacturer of the amplifier used to collect the data.
"ManufacturersModelName": "some amplifier model name" # Model name of the amplifier.
}
Coordinate system sidecar¶
To specify positions of the electrodes, one first needs to define a suitable coordinate systems. For high-density EMG arrays, the recommended approach is to define the electrode positions in a local grid-specific coordinate system (child coordinate system), and store the global information by anchoring the local coordinate system inside an anatomical parent coordinate system. Here, we define one global forearm coordinat system and one local coordinate system for every grid/electrode. Both child and parent coordinate systems end up in individual files ending with "_coordsystem.json". Within the MUniverse framework, we store all coordinate systems as a dictonary of coordinate systems (with keys representing coordinate system names and the values being dictonaries of all required coordinate system metadata fields).
In our example, the parent coordinate system is defined through anatomical landmarks on the forearm, with percent between landmarks as length units. The grid specific systems are then localized by defining an anchor electrode together with its location and orientation in the parent coordinate system.
The actual information of where each electrode is happens in the next step, inside the _electrodes.tsv files.
coord_sidecar = {
"forearmCoordSys" : {
"EMGCoordinateSystem": "Other",
"EMGCoordinateUnits": "percent",
"EMGCoordinateSystemDescription": "x: Radial Styloid Process (RSP) -> Ulnar Styloid Process (USP); y: Right-hand rule (limits: Olecranon Process -> Cubital Fossa); z: midpoint RSP-USP -> Lateral Humerus Epicondyle (LHE)"
},
"grid1CoordSys" : {
"EMGCoordinateSystem": "Other",
"EMGCoordinateUnits": "mm",
"EMGCoordinateSystemDescription": " The x-axis is left to right, the y-axis is bottom to top. Note: the z-axis is not used.",
"ParentCoordinateSystem": "forearmCoordSys",
"AnchorElectrode": "E001",
"AnchorCoordinates": [30, 50, 80]
},
"grid2CoordSys" : {
"EMGCoordinateSystem": "Other",
"EMGCoordinateUnits": "mm",
"EMGCoordinateSystemDescription": "The x-axis is left to right, the y-axis is bottom to top. Note: the z-axis is not used.",
"ParentCoordinateSystem": "forearmCoordSys",
"AnchorElectrode": "E009",
"AnchorCoordinates": [30, 50, 60]
},
"grid3CoordSys" : {
"EMGCoordinateSystem": "Other",
"EMGCoordinateUnits": "mm",
"EMGCoordinateSystemDescription": "The x-axis is left to right, the y-axis is bottom to top. Note: the z-axis is not used.",
"ParentCoordinateSystem": "forearmCoordSys",
"AnchorElectrode": "E017",
"AnchorCoordinates": [30, 50, 40]
},
"intraGrid1CoordSys" : {
"EMGCoordinateSystem": "Other",
"EMGCoordinateUnits": "mm",
"EMGCoordinateSystemDescription": "The x-axis is left to right, the y-axis is bottom to top. Note: the z-axis is not used.",
"ParentCoordinateSystem": "forearmCoordSys",
"AnchorElectrode": "iE001",
"AnchorCoordinates": [30, 50, 70]
},
"intraGrid2CoordSys" : {
"EMGCoordinateSystem": "Other",
"EMGCoordinateUnits": "mm",
"EMGCoordinateSystemDescription": "The x-axis is left to right, the y-axis is bottom to top. Note: the z-axis is not used.",
"ParentCoordinateSystem": "forearmCoordSys",
"AnchorElectrode": "iE009",
"AnchorCoordinates": [30, 50, 50]
}
}
Electrode data & sidecar¶
The actual electrode positions are stored in a tabular file named _electrodes.tsv. BIDS requires to define the columns "name" (unique electrode name), "x" (x-coordinate), "y" (x-coordinate), optionally "z" (x-coordinate) and "coordinate_system" (the name of the coordinate system in which coordinates are reported). Further, additional electrode-specific information, such as type, material, diameter or surface area, manufacturer can be defined. There is a set of metadata fields that are already defined by EMG-BIDS and that can be used without any additional considerations. Sometimes you might need to report metadata that is rather specific to your experiment. This is possible, however, it requires defining the content in an additional file named _electrodes.json. Here, we make use of that possibility as invasive EMG requires reporting metadata that not is not yet standardized through BIDS.
EMG-arrays typically come in a few fixed geometries. Thus it can be convinient to store these information once in a file or a callable function. For example, in this example, we have pre-defined the electrode positions in a .tsv file. In the MUniverse framework you can import the content of the electrode file either through a dictonary or a DataFrame.
Some notes on this:
- Electrode names must be unique.
- "x", "y" and "z" specify position inside "coordinate_system" (whose names must match the defined coordinate systems), "z" can usually be left empty.
- If you have fine-wires, please specify the diameter of the fine wire tip in "electrode_diameter", as well as unisolated length of the fine wire tip in "electrode_tip_length".
- "interelectrode_distance" in a grid means distance between neighboring electrodes, while in a fine wire it means distance between the wire tips.
- If you have concentric needles, please specify length and diameter in the columns "cannula_length" and "cannula_diameter".
el_metadata = pd.read_csv("fictionalDatasetExampleInputMetadata//electrodes.tsv", sep="\t")
el_metadata.head()
| name | x | y | z | coordinate_system | type | material | group | interelectrode_distance | electrode_surface_area | electrode_diameter | electrode_tip_length | cannula_diameter | cannula_length | manufacturer | manufacturers_model_name | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | E001 | 0 | 0 | NaN | grid1CoordSys | HDsEMG | gold | Grid1 | 10 | 0,1 | NaN | NaN | NaN | NaN | manufacturerPlaceholder | manufacturerModelNamePlaceholder |
| 1 | E002 | 10 | 0 | NaN | grid1CoordSys | HDsEMG | gold | Grid1 | 10 | 0,1 | NaN | NaN | NaN | NaN | manufacturerPlaceholder | manufacturerModelNamePlaceholder |
| 2 | E003 | 20 | 0 | NaN | grid1CoordSys | HDsEMG | gold | Grid1 | 10 | 0,1 | NaN | NaN | NaN | NaN | manufacturerPlaceholder | manufacturerModelNamePlaceholder |
| 3 | E004 | 30 | 0 | NaN | grid1CoordSys | HDsEMG | gold | Grid1 | 10 | 0,1 | NaN | NaN | NaN | NaN | manufacturerPlaceholder | manufacturerModelNamePlaceholder |
| 4 | E005 | 0 | 10 | NaN | grid1CoordSys | HDsEMG | gold | Grid1 | 10 | 0,1 | NaN | NaN | NaN | NaN | manufacturerPlaceholder | manufacturerModelNamePlaceholder |
As we make use of custom metadata fields, next we define a corresponding electrodes sidecar file (_electrodes.json). Similar to the subjects sidecar, this is the place where we can define non-default columns -- and what we will need here to report essential metadata on invasive EMG.
electrodes_sidecar = {
"interelectrode_distance": "Distance between electrodes. In a grid this means distance between neighboring electrodes. In a fine wire it means distance between the wire tips.",
"electrode_surface_area": "Surface area of the electrode in mm^2",
"electrode_diameter": "Diameter of the electrode in mm",
"electrode_tip_length": "Unisolated length (in mm) of the fine wire tip",
"cannula_diameter": "Diameter of Cannula in mm",
"cannula_length": "Length of Cannula in mm",
"manufacturer": "Name of electrode manufacturer",
"manufacturers_model_name": "Model name of Electrode",
}
Channel data & sidecar¶
Another tabular file -- _channels.tsv -- maps the electrode configuartion to the digitized data matrix. Each channel needs to have a unique name, which is reported together with the type and the unit of the channel. Further, we specify which electrode is associated with which channel, as well as some information about channel quality and applied filters. Here, we have prepared the content of the channels.tsv file in a file which we can load and link to our BIDS dataset.
A few detailed considerations are:
- channel names must be unique. The electrode names in the "signal_electrode" and "reference_electrode" columns must exist in electrodes.tsv file.
- The Type of channel must be one of the options specified in the BIDS documentation (keyword "EMG" for EMG channels, you can use "MISC" for anything that does not have an explicit correspondence and further specify the content in a free-text "description" field).
- "status" can be used to tag channels which should be ignored in data analysis. Must be "good", "bad" or left empty.
- "low_cutoff" and "high_cutoff" refers to Low/High-pass filter frequency (in Hz).
ch_metadata = pd.read_csv("fictionalDatasetExampleInputMetadata//channels.tsv", sep="\t")
ch_metadata.head()
| name | type | units | description | signal_electrode | reference | group | status | low_cutoff | high_cutoff | some_additional_column | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Ch1 | EMG | mV | Monopolar surface EMG | E001 | R1 | Grid1 | good | 10.0 | 500.0 | none |
| 1 | Ch2 | EMG | mV | Monopolar surface EMG | E002 | R1 | Grid1 | good | 10.0 | 500.0 | none |
| 2 | Ch3 | EMG | mV | Monopolar surface EMG | E003 | R1 | Grid1 | good | 10.0 | 500.0 | none |
| 3 | Ch4 | EMG | mV | Monopolar surface EMG | E004 | R1 | Grid1 | good | 10.0 | 500.0 | none |
| 4 | Ch5 | EMG | mV | Monopolar surface EMG | E005 | R1 | Grid1 | bad | 10.0 | 500.0 | none |
As before, we can make use of a _channels.json to define non-default columns.
channels_sidecar = {
"some_additional_column": "some Description of this column" # we define an optional column for illustrative purposes
}
Events data & sidecar¶
Finally, there is an optional file type named _events.tsv, which is a powerfull option to anotate the experimental protcol. This is a tabular file, whereby each row corresponds to an individual event. For each event, we must define "onset" and "duration". Further, we can use arbitary addional columns to further describe the events. For example, an event can be the onset of muscle activation, an electric stimulus or artifacts.
Here we use the events file to describe a trapozoidal contraction intensity profile. Thus we have the following events "muscle on" (the time when the muscle is activated), "linear_isometric_ramp" describing the rising and falling phase of the trapozoidal contraction, "steady_isometric" to describe the plateau of the trapozoidal task and "muscle_off" to describe the end of the contractile task. To fully describe the task, besides the timing we also include information on the event duration, the level of contraction (in %MVC) at the onset of the event, as well as the MVC rate.
# the baseline noise recording does not get an events.tsv file, because there are no events we could write into it.
events_metadata1 = pd.read_csv("fictionalDatasetExampleInputMetadata//events30PercentMVC.tsv", sep="\t")
events_metadata2 = pd.read_csv("fictionalDatasetExampleInputMetadata//events50PercentMVC.tsv", sep="\t")
# To retrieve these cleanly later from inside a for-loop we pack them into a dict with tasknames as keys
events_metadata_taskDict = {
"trapezoidalContraction30PercentMVC": events_metadata1,
"trapezoidalContraction50PercentMVC": events_metadata2,
}
events_metadata1.head()
| onset | duration | sample | mvc_rate | mvc_level | event_type | description | |
|---|---|---|---|---|---|---|---|
| 0 | 1.0 | 0.0 | 2048 | NaN | NaN | muscle_on | Time at which the muscle is activated. |
| 1 | 1.0 | 1.0 | 2048 | 30.0 | 0.0 | linear_isometric_ramp | Linear ramp (rate: 30 % MVC per s; duration: 1... |
| 2 | 2.0 | 1.0 | 4096 | 0.0 | 30.0 | steady_isometric | Steady isometric torque at 30% MVC for 15 s |
| 3 | 3.0 | 1.0 | 6144 | -30.0 | 50.0 | linear_isometric_ramp | Linear ramp (rate: -30 % MVC per s; duration: ... |
| 4 | 4.0 | 0.0 | 8192 | NaN | NaN | muscle_off | Time at which the muscle is deactivated. |
An additional events sidecar file is required to describe the non-mandatory columns (only "onset" and "duration" are pre-defined).
events_sidecar = {
"sample": {
"Description": "Sample index of the event onset (zero-indexing)",
"Unit": "samples"
},
"mvc_rate": {
"Description": "Rate at which the torque changes in percent MVC per second",
"Unit": "% MVC / s"
},
"mvc_level": {
"Description": "MVC (maximum voluntary contraction) level at the onset of the event",
"Unit": "% MVC"
},
"event_type": {
"Description": "Event label.",
"Levels": {
"muscle_on": "The muscle is activated.",
"muscle_off": "The muscle is deactivated.",
"linear_isometric_ramp": "The isometric torque changes linearly over time with a fixed rate.",
"steady_isometric": "Steady isometric contraction at a fixed MVC level."
}
},
"description": {
"Description": "Free text event description."
}
}
Building the dataset¶
Tasks¶
As all subjects performed the same tasks, it turns out usefull to create a list of task labels, which we will use to loop over all recordings:
- restNoise: A baseline recording (i.e., no muscle activity is present) to estimate the recording noise
- trapezoidalContraction30PercentMVC: A trapozoidal contraction at 30% of the subjects MVC
- trapezoidalContraction50PercentMVC: A trapozoidal contraction at 50% of the subjects MVC
tasks = [
"restNoise",
"trapezoidalContraction30PercentMVC",
"trapezoidalContraction50PercentMVC"
]
task_descriptions = [
"The muscle is fully relaxed.",
"A trapezoidal contraction at 30 percent MVC, consisting of linear ramps up and down performed at 30 percent per second and a plateau maintained for 1 s.",
"A trapezoidal contraction at 50 percent MVC, consisting of linear ramps up and down performed at 50 percent per second and a plateau maintained for 1 s."
]
task_instructions = [
"Do nothing.",
"Follow path provided via visual feedback.",
"Follow path provided via visual feedback."
]
samplingFrequency = 2048 # samples per second
n_channels = len(ch_metadata.loc[:,"name"])
# Ensure reproducible random numbers
rng = np.random.default_rng(seed=12345)
# Loop over all recordings
for participant_id in subjects_data["participant_id"]:
subject_label = participant_id.split("-")[-1]
print(f"Bidsifying data of sub-{subject_label}") # Print progress
for idx, task in enumerate(tasks):
# create a random array of the correct size to be our raw data.
recordingLength = rng.integers(low=5,high=8) # in seconds
n_samples = np.ceil(recordingLength * samplingFrequency)
data = rng.uniform(low=0, high=1, size=(n_channels, n_samples))
# Update the EMG sidecar
emg_sidecar = emg_sidecar_template.copy()
emg_sidecar["TaskName"] = task
emg_sidecar["TaskDescription"] = task_descriptions[idx]
emg_sidecar["Instructions"] = task_instructions[idx]
# Init the bids recording class
emg_recording = EMGBIDSRecording(
parent_dataset = FictionalDatasetExample,
subject_label=subject_label,
task_label=task,
datatype='emg',
inherited_metadata=["coordsystem.json", "electrodes.tsv", "electrodes.json", "events.json"], # Here is where we define which files to inherit
inherited_level=["subject", "subject", "dataset", "dataset"], # and here which level they are inherited to
)
# Set data and metadata
emg_recording.set_metadata(field_name='emg_sidecar', source=emg_sidecar)
emg_recording.set_data(field_name='data', mydata=data,fsamp=samplingFrequency)
emg_recording.set_metadata(field_name='coord_sidecar', source=coord_sidecar, overwrite=True)
emg_recording.set_metadata(field_name='electrodes_sidecar', source=electrodes_sidecar)
emg_recording.set_metadata(field_name='electrodes', source=el_metadata)
emg_recording.set_metadata(field_name='channels_sidecar', source=channels_sidecar)
emg_recording.set_metadata(field_name='channels', source=ch_metadata)
emg_recording.set_metadata(field_name="events_sidecar", source=events_sidecar)
if task in events_metadata_taskDict.keys(): # Not needed for resting tasks
emg_recording.set_metadata(field_name="events", source=events_metadata_taskDict[task])
# Write metadata and data
emg_recording.write()
Bidsifying data of sub-01 Bidsifying data of sub-02 Bidsifying data of sub-03 Bidsifying data of sub-04 Bidsifying data of sub-05 Bidsifying data of sub-06
Validation¶
Finally we use the BIDS-validator to check the integrity of our dataset. For more information see here: https://github.com/bids-standard/bids-validator
The muniverse wrapper for the bids-validator allows us to ignore certain codes or fields. This can be helpful for decluttering the output of the validator.
err, warn, _ = FictionalDatasetExample.validate(
print_errors=True,
print_warnings=True,
ignored_codes=[
"EVENTS_TSV_MISSING", # we specify events.tsv files, but not for restNoise. In order to not get one warning per restNoise recording in the dataset, we ignore the code.
],
ignored_fields=[ # We ignore some fields that would produce warnings
"SourceDatasets", # there isn't one
"DeviceSerialNumber", # no hardware was used to record this dataset
"SoftwareVersions", # this also pertains to measurement hardware
"InstitutionName", # we also don't want to specify an affiliated institution
"InstitutionAddress",
"InstitutionalDepartmentName",
"HEDVersion", # we don't use HED.
"StimulusPresentation", # we also ignore this, to not clutter this jupyter notebook.
# We have different ElectrodeManufacturer and ElectrodeManufacturersModelName per electrode, so we specify it on a per electrode basis in the electrodes.tsv file, rather than inside of emg.json (as mandated by the BIDS documentation for this case). However at the time of writing this script, the Validator is not smart enough to understand this. So we ignore the raised warning.
"ElectrodeManufacturer",
"ElectrodeManufacturersModelName",
],
ignored_files=[],
)
print("The BIDS conversion has completed")
print(f"Your BIDS dataset contains {len(err)} errors and {len(warn)} warnings")
Number of detected errors: 0 [] Number of detected warnings: 0 [] The BIDS conversion has completed Your BIDS dataset contains 0 errors and 0 warnings
Spike labels¶
Lastly we want to add some labels of motor unit spike times that we assume are derived from a manual decomposition of the invaisve EMG. Therefore, we treat the decomposed spike times as a BIDS derivative. Such derived data (i.e., processed data or additional information extracted from the data) can be either stored as a standalone BIDS dataset, or can become a sub-folder in your BIDS dataset. The latter is used here. Regarding gloabl information, we only add some basic information on the utilized tools (i.e., the "GeneratedBy" field) and that the dataset is a derivative:
labels_dataset_sidecar = {
"DatasetType": "derivative", # Indicates that this derived data
"GeneratedBy": [{
"Name": "Some Software Tool",
"Version": 3.11,
"URL": "Link to the repository",
"Commit": "123456"
}]
}
dataset_labels = BIDSDataset(
datasetname="manual_decomposition",
path=FictionalDatasetExample.root + "derivatives/"
)
dataset_labels.set_metadata(field_name="dataset_sidecar", source=labels_dataset_sidecar)
dataset_labels.write(overwrite=True)
To add some the spike labels, we can use the MUniverse BIDSDecompositionDerivative class. Thereby, spikes can be added as tabular-like files (such as a BIDS events file) or as a dictonary:
for participant_id in subjects_data["participant_id"]:
subject_label = participant_id.split("-")[-1]
print(f"Bidsifying data of sub-{subject_label}") # Print progress
for idx, task in enumerate(tasks):
if task == "restNoise":
continue # There are no labels for resting tasks
# Make some random spike trains
n_units = 3
n_spikes = 20
spikes_times = {i: [] for i in range(n_units)}
for unit_idx in range(n_units):
spikes_times[unit_idx] = rng.integers(
low=2048, high=2048*4, size=n_spikes)
decomposed_recording = BIDSDecompositionDerivative(
parent_dataset=dataset_labels,
subject_label=subject_label,
task_label=task,
datatype='emg',
inherited_metadata=["events.json"],
inherited_level=["dataset"],
)
decomposed_recording.add_spikes(spikes=spikes_times, fsamp=2048)
decomposed_recording.write(overwrite=True)
Bidsifying data of sub-01 Bidsifying data of sub-02 Bidsifying data of sub-03 Bidsifying data of sub-04 Bidsifying data of sub-05 Bidsifying data of sub-06