The new base HED-3G schema specification (current version, 8.0.0) clarifies and simplifies the structure of the upper-level HED vocabulary schema to better-support annotation and readability. It also increases the precision of the HED syntax and expands the scope of the base HED schema to better support specification of experiment design and structure as well as participant task, intent, and expectation, although more fundamental work on these is still needed. Why is this additional information an essential part of event annotation? Because the aspects and attributes of events that are most important to document and apply in subsequent analysis are their relationships to participant task, intent, and expectation in the current temporal context, which in turn is intrinsically connected to experiment design and structure.
Unique Mapping
A key new concept in HED-3G is its unique mapping rule. In HED-3G individual terms (node names in the schema trees) used in HED tags may appear in no more than one place in a schema. While this requirement may somewhat complicate the HED schema-design process, it offers great improvements in usability for HED users. Users can now just use single (leaf) node names instead of complete tag path hierarchies during annotation; HED tools can then expand the ‘short form’ annotations to full tag paths.
User Definitions
Many research labs develop shorthand ‘lab jargon’ terms to refer to event types used in their experiments (‘targets, ‘standards’, etc.). Such descriptions are not standardized across laboratories and omit many details crucial to efficient cross-study search and analysis. HED user definitions allow users to give detailed definitions of lab jargon terms once, early in the annotation process, thereby retaining the mnemonic advantages of jargon for the annotator, while avoiding its vagarity in shared or archived data.
Event Duration and Context
Another key HED-3G advance is the introduction of comprehensive mechanisms for handling of events with different durations and overlapping time boundaries. The need for this capability is motivated by the important context sensitivity of brain dynamics, essential for enabling the human brain to adapt behavior and experience flexibly in light of ever-changing needs, threats, and opportunities. HED-3G definitions and organizational tags play a crucial role in supporting these mechanisms.
Library Schemas
HED-3G also introduces the concept of subsidiary HED schema libraries that expand the HED base schema tag vocabulary by providing terms for describing events needed by particular user communities (clinical practice, language research, etc.). Though extensive, this formal schema reorganization does not significantly impact the use of HED in BIDS, as the HED validation tools have been built to validate against any specified HED schema.
Reorganization of the HED Tag Vocabulary
As the HED-2G vocabulary expanded, some branches of the HED-2G schema hierarchy became quite deep and detailed, while other branches remained relatively bare, making search through the tag term forest frustrating and posing a significant usability barrier. A more compact and easily searchable HED schema format was needed to improve HED system effectiveness and usability. Figure 1 displays the redesigned schema using the new online HTML schema browser that allows users to explore any available version of the HED schema with expandable or collapsible views. In the fully expanded view users can use the browser find-in-page search features to find particular items.
Computer menu usability guidelines suggest limiting sub-categories to fewer than 10 items (Carliner, 1987), ideally 3 to 7. As part of the redesign, the HED-3G vocabulary was therefore significantly reorganized for clarity into the following eight top level-categories (with numbers in parentheses indicating the number of second-level categories): Event(7), Agent(6), Action(5), Item(4), Property(7), and Relation(5). This organization reflects the trade-off between hierarchy balance and depth under the constraints of orthogonality. In addition to vocabulary reorganization, the schema description and purpose of each tag are being improved, and suggestedTag and relatedTag attributes are being added for individual tags. In Fig. 1, suggestedTag value Property/Task-property/Task-event-role is displayed in the details box on the right when the hovers the cursor over an element in the schema tree on the left. These tags will allow tool-builders to easily incorporate hints to assist users during annotation, review, and analysis. The planned addition of an ID to each node in the schema hierarchy will allow future development of databases of examples relevant to tags as well as links to external information sources and ontologies.
Unique Mapping and the Introduction of Short Forms
In previous versions, HED strings were always built, displayed, and reviewed in fully elaborated format. In HED-3G, a full path annotation is now referred to as a node’s long form. However, when researchers wish to detail the nature of not yet annotated events or review how events have been annotated, full long-form HED strings can be difficult to read quickly. If the individual nodes in a schema hierarchy have unique names, it is easy to expand any node name or its partial path into its full path. The use of any partial path from a schema’s node-name to its schema tree root is referred to as a short form.
HED-3G requires that all tools from validation through analysis support short form and provides library functions in Python, JavaScript, and MATLAB to support translation. HED-3G short form syntax compresses the HED string syntax to enable quick composition and review. The concise representation is designed to make HED-3G annotations easier to read, write, and review than their complete long forms as illustrated by the following example.
The two HED string annotation versions in Example 3 above describe the same event as Examples 1 and 2. Their difference in ease of comprehension is evident, yet the full long form can be automatically derived from the short form because it is built on a base HED schema (8.0.0ph) that satisfies the HED-3G unique mapping rule. Uniqueness allows HED tools to present and translate HED strings interchangeably between long and short forms. For added clarity, the composer can include as much of the relevant tag prefix as desired (for example, using Sensory-presentation/Visual-presentation rather than just Visual-presentation above).
Notice that although the HED-2G and HED-3G HED strings in Examples 2 and 3 are equivalent, the HED-3G schema has been somewhat reorganized to satisfy uniqueness and orthogonality. The HED-3G string Event type (Sensory-event in Example 3) indicates that an environmental sensory event has occurred in the participant’s field of view. In contrast, HED-2G directly specifies that an Experiment-stimulus has occurred (Example 2). Since the same sensory event can be a stimulus in one task and not in another, this organization makes it difficult to annotate sensory events in a consistent manner.
In HED-3G, the relationship of the sensory event to the intent of the experiment (recording the role of the event in the experiment structure) is specified using further tags. This separation in HED-3G between sensory events and experiment design is based on the recognition that brain and behavioral dynamics are affected by sensory input in complex ways that are highly dependent on the participant's perceived significance of the event within the currently evolving context.
Expanding the HED Vocabulary with HED Library Schemas
A major shortcoming of HED-2G was the tendency for users, when faced with a new concept, to add overly-specific terms and jargon to the base schema – for example, adding musical terms to tag events in music-based experiments, video markup terms for experiments involving movie viewing, traffic control terms for experiments involving virtual driving, and so forth. Clinical fields using neuroimaging also have their own specific vocabularies of terms for noting data features of clinical interest (e.g., ‘seizure’, ‘sleep stage IV’). Including all possible research-area-specific terms in the base HED schema would quickly make the vocabulary wholly unwieldy and practically unusable. In building the base HED-3G schema, therefore, we have tried to remove terms with an overly-specific field of use.
To accommodate the annotation needs of specific research and clinical subfields, HED-3G introduces HED library schemas. To use a programming language analogy: when programmers write a C or Python module, its code does not become part of the standard C or Python library. Instead the module is embedded within an application library that is included when needed by an application. Similarly, in addition to the base HED-3G schema, users may use tags from one or more HED library schemas to describe events in their data. HED library schemas must conform to the same syntax as the base HED schema, and should follow four basic rules:
-
1.
Schema terms should be readily understood by most users (Clarity).
-
2.
Within a library schema, every term must be unique (Uniqueness).
-
3.
Terms used independently must be in different sub-trees (Orthogonality).
-
4.
Term hierarchies should have a moderate number of subcategories at each node, ideally in the range 3 to 7 subcategories (Structural sparsity).
As with C or Python libraries, we anticipate that many different HED schema libraries may be defined and used in conjunction with the base HED schema to annotate details of events in experiments designed to answer questions of interest to particular research or clinical communities. Since it would be impossible to avoid naming conflicts across schema libraries built in parallel by different user communities, HED-3G supports distinct schema library namespaces. Users can define a local namespace name within their file and associate the identifier with an external library schema. Annotations identify the source of terms defined in a specific HED library schema by prepending namespace designators (using format, Library_identifier:Tag-term) to use the Tag-term term from the library schema designated by its brief library namespace identifier.
The first HED library schema, now under construction, will implement the standardized SCORE vocabulary used by clinical neurophysiologists and neurologists worldwide in reporting their visual (and/or software-aided) evaluation of clinical EEG data (Beniczky et al., 2013, 2017). The development of a HED library schema for SCORE will allow archiving of annotated clinical EEG data in BIDS or other formats that accept HED annotations, hopefully enabling large quantities of such data to be accumulated for clinical and basic exploration and discovery using now rapidly advancing machine learning methods.
The SCORE library schema will be the first to be included in a planned central HED library schema registry (https://github.com/hed-standard/hed-schema-library). Although private HED schemas may also be used, annotations of shared data using registered and openly shared HED library schemas will be of value to more users for more purposes, and will thus be encouraged.
Definitions, Experimental-structure, and Time
HED-3G also introduces a number of structural enhancements that allow annotators to capture richer information about experiment events in ways that are both human- and machine-actionable. This information includes the nature and structure of the control variables, the temporal organization of the recordings, and detailed contextual information describing the conditions under each event occurs. HED-3G introduces user-developed Definition tags not only to facilitate tag reuse and minimize tag repetition, but also as the foundation for annotation of complex structure and temporal evolution.
Definition tags allow users to use terms they normally use in the laboratory to describe their data, while mapping them into standardized annotations appropriate for sharing. Users specify a named Definition tag associated with a tag group of elaborative HED tags. The defined name can then be used to represent that group of tags during annotations. HED tools automatically handle the translation during validation, event-related data search, and analysis.
Once ScreenSetup is defined, the tag Def/ScreenSetup can be used in annotations to avoid repeating these screen description tags in every screen-presented visual event. The ‘Def/’ prefix is required in the annotations to allow the HED validator and analysis tools to identify ScreenSetup as an unexpanded definition name. During analysis, tools will insert the entire definition in place of Def/ScreenSetup to create a fully-elaborated HED string annotation for each event. However, the Definition/ScreenSetup tag in the definition will be replaced by Def-expand/ScreenSetup so that the inserted tags retain an association with the definition but are not confused with the definition, itself. In practice, a lab-specific set of definitions can be built and used for tagging all relevant lab data sets, further speeding annotation of new and existing data.
Annotating Event Duration
Events without explicit temporal extent (e.g., onset, offset or duration) are modeled as instantaneous (i.e., occurring at a single instant). In HED-3G, the ability to give tag groups explicit Definition names also provides a foundation for specifying the temporal extent (time span or temporal scope) of ‘enduring’ events having measurable temporal extent. Tagging an enduring event’s temporal extent explicitly allows HED tools to support analysis of events modeled (more flexibly and often, realistically) as processes unfolding through time. For example, in a reach-to-touch gesture in a touchscreen task or in a step cycle during a treadmill walking task, each participant action has an appreciable duration within which various critical stage events may be annotated for analysis (e.g., stimulus or movement onset, offset, points of max acceleration or velocity, etc.).
Enduring events may be indicated explicitly using pairs of instantaneous Onset and Offset events linked to each other by a common tag-group definition name. A defined name grouped with an Onset tag marks the beginning of the enduring event. The end of the enduring event occurs either when the defined name is grouped with an Offset tag or when it is grouped with an Onset tag. All tags in a tag group containing a Duration or Onset are assumed to apply throughout the enduring event. Tags not appearing in a tag group containing Duration or Onset are assumed to apply only to the marked instant. During analysis, HED tools keep track of which enduring events are ongoing at each moment and add Event-context information to the HED string for each event, as detailed below.
An event string that is grouped with a Duration tag also represents an enduring event. The onset (i.e., the beginning of the time span) of this enduring event is the time of the event whose annotation contains the Duration tag group. The enduring event’s offset is not recorded explicitly as a separate event, but calculated by adding the duration value to the onset time. Multiple tag groups containing Duration tags with different duration values may appear in the same event annotation.
Enduring Events and Experiment Design
An important addition to HED-3G is the capability to embed analysis-ready annotation of experiment design and task organization via enduring events. This embedding is accomplished using the HED-3G organizational tags Recording, Task, Condition-variable, Time-block, and Experimental-trial in conjunction with enduring events.
The Recording tag is a convenient organizational tag for grouping metadata and setup information relevant to the entire recording. The Recording tag is often associated with an enduring event spanning the entire recording. We anticipate developing tools tailored to specific dataset organizations such as BIDS that automatically gather relevant metadata and setup information stored in auxiliary files and insert this information in tagged form as such an enduring event.
A task is a limited set of structured and, typically, instructed mental and/or physical activities performed by the participant during the recording; usually these are integrally related to the planned data analysis. The Task tag is generally a top-level organizational concept used to organize the annotations of these activities and their relationship to recorded events.
A condition variable is an aspect of the experiment that is set or manipulated during the experiment to observe an effect or to control bias. Condition variables are sometimes called independent variables or contrasts. The Condition-variable tag is used to organize the annotations that describe these conditions. Often an Condition-variable is used as part of the annotation of an event to indicate that the specified experimental condition was in effect during that event.
Many electrophysiological experiments are organized into distinct blocks of contiguous time interspersed with breaks for participant relief and setup changes. The Time-block tag organizes tags used to annotate what is happening during such a block. Time-block tags are usually associated with enduring events marking the temporal span of the blocks.
In many electrophysiological experiments designed for event-related analysis, a specific set of events occurs in sequence (e.g., a stimulus presentation followed first by a behavioral response and then by some sensory feedback), and the contiguous data segment containing this sequence is extracted for analysis. The contiguous data block is sometimes referred to as an experimental trial. The Experimental-trial tag organizes annotations associated with an experimental trial. The Experimental-trial tag may be associated with an enduring event. Another use of the Experimental-trial tag is to group events associated with a given trial. For example, a tool could automatically identify which events are part of each trial based on a task specification. The tool could then insert the tag Experimental-trial#, where # is the trial number, in the HED annotation of each event.
To understand how these organizational annotation terms may be used, consider the following simple example study in which the participants perform two main tasks, each in two different task conditions. A researcher can organize this experiment in many ways including those described in Example 5 and illustrated schematically in Fig. 2.
-
Example 5: Three possible experiment designs for the simple study.
-
Design 1 (left): Each Recording includes a single Task and Condition-variable, but has two Time-block sections separated by a relief break. Counterbalancing of Task and Condition-variable is done at the study level over four Recordings in different orders for each participant. An Experimental-trial includes three events.
-
Design 2 (center): Each Recording includes two Time-blocks in which the participant performs one of the two main Tasks. Each main task Time-block comprises a single Condition-variable. Task and Condition-variable counterbalancing is performed across the time blocks within each recording.
-
Design 3 (right): Each Recording comprises one Task and continuous Time-block, but here the Condition-variable is selected at random for each Experimental-trial.
A Sample Dataset Structure Viewer
A best practice for HED-3G tagging is to create Definition tags to represent the organization of the experiment, including definitions for each Task, Condition-variable and Time-block used in the study. These defined tags should then be grouped with Onset and Offset tags to mark where in the experiment the particular tagged aspect was in effect. Appropriate and consistent structural annotation can provide a wealth of information to automated data search and analysis tools. For example, a data repository could use this information to automatically produce a visualization of the dataset structure via a repository data browsing application. Figure 3 below shows a mock-up overview of such a visualization.
Such a timeline viewer application might be used by researchers to verify that the experiment was actually conducted according to the intended or documented specification. The availability of such annotations might also encourage researchers to more completely document items they might otherwise ignore or forget to tag (such as the administration of a survey between the two main task blocks). More details such as the presence of selected types of trial events might be optionally included in the lowest level of the timeline display when/if space permits.
Importantly, the organizational tags Condition-variable and Time-block make available information about changes in task and conditions at the supra-event level needed to inform any analysis, without requiring the annotator to include all their information when annotating every event during their time-span (see following paragraphs). Using these tags, automated tools could test whether there was a significant difference in some EEG measure across all available studies that included visual stimulus presentation conditions in which some control variable (e.g., stimulus rate) varied either within or across studies. One might also test across a set of HED-tagged datasets for subject traits or demographics that account for some feature variance (e.g., to test how available participant age may influence some measures of EEG dynamics or recorded behavior).
Context-Aware Analysis
To make effective use of the information provided by currently unfolding events, we are currently designing HED-3G analysis tools that perform tag remapping to document ongoing events that contribute to the active context of the intervening events. For example, suppose PlayMovie is an identifier defined to document the presentation of a short movie to the participant. A (Def/PlayMovie, Onset) event occurs at 20 s from the beginning of the file, and a (Def/PlayMovie, Offset) event occurs at 100 s. All the intervening events in the interval [20, 100] seconds should inherit the information that the specified movie clip is playing (and perhaps that the participant has been asked to view the movie with some specified task intent), without requiring the user to tag this information explicitly in the HED string for each such event. However, this mapping of the ongoing context should not anywise suggest that events occurring during the movie presentation should be associated with effects similar to those associated with the physical movie presentation onset and offset events.
HED-3G introduces the Event-context tag to capture this distinction. During analysis, compliant HED tools append a single (Event-context, ….) tag group to the HED string annotation of each annotated event. The tools then insert copies of the annotations of all then-ongoing enduring events into the Event-context tag group. Thus, an event occurring while the PlayMovie event is ongoing will have annotation including this information in its full-form Event-context tag group. This tag group may also hold many other types of information pertaining to the recording as a whole, as well as to the current task trial, block, and/or condition. While the actual mapping of an event’s context does not take place until the full-form annotation is assembled for analysis, the ability to use this facility for advanced analysis depends critically on the availability of appropriate annotations.
HED Tools and Development Process
User-Friendly Tagging Tools
The original CTagger tool has been completely redesigned to enhance the ease of navigation during the annotation process as illustrated in Fig. 4.
The CTagger main interface consists of two parts: on the left, a list of event types to be annotated; on the right, a HED string input text area. CTagger suggests tags as users start typing, and users can also browse through an expandable tag view to select appropriate tags to add to the event string during tagging. The new suggestedTag and relatedTag attributes in the HED schema will be used to provide tagging hints for users during annotation.
The HED Tool Libraries
As discussed previously, compliant HED-3G analysis tools should handle the mapping of events to event context across the recording. The HED analysis tools also must convert all short-form tags to long form and expand defined tag terms into the tag groups they represent. Tool libraries in Python, Matlab, and JavaScript are under development to accomplish these expansions in easily callable formats. These libraries will provide a foundation for future tool development. Several other basic tools for searching and extracting time-locked data epochs are also already available or under development.
In addition to supporting common types of data search and collection operations, the structure of HED-3G may support future applications using more extensive knowledge-integration techniques including natural language processing. Future additions to the base HED schema could support inclusion of additional metadata into the HED schema, such as unique term identifiers and links to external resources and knowledge bases stored in external databases. Such links and identifiers, once created by domain experts, need not be visible during HED annotation and review. For example, there is a natural correspondence between HED schema elements and the Resource Framework Description (McBride, 2004) for interchange of web-linked data (Bigdely-Shamlo et al., 2016).
Formalizing the Development Process
In order to put the development processes on a firmer footing for community contributions, we have moved the code for all projects to the hed-standard GitHub organization site (https://github.com/hed-standard) and instituted the standard GitHub fork-pull-review-merge mechanism for proposing and incorporating schema changes and code updates. The hed-specification repository (https://github.com/hed-standard/hed-specification) holds all versions of the HED schema. HED tools can download and cache any of these schema versions for use in validation and analysis. The base HED schema is stored in XML format for all machine processing purposes. The schema is also stored in a human-readable WYSIWYG MEDIAWIKI format to make it easier for developers to edit. Supported functions convert between MEDIAWIKI, XML, and JavaScript/HTML formats. A convenient JavaScript/HTML tool displays the schema in an interactive, expandable format in web browsers, facilitating schema search and review (Fig. 1). Issues, comments, and discussion are handled using the Issues mechanism of GitHub.
Other repositories housed on the hed-standard organization site include hed-python (validation and analysis tools as well as Docker containers for online deployment), hed-javascript (npm validation module called by BIDs for validating HED), CTagger (portable GUI tagging tools), hed-matlab (HED validation and analysis tools as well as EEGLAB plug-ins), and hed-schema-library (repository for organizing community development of HED library schema). Additional repositories hold examples, documentation, and other tools.