A Practical Approach to Using the Genomic Standards Consortium MIxS Reporting Standard for Comparative Genomics and Metagenomics

Eloe-Fadrosh, Emiley A.; Mungall, Christopher J.; Miller, Mark Andrew; Smith, Montana; Patil, Sujay Sanjeev; Kelliher, Julia M.; Johnson, Leah Y. D.; Rodriguez, Francisca E.; Chain, Patrick S. G.; Hu, Bin; Thornton, Michael B.; McCue, Lee Ann; McHardy, Alice Carolyn; Harris, Nomi L.; Reddy, T. B. K.; Mukherjee, Supratim; Hunter, Christopher I.; Walls, Ramona; Schriml, Lynn M.

doi:10.1007/978-1-0716-3838-5_20

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2802))

1562 Accesses
4 Altmetric

Abstract

Comparative analysis of (meta)genomes necessitates aggregation, integration, and synthesis of well-annotated data using standards. The Genomic Standards Consortium (GSC) collaborates with the research community to develop and maintain the Minimum Information about any (x) Sequence (MIxS) reporting standard for genomic data. To facilitate the use of the GSC’s MIxS reporting standard, we provide a description of the structure and terminology, how to navigate ontologies for required terms in MIxS, and demonstrate practical usage through a soil metagenome example.

CRediT roles

Emiley A. Eloe-Fadrosh: Conceptualization, Writing - original draft, Writing - review & editing

Chris Mungall: Methodology, Conceptualization, Writing - original draft, Writing - review & editing

Mark Andrew Miller: Formal analysis, Software, Writing - original draft, Writing - review & editing

Montana Smith: Methodology, Software, Writing - original draft, Data curation, Writing - review & editing

Sujay Sanjeev Patil: Software, Writing - original draft, Writing - review & editing

Julia M. Kelliher: Writing - original draft, Writing - review & editing

Leah Y.D. Johnson: Writing - original draft, Writing - review & editing

Francisca E. Rodriguez: Writing - review & editing

Patrick S. G. Chain: Writing - review & editing

Bin Hu: Writing - review & editing

Alice C. McHardy: Writing - review & editing

Michael B. Thornton: Writing - review & editing

Lee Ann McCue: Writing - original draft, Writing - review & editing

Nomi L. Harris: Writing - review & editing

T.B.K. Reddy: Data curation, Writing - review & editing

Supratim Mukherjee: Data curation, Writing - review & editing

Christopher I. Hunter: Writing - review & editing

Ramona Walls: Writing - review & editing

Lynn M. Schriml: Writing - review & editing

You have full access to this open access chapter, Download protocol PDF

Key words

1 Introduction

To make genomics data Findable, Accessible, Interoperable, and Reusable (FAIR) [1], it is necessary to have standards for describing the provenance of sequence data. Accurately recording information about factors like sequencing method and environmental conditions, referred to as metadata, allows for reanalysis, integrative meta-analyses, and accurate interpretation of results. The Genomic Standards Consortium (GSC) [2] is an open-membership working body formed nearly twenty years ago with the aim of supporting community-driven standards for sequence data. The primary standard produced by the GSC is the Minimum Information about any (x) Sequence (MIxS) [3], which allows researchers to capture extensive metadata on a per-sample basis. MIxS consists of a number of metadata elements (also called terms) that describe a particular characteristic of the sample or its source environment. These elements are attributes of different checklists (for describing sampling method and sequencing) and different extensions (for describing the source environment). The allowed values for each of these elements include free text, quantitative measurements, or value sets (picklists derived from different controlled vocabularies or ontologies, such as the Environment Ontology [4] for natural environments or the Uberon anatomy ontology [5] for metazoan host-derived samples). Most samples and their corresponding sequence data are described with a combination of a checklist and an extension.

Without metadata describing the environmental conditions, sample collection methods, or data generation approaches, (meta)genomic data would be meaningless [6]. As the volume and complexity of (meta)genomics data have dramatically increased and (meta)genomics has become a data-driven field [7], metadata provides the necessary contextual information for data use, reuse, and comparative analyses. Through the implementation of the MIxS standard across primary repositories, researchers are able to search and discover data of interest and perform comparative analyses such as correlating genes or functions of interest with environmental parameters. Further, as (meta)genomic catalogs across diverse samples and environments are generated, such as from human or other host-associated systems [8,9,10], soil [11, 12], and diverse aquatic habitats [13, 14], integration and synthesis will only be achieved through standardized metadata.

Here, we describe the components of the MIxS standard, discuss how to navigate specific ontologies that form the basis of mandatory terms, and provide an example of a soil metagenome collected as part of the National Science Foundation's National Ecological Observatory Network (NEON) continental-scale observation facility (https://data.neonscience.org/data-products/DP1.10107.001). We also provide updates on some recent developments in the evolution of MIxS that make the standard more FAIR and easier to use. While there are a variety of MIxS implementations for sample submission across the primary data repositories that form the International Nucleotide Sequence Database Collaboration (INSDC [15]) and other (meta)data platforms and knowledge bases [16, 17], the outlined methods aim to provide researchers with a practical approach to organizing their data from field sampling expeditions and an understanding of the terminology used in MIxS implementation. As MIxS is a community-driven standard, there are regular updates to terms on an approximately annual basis, and researchers are invited to contribute.

2 Overview of the Structure and Terminology of MIxS

The MIxS standard captures environmental information, sample collection methods, sample properties, nucleotide extraction method, quality, quantity, library preparation, and sequencing information, among other aspects. MIxS provides a number of terms (also called metadata elements) for describing these aspects of a sample. Some terms are generic and are applicable across all samples, while others are more specific to certain kinds of studies, environments, or sample collection methods. Examples of terms that are broadly applicable are depth, local environmental context, and geographic location (latitude and longitude). To ensure the MIxS standard is compliant with the FAIR guiding principles [1] and best practice for identifiers [18], all terms are assigned a resolvable, globally unique persistent identifier called a MIxS ID. For example, the term “depth” has the identifier MIXS:0000018, which is a compact uniform resource identifier (CURIE [18]). CURIEs expand to resolvable URLs by replacing the prefix (e.g., MIxS) with a web location (e.g., https://w3id.org/mixs/). In addition to the definitions, the MIxS terms have both a long descriptive title as well as a short, computer-friendly name called the “structured comment name.” An example is “altitude” (MIXS:0000094), which has the structured comment name “alt” and the title “altitude.”

There are two main components of the MIxS standard: checklists and extensions (previously referred to as “environmental packages”). These components are described below and outlined in Tables 1 and 2. Checklists and extensions are intended to be used in a combinatorial and modular manner. New checklists and extensions may be proposed and developed in coordination with the GSC community, stakeholders within the field, and the GSC Technical Working Group.

Table 1 The MIxS checklists. Checklists include metadata terms to minimally describe the sampling and sequencing methods. The six main checklists span genomes, marker genes, metagenomes, single amplified genomes, metagenome-assembled genomes, and uncultivated virus genomes. For marker genes and genomes, the checklists are classified under sub-checklists related to the organism or sequence as specified below. All checklists share the ten metadata terms listed in the right box. Additional type-specific descriptors not listed here are defined for each checklist and sub-checklist (https://genomicsstandardsconsortium.github.io/mixs/#checklists)

Full size table

Table 2 The MIxS extensions. Extensions are a collection of context-specific terms developed by community experts to provide context about the sample and environment. The GSC describes an environment as any location in which a sample or organism is found. The extensions available currently are listed below with example terms specific to each extension. Additional information and a full list of terms is available through the GSC’s GitHub (https://genomicsstandardsconsortium.github.io/mixs/#extensions)

Full size table

3 Checklists Describe Sampling and Sequencing Methods

A checklist is a collection of terms that minimally describe the sampling and sequencing method of a biological sample used to generate sequence data (https://genomicsstandardsconsortium.github.io/mixs/#checklists). Checklists include mandatory, recommended, and optional terms for specific types of sequencing data: genome, metagenome, marker gene, or more recently single-cell genomes, metagenome-assembled genomes, and predicted viral genomes [19, 20]. For genomic sequences (Minimum Information about any Genome Sequence, MIGS [21]), there are specific checklists for different taxa groups: Eukaryotes (EU), Bacteria and Archaea (BA), Plasmids (PL), Viruses (VI), and Organelles (ORG). Similarly, for marker gene sequences (Minimum Information about a MARKer gene Sequence, MIMARKS [3]), there are two checklists: Surveys (SU), which comprise sampling directly from environmental samples, and Specimens (SP), which are directly from cultured samples. There are ten mandatory terms that span all checklists: project name, sample name, taxonomy ID of DNA sample, geographic location (latitude and longitude), geographic location (country and/or sea, region), collection date, broad-scale environmental context, local environmental context, environmental medium, and sequencing method (Table 1). Beyond these ten mandatory terms, the type-specific checklists contain additional terms that have been developed in coordination with the corresponding research community.

4 Extensions Describe Sample and Sampling Contexts

Extensions (previously referred to as “environmental packages”) are collections of terms describing the specific environment, host, or context for a biological sample and are developed with domain experts (https://genomicsstandardsconsortium.github.io/mixs/Extension/). Extensions supplement checklists by providing additional terms to elaborate the context of the sample and/or sampling event (Table 2). For example, the soil extension has a number of terms to record attributes specific to soil environments, including soil depth (MIXS:0000018) for the measured vertical distance that a sample was collected and FAO class (MIXS:0001083) for soil classification from the FAO World Reference Database for Soil Resources [22]. Similarly, environment and study-specific terms like “history/agrochemical additions” (MIXS:0000639) are important for experimental designs in which fertilizers or other agrochemicals are applied to the field site. Extensions are used in conjunction with checklists, and together they form a “Combination.” For example, if a researcher generates metagenome sequence data from a soil environment, the appropriate combination would be MIMS and the soil extension. A detailed description of this combination is provided below in the section, A Primer on Using MIxS: The MIMS Checklist and Soil Extension (Subheading 8).

5 Use of Ontologies and Value Sets

The use of ontologies in MIxS supports the standardization of terms, allowing different datasets to be combined and compared (an example is provided below in the section, A Primer on Using MIxS: The MIMS Checklist and Soil Extension (Subheading 8)). Ontologies also allow the submitter to describe values at the appropriate level of granularity. To standardize the use of categorical values, MIxS makes use of both ontologies and value sets for some terms (Table 3). For example, “host body site” (MIXS:0000867) can take values that are terms from the Uberon multi-species anatomy ontology. In some cases where there is no standardized ontology, a value set, i.e., a small set of enumerated values, is provided as an option. As another example, the term “host cellular location” (MIXS:0001313) takes a value set/enumeration that restricts the permissible values of “host cellular location” to “extracellular,” “intracellular,” or “not determined.” In the future, these value sets may be mapped to ontologies.

Table 3 Examples of ontologies used in MIxS. A set of illustrative examples (not comprehensive) for ontologies used in MIxS together with example terms that demonstrate their usage and example values

Full size table

When ontology term values are provided in MIxS, the standard requires that these be written using “termLabel [termID]” syntax, where the label is followed by the unique identifier in square brackets. This allows for both human readability as well as the best-practice use of identifiers. All ontology identifiers are prefixed identifiers (also known as CURIEs [18]), with the prefix registered in the bioregistry [23], and most of the ontologies belong to the Open Biological and Biomedical Ontology (OBO) Foundry [24]. MIxS uses ontologies that are openly available and can be browsed in standard ontology web portals such as BioPortal [25], OLS [26], or OntoBee [27]. When browsing these terms using these standard browsers, it is possible to see terms in the context of other terms, alongside their textual definitions, which makes it easier to select the correct term.

6 MIxS Versions

MIxS is updated as new terms are suggested by domain experts or errors are found in the current version. MIxS follows the three-part Semantic Versioning practice [28], in the format X.Y.Z, where X is the major version, Y is the minor version, and Z is the patch version. A major version is released whenever new checklists, extensions, or terms are added. This requires approval from the GSC Board and major external stakeholders, such as INSDC and the Genomes OnLine Database for subsequent adoption [17]. The target frequency for major updates is roughly once per year. A minor version is released when errors have been fixed or refinements have been made without adding new checklists or extensions. Minor versions may include updates to terms or new terms that do not break any existing use of checklists and extensions. Minor version updates require review from the GSC Compliance and Interoperability Group. Patch versions are released when infrastructural changes are made to the MIxS code repository or to fix grammatical or spelling errors without making functional changes to MIxS content. Patches require review from the GSC Technical Working Group. The MIxS standard has moved to being hosted on GitHub, allowing full tracking of changes and the ability to easily retrieve older versions.

7 Methods

7.1 How to Access the MIxS Standard

There are multiple ways to explore the MIxS standard. The web-based documentation at https://w3id.org/mixs has been optimized for exploring the large number of MIxS terms, checklists, and extensions. Resources in various data modeling frameworks (JSON-LD, JSON Schema, OWL, SQL) are also provided for computational users in the MIxS GitHub repository, specifically at https://github.com/GenomicsStandardsConsortium/mixs/tree/v6.2.0/project. Since MIxS is written in LinkML, other technical representations could be added in the future. Recently, in collaboration with the National Microbiome Data Collaborative (NMDC [16]), the GSC updated the underlying representation of MIxS to use the Linked Data Modeling Language (LinkML [29]), which is expressed in the YAML format. This YAML representation also allows MIxS to be automatically converted to different formats via the LinkML library for use in different tools by computational users. For example, there is a JSON Schema version of MIxS that allows data in JSON format to be validated. There are also semantic web representations, such as a Web Ontology Language (OWL) representation, which can be used in ontology browsers or editors like Protege.

As described previously, all MIxS terms are assigned a resolvable, globally unique persistent identifier that resolves to a page with full details about the term, including (1) both a structured comment name and a title; (2) a description of what the term represents and how it should be used; (3) which checklists and extensions the term can be used with; (4) the allowed values for that term; and (5) additional information of interest. An example is shown in Fig. 1.

A screenshot of M I X S terms. a. a. M I M S checklist and soil extension. The third option, reference, or method used in vegetation classification is squared. b. Current vegetation method. It lists the properties below including range, cardinality, structured pattern, and regex pattern. — **Fig. 1**

Across the INSDC, the MIxS standard is made available in different forms. The ENA provides XML downloads for all extensions (https://www.ebi.ac.uk/ena/browser/checklists), and the NCBI BioSamples database provides both XML and Excel downloads of each extension and checklist combination https://www.ncbi.nlm.nih.gov/biosample/docs/packages/. Further details for INSDC data submission are provided below.

7.2 How to Use the MIxS Standard for Data Use, Reuse, and Analysis

Standards are necessary to ensure data are interoperable and can be combined with data from other sources. By providing a standard set of data descriptor terms, together with constraints on how these can be used, MIxS allows different genomic datasets from multiple sources, including microbiome datasets, to be combined in meta-analyses. The ways in which MIxS is used vary depending on the database implementing it. Some resources, such as the NMDC Data Portal [16], allow for faceted search using a selected subset of MIxS terms (Fig. 2).

A screenshot displays the M X S standard for data use, reuse, and analysis. The right pane includes a list of sample options. The left pane displays the depth measure. — **Fig. 2**

When downloading biosample data in bulk, MIxS terms may appear as column headers in tabular data downloads, as XML elements (NCBI BioSample), or JSON objects (NMDC). Note that additional processing may be required to make data comparable. Most MIxS measurement fields are string values containing both a numeric value and a unit (and in some cases, ranges may be allowed). These may need to be parsed before quantitative analysis. MIxS does not currently mandate the use of any one unit for measurement fields such as “depth,” which means that one study may measure depth in centimeters and another in meters, so it may be necessary to do basic unit conversions.

Many databases do not enforce all MIxS constraints, or they may have data that predates certain constraints. This means that some fields may have erroneous or ambiguous values. Care should be taken with any analysis, and decisions on how to clean or normalize the data need to be made on a case-by-case basis. An additional issue is data sparsity: although MIxS contains a lot of terms to describe various aspects of a sample, in practice, very few are commonly used, and thus analyses will need to handle missing data appropriately. As awareness of MIxS increases and data validation tools such as the NMDC Submission Portal (https://data.microbiomedata.org/submission/home) become more widespread, the quality and completeness of sample metadata should increase, enabling more powerful meta-analyses.

7.3 How to Use the MIxS Standard for Data Submission

The GSC works across public databases and primary repositories, namely the International Nucleotide Sequence Database Collaboration (INSDC, https://www.insdc.org/ [15]; comprising the DNA Data Bank of Japan (DDBJ) at the National Institute of Genetics in Mishima, Japan; the European Nucleotide Archive (ENA) at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK; and GenBank at the National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health in Bethesda, Maryland, USA). The INSDC integrates the MIxS standard, updating their nucleotide sequence and BioSample resources (NCBI Packages: https://www.ncbi.nlm.nih.gov/biosample/docs/packages/; EBI BioSamples https://www.ebi.ac.uk/biosamples/; EBI-ENA standards: https://www.ebi.ac.uk/ena/browser/about/data-standards; DDBJ: https://www.ddbj.nig.ac.jp/biosample/submission-e.html) to utilize the latest major release, ensuring backwards compatibility with previous MIxS versions and providing MIxS compliant metadata templates (NCBI: https://submit.ncbi.nlm.nih.gov/biosample/template/; ENA Sample checklists: https://www.ebi.ac.uk/ena/browser/checklists) to enable the selection of GSC checklists and extensions.

Form-based interfaces are used with different implementations across the INSDC repositories to collect information about samples using MIxS. Typically, the submitter is asked to provide both checklist and extension, and the combination of these determines which terms are provided, and what the constraints on these terms are. Data submission to EBI can be made via WebIn (https://www.ebi.ac.uk/ena/submit/webin/), while NCBI offers a different online submission tool (https://submit.ncbi.nlm.nih.gov/biosample/template/). DDBJ also offers the MIxS checklists as pre-formatted template files to be uploaded to their submission portal D-Way (https://ddbj.nig.ac.jp/D-way/). Similarly, for other databases such as GOLD, the implementation is through a web interface where a submitter chooses a MIxS checklist and extension. These different implementations are supported by expert curators to validate terms and ensure compliance. Alternatively, the NMDC uses a specialized data submission tool called DataHarmonizer [30], which provides real-time validation to users and aims to lower barriers for metadata submission (Fig. 3).

A screenshot displays the home page of the import X L S X file. It includes sample I D, source material identifier, analysis or data type, sample linkage, and broad-scale environment context. The left page displays column help options including column name, description, guidance, and example. — **Fig. 3**

The different submission systems available can benefit the larger research ecosystem by providing resources via familiar settings and user interfaces for a variety of different users. However, the multiple interfaces may also present challenges for researchers who may not be familiar with the MIxS standard, as they offer multiple interpretations of the same terms. Regardless of the method used for submitting metadata, it is useful for researchers to be aware of the MIxS standard to help prospectively record the required information. For example, when carrying out a study involving soil biogeochemical analysis, we recommend reviewing the MIxS soil extension (see below) to ensure that all measurements can be mapped to a term and to plan for capturing these measurements prospectively, so they can be easily included as part of a submission.

7.4 How to Specify Sample Environments Using the EnvO Ecosystem Classification

The MIxS standard uses many different ontologies for different terms, as described previously. For environmental samples, the key ontology is the Environment Ontology (EnvO [4]), a community-led domain ontology that represents diverse environments and aims to promote standardization and interoperability through concise, controlled descriptions of environment types across several levels of granularity. It also ensures that datasets described using EnvO terms can be more easily integrated and analyzed in a reproducible manner. Since the meanings of the terms are precisely defined and accessible, humans and computers can easily connect EnvO terms across datasets. EnvO also serves as a bridge to other standards and vocabularies in the environmental sciences, including mappings to the SWEET vocabulary [31].

MIxS uses EnvO as a set of three mandatory terms in all extensions to specify the biome, environmental feature, and environmental material, colloquially referred to as the “EnvO triad.” These three terms are described as follows:

Broad-scale environmental context (MIXS:0000012): The major environmental system (e.g., EnvO’s biome) that the sample or specimen derived from. The biome identified should have a coarse grain, meaning this is the largest breadth of a general environment from which the sampling was done. For example, the terrestrial biome is defined as “a biome which is primarily or completely situated on a landmass,” ENVO:00000446.
Local environmental context (MIXS:0000013): A more direct expression of the sample or specimen’s local vicinity, which likely has a significant influence on the sample or specimen. Taking the above terrestrial biome sample, a local environmental context could be an area of evergreen forest which is defined as “an area of a the planet's surface which is primarily covered by a forest in which the majority of trees maintain their foliage despite seasonal change,” ENVO:01000843.
Environmental medium (MIXS:0000014): The environmental material(s) immediately surrounding your sample or specimen prior to sampling. Subclasses within EnvO’s environmental material class (http://purl.obolibrary.org/obo/ENVO_00010483) should be used as values for this term. Using the previous example, a soil sample collected from an evergreen forest would simply use the environmental medium soil, defined as “environmental material which is primarily composed of minerals, varying proportions of sand, silt, and clay, organic material such as humus, interstitial gases, liquids, and a broad range of resident micro- and macroorganisms,” ENVO:00001998.

For host-associated samples, terms from a relevant anatomy ontology (UBERON for animals and PO for plants) can be used for the local environmental context. EnvO provides a detailed description with usage notes and general considerations (https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS) to further guide researchers.

8 A Primer on Using MIxS: The MIMS Checklist and Soil Extension

To illustrate the usage of MIxS, we outline below an example sample from NSF's NEON soil collection (https://data.neonscience.org/data-products/DP1.10107.001) that was sequenced through the Department of Energy’s (DOE) Joint Genome Institute under Award DOI 10.46936/10.25585/60008738. Since this is a soil metagenome sample, the MIMS checklist is used along with the soil extension with the full set of terms in the combination (https://genomicsstandardsconsortium.github.io/mixs/MimsSoil/). The soil metagenome sample name is “Terrestrial soil microbial communities from Great Basin, Onaqui, Utah, USA - ONAQ_008-M-20210524-comp-1.” This example is available through NCBI’s BioSample resource (https://www.ncbi.nlm.nih.gov/biosample/SAMN37862680), along with records available through the NMDC Data Portal (https://data.microbiomedata.org/details/sample/nmdc:bsm-11-357mga60) and GOLD (https://gold.jgi.doe.gov/biosamples?id=Gb0356145). Figure 4 shows the soil metagenome example deposited in NCBI’s BioSample repository and how metadata has been populated to conform to the MIxS standard.

A screenshot displays the National Library of Medicine options. It includes identifiers, organisms, packages, attributes, description links, submission, and accession. — **Fig. 4**

The MIMS checklist, together with the soil extension, contains a combined total of 97 terms to describe the soil environment and context for a given sample (https://github.com/GenomicsStandardsConsortium/mixs/releases/tag/v6.2.0). Of these 97 terms, the cardinality indicates whether the terms are mandatory, recommended, or optional according to LinkML syntax documentation (https://linkml.io/linkml/schemas/slots.html#slot-cardinality). Accordingly, Table 4 provides the mandatory and recommended terms for this example, although we note that not all recommended terms have been submitted. Additionally, optional terms for additional metadata such as pH, soil horizon, and water content are shown. Term descriptions and formatting guidelines are provided at https://genomicsstandardsconsortium.github.io/mixs/MimsSoil/, and adhering to this standard ensures metadata is interoperable and machine-readable.

Table 4 The soil metagenome sample “Terrestrial soil microbial communities from Great Basin, Onaqui, Utah, USA - ONAQ_008-M-20210524-comp-1” with MIxS compliant metadata using the combination of the MIMS checklist and soil extension. Mandatory terms are indicated with the cardinality 1..1, while 0..1 are recommended terms

Full size table

As previously mentioned, the use of ontologies in MIxS supports the standardization of terms and can be leveraged for comparative (meta)genome analyses. An example using the MIMS checklist and soil extension is demonstrated in the NMDC Data Portal with the term environmental medium (MIXS:0000014), which can be used to identify samples across diverse soil types (Fig. 5). In this example, samples from pasture soil (ENVO:00005773), tropical soil (ENVO:00005778), meadow soil (ENVO:00005761), grassland soil (ENVO:00005750), and alpine soil (ENVO:00005741) can be identified and selected for comparative analyses across four separate studies and 187 biosample records. Using these EnvO terms, researchers have the ability to search and access data across diverse soil types for downstream meta-analyses of these metagenomes.

A screenshot displays the environmental ontology page. The right pane displays the active query terms including environmental medium. The middle page displays the environmental medium under the omics option. A map layout is provided on the left screen. — **Fig. 5**

9 Discussion

9.1 How to Contribute to Future Development of the MIxS Standard

The GSC engages researchers from around the globe and welcomes feedback from the community on every aspect of the MIxS standard. Researchers can directly submit GitHub tickets to the GSC GitHub issue tracker (https://github.com/GenomicsStandardsConsortium/mixs/issues) to propose changes or request new checklists, extensions, or terms; to correct errors in the standard; or to ask questions on how to apply MIxS.

Although an individual may submit requests for new or updated MIxS checklists or extensions, additions or updates are generally created by a community working within a specific research area or with a specific type of genetic or genomic data. Sometimes, a community may simply need to add new terms to an existing extension or checklist, or they may need to create an entirely new one that reuses some existing terms but requires many new terms to cover a new topic area. Often, research communities reach out to the GSC after having identified a set of metadata terms they need, but we encourage them to reach out to the GSC as early as possible to co-develop the expansion. To do so, community representative(s) should contact the GSC Compliance and Interoperability Group to set up an initial consultation. This can be done via the MIxS GitHub repository issue tracker or by emailing directly at gensc-cig@googlegroups.com. The GSC will guide the community through the process, but, in general, it will involve the following steps:

1.
Complete a project proposal using the template (https://www.gensc.org/pages/projects/gsc-project-description-template.html).
2.
Select any appropriate existing MIxS terms from the extensive catalog of terms already defined (https://genomicsstandardsconsortium.github.io/mixs/term_list/).
3.
Propose any new terms required by the community for the new checklist or extension using the GitHub issue tracker template (https://github.com/GenomicsStandardsConsortium/mixs/issues/new?template=term-request.md).
4.
If any existing terms need refinement of their examples, required or recommended rules, or comments for use in the new project, those changes should also be submitted as GitHub issues.
5.
The GSC’s Compliance and Interoperability Group will review requests and liaise with the extended community to ensure that there is a legitimate need for the new checklist or extension and that terms are appropriately defined with suitable expected values.
6.
Once consensus is reached between the community representatives and the GSC, the new terms will be incorporated into a MIxS release candidate for review by the GSC, repositories that implement GSC, and the general public.
7.
New checklists, extensions, and terms only become officially part of MIxS once they are approved as part of a major release. Communities may begin to use them and their identifiers prior to the release, but must be aware that they are subject to change until approved by the broader GSC community.

As a community standard, the GSC is committed to including and incorporating feedback and providing updates to meet community needs. The NMDC has helped facilitate these updates through user research. Upon discussion with subject matter experts, some terms, such as “climate environment” (MIXS:0001040), have been identified as being ambiguous or redundant and will be deprecated in a future version of MIxS. Additionally, concerns have been raised about the widespread tolerance of open-ended units for some terms, like depth (MIXS:0000018). Based on this feedback, some terms will be identified as requiring values in specific units in future MIxS releases (for example, depth will be required in meters). This change will improve interoperability, ensure consistent capture, lower confusion, and improve machine readability.

Outreach and community collaboration are supported through the GSC annual in-person meetings, which facilitate discussions among GSC board members, event attendees, and local researchers. These meetings consist of updates from the GSC and its working groups, talks centered around implementing standards, and workshops aimed at promoting practical skills for adopting best practices in standards. Themes for each annual meeting are devised to facilitate discussion and solutions to burgeoning data standards needs. Toward that aim, the GSC rotates the annual meetings among areas of the world, providing opportunities for engagement between the GSC and diverse local researchers and students.

Community members are encouraged to join two GSC-led working groups. The Compliance and Interoperability Group meets virtually monthly to discuss proposed changes to MIxS checklists, extensions, or specific terms with a focus on biological topics. The Technical Working Group meets twice a month and is focused on technical implementation and software development of the GSC standards, such as LinkML and ontologies. Both working groups are open to new participants, regardless of familiarity with the GSC, technical expertise, and level of continued participation. The GSC Google Group (https://groups.google.com/u/0/g/genomic-standards-consortium/about) is maintained for GSC members and the larger community. It distributes GSC-related emails, and the group provides information on upcoming meetings and provides a place for discussion on standards and GSC activities. To request to join one of the working groups, please send a message to the GSC Google Group.

The GSC continues to emphasize community engagement and educational initiatives across genomic research communities. Participation in national and international genomic research conferences (Intelligent Systems for Molecular Biology [ISMB], International Society for Microbial Ecology [ISME], American Society for Microbiology [ASM]) provides opportunities to connect with diverse genomic researchers. The breadth of environments studied and groups performing genome sequencing has continued to grow over the past two decades, and the GSC strives to engage new communities to support their growing metadata standardization needs and to facilitate ever greater reuse and discoverability of genomic datasets.

9.2 Beyond Sequence Data Standards: Metabolomics and Proteomics

Community-supported consortia have formed around developing standards for additional data types beyond (meta)genomics data. For example, researchers published several articles on establishing standard reporting requirements for metabolomics data in a formative volume of the journal Metabolomics [32], and the Human Proteome Organization (HUPO) developed a number of modules (https://www.psidev.info/miape) for reporting the minimum information about a proteomics experiment [33]. However, in the years since their development, the challenges associated with maintaining these standards and supporting community adoption have become clear. The Metabolomics Standards Initiative’s ontology and reporting standard (https://github.com/MSI-Metabolomics-Standards-Initiative) are not regularly updated, and analysis of data repositories revealed poor compliance with the established metabolomics standards [34, 35]. Similarly, HUPO’s proteomics standards have not been updated for a number of years.

The GSC has engaged members of the metabolomics community to begin exploring potential improvements to the standard terms, ontologies, and reporting formats that could be implemented to encourage broader adoption, particularly among the growing community of researchers generating metabolomics and genomics data from the same sample (i.e., multi-omics). In practice, the convention established by the GSC will be followed, such that a researcher would combine an extension and a checklist to provide context about their sample and the descriptors and standard terms necessary for a metabolomics experiment, respectively. The metabolomics checklist(s) will leverage previous Metabolomics Standards Initiative efforts and will encompass descriptors for sample processing (e.g., solvent extraction, derivatization), instrument analysis (e.g., chromatography separation, ionization source), and data processing (e.g., normalization, peak picking).

9.3 Partnerships and Alignment with Other Standards

The GSC is committed to partnering with other programs and initiatives to further the reach of the MIxS standard and lower barriers to adoption. The GSC has partnered with the NMDC to host training events and communicate the value of microbiome data standards. As part of this collaboration, the NMDC provides extensive MIxS and data standards training to the annual cohort of NMDC Ambassadors, who are then tasked with hosting their own workshops and events to distribute this information and provide hands-on experiences with MIxS for microbiome data [36]. The GSC is planning to extend these activities with additional partners, such as the NFDI4Microbiota consortium (https://nfdi4microbiota.de/), which also offers a range of training courses relating to FAIR microbiome data generation, processing, and deposition. The GSC also meets regularly with the human biomedical-centric organization Global Alliance for Genomics and Health (GA4GH, https://www.ga4gh.org/) to ensure that our work on different standards is aligned and non-redundant.

Due to its broad scope, covering sequencing and other omics techniques, sample preparation, and sample sources from natural environments, human and animal subjects, experiments, and manufactured products, there is naturally overlap between MIxS and other standards in domains such as genomics, environmental science, and biodiversity. The GSC’s strategy is to provide robust curated mappings between these standards using frameworks such as the Simple Standard for Sharing Ontological Mappings (SSSOM) [37]. This strategy is exemplified by recent work to align MIxS with standards used in biodiversity informatics. The leading standards body is the Biodiversity Information Standards Group (TDWG, https://www.tdwg.org/), which produces the Darwin Core standard [38] used by the biodiversity community in databases such as the Global Biodiversity Information Facility (GBIF, https://www.gbif.org/). The GSC has signed a memorandum of understanding with TDWG which states a commitment to maintaining a shared mapping between the two groups’ vocabularies (MIxS and Darwin Core) [39].

Other standards bodies of relevance include the Global Alliance for Genomics and Health (GA4GH) [40], which publishes the Phenopackets standard for representing metadata about patients and genomics research subjects [41]. Although the emphasis is on healthcare and research, there are many aspects where this standard can relate to MIxS host-associated extensions, including the representation of samples and their source tissues, the disease status of the source, as well as drug exposures or therapies. Aligning these and other standards in the clinical domain has yet to commence but will be valuable for the global interoperability of genomics data.

10 Conclusion

This chapter provides a practical overview of the MIxS standard with the aim to support future use and development for FAIR comparative (meta)genome analysis. The structure and terminology presented derive from the most recent MIxS version 6.2 and will continue to evolve with future versions as a community-driven standard. We encourage the research community to continue working with the GSC and partner organizations like the NMDC to champion the use of standards to enable data discovery and research innovation.

References

Wilkinson MD, Dumontier M, Aalbersberg IJJ et al (2016) The FAIR guiding principles for scientific data management and stewardship. Sci Data 3:160018. https://doi.org/10.1038/sdata.2016.18
Article PubMed PubMed Central Google Scholar
Field D, Amaral-Zettler L, Cochrane G et al (2011) The genomic standards consortium. PLoS Biol 9:e1001088. https://doi.org/10.1371/journal.pbio.1001088
Article CAS PubMed PubMed Central Google Scholar
Yilmaz P, Kottmann R, Field D et al (2011) Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol 29:415–420. https://doi.org/10.1038/nbt.1823
Article CAS PubMed PubMed Central Google Scholar
Buttigieg PL, Pafilis E, Lewis SE et al (2016) The environment ontology in 2016: bridging domains with increased scope, semantic density, and interoperation. J Biomed Semant 7:57. https://doi.org/10.1186/s13326-016-0097-6
Article Google Scholar
Mungall CJ, Torniai C, Gkoutos GV et al (2012) Uberon, an integrative multi-species anatomy ontology. Genome Biol 13:R5. https://doi.org/10.1186/gb-2012-13-1-r5
Article PubMed PubMed Central Google Scholar
Huttenhower C, Finn RD, McHardy AC (2023) Challenges and opportunities in sharing microbiome data and analyses. Nat Microbiol. https://doi.org/10.1038/s41564-023-01484-x
Kyrpides NC, Eloe-Fadrosh EA, Ivanova NN (2016) Microbiome data science: understanding our microbial planet. Trends Microbiol 24:425–427. https://doi.org/10.1016/j.tim.2016.02.011
Article CAS PubMed Google Scholar
Almeida A, Nayfach S, Boland M et al (2021) A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol 39:105–114. https://doi.org/10.1038/s41587-020-0603-3
Article CAS PubMed Google Scholar
Forster SC, Kumar N, Anonye BO et al (2019) A human gut bacterial genome and culture collection for improved metagenomic analyses. Nat Biotechnol 37:186–192. https://doi.org/10.1038/s41587-018-0009-7
Article CAS PubMed PubMed Central Google Scholar
Seshadri R, Leahy SC, Attwood GT et al (2018) Cultivation and sequencing of rumen microbiome members from the Hungate1000 collection. Nat Biotechnol 36:359–367. https://doi.org/10.1038/nbt.4110
Article CAS PubMed PubMed Central Google Scholar
Choi J, Yang F, Stepanauskas R et al (2017) Strategies to improve reference databases for soil microbiomes. ISME J 11:829–834. https://doi.org/10.1038/ismej.2016.168
Article PubMed Google Scholar
Woodcroft BJ, Singleton CM, Boyd JA et al (2018) Genome-centric view of carbon processing in thawing permafrost. Nature 560:49–54. https://doi.org/10.1038/s41586-018-0338-1
Article CAS PubMed Google Scholar
A functional microbiome catalog crowdsourced from North American rivers. https://doi.org/10.1101/2023.07.22.550117
Sunagawa S, Acinas SG, Bork P et al (2020) Tara Oceans: towards global ocean ecosystems biology. Nat Rev Microbiol 18:428–445. https://doi.org/10.1038/s41579-020-0364-5
Article CAS PubMed Google Scholar
Arita M, Karsch-Mizrachi I, Cochrane G (2021) The international nucleotide sequence database collaboration. Nucleic Acids Res 49:D121–D124. https://doi.org/10.1093/nar/gkaa967
Article CAS PubMed Google Scholar
Eloe-Fadrosh EA, Ahmed F, Anubhav A et al (2021) The National Microbiome Data Collaborative Data Portal: an integrated multi-omics microbiome data resource. Nucleic Acids Res 50:D828–D836. https://doi.org/10.1093/nar/gkab990
Article CAS PubMed Central Google Scholar
Mukherjee S, Stamatis D, Li CT et al (2023) Twenty-five years of Genomes OnLine Database (GOLD): data updates and new features in v.9. Nucleic Acids Res 51:D957–D963. https://doi.org/10.1093/nar/gkac974
Article CAS PubMed Google Scholar
McMurry JA, Juty N, Blomberg N et al (2017) Identifiers for the 21st century: how to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data. PLoS Biol 15:e2001414. https://doi.org/10.1371/journal.pbio.2001414
Article CAS PubMed PubMed Central Google Scholar
Bowers RM, Kyrpides NC, Stepanauskas R et al (2017) Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol 35:725–731. https://doi.org/10.1038/nbt.3893
Article CAS PubMed PubMed Central Google Scholar
Roux S, Adriaenssens EM, Dutilh BE et al (2019) Minimum information about an uncultivated virus genome (MIUViG). Nat Biotechnol 37:29–37. https://doi.org/10.1038/nbt.4306
Article CAS PubMed Google Scholar
Field D, Garrity G, Gray T et al (2008) The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol 26:541–547. https://doi.org/10.1038/nbt1360
Article CAS PubMed PubMed Central Google Scholar
Food and Agriculture Organization of the United Nations (2018) World reference base for soil resources 2014: International soil classification system for naming soils and creating legends for soil maps - update 2015. Food & Agriculture Org
Google Scholar
Hoyt CT, Balk M, Callahan TJ, Domingo-Fernández D (2022) Unifying the identification of biomedical entities with the bioregistry. Sci Data 9:714
Article PubMed PubMed Central Google Scholar
Jackson RC, Matentzoglu N, Overton JA et al (2021) OBO foundry in 2021: operationalizing open data principles to evaluate ontologies. bioRxiv 2021.06.01.446587
Google Scholar
Whetzel PL, Noy NF, Shah NH et al (2011) BioPortal: enhanced functionality via new web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res 39:W541–W545. https://doi.org/10.1093/nar/gkr469
Article CAS PubMed PubMed Central Google Scholar
Jupp S, Burdett T, Malone J, et al A new ontology lookup service at EMBL-EBI. http://ceur-ws.org/Vol-1546/paper_29.pdf. Accessed 3 Jan 2023
Ong E, Xiang Z, Zhao B et al (2016) Ontobee: a linked ontology data server to support ontology term dereferencing, linkage, query and integration. Nucleic Acids Res gkw918. https://doi.org/10.1093/nar/gkw918
Jonquet C, Poveda-Villalon M (2023) About versioning ontologies or any digital objects with clear semantics
Google Scholar
Moxon S, Solbrig H, Unni D et al (2021) The linked data modeling language (LinkML): a general-purpose data modeling framework grounded in machine-readable semantics. In: 2021 international conference on biomedical ontologies, ICBO 2021. CEUR-WS, pp 148–151
Google Scholar
Gill IS, Griffiths EJ, Dooley D et al (2023) The DataHarmonizer: a tool for faster data harmonization, validation, aggregation and analysis of pathogen genomics contextual information. Microb Genom 9. https://doi.org/10.1099/mgen.0.000908
DiGiuseppe N, Pouchard LC, Noy NF (2014) SWEET ontology coverage for earth system sciences. Earth Sci Inf 7:249–264. https://doi.org/10.1007/s12145-013-0143-1
Article Google Scholar
Metabolomics. In: SpringerLink. https://link.springer.com/journal/11306/volumes-and-issues/3-3. Accessed 18 Oct 2023
Taylor CF, Paton NW, Lilley KS et al (2007) The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 25:887–893. https://doi.org/10.1038/nbt1329
Article CAS PubMed Google Scholar
Spicer RA, Salek R, Steinbeck C (2017) A decade after the metabolomics standards initiative it’s time for a revision. Sci Data 4:170138
Article PubMed PubMed Central Google Scholar
Kodra D, Pousinis P, Vorkas PA et al (2022) Is current practice adhering to guidelines proposed for metabolite identification in LC-MS untargeted metabolomics? A meta-analysis of the literature. J Proteome Res 21:590–598. https://doi.org/10.1021/acs.jproteome.1c00841
Article CAS PubMed Google Scholar
Kelliher JM, Rudolph M, Vangay P et al (2023) Cohort-based learning for microbiome research community standards. Nat Microbiol 8:751–753. https://doi.org/10.1038/s41564-023-01361-7
Article CAS PubMed Google Scholar
Matentzoglu N, Balhoff JP, Bello SM et al (2022) A simple standard for sharing ontological mappings (SSSOM). Database 2022. https://doi.org/10.1093/database/baac035
Wieczorek J, Bloom D, Guralnick R et al (2012) Darwin Core: an evolving community-developed biodiversity data standard. PLoS One 7:e29715. https://doi.org/10.1371/journal.pone.0029715
Article CAS PubMed PubMed Central Google Scholar
Meyer R, Appeltans W, Duncan WD et al (2023) Aligning standards communities for omics biodiversity data: sustainable Darwin Core-MIxS interoperability. Biodivers Data J 11:e112420. https://doi.org/10.3897/BDJ.11.e112420
Article PubMed PubMed Central Google Scholar
Rehm HL, Page AJH, Smith L et al (2021) GA4GH: International policies and standards for data sharing across genomic research and healthcare. Cell Genom 1. https://doi.org/10.1016/j.xgen.2021.100029
Jacobsen JOB, Baudis M, Baynam GS et al (2022) The GA4GH Phenopacket schema defines a computable representation of clinical data. Nat Biotechnol 40:817–820. https://doi.org/10.1038/s41587-022-01357-4
Article CAS PubMed PubMed Central Google Scholar
Kottmann R, Gray T, Murphy S et al (2008) A standard MIGS/MIMS compliant XML schema: toward the development of the Genomic Contextual Data Markup Language (GCDML). OMICS 12:115–121. https://doi.org/10.1089/omi.2008.0A10
Article CAS PubMed Google Scholar
Schriml LM, Munro JB, Schor M et al (2022) The human disease ontology 2022 update. Nucleic Acids Res 50:D1255–D1261. https://doi.org/10.1093/nar/gkab1063
Article CAS PubMed Google Scholar
Hastings J, Owen G, Dekker A et al (2016) ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res 44:D1214–D1219. https://doi.org/10.1093/nar/gkv1031
Article CAS PubMed Google Scholar
Cooper L, Jaiswal P (2016) The plant ontology: a tool for plant genomics. Methods Mol Biol 1374:89–114. https://doi.org/10.1007/978-1-4939-3167-5_5
Article CAS PubMed Google Scholar
Bandrowski A, Brinkman R, Brochhausen M et al (2016) The ontology for biomedical investigations. PLoS One 11:e0154556. https://doi.org/10.1371/journal.pone.0154556
Article CAS PubMed PubMed Central Google Scholar
Malone J, Holloway E, Adamusiak T et al (2010) Modeling sample variables with an experimental factor ontology. Bioinformatics 26:1112–1118. https://doi.org/10.1093/bioinformatics/btq099
Article CAS PubMed PubMed Central Google Scholar
Dooley DM, Griffiths EJ, Gosal GS et al (2018) FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration. NPJ Sci Food 2:23. https://doi.org/10.1038/s41538-018-0032-6
Article PubMed PubMed Central Google Scholar
Köhler S, Gargano M, Matentzoglu N et al (2021) The human phenotype ontology in 2021. Nucleic Acids Res 49:D1207–D1217. https://doi.org/10.1093/nar/gkaa1043
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Emiley A. Eloe-Fadrosh, Christopher J. Mungall, Mark Andrew Miller, Sujay Sanjeev Patil, Michael B. Thornton & Nomi L. Harris
Pacific Northwest National Laboratory, Richland, WA, USA
Montana Smith & Lee Ann McCue
Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
Julia M. Kelliher, Leah Y. D. Johnson, Francisca E. Rodriguez, Patrick S. G. Chain & Bin Hu
Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
Alice Carolyn McHardy
DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
T. B. K. Reddy & Supratim Mukherjee
GigaScience Press, Hong Kong Science Park, Pak Shek Kok, New Territories, Hong Kong
Christopher I. Hunter
Critical Path Institute, Tucson, AZ, USA
Ramona Walls
University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
Lynn M. Schriml

Authors

Emiley A. Eloe-Fadrosh
View author publications
You can also search for this author in PubMed Google Scholar
Christopher J. Mungall
View author publications
You can also search for this author in PubMed Google Scholar
Mark Andrew Miller
View author publications
You can also search for this author in PubMed Google Scholar
Montana Smith
View author publications
You can also search for this author in PubMed Google Scholar
Sujay Sanjeev Patil
View author publications
You can also search for this author in PubMed Google Scholar
Julia M. Kelliher
View author publications
You can also search for this author in PubMed Google Scholar
Leah Y. D. Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Francisca E. Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
Patrick S. G. Chain
View author publications
You can also search for this author in PubMed Google Scholar
Bin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Michael B. Thornton
View author publications
You can also search for this author in PubMed Google Scholar
Lee Ann McCue
View author publications
You can also search for this author in PubMed Google Scholar
Alice Carolyn McHardy
View author publications
You can also search for this author in PubMed Google Scholar
Nomi L. Harris
View author publications
You can also search for this author in PubMed Google Scholar
T. B. K. Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Supratim Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Christopher I. Hunter
View author publications
You can also search for this author in PubMed Google Scholar
Ramona Walls
View author publications
You can also search for this author in PubMed Google Scholar
Lynn M. Schriml
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emiley A. Eloe-Fadrosh .

Editor information

Editors and Affiliations

Department of Biochemistry, University of Sao Paulo, Sao Paulo, São Paulo, Brazil
João Carlos Setubal
Bioinformatics Group, Universität Leipzig, Leipzig, Sachsen, Germany
Peter F. Stadler
Technische Fakultät, Universität Bielefeld, Bielefeld, Germany
Jens Stoye

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Eloe-Fadrosh, E.A. et al. (2024). A Practical Approach to Using the Genomic Standards Consortium MIxS Reporting Standard for Comparative Genomics and Metagenomics. In: Setubal, J.C., Stadler, P.F., Stoye, J. (eds) Comparative Genomics. Methods in Molecular Biology, vol 2802. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3838-5_20

Download citation

DOI: https://doi.org/10.1007/978-1-0716-3838-5_20
Published: 01 June 2024
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-3837-8
Online ISBN: 978-1-0716-3838-5
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

A Practical Approach to Using the Genomic Standards Consortium MIxS Reporting Standard for Comparative Genomics and Metagenomics

Abstract

Key words

1 Introduction

2 Overview of the Structure and Terminology of MIxS

3 Checklists Describe Sampling and Sequencing Methods

4 Extensions Describe Sample and Sampling Contexts

5 Use of Ontologies and Value Sets

6 MIxS Versions

7 Methods

7.1 How to Access the MIxS Standard

7.2 How to Use the MIxS Standard for Data Use, Reuse, and Analysis

7.3 How to Use the MIxS Standard for Data Submission

7.4 How to Specify Sample Environments Using the EnvO Ecosystem Classification

8 A Primer on Using MIxS: The MIMS Checklist and Soil Extension

9 Discussion

9.1 How to Contribute to Future Development of the MIxS Standard

9.2 Beyond Sequence Data Standards: Metabolomics and Proteomics

9.3 Partnerships and Alignment with Other Standards

10 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation