Integration of genomics and metabolomics for prioritization of rare disease variants: a 2018 literature review
Many inborn errors of metabolism (IEMs) are amenable to treatment; therefore, early diagnosis and treatment is imperative. Despite recent advances, the genetic basis of many metabolic phenotypes remains unknown. For discovery purposes, whole exome sequencing (WES) variant prioritization coupled with clinical and bioinformatics expertise is the primary method used to identify novel disease-causing variants; however, causation is often difficult to establish due to the number of plausible variants. Integrated analysis of untargeted metabolomics (UM) and WES or whole genome sequencing (WGS) data is a promising systematic approach for identifying disease-causing variants. In this review, we provide a literature-based overview of UM methods utilizing liquid chromatography mass spectrometry (LC-MS), and assess approaches to integrating WES/WGS and LC-MS UM data for the discovery and prioritization of variants causing IEMs. To embed this integrated -omics approach in the clinic, expansion of gene-metabolite annotations and metabolomic feature-to-metabolite mapping methods are needed.
KeywordsMetabolomics Genomics Omic integration Inborn errors of metabolism Variant prioritization
Inborn errors of metabolism (IEMs) are the largest group of genetic diseases amenable to causal therapy, and are caused by genetic variants that disrupt the function of enzymes or other proteins involved in cellular metabolism, leading to energy deficit and/or accumulation of toxins (Van Bokhoven 2011; del Rosario et al 2012; Rauch et al 2012; Ellison et al 2013). Early diagnosis, enabled by newborn metabolic screening programs and genetics profiling, is pivotal so that treatment can be initiated before the onset of irreversible progressive damage to the central nervous system, which in some cases can result in intellectual disability disorder (IDD) and damage to additional organ systems.
There are currently more than 100 treatable IEMs, but for many phenotypes the genetic basis remains to be discovered (Van Karnebeek and Stockler 2012). Cases for which the causal gene was recently identified have in turn provided insights and opportunities for interventions targeting their downstream molecular or cellular abnormalities (Collins et al 2010; Horvath et al 2016; Karnebeek et al 2016). These efforts have been cataloged in the online resource IEMbase, which provides further information on the etiologies and treatment of over 500 IEM disorders (Blau et al 2014).
Whole exome sequencing (WES) is the primary tool for discovery of the genetic basis of IEMs, and thus establishment of a genetic-based diagnosis that, in some cases, can lead to improved outcomes through targeted interventions. The promise of this approach was illustrated by a recent neurometabolic gene discovery study (Tarailo-Graovac et al 2016), in which deep phenotyping and WES achieved a diagnostic yield of 68% in patients with unexplained phenotypes, identified novel human disease genes, and most importantly enabled targeted intervention for improved outcomes in 44% of the patients. Overall, published studies applying WES coupled with variant prioritization in patients with unexplained phenotypes are successful in identifying the underlying cause in 16 to 68% of patients (Tarailo-Graovac et al 2016).
However, with our current limited knowledge of variant pathogenicity and the impact of rare variants, variant prioritization algorithms that aim to completely automate the process of prioritization fail to identify the causal variant in a substantial number of patients. Further, variants that are identified as plausible often have a low level of supporting evidence, and are thus not adequate to establish a genetic-based diagnosis. Using multiple types of personalized “-omic” data is a promising approach to address the evidence gap in support of an IEM diagnosis. The integration of metabolomics data with WES/WGS data to identify genes causing IEM is a prime example of this approach. For example, a diagnosis of maple syrup urine disease can be supported by 1) pathogenic variants in either DBT, BCKDHB or BCKDHA, 2) high levels of amino acids, such as allo-isoleucine, isoleucine, leucine, and valine, and 3) branched-chain oxoacids (Strauss et al 2013). These biochemical biomarkers can be detected individually (targeted metabolomics), or as part of a broader characterization of the metabolome (untargeted metabolomics). Recently, the unbiased approach afforded by untargeted metabolomics has increased in popularity due to decreasing costs, lack of required parameter tuning, and opportunities for pathway analysis (Johnson et al 2016). In this literature review, we provide an overview (anno 2018) of WES-enabled variant prioritization, untargeted metabolomics methods utilizing liquid chromatography MS (LC-MS), and assess approaches to integrating WES/WGS and LC-MS untargeted metabolomics data for the discovery and prioritization of variants causing IEMs.
Overview of WES-enabled prioritization of causative variants
Bioinformatic-driven variant prioritization involves multiple filtering steps that incorporate prior knowledge about allele population frequency and predicted pathogenicity. Databases, such as ExAC, dbSNP, and gnomAD, provide information about allele frequencies seen in the general population, which are then used to filter out common and likely non-pathogenic variants in the patient (Smigielski et al 2000; Exome Aggregate Consortium 2016). Once identified as pathogenic through use of in silico prediction tools (such as PolyPhen-2 and SIFT), genomic data from the individual’s parents is then used to filter variants according to Mendelian models of inheritance, making the parents the individual’s “controls”(Ng and Henikoff 2003; Adzhubei et al 2013). These different types of controls allow for the isolation of pathogenic variants, and the assignment of mode of inheritance. However, it should be noted that some studies have questioned whether genomic databases may in fact contain individuals with disease-associated genotypes but no clinical presentation of the underlying disease at the time of the inclusion, as more than 2.8% of the ExAC population was found to carry likely/pathogenic genotypes reported in ClinVar, (Tarailo-Graovac et al 2017). Continued efforts to combine clinical and genetic data will play an important role in clarifying the pathogenicity and frequency of variants in genetic backgrounds.
Methods for acquisition and analysis of untargeted metabolomics data
Acquiring untargeted metabolomic data
In general, metabolomics quantifies a subset of small molecules (metabolites) in a tissue or body fluid using either nuclear magnetic resonance (NMR) spectroscopy or mass spectrometry (MS) (Johnson et al 2016). NMR spectroscopy quantifies solution-state molecular structures based on atom-centered nuclear interactions. NMR spectroscopy is inexpensive, capable of high throughput analysis, and highly reproducible; however, it lacks sensitivity and is generally only able to quantify metabolites of medium to high abundance. For this reason, MS based quantification has primarily been used in the context of IEM diagnosis and discovery.
In mass-spectrometry (MS) based quantification, metabolites are first chromatographically separated and quantified in a semi-quantitative manner using high resolution mass spectrometers in detection modes that measure both positive and negatively charged ions produced through electrospray ionization (ESI). MS separation techniques include liquid chromatography, capillary electrophoresis, gas chromatography, and ultra-performance liquid chromatography (Zhang et al 2012). No single chromatographic separation protocol can quantify all metabolites in a sample. Therefore, to completely capture all metabolites, multiple chromatographic methods must be used. For example, reverse-phase LC quantifies non-polar to slightly polar molecules, while hydrophilic interaction LC detects strongly polar to slightly polar molecules (Bajad et al 2006; Roberts et al 2012). This review will focus on liquid chromatography coupled MS (LC-MS), as it quantifies the widest range of metabolite polarity, and is widely used. When coupled with LC, the most common means of separation are reverse phase liquid chromatography (RPLC) for separation of hydrophobic metabolites, and hydrophilic interaction chromatography (HILIC) for the separation of hydrophilic metabolites (Zhou and Yin 2016). MS platforms commonly used for untargeted metabolomics studies include low resolution techniques, such as triple quadrupole (QQQ), quadrupole-ion trap (QIT), and high resolution techniques, such as quadrupole-time of flight (Q-TOF), quadrupole Orbitrap (Q-Orbitrap) and Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS).
Processing LC-MS data
Normalizing LC-MS data
To be biologically informative, raw intensities need to be corrected for a) batch effects, b) missing values, and c) inter-sample variation. This section describes standard approaches used for such normalizations.
As a first step, raw intensities of each feature produced from data processing packages typically need to be corrected for systematic variation due to batch effects. In metabolomics data, a common type of batch effect is “chemical drift”. This drift—caused by changes in signal that occur as metabolites interact with each other while waiting to be analyzed—can be corrected if QC samples are analyzed in between experimental samples (Vaikenborg et al 2009; Shen et al 2016). While these corrections are not always performed, they have been shown to minimize inter-batch variation (Alonso-herranz et al 2015). Dimensionality reduction approaches such as PCA allow for visualizing systematic trends in high-dimensional data, and thus are powerful tools for assessing batch effects.
Missing values can result from a variety of processes, and thus require a nuanced approach. Specifically, a missing value, which is an intensity of zero or infinity, can be created from a metabolite existing in one sample but 1) not existing in another, 2) existing at a concentration below an instrument’s limit of detection or 3) existing at a concentration above an instrument’s limit of detection. The problem of missing values can best be improved by increasing the sensitivity of detection of the MS platform. Numerous strategies have been developed to reduce missing values through a group of analytic techniques called missing value imputation (MVI). The utility of these techniques has empirically been found to depend on whether univariate or multivariate techniques are used to detect differentially abundant features (Karpievitch et al 2012).
Subsequently to above, both sample-wise and feature-wise normalization methods that concurrently consider multiple samples, are typically applied to adjust for technical and biological variation. Sample-wise normalization methods involve constructing scaling factors for each sample, and include quantile, linear baseline, total ion count (TIC), and LOESS normalization (Wu and Li 2016). These methods adjust for technical factors that may have affected the entire sample. Feature-wise normalization methods involve constructing scaling factors for each feature, and include centering, scaling, and transformations (Bolstad et al 2003; van den Berg et al 2006). These approaches minimize the intensity differences between metabolites with low or high abundance, allowing relative perturbations of each metabolite to be compared. Usually, both sample-wise and feature-wise normalization methods are applied during pre-processing; however, the type of each is data-set dependent, as no gold standard approach exists.
Testing for significant features in IEM studies
In a typical experimental design relevant to IEMs, one typically measures metabolomics data for a set of patients only or a set of patients and some controls (i.e., case/control design). Because each case is likely unique (i.e., may represent a unique disease caused by a rare genetic variant), data is usually analyzed for one patient at a time and compared against a) controls or b) other patients. Both parametric (e.g., t-test, ANOVA) and non-parametric (e.g., Mann-Whitney U-test, Wilcoxon-signed rank and Kruskal-Wallis) tests can be used to identify differentially abundant features in a given patient sample. When pursuing parametric tests, which typically have more statistical power compared to non-parametric tests, care must be taken to transform data so that it is distributed according to the expectation of the test (e.g., Gaussian for t-test). Correction for multiple testing must also be performed in a way that balances the generation of false positives and false negatives. In contrast to studies of common disease, with this type of analysis, one seeks to identify “outlier” features, as they highlight abnormal metabolites that may be pathogenic. Thus, availability of biological and technical replicates is important in confirming that a given metabolite value is a “biological” outlier, rather than an artifact of technical variation.
In metabolomic studies, selection of “control” samples (or comparators) that are as similar as possible to the patient being studied is paramount to reducing noise. This is difficult due to the numerous factors that influence the metabolome (i.e., age, sex, ethnicity, food consumption, and time of day). Selection of controls often depends on patient availability, and the type of bio-fluid analyzed; finding suitable controls is much easier for analysis of urine samples, and much more difficult for plasma and CSF samples, due to the relative ease at which these samples can be provided. Because the metabolome has been estimated to have a median heritability of approximately 50%, the trio structure has been suggested as a possible replacement for the classical case-control design, as it may enable the removal of metabolomic features attributable to non-disease related heritable phenotypes, allowing researchers to isolate features related to a specific neurometabolic phenotype. When IEMs are predominantly recessive, parents would likely show an abnormal profile for metabolites related to a heterozygous gene variant, which due to the effects of a suspected knock-down, would be magnified in the bi-allelic patient (Long et al 2017). However, due to inherent uncertainty in quantification, the significant impact of age, gender and diet, and varying heritability of each metabolite, the human metabolome needs to be explored further before the trio structure can be robustly used in this manner. Overall, like any other omics study of dynamic molecular traits, experimental designs that enable robust statistical adjustments for the effect of demographical and environmental factors are of key importance for identifying meaningful disease-associated metabolites. At the least, care should be taken to utilize metabolomic controls that share as many characteristics as possible with the population being studied.
Annotating features: adducts, isotopes, and metabolites
Once features have been identified, they can be annotated as an adduct or isotope of a particular metabolite. An adduct is an ionized metabolite that has become associated with another ion through electrospray ionization (ESI), most commonly H+, Na+, K+, and H20. An isotope is a metabolite that is composed of elements that are not in their most abundant form. A metabolite’s most abundant isotopic form generally corresponds to its most abundant features. The most readily quantifiable adduct depends on the chromatographic separation performed (Keller et al 2008). Annotation of isotopes and adducts corresponding to a particular metabolite reduces the multiple testing burden by enabling the removal of features belonging to the same metabolite. Removal of redundant features is performed at the discretion of the researcher, as no standard filtering approach exists. In Mzmine2 and MAVEN packages, adduct and isotope annotation is performed automatically, whereas if XCMS is utilized, an external package (such as CAMERA) must be used to make these annotations (Kuhl et al 2012).
The putative metabolite mass annotated in the peak-annotation step described earlier is used to map a specific feature to known metabolite(s). Databases that include mass, adduct, spectra, and structure data are then used to match metabolite masses to known metabolites that fall within the specific mass accuracy of the mass spectrometer used. Several such databases exist, such as the Human Metabolome Database (HMDB), Recon2, BioCyc, and METLIN (Petri and Schmidt-Dannert 2004; Smith et al 2005; Wishart et al 2007; Thiele et al 2013). The HMDB in particular contains information on endogenous, food-based and drug-related metabolites found in urine, CSF, and plasma of humans. The fact that only metabolites found in humans are profiled makes this database particularly useful for mapping features identified through untargeted metabolomic methods, as the entire database can be utilized without a priori knowledge of each metabolite’s origin. At the time of writing of this manuscript, it contains over 114,100 metabolites annotated with structure and chemical properties, a portion of which are also associated with specific genes (n = 5701). Of these metabolites, 19.5% have been detected in a biofluid, and 81.5% are predicted or expected. Its sister database, the Small Molecule Pathway Database (SMPDB), annotates a portion of genes and metabolites to specific small molecule pathways. Together, the HMDB and SMPDB facilitate biological interpretation at the gene and pathway level. Limitations of these databases include the relatively small number of detected and quantified metabolites, as well as the relevant paucity of genes annotated to both HMDB and SMPDB.
Confirming the “true” identity of a specific feature is challenging because each neutral mass can be annotated to multiple isobaric metabolites, (i.e., one-to-many mapping). Narrowing down the identity of a given feature is currently an active area of research (Li et al 2013; Pirhaji et al 2016). Public databases that contain metabolite masses and MS/MS spectra can assist in confirming metabolite identities, in cases where mass spectra are available (Wishart et al 2007). Additionally, “internal standards”, or radiolabeled compounds that can be easily identified through isotopic analysis, have been used to aid feature identification in targeted metabolomics as well as untargeted lipidomics (detection of all lipids in the metabolome), as they allow researchers to benchmark when certain ions elute over time (i.e., their retention time) (Sysi-Aho et al 2007; Ejigu et al 2013; Weindl et al 2015). Validation of the mapping between a feature and its assigned metabolite can be achieved by analyzing a purchased chemical standard through identical processing techniques, and comparing its m/z ratio, potential ion-source fragments, and retention time to that of an experimentally-derived feature.
Identifying IEMs through untargeted metabolomic analysis
The creation of processing tools and metabolomic databases has greatly facilitated the use of untargeted metabolomics in diagnosing IEMs. Both univariate and multivariate tests have been used to identify biomarkers of IEMs through untargeted metabolomics (Wikoff et al 2007; Dercksen et al 2013; Venter et al 2014; Najdekr et al 2015; Abella et al, 2016; Kennedy et al 2017; Pappan et al 2017). Many of these identified biomarkers have been added to newborn IEM screenings, enabling the detection of a wider variety of IEMs. Recently, untargeted metabolomics demonstrated its utility as a replacement for traditional newborn dry blood spot screenings, as it was successfully used to identify 20 of 21 IEMs (with each IEM represented by more than two patients) (Miller et al 2015). Challenges with this method include separating noise (unrelated food and environmental influences) from disease signal and identifying isobaric compounds. Because of these challenges, untargeted metabolomics alone is unlikely to usurp traditional genomics-based methods that identify causative genes for novel IEMs. However, cross-omic studies that integrate genomics and metabolomics have been initiated for this purpose. The benefits and drawbacks of integrating genetic and metabolomic data for the purpose of identifying both known and unknown IEMs are addressed in the subsequent section.
Integrating genomics and metabolomics
Development of models and algorithms for integrated analysis of genomics and metabolomics data is an active area of research. To perform an up-to-date review of published articles pertaining to the integration of genomics and metabolomics data, the Pubmed NCBI database and Google Scholar databases were queried for all articles published between Jan 1966 and July 2017 that contained the phrases “metabolomics” and “whole exome sequencing”, “metabolomics” and “whole genome sequencing” or “metabolomics” and “genomics”. Conference abstracts as well as articles utilizing targeted metabolomics were excluded, leaving a total of 17 articles for review.
Integration of genomic and metabolomic data has been performed for two primary purposes: to identify 1) metabolically active loci and 2) genes relevant to a disease phenotype. Most existing methods for combining genomic and metabolomic data conceptually follow from the former purpose. These studies aim to increase our understanding of genetically determined metabolic phenotypes at the population level. Specifically, population-based studies have combined genotyping microarray data (Gieger et al 2008; Hicks et al 2009; Illig et al 2010; Suhre et al 2011; Demirkan et al 2012; Tukiainen et al 2012; Kettunen et al 2012; Shin et al 2014; Draisma et al 2015; Rhee et al 2016; Long et al 2017) or WES/WGS data (Guo et al 2015; Yazdani et al 2016; Yu et al 2016) with metabolomics data, using quantitative trait loci (QTL) analysis, to metabolically identify loci and characterize the impact of common genetic variation (also known as heritability) on metabolite abundance. In so-called metabolite QTL (mQTL) analysis, linear regression is used to associate genetic variants with individual metabolite intensities (or metabolite ratios) for 2000 to 8000 subjects, allowing for the identification of metabolites associated with genetic loci. To reduce spurious associations, most studies restrict their analysis to common variants (e.g., MAF ≥ 5%) in addition to restricting the analysis to pairs of variants and metabolic loci that are located near each other (i.e., “cis” association analysis) (Tukiainen et al 2012). Such mQTL analyses have identified numerous disease biomarkers by associating variants in known disease-causing genes with metabolites. In all studies examined, loci strongly associated with metabolite intensities or ratios—termed “metabotypes”—have predominantly been found to map close to or in genes associated with enzymes, transporters, and regulators of metabolism, facilitating biological interpretation. Associations between variants in genes of unknown function and metabolites are more difficult to interpret, and have required additional experimental investigation.
Consistent with transcriptomic-based QTL studies (e.g., eQTL studies), it has been reported that, on average, genetic variation is a stronger predictor of metabolite variance across individuals, compared to demographic and symptom-based clinical covariates (Rhee et al 2013; Shin et al 2014). Heritability estimates have varied across classes of metabolites. Shin et al found the heritability of amino acids (e.g., carnosine, h2 = 0.86, P = 6.8 × 10−4) to be higher than lipids (e.g., lysophosphatidylcholine, h2 = 0.46, P = 2.0 × 10−7), and that of essential amino acids (mean h2 = 0.29) to be lower than non-essential amino acids (mean h2 = 0.53), suggesting that some metabolites are more influenced by genomic variation than others, as one might expect (Shin et al 2014).
Rare variants (0.5% ≤ minor allele frequency (MAF) ≥ 5%) have been found to have a larger effect size than common variants (MAF > 5%) (Long et al 2017). However, because association analysis (e.g., mQTL analysis) is statistically challenging and underpowered in the rare disease setting (e.g., when a variant has only been observed once), the effects of rare variants have primarily been studied manually by considering their predicted effects. Long et al identified the effect of 17 rare coding variants (SnpEff annotations, such as “stop”, “missense” or “frame”) by first manually identifying an outlier metabolite that based on biological plausibility could be affected by the variant, and then confirming the presence of this putative rare variant and outlier metabolite combination in at least one other sample (Long et al 2017). Guo et al examined the effect of rare coding variants by assessing the overlap of genes in perturbed metabolic pathways (i.e biochemical pathways with at least one outlier metabolite) with rare exon variants (Guo et al 2015). These studies indeed show that rare variants can have a large effect on metabolic variation, but the small number of rare variant-metabolite relationships yet identified suggest that clarifying their role in a systematic manner will likely require a more nuanced approach.
Studies aiming to explore rare disease using smaller sample sizes have used metabolomics in conjunction with curated biochemical knowledge for the second purpose (mentioned above): deriving disease-specific biological insights. In a “pathway based approach”, genes in enriched metabolic pathways were found to harbor variants that explained the patient’s biochemical phenotype (Guo et al 2015). Several studies have also reported untargeted metabolomics’ utility in quantifying gene-associated metabolites to provide evidence a variant is disease-causing (Gauba et al 2012; Abella et al, 2016; Pappan et al 2017). An example of this approach is our group’s recent study, which used untargeted metabolomics to demonstrate that a bi-allelic variant in N-acetylneuraminic acid synthase (NANS) in patients with infantile-onset severe developmental delay and skeletal dysplasia was reflected in high levels of N-acetylneuraminic acid (Van Karnebeek et al, 2016). Confirmation of high levels of this enzymatic substrate of NANS suggested that the clinical phenotype was likely caused by an enzymatic deficiency in NANS. Normalization of skeletal dysplasia in a zebrafish model with knocked-out nansa and nansb (zebrafish orthologs for human NANS) occurred after supplementation with sialic acid, shedding light on a possible treatment. These findings support the idea that metabolomics and genomics (i.e., mircorarrays/WES/WGS) can synergize and an integrated approach can be used to facilitate variant filtering and improve diagnosis and IEM discovery. As of yet, genome-wide integration has mainly been performed for exploratory purposes. Challenges facing the integration of these two -omic technologies must be addressed before use in clinical diagnostics.
Major challenges to the integration of genomics and LC-MS metabolomics
Challenges to streamlining the multi-omic approach for IEMs exist. These can broadly be divided into those concerning the technical aspects of metabolite quantification and identification, and those concerning the biological interpretation of results. On the technical side, it is currently impossible to know the number of unique metabolites in the typical plasma/CSF/urine metabolome, as no LC-MS protocol is capable of identifying all metabolites. This means that for experiments aiming to capture an unbiased snapshot of the metabolome, a combination of chromatography techniques must be used. Comparisons across platforms are difficult to make, as little is known about how results from different analytic techniques can be compared, although some efforts have been made (Büscher et al 2009; Yet et al 2016; Leuthold et al 2017). Further, only approximately 65% of metabolites are quantifiable in all three body fluids (plasma, urine, and CSF), indicating that care must also be taken to select the relevant biofluid (Kennedy et al 2017). Additionally, different pre-processing algorithms may have a large effect on feature detection and adduct annotation. This renders analysis reproducibility difficult. On the biological interpretation side, there is a lack of established methods for mapping a genomic perturbation to its downstream (directly and indirectly) impacted metabolites in the rare disease context. mQTL studies are underpowered, particularly for those caused by rare variants, making it difficult for small studies to form novel IEM insights. Finally, diagnostic analyses are limited by incomplete annotation of gene-metabolite associations in databases such as the HMDB (at time of writing, only 5701 genes are associated with metabolites in the HMDB).
Identified IEM genes, their functions, and number of associated metabolites (as listed in the Human Metabolome Database)
Number of annotated metabolites
CPT1A (carnitine O-palmitoyltransferase 1, liver isoform)
Catalyzes the transfer of the acyl group of long-chain fatty acid-CoA conjugates onto carnitine, an essential step for the mitochondrial uptake of long-chain fatty acids and their subsequent beta-oxidation in the mitochondrion
NANS (N-acetylneuraminate synthase)
Produces phosphorylated and unphosphorylated forms of N-acetylneuraminic acid (Neu5Ac) and 2-keto-3-deoxy-D-glycero-D-galacto-nononic acid (KDN)
DYRK1A (dual specificity tyrosine-phosphorylation-regulated kinase 1A)
SCN2A (sodium channel protein type 2 subunit alpha)
Sodium ion membrane transporter
In order for successful integration of genomics and LC-MS based metabolomics in the clinical diagnostics of IEMs, several areas of improvement must be addressed. First, existing feature detection and adduct/isotope annotation methods must be refined and benchmarked for use in clinical metabolomics. Several publicly available databases with known chemical compositions have been generated for this purpose (Kenar et al 2014). Second, exploration of gene-metabolite associations through mQTL studies followed by biochemical validation is needed to expand annotations in databases such as the HMDB. Due to the magnitude of this task, and relevance to other fields, a large-scale collaboration between geneticists and clinical chemists aiming to map the human genome onto the human metabolome may prove worthwhile. This type of effort may also elucidate which genetic perturbations are most easily detectable through metabolomic analysis; thereby, clarifying innate biases in integrated omics analyses. Third, additional computational methods that integrate genetics and metabolomics data must be explored. Toward this end, network models that take into account the interactions between genes and metabolites have been used to identify metabolic pathways perturbed in disease (Krumsiek et al 2012; Li et al 2013; Pirhaji et al 2016). So far, these methods have primarily been used to identify signatures in relatively common diseases. Building integrated genomic and metabolomic networks in the rare disease context may prove difficult; therefore, further research is needed.
We are grateful to Ms. Dora Pak and Ms. Evelyn Lomba for research management support. CvK is supported by a Michael Smith Foundation for Health Research Scholar Research Award. EG is supported by an NSERC Canada Graduate Student – Masters Award.
Details of funding
EG was supported by an NSERC scholarship. CvK is a Michael Smith Foundation for Health Research Scholar. SM is a CIFAR Child and Brain Development Fellow.
Compliance with ethical standards
Conflict of interest
- Abella L, Steindl K, Simmons L et al (2016) A combined metabolic-genetic approach to early-onset epileptic encephalopathies: results from a Swiss study cohort. Neuropediatrics. doi: 10.1055/s-0036-1583731Google Scholar
- Adzhubei I, Jordan DM, Sunyaev SR (2013) Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet. https://doi.org/10.1002/0471142905.hg0720s76
- Alonso-herranz JGV, Barbas C, Grace E (2015) Controlling the quality of metabolomics data : new strategies to get the best out of the QC sample. Metabolomics 518–528. doi: https://doi.org/10.1007/s11306-014-0712-4
- Blau N, Gibson MK, Marinus D, Dionisi-Vici C (2014) IEMBASE, a knowledgebase of inborn errors of metabolism. Mol Genet Metab 111:296Google Scholar
- Bertier G, Joly Y, Hétu M (2016) Unsolved challenges of clinical whole-exome sequencing: A systematic literature review of end-users’ views. Accepted 07/28/2016. BMC Med Genomics. doi: https://doi.org/10.1186/s12920-016-0213-6
- del Rosario M, Brunner HG, de Ligt J et al (2012) Diagnostic exome sequencing in persons with severe intellectual disability. N Engl J Med 121003140044006. doi: https://doi.org/10.1056/NEJMoa1206524
- Demirkan A, van Duijn CM, Ugocsai P et al (2012) Genome-wide association study identifies novel loci associated with circulating phospho- and sphingolipid concentrations. PLoS Genet. https://doi.org/10.1371/journal.pgen.1002490
- Exome Aggregate Consortium (2016) ExAC browser. http://exac.broadinstitute.org/Google Scholar
- Gauba R, Natarajan TG, Song L et al (2012) Metabolomic and exome sequence analysis reveal novel molecular signatures associated with colorectal cancer relapse. BMC Proc Conference: Beyond the Genome 2012, Boston Google Scholar
- Gieger C, Geistlinger L, Altmaier E et al (2008) Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum. PLoS Genet. https://doi.org/10.1371/journal.pgen.1000282
- Hicks AA, Pramstaller PP, Johansson Å et al (2009) Genetic determinants of circulating sphingolipid concentrations in European populations. PLoS Genet. https://doi.org/10.1371/journal.pgen.1000672
- Karpievitch YV, Dabney AR, Smith RD (2012) Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics 13(Suppl 16):S5. https://doi.org/10.1186/1471-2105-13-S16-S5
- Krumsiek J, Suhre K, Evans AM et al (2012) Mining the. A Systems Approach to Metabolite Identification Combining Genetic and Metabolic Information. PLoS Genet. https://doi.org/10.1371/journal.pgen.1003005
- Mahieu NG, Patti GJ (2017) Systems-level annotation of a metabolomics data set reduces 25 000 features to fewer than 1000 unique metabolites. Anal Chem. doi: https://doi.org/10.1021/acs.analchem.7b02380
- Nicholson G, Rantalainen M, Maher AD et al (2011) Human metabolic profiles are stably controlled by genetic and environmental variation. Mol Syst Biol. https://doi.org/10.1038/msb.2011.57
- Pappan KL, Kennedy AD, Magoulas P et al (2017) Clinical metabolomics to segregate aromatic amino acid decarboxylase deficiency from drug-induced metabolite elevations. Pediatr Neurol. doi: https://doi.org/10.1016/j.pediatrneurol.2017.06.014
- Petri R, Schmidt-Dannert C (2004) BioCyc—genome and metabolism. Angew Chem Int Ed 43:1908. https://doi.org/10.1002/anie.200483067
- Pirhaji L, Milani P, Leidl M et al (2016) Revealing disease-associated pathways by network integration of untargeted metabolomics. doi: https://doi.org/10.1038/nmeth.3940
- Rhee EP, Yang Q, Yu B et al (2016) An exome array study of the plasma metabolome 7:12360Google Scholar
- Roberts LD, Souza AL, Gerszten RE, Clish CB (2012) Targeted metabolomics. Curr Protoc Mol Biol. https://doi.org/10.1002/0471142727.mb3002s98
- Smith A, O’maille G, Want EJ et al (2005) METLIN a metabolite mass spectral database. Proc 9Th Int Congr Ther Drug Monit Clin Toxicol 27:747–751. https://doi.org/10.1097/01.ftd.0000179845.53213.39 CrossRefGoogle Scholar
- Strauss KA, Puffenberger EG, Morton DH (2013) Maple syrup urine disease. GeneReviews, University of Washington, Seattle, pp 93–97Google Scholar
- Tarailo-Graovac M, Shyr C, Ross CJ et al (2016) Exome sequencing and the management of neurometabolic disorders. N Engl J Med NEJMoa 1515792. doi: https://doi.org/10.1056/NEJMoa1515792
- Tarailo-Graovac M, Zhu JYA, Matthews A et al (2017) Assessment of the ExAC data set for the presence of individuals with pathogenic genotypes implicated in severe Mendelian pediatric disorders. Genet Med. https://doi.org/10.1038/gim.2017.50
- Van Bokhoven H (2011) Genetic and epigenetic networks in intellectual disabilities. Annu Rev Genet 45:81–104. https://doi.org/10.1146/annurev-genet-110410-132512 CrossRefPubMedGoogle Scholar
- Wishart DS, Tzur D, Knox C et al (2007) HMDB: the human metabolome database. Nucleic Acids Res. https://doi.org/10.1093/nar/gkl923
- Yet I, Menni C, Shin SY et al (2016) Genetic influences on metabolite levels: a comparison across metabolomic platforms. PLoS One. https://doi.org/10.1371/journal.pone.0153672
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.