Integration of Metabolomics and Transcriptomics to Identify Gene-Metabolite Relationships Specific to Phenotype

  • Andrew Patt
  • Jalal Siddiqui
  • Bofei Zhang
  • Ewy MathéEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1928)


Metabolomics plays an increasingly large role in translational research, with metabolomics data being generated in large cohorts, alongside other omics data such as gene expression. With this in mind, we provide a review of current approaches that integrate metabolomic and transcriptomic data. Furthermore, we provide a detailed framework for integrating metabolomic and transcriptomic data using a two-step approach: (1) numerical integration of gene and metabolite levels to identify phenotype (e.g., cancer)-specific gene-metabolite relationships using IntLIM and (2) knowledge-based integration, using pathway overrepresentation analysis through RaMP, a comprehensive database of biological pathways. Each step makes use of publicly available R packages ( and, and provides a user-friendly web interface for analysis. These interfaces can be run locally through the package or can be accessed through our servers ( and The goal of this chapter is to provide step-by-step instructions on how to install the software and use the commands within the R framework, without the user interface (which is slower than running the commands through command line). Both packages are in continuous development so please refer to the GitHub sites to check for updates.

Key words

Metabolomics Gene expression Pathway analysis Network Omics integration R packages Gene Metabolite 



This work was supported by funding from the National Cancer Institute (1R03CA222428-01) and the Ohio State University Translational Data Analytics Institute and startup funds by the Ohio State University to Ewy Mathé, by the Ohio State University Discovery Themes Foods for Health postdoctoral fellowship to Jalal Siddiqui, and by the National Institute of General Medical Sciences of the National Institutes of Health to Andy Patt (T32GM068412). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.


  1. 1.
    Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D (2015) Methods of integrating data to uncover genotype-phenotype interactions. Nat Rev Genet 16(2):85–97. Scholar
  2. 2.
    van Karnebeek CDM, Wortmann SB, Tarailo-Graovac M, Langeveld M, Ferreira CR, van de Kamp JM, Hollak CE, Wasserman WW, Waterham HR, Wevers RA, Haack TB, Wanders RJA, Boycott KM (2018) The role of the clinician in the multi-omics era: are you ready? J Inherit Metab Dis 41(3):571–582. Scholar
  3. 3.
    Gomez-Cabrero D, Abugessaisa I, Maier D, Teschendorff A, Merkenschlager M, Gisel A, Ballestar E, Bongcam-Rudloff E, Conesa A, Tegner J (2014) Data integration in the era of omics: current and future challenges. BMC Syst Biol 8(Suppl 2):I1. Scholar
  4. 4.
    Rohart F, Gautier B, Singh A, Le Cao KA (2017) mixOmics: an R package for 'omics feature selection and multiple data integration. PLoS Comput Biol 13(11):e1005752. Scholar
  5. 5.
    Fukushima A (2013) DiffCorr: an R package to analyze and visualize differential correlations in biological networks. Gene 518(1):209–214. Scholar
  6. 6.
    Su G, Burant CF, Beecher CW, Athey BD, Meng F (2011) Integrated metabolome and transcriptome analysis of the NCI60 dataset. BMC Bioinf 12(Suppl 1):S36. Scholar
  7. 7.
    Bradley PH, Brauer MJ, Rabinowitz JD, Troyanskaya OG (2009) Coordinated concentration changes of transcripts and metabolites in Saccharomyces cerevisiae. PLoS Comput Biol 5(1):e1000270. Scholar
  8. 8.
    Kuo TC, Tian TF, Tseng YJ (2013) 3Omics: a web-based systems biology tool for analysis, integration and visualization of human transcriptomic, proteomic and metabolomic data. BMC Syst Biol 7:64. Scholar
  9. 9.
    Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinf 9:559. Scholar
  10. 10.
    Siska C, Bowler R, Kechris K (2016) The discordant method: a novel approach for differential correlation. Bioinformatics 32(5):690–696. Scholar
  11. 11.
    Siddiqui JK, Baskin E, Liu M, Cantemir-Stone CZ, Zhang B, Bonneville R, McElroy JP, Coombes KR, Mathe EA (2018) IntLIM: integration using linear models of metabolomics and gene expression data. BMC Bioinf 19(1):81. Scholar
  12. 12.
    Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K (2017) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45(D1):D353–D361. Scholar
  13. 13.
    Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30CrossRefGoogle Scholar
  14. 14.
    Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M (2016) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44(D1):D457–D462. Scholar
  15. 15.
    Croft D, Mundo AF, Haw R, Milacic M, Weiser J, Wu G, Caudy M, Garapati P, Gillespie M, Kamdar MR, Jassal B, Jupe S, Matthews L, May B, Palatnik S, Rothfels K, Shamovsky V, Song H, Williams M, Birney E, Hermjakob H, Stein L, D'Eustachio P (2014) The Reactome pathway knowledgebase. Nucleic Acids Res 42(Database issue):D472–D477. Scholar
  16. 16.
    Kelder T, van Iersel MP, Hanspers K, Kutmon M, Conklin BR, Evelo CT, Pico AR (2012) WikiPathways: building research communities on biological pathways. Nucleic Acids Res 40(Database issue):D1301–D1307. Scholar
  17. 17.
    Kutmon M, Riutta A, Nunes N, Hanspers K, Willighagen EL, Bohler A, Melius J, Waagmeester A, Sinha SR, Miller R, Coort SL, Cirillo E, Smeets B, Evelo CT, Pico AR (2016) WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Res 44(D1):D488–D494. Scholar
  18. 18.
    Slenter DN, Kutmon M, Hanspers K, Riutta A, Windsor J, Nunes N, Melius J, Cirillo E, Coort SL, Digles D, Ehrhart F, Giesbertz P, Kalafati M, Martens M, Miller R, Nishida K, Rieswijk L, Waagmeester A, Eijssen LMT, Evelo CT, Pico AR, Willighagen EL (2018) WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Res 46(D1):D661–D667. Scholar
  19. 19.
    Wishart DS, Jewison T, Guo AC, Wilson M, Knox C, Liu Y, Djoumbou Y, Mandal R, Aziat F, Dong E, Bouatra S, Sinelnikov I, Arndt D, Xia J, Liu P, Yallou F, Bjorndahl T, Perez-Pineiro R, Eisner R, Allen F, Neveu V, Greiner R, Scalbert A (2013) HMDB 3.0--The Human Metabolome Database in 2013. Nucleic Acids Res 41(Database issue):D801–D807. Scholar
  20. 20.
    Wishart DS, Knox C, Guo AC, Eisner R, Young N, Gautam B, Hau DD, Psychogios N, Dong E, Bouatra S, Mandal R, Sinelnikov I, Xia J, Jia L, Cruz JA, Lim E, Sobsey CA, Shrivastava S, Huang P, Liu P, Fang L, Peng J, Fradette R, Cheng D, Tzur D, Clements M, Lewis A, De Souza A, Zuniga A, Dawe M, Xiong Y, Clive D, Greiner R, Nazyrova A, Shaykhutdinov R, Li L, Vogel HJ, Forsythe I (2009) HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res 37(Database issue):D603–D610. Scholar
  21. 21.
    Wishart DS, Tzur D, Knox C, Eisner R, Guo AC, Young N, Cheng D, Jewell K, Arndt D, Sawhney S, Fung C, Nikolai L, Lewis M, Coutouly MA, Forsythe I, Tang P, Shrivastava S, Jeroncic K, Stothard P, Amegbey G, Block D, Hau DD, Wagner J, Miniaci J, Clements M, Gebremedhin M, Guo N, Zhang Y, Duggan GE, Macinnis GD, Weljie AM, Dowlatabadi R, Bamforth F, Clive D, Greiner R, Li L, Marrie T, Sykes BD, Vogel HJ, Querengesser L (2007) HMDB: the human metabolome database. Nucleic Acids Res 35(Database issue):D521–D526. Scholar
  22. 22.
    Frolkis A, Knox C, Lim E, Jewison T, Law V, Hau DD, Liu P, Gautam B, Ly S, Guo AC, Xia J, Liang Y, Shrivastava S, Wishart DS (2010) SMPDB: the small molecule pathway database. Nucleic Acids Res 38(Database issue):D480–D487. Scholar
  23. 23.
    Jewison T, Su Y, Disfany FM, Liang Y, Knox C, Maciejewski A, Poelzer J, Huynh J, Zhou Y, Arndt D, Djoumbou Y, Liu Y, Deng L, Guo AC, Han B, Pon A, Wilson M, Rafatnia S, Liu P, Wishart DS (2014) SMPDB 2.0: big improvements to the small molecule pathway database. Nucleic Acids Res 42(Database issue):D478–D484. Scholar
  24. 24.
    Paley S, Karp PD (2017) Update notifications for the BioCyc collection of databases. Database (Oxford). 2017. Doi:
  25. 25.
    Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur O, Anwar N, Schultz N, Bader GD, Sander C (2011) Pathway commons, a web resource for biological pathway data. Nucleic Acids Res 39(Database issue):D685–D690. Scholar
  26. 26.
    Xia J, Sinelnikov IV, Han B, Wishart DS (2015) MetaboAnalyst 3.0--making metabolomics more meaningful. Nucleic Acids Res 43(W1):W251–W257. Scholar
  27. 27.
    Kamburov A, Cavill R, Ebbels TM, Herwig R, Keun HC (2011) Integrated pathway-level analysis of transcriptomics and metabolomics data with IMPaLA. Bioinformatics 27(20):2917–2918. Scholar
  28. 28.
    Gowda H, Ivanisevic J, Johnson CH, Kurczy ME, Benton HP, Rinehart D, Nguyen T, Ray J, Kuehl J, Arevalo B, Westenskow PD, Wang J, Arkin AP, Deutschbauer AM, Patti GJ, Siuzdak G (2014) Interactive XCMS Online: simplifying advanced metabolomic data processing and subsequent statistical analyses. Anal Chem 86(14):6931–6939. Scholar
  29. 29.
    Tautenhahn R, Patti GJ, Rinehart D, Siuzdak G (2012) XCMS Online: a web-based platform to process untargeted metabolomic data. Anal Chem 84(11):5035–5039. Scholar
  30. 30.
    Guijas C, Montenegro-Burke JR, Domingo-Almenara X, Palermo A, Warth B, Hermann G, Koellensperger G, Huan T, Uritboonthai W, Aisporna AE, Wolan DW, Spilker ME, Benton HP, Siuzdak G (2018) METLIN: a technology platform for identifying knowns and unknowns. Anal Chem 90(5):3156–3164. Scholar
  31. 31.
    Fahy E, Sud M, Cotter D, Subramaniam S (2007) LIPID MAPS online tools for lipid research. Nucleic Acids Res 35(Web Server issue):W606–W612. Scholar
  32. 32.
    P.J. Linstrom and W.G. Mallard 2018) NIST Chemistry WebBook, NIST Standard Reference Database Number 69. National Institute of Standards and Technology. Doi:
  33. 33.
    Horai H, Arita M, Kanaya S, Nihei Y, Ikeda T, Suwa K, Ojima Y, Tanaka K, Tanaka S, Aoshima K, Oda Y, Kakazu Y, Kusano M, Tohge T, Matsuda F, Sawada Y, Hirai MY, Nakanishi H, Ikeda K, Akimoto N, Maoka T, Takahashi H, Ara T, Sakurai N, Suzuki H, Shibata D, Neumann S, Iida T, Tanaka K, Funatsu K, Matsuura F, Soga T, Taguchi R, Saito K, Nishioka T (2010) MassBank: a public repository for sharing mass spectral data for life sciences. J Mass Spectrom 45(7):703–714. Scholar
  34. 34.
    Wanichthanarak K, Fan S, Grapov D, Barupal DK, Fiehn O (2017) Metabox: a toolbox for metabolomic data analysis, interpretation and integrative exploration. PLoS One 12(1):e0171046. Scholar
  35. 35.
    Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH (2016) PubChem substance and compound databases. Nucleic Acids Res 44(D1):D1202–D1213. Scholar
  36. 36.
    UniProt Consortium T (2018) UniProt: the universal protein knowledgebase. Nucleic Acids Res 46(5):2699. Scholar
  37. 37.
    Zerbino DR, Achuthan P, Akanni W, Amode MR, Barrell D, Bhai J, Billis K, Cummins C, Gall A, Giron CG, Gil L, Gordon L, Haggerty L, Haskell E, Hourlier T, Izuogu OG, Janacek SH, Juettemann T, To JK, Laird MR, Lavidas I, Liu Z, Loveland JE, Maurel T, McLaren W, Moore B, Mudge J, Murphy DN, Newman V, Nuhn M, Ogeh D, Ong CK, Parker A, Patricio M, Riat HS, Schuilenburg H, Sheppard D, Sparrow H, Taylor K, Thormann A, Vullo A, Walts B, Zadissa A, Frankish A, Hunt SE, Kostadima M, Langridge N, Martin FJ, Muffato M, Perry E, Ruffier M, Staines DM, Trevanion SJ, Aken BL, Cunningham F, Yates A, Flicek P (2018) Ensembl 2018. Nucleic Acids Res 46(D1):D754–D761. Scholar
  38. 38.
    Zhang B, Hu S, Baskin E, Patt A, Siddiqui JK, Mathe EA (2018) RaMP: a comprehensive relational database of metabolomics pathways for pathway enrichment analysis of genes and metabolites. Meta 8(1):E16. Scholar
  39. 39.
    Terunuma A, Putluri N, Mishra P, Mathe EA, Dorsey TH, Yi M, Wallace TA, Issaq HJ, Zhou M, Killian JK, Stevenson HS, Karoly ED, Chan K, Samanta S, Prieto D, Hsu TY, Kurley SJ, Putluri V, Sonavane R, Edelman DC, Wulff J, Starks AM, Yang Y, Kittles RA, Yfantis HG, Lee DH, Ioffe OB, Schiff R, Stephens RM, Meltzer PS, Veenstra TD, Westbrook TF, Sreekumar A, Ambs S (2014) MYC-driven accumulation of 2-hydroxyglutarate is associated with breast cancer prognosis. J Clin Invest 124(1):398–412. Scholar
  40. 40.
    Hernandez-Ferrer C, Ruiz-Arenas C, Beltran-Gomila A, Gonzalez JR (2017) MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration. BMC Bioinf 18(1):36. Scholar
  41. 41.
    Wickham H; Chang W (2016) devtools: tools to make developing R packages easier. R package version 1.11.1 ednGoogle Scholar
  42. 42.
    Fisher R (1950) Statistical methods for research workers. Oliver & Boyd, LondonGoogle Scholar
  43. 43.
    Huang DW, Sherman BT, Tan Q, Collins JR, Alvord WG, Roayaei J, Stephens R, Baseler MW, Lane HC, Lempicki RA (2007) The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol 8(9):R183. Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Andrew Patt
    • 1
  • Jalal Siddiqui
    • 1
  • Bofei Zhang
    • 1
  • Ewy Mathé
    • 1
    Email author
  1. 1.The Ohio State University College of MedicineColumbusUSA

Personalised recommendations