, Volume 11, Issue 6, pp 1492–1513 | Cite as

Computational and statistical analysis of metabolomics data

  • Sheng Ren
  • Anna A. Hinzman
  • Emily L. Kang
  • Rhonda D. Szczesniak
  • Long Jason Lu
Review Article


Metabolomics is the comprehensive study of small molecule metabolites in biological systems. By assaying and analyzing thousands of metabolites in biological samples, it provides a whole picture of metabolic status and biochemical events happening within an organism and has become an increasingly powerful tool in the disease research. In metabolomics, it is common to deal with large amounts of data generated by nuclear magnetic resonance (NMR) and/or mass spectrometry (MS). Moreover, based on different goals and designs of studies, it may be necessary to use a variety of data analysis methods or a combination of them in order to obtain an accurate and comprehensive result. In this review, we intend to provide an overview of computational and statistical methods that are commonly applied to analyze metabolomics data. The review is divided into five sections. The first two sections will introduce the background and the databases and resources available for metabolomics research. The third section will briefly describe the principles of the two main experimental methods that produce metabolomics data: MS and NMR, followed by the fourth section that describes the preprocessing of the data from these two approaches. In the fifth and the most important section, we will review four main types of analysis that can be performed on metabolomics data with examples in metabolomics. These are unsupervised learning methods, supervised learning methods, pathway analysis methods and analysis of time course metabolomics data. We conclude by providing a table summarizing the principles and tools that we discussed in this review.


Computational Statistical Unsupervised learning Supervised learning Pathway analysis Time course data 



We would like to express our great appreciation to Dr. Lilliam Ambroggio and Dr. Lindsey Romick-Rosendale for their valuable and constructive suggestions to our review. Their willingness to give their time so generously has been very much appreciated. This study is funded by the NIH Grant R01 HL116226 to RDS and LJL.

Compliance with Ethical Standards

Conflict of interest

Sheng Ren, Anna A. Hinzman, Emily L. Kang, Rhonda D. Szczesniak and L. Jason Lu declare that we have no conflict of interest and we have included separately signed conflict of interest forms in this manuscript.

Supplementary material

11306_2015_823_MOESM1_ESM.docx (1.4 mb)
Supplementary material 1 (DOCX 1395 kb)


  1. Abdi, H. (2010). Partial least squares regression and projection on latent structure regression (PLS regression). Wiley Interdisciplinary Reviews: Computational Statistics, 2, 97–106.CrossRefGoogle Scholar
  2. Agresti, A. (2014). Categorical data analysis. New York: Wiley.Google Scholar
  3. Anderson, P. E., Reo, N. V., DelRaso, N. J., Doom, T. E., & Raymer, M. L. (2008). Gaussian binning: A new kernel-based method for processing NMR spectroscopic data for metabolomics. Metabolomics, 4, 261–272.CrossRefGoogle Scholar
  4. Armitage, E. G., & Barbas, C. (2014). Metabolomics in cancer biomarker discovery: Current trends and future perspectives. Journal of Pharmaceutical and Biomedical Analysis, 87, 1–11.PubMedCrossRefGoogle Scholar
  5. Assfalg, M., et al. (2008). Evidence of different metabolic phenotypes in humans. Proceedings of the National Academy of Sciences, 105, 1420–1424.CrossRefGoogle Scholar
  6. Becker, S. A., Feist, A. M., Mo, M. L., Hannum, G., Palsson, B. Ø., & Herrgard, M. J. (2007). Quantitative prediction of cellular metabolism with constraint-based models: The COBRA Toolbox. Nature Protocols, 2, 727–738.PubMedCrossRefGoogle Scholar
  7. Beckonert, O., Monnerjahn, J., Bonk, U., & Leibfritz, D. (2003). Visualizing metabolic changes in breast-cancer tissue using 1H-NMR spectroscopy and self-organizing maps. NMR in Biomedicine, 16, 1–11.PubMedCrossRefGoogle Scholar
  8. Berk, M., Ebbels, T., & Montana, G. (2011). A statistical framework for biomarker discovery in metabolomic time course data. Bioinformatics, 27, 1979–1985. doi: 10.1093/bioinformatics/btr289.PubMedPubMedCentralCrossRefGoogle Scholar
  9. Bezdek, J. C., Coray, C., Gunderson, R., & Watson, J. (1981). Detection and characterization of cluster substructure i. linear structure: Fuzzy c-lines. SIAM Journal on Applied Mathematics, 40, 339–357.CrossRefGoogle Scholar
  10. Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.Google Scholar
  11. Blekherman, G., et al. (2011). Bioinformatics tools for cancer metabolomics. Metabolomics, 7, 329–343. doi: 10.1007/s11306-010-0270-3.PubMedPubMedCentralCrossRefGoogle Scholar
  12. Boulesteix, A.-L. (2004). PLS dimension reduction for classification with microarray data. Statistical Applications in Genetics and Molecular Biology, 3, 1–30.CrossRefGoogle Scholar
  13. Box, G. E., Hunter, W. G., & Hunter, J. S. (1978). Statistics for experimenters. New York: Wiley.Google Scholar
  14. Brereton, R. G., & Lloyd, G. R. (2010). Support vector machines for classification and regression. Analyst, 135, 230–267.PubMedCrossRefGoogle Scholar
  15. Broadhurst, D. I., & Kell, D. B. (2006). Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics, 2, 171–196.CrossRefGoogle Scholar
  16. Brockwell, P. J., & Davis, R. A. (2002). Introduction to time series and forecasting (Vol. 1). Boca Raton: Taylor & Francis.CrossRefGoogle Scholar
  17. Bu, H.-L., Li, G.-Z., Zeng, X.-Q., Yang, J. Y., & Yang, M. Q. (2007). Feature selection and partial least squares based dimension reduction for tumor classification. In Proceedings of the 7th IEEE international conference on bioinformatics and bioengineering, 2007 (BIBE 2007) (pp. 967–973). New York: IEEE.Google Scholar
  18. Bylesjö, M., Rantalainen, M., Cloarec, O., Nicholson, J. K., Holmes, E., & Trygg, J. (2006). OPLS discriminant analysis: Combining the strengths of PLS-DA and SIMCA classification. Journal of Chemometrics, 20, 341–351.CrossRefGoogle Scholar
  19. Cao, H., Dong, J., Cai, C., & Chen, Z. (2008). Investigations on the effects of NMR experimental conditions in human urine and serum metabolic profiles. In The 2nd international conference on bioinformatics and biomedical engineering, 2008 (ICBBE 2008) (pp. 2236–2239). New York: IEEE.Google Scholar
  20. Chun, H., & Keleş, S. (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72, 3–25.CrossRefGoogle Scholar
  21. Chung, D., & Keles, S. (2010). Sparse partial least squares classification for high dimensional data. Statistical Applications in Genetics and Molecular Biology. doi: 10.2202/1544-6115.1492.PubMedPubMedCentralGoogle Scholar
  22. Coombes, K. R., Tsavachidis, S., Morris, J. S., Baggerly, K. A., Hung, M. C., & Kuerer, H. M. (2005). Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics, 5, 4107–4117.PubMedCrossRefGoogle Scholar
  23. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297.Google Scholar
  24. Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K., & Lindon, J. C. (2006). Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Analytical Chemistry, 78, 2262–2267.PubMedCrossRefGoogle Scholar
  25. Cui, Q., et al. (2008). Metabolite identification via the Madison metabolomics consortium database. Nature Biotechnology, 26, 162–164.PubMedCrossRefGoogle Scholar
  26. Davis, R. A., Charlton, A. J., Godward, J., Jones, S. A., Harrison, M., & Wilson, J. C. (2007). Adaptive binning: An improved binning method for metabolomics data using the undecimated wavelet transform. Chemometrics and Intelligent Laboratory Systems, 85, 144–154.CrossRefGoogle Scholar
  27. De Soete, G., & Carroll, J. D. (1994). K-means clustering in a low-dimensional Euclidean space. In E. Diday, et al. (Eds.), New approaches in classification and data analysis (pp. 212–219). Heidelberg: Springer.CrossRefGoogle Scholar
  28. Dettmer, K., Aronov, P. A., & Hammock, B. D. (2007). Mass spectrometry-based metabolomics. Mass Spectrometry Reviews, 26, 51–78. doi: 10.1002/mas.20108.PubMedPubMedCentralCrossRefGoogle Scholar
  29. Dieterle, F., Ross, A., Schlotterbeck, G., & Senn, H. (2006). Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics. Analytical Chemistry, 78, 4281–4290.PubMedCrossRefGoogle Scholar
  30. Draisma, H. H., Reijmers, T. H., Meulman, J. J., van der Greef, J., Hankemeier, T., & Boomsma, D. I. (2013). Hierarchical clustering analysis of blood plasma lipidomics profiles from mono-and dizygotic twin families. European Journal of Human Genetics, 21, 95–101.PubMedCrossRefGoogle Scholar
  31. Dunn, J. C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3, 32–57.CrossRefGoogle Scholar
  32. Dunn, W. B., Bailey, N. J., & Johnson, H. E. (2005). Measuring the metabolome: Current analytical technologies. Analyst, 130, 606–625.PubMedCrossRefGoogle Scholar
  33. Dunn, W. B., Wilson, I. D., Nicholls, A. W., & Broadhurst, D. (2012). The importance of experimental design and QC samples in large-scale and MS-driven untargeted metabolomic studies of humans. Bioanalysis, 4, 2249–2264.PubMedCrossRefGoogle Scholar
  34. Dunn, W. B., Broadhurst, D., Begley, P., Zelena, E., Francis-McIntyre, S., Anderson, N., et al. (2011). Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nature Protocols, 6, 1060–1083.PubMedCrossRefGoogle Scholar
  35. Eilers, P. H., & Marx, B. D. (1996). Flexible smoothing with B-splines and penalties. Statistical Science, 11, 89–102.CrossRefGoogle Scholar
  36. Emwas, A.-H., Luchinat, C., Turano, P., Tenori, L., Roy, R., Salek, R. M., et al. (2014). Standardizing the experimental conditions for using urine in NMR-based metabolomic studies with a particular focus on diagnostic studies: A review. Metabolomics, 11(4), 872–894.PubMedPubMedCentralCrossRefGoogle Scholar
  37. Enea, C., et al. (2010). 1H NMR-based metabolomics approach for exploring urinary metabolome modifications after acute and chronic physical exercise. Analytical and Bioanalytical Chemistry, 396, 1167–1176.PubMedCrossRefGoogle Scholar
  38. Ertöz, L., Steinbach, M., & Kumar, V. (2003). Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: SDM 2003, SIAM (pp. 47–58).Google Scholar
  39. Fahy, E., Sud, M., Cotter, D., & Subramaniam, S. (2007). LIPID MAPS online tools for lipid research. Nucleic Acids Research, 35, W606–W612.PubMedPubMedCentralCrossRefGoogle Scholar
  40. Förster, J., Gombert, A. K., & Nielsen, J. (2002). A functional genomics approach using metabolomics and in silico pathway analysis. Biotechnology and Bioengineering, 79, 703–712.PubMedCrossRefGoogle Scholar
  41. Gentleman, R. C., et al. (2004). Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology, 5, R80.PubMedPubMedCentralCrossRefGoogle Scholar
  42. Gika, H. G., Theodoridis, G. A., Plumb, R. S., & Wilson, I. D. (2014). Current practice of liquid chromatography–mass spectrometry in metabolomics and metabonomics. Journal of Pharmaceutical and Biomedical Analysis, 87, 12–25.PubMedCrossRefGoogle Scholar
  43. Griffin, J. L., Atherton, H., Shockcor, J., & Atzori, L. (2011). Metabolomics as a tool for cardiac research. Nature, 8, 630–643.Google Scholar
  44. Griffin, J. L., & Shockcor, J. P. (2004). Metabolic profiles of cancer cells. Nature Reviews Cancer, 4, 551–561. doi: 10.1038/nrc1390.PubMedCrossRefGoogle Scholar
  45. Griffiths, W. J., Koal, T., Wang, Y., Kohl, M., Enot, D. P., & Deigner, H. P. (2010). Targeted metabolomics for biomarker discovery. Angewandte Chemie, 49, 5426–5445. doi: 10.1002/anie.200905579.PubMedCrossRefGoogle Scholar
  46. Guan, W., Zhou, M., Hampton, C. Y., Benigno, B. B., Walker, L. D., Gray, A., et al. (2009). Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines. BMC Bioinformatics, 10, 259. doi: 10.1186/1471-2105-10-259.PubMedPubMedCentralCrossRefGoogle Scholar
  47. Gunderson, R. W. (1982). Choosing the r-dimension for the FCV family of clustering algorithms. BIT Numerical Mathematics, 22, 140–149.CrossRefGoogle Scholar
  48. Gunderson, R. W. (1983). An adaptive FCV clustering algorithm. International Journal of Man-Machine Studies, 19, 97–104.CrossRefGoogle Scholar
  49. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. The Journal of Machine Learning Research, 3, 1157–1182.Google Scholar
  50. Haddad, I., Hiller, K., Frimmersdorf, E., Benkert, B., Schomburg, D., & Jahn, D. (2009). An emergent self-organizing map based analysis pipeline for comparative metabolome studies. In Silico Biology, 9, 163–178.PubMedGoogle Scholar
  51. Hamerly, G., & Elkan, C. (2003). Learning the k in k-means. Advances in Neural Information Processing Systems, 16, 281–288.Google Scholar
  52. Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28, 100–108.Google Scholar
  53. Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., & Tibshirani, R. (2009). The elements of statistical learning (Vol. 2). Berlin: Springer.CrossRefGoogle Scholar
  54. Heather, L. C., Wang, X., West, J. A., & Griffin, J. L. (2013). A practical guide to metabolomic profiling as a discovery tool for human heart disease. Journal of Molecular and Cellular Cardiology, 55, 2–11.PubMedCrossRefGoogle Scholar
  55. Heinzmann, S. S., Brown, I. J., Chan, Q., Bictash, M., Dumas, M. E., Kochhar, S., et al. (2010). Metabolic profiling strategy for discovery of nutritional biomarkers: Proline betaine as a marker of citrus consumption. The American Journal of Clinical Nutrition, 92, 436–443.PubMedPubMedCentralCrossRefGoogle Scholar
  56. Henneges, C., Bullinger, D., Fux, R., Friese, N., Seeger, H., Neubauer, H., et al. (2009). Prediction of breast cancer by profiling of urinary RNA metabolites using Support Vector Machine-based feature selection. BMC Cancer, 9, 104.PubMedPubMedCentralCrossRefGoogle Scholar
  57. Holmans, P. (2010). Statistical methods for pathway analysis of genome-wide data for association with complex genetic traits. Advances in Genetics, 72, 141–179. doi: 10.1016/B978-0-12-380862-2.00007-2.PubMedGoogle Scholar
  58. Horai, H., Arita, M., Kanaya, S., Nihei, Y., Ikeda, T., Suwa, K., et al. (2010). MassBank: A public repository for sharing mass spectral data for life sciences. Journal of Mass Spectrometry, 45, 703–714.PubMedCrossRefGoogle Scholar
  59. Hou, Y., et al. (2012). Microbial strain prioritization using metabolomics tools for the discovery of natural products. Analytical Chemistry, 84, 4277–4283. doi: 10.1021/ac202623g.PubMedPubMedCentralCrossRefGoogle Scholar
  60. Huang, J. Z., Ng, M. K., Rong, H., & Li, Z. (2005). Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 657–668.PubMedCrossRefGoogle Scholar
  61. Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31, 651–666.CrossRefGoogle Scholar
  62. Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys (CSUR), 31, 264–323.CrossRefGoogle Scholar
  63. Jansen, J. J., Hoefsloot, H. C., Boelens, H. F., van der Greef, J., & Smilde, A. K. (2004). Analysis of longitudinal metabolomics data. Bioinformatics, 20, 2438–2446. doi: 10.1093/bioinformatics/bth268.PubMedCrossRefGoogle Scholar
  64. Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32, 241–254.PubMedCrossRefGoogle Scholar
  65. Johnson, R. A., & Wichern, D. W. (2007). Applied multivariate statistical analysis (6th ed.). Upper Saddle River, NJ: Pearson Prentice Hall.Google Scholar
  66. Jolliffe, I. (2005). Principal component analysis. New YorK: Wiley Online Library.CrossRefGoogle Scholar
  67. Kaddurah-Daouk, R., & Krishnan, K. R. (2009). Metabolomics: A global biochemical approach to the study of central nervous system diseases. Neuropsychopharmacology, 34, 173–186. doi: 10.1038/npp.2008.174.PubMedCrossRefGoogle Scholar
  68. Kanehisa, M. (2002). The KEGG database. Novartis Foundation Symposium, 247, 91–101 ; discussion 101–3, 119–28, 244–52.PubMedCrossRefGoogle Scholar
  69. Kang, S. M., Park, J. C., Shin, M. J., Lee, H., Oh, J., Hwang, G. S., et al. (2011). (1)H nuclear magnetic resonance based metabolic urinary profiling of patients with ischemic heart failure. Clinical Biochemistry, 44, 293–299. doi: 10.1016/j.clinbiochem.2010.11.010.PubMedCrossRefGoogle Scholar
  70. Kell, D. B., Brown, M., Davey, H. M., Dunn, W. B., Spasic, I., & Oliver, S. G. (2005). Metabolic footprinting and systems biology: The medium is the message. Nature Reviews Microbiology, 3, 557–565. doi: 10.1038/nrmicro1177.PubMedCrossRefGoogle Scholar
  71. Khatri, P., Sirota, M., & Butte, A. J. (2012). Ten years of pathway analysis: Current approaches and outstanding challenges. PLoS Computational Biology, 8, e1002375. doi: 10.1371/journal.pcbi.1002375.PubMedPubMedCentralCrossRefGoogle Scholar
  72. Kilkenny, C., Parsons, N., Kadyszewski, E., Festing, M. F., Cuthill, I. C., Fry, D., et al. (2009). Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS ONE, 4, e7824.PubMedPubMedCentralCrossRefGoogle Scholar
  73. Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78, 1464–1480.CrossRefGoogle Scholar
  74. Kohonen, T. (1998). The self-organizing map. Neurocomputing, 21, 1–6.CrossRefGoogle Scholar
  75. Kutner, M. H. (2005). Applied linear statistical models (5th ed.). McGraw-Hill/Irwin: Boston.Google Scholar
  76. Lauridsen, M., Hansen, S. H., Jaroszewski, J. W., & Cornett, C. (2007). Human urine as test material in 1H NMR-based metabonomics: Recommendations for sample preparation and storage. Analytical Chemistry, 79, 1181–1186.PubMedCrossRefGoogle Scholar
  77. Li, H., Liang, Y., & Xu, Q. (2009a). Support vector machines and its applications in chemistry. Chemometrics and Intelligent Laboratory Systems, 95, 188–198.CrossRefGoogle Scholar
  78. Li, X., Lu, X., Tian, J., Gao, P., Kong, H., & Xu, G. (2009b). Application of fuzzy c-means clustering in data analysis of metabolomics. Analytical Chemistry, 81, 4468–4475.PubMedCrossRefGoogle Scholar
  79. Li, F., Wang, J., Nie, L., & Zhang, W. (2012). Computational methods to interpret and integrate metabolomic data. New York: INTECH Open Access Publisher.CrossRefGoogle Scholar
  80. Luo, W., & Brouwer, C. (2013). Pathview: An R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics, 29, 1830–1831.PubMedPubMedCentralCrossRefGoogle Scholar
  81. Mahadevan, S., Shah, S. L., Marrie, T. J., & Slupsky, C. M. (2008). Analysis of metabolomic data using support vector machines. Analytical Chemistry, 80, 7562–7570.PubMedCrossRefGoogle Scholar
  82. Martens, H. (1992). Multivariate calibration. New York: Wiley.Google Scholar
  83. Marzetti, E., Landi, F., Marini, F., Cesari, M., Buford, T. W., Manini, T. M., et al. (2014). Patterns of circulating inflammatory biomarkers in older persons with varying levels of physical performance: A partial least squares-discriminant analysis approach. Frontiers in Medicine, 1, 27. doi: 10.3389/fmed.2014.00027.
  84. Matthiesen, R., & SpringerLink (Online Service). (2010). Bioinformatics methods in clinical research. In S. Krawetz & S. Misener (Eds.), Methods in molecular biology, methods and protocols. Totowa: Humana Press.Google Scholar
  85. Milliken, G. A., & Johnson, D. E. (2009). Analysis of messy data (2nd ed.). Boca Raton: CRC Press.CrossRefGoogle Scholar
  86. Milone, D. H., Stegmayer, G., López, M., Kamenetzky, L., & Carrari, F. (2014). Improving clustering with metabolic pathway data. BMC Bioinformatics, 15, 101.PubMedPubMedCentralCrossRefGoogle Scholar
  87. Montgomery, D. C. (2008). Design and analysis of experiments. New York: Wiley.Google Scholar
  88. Nguyen, D. V., & Rocke, D. M. (2002). Tumor classification by partial least squares using microarray gene expression data. Bioinformatics, 18, 39–50.PubMedCrossRefGoogle Scholar
  89. Nicholson, J. K., Lindon, J. C., & Holmes, E. (1999). ‘Metabonomics’: Understanding the metabolic responses of living systems to pathphysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica, 29, 1181–1189.PubMedCrossRefGoogle Scholar
  90. Nin, N., Izquierdo-García, J., & Lorente, J. (2012). The metabolomic approach to the diagnosis of critical illness. In Annual update in intensive care and emergency medicine (pp. 43–52). Berlin: Springer.Google Scholar
  91. Nueda, M. J., Conesa, A., Westerhuis, J. A., Hoefsloot, H. C., Smilde, A. K., Talón, M., et al. (2007). Discovering gene expression patterns in time course microarray experiments by ANOVA-SCA. Bioinformatics, 23, 1792–1800. doi: 10.1093/bioinformatics/btm251.PubMedCrossRefGoogle Scholar
  92. Oliver, S. G. (2002). Functional genomics: Lessons from yeast. Philosophical Transactions of the Royal Society of London. Series B, Biological sciences, 357, 17–23. doi: 10.1098/rstb.2001.1049.PubMedPubMedCentralCrossRefGoogle Scholar
  93. Oliver, S. G., Winson, M. K., Kell, D. B., & Baganz, F. (1998). Systematic functional analysis of the yeast genome. Trends in Biotechnology, 16, 373–378PubMedCrossRefGoogle Scholar
  94. O’Sullivan, A., Gibney, M. J., & Brennan, L. (2011). Dietary intake patterns are reflected in metabolomic profiles: Potential role in dietary assessment studies. The American Journal of Clinical Nutrition, 93, 314–321.PubMedCrossRefGoogle Scholar
  95. Papin, J. A., Stelling, J., Price, N. D., Klamt, S., Schuster, S., & Palsson, B. O. (2004). Comparison of network-based pathway analysis methods. Trends in Biotechnology, 22, 400–405. doi: 10.1016/j.tibtech.2004.06.010.PubMedCrossRefGoogle Scholar
  96. Patel, K. N., Patel, J. K., Patel, M. P., Rajput, G. C., & Patel, H. A. (2010). Introduction to hyphenated techniques and their applications in pharmacy. Pharmaceutical Methods, 1, 2–13.PubMedPubMedCentralCrossRefGoogle Scholar
  97. Pauling, L., Robinson, A. B., Teranishi, R., & Cary, P. (1971). Quantitative analysis of urine vapor and breath by gas-liquid partition chromatography. Proceedings of the National Academy of Sciences of the United States of America, 68, 2374–2376.PubMedPubMedCentralCrossRefGoogle Scholar
  98. Poroyko, V., Morowitz, M., Bell, T., Ulanov, A., Wang, M., Donovan, S., et al. (2011). Diet creates metabolic niches in the “immature gut” that shape microbial communities. Nutricion Hospitalaria, 26, 1283–1295. doi: 10.1590/S0212-16112011000600015.PubMedGoogle Scholar
  99. Putri, S. P., Nakayama, Y., Matsuda, F., Uchikata, T., Kobayashi, S., Matsubara, A., et al. (2013). Current metabolomics: Practical applications. Journal of Bioscience and Bioengineering, 115, 579–589. doi: 10.1016/j.jbiosc.2012.12.007.PubMedCrossRefGoogle Scholar
  100. Ramadan, Z., Jacobs, D., Grigorov, M., & Kochhar, S. (2006). Metabolic profiling using principal component analysis, discriminant partial least squares, and genetic algorithms. Talanta, 68, 1683–1691.PubMedCrossRefGoogle Scholar
  101. Raman, K., & Chandra, N. (2009). Flux balance analysis of biological systems: Applications and challenges. Briefings in Bioinformatics, 10, 435–449. doi: 10.1093/bib/bbp011.PubMedCrossRefGoogle Scholar
  102. Riter, L. S., Vitek, O., Gooding, K. M., Hodge, B. D., & Julian, R. K. (2005). Statistical design of experiments as a tool in mass spectrometry. Journal of Mass Spectrometry, 40, 565–579.PubMedCrossRefGoogle Scholar
  103. Rocke, D. M. (2004). Design and analysis of experiments with high throughput biological assay data. Seminars in Cell & Developmental Biology, 15, 703–713.CrossRefGoogle Scholar
  104. Savorani, F., Tomasi, G., & Engelsen, S. B. (2010). icoshift: A versatile tool for the rapid alignment of 1D NMR spectra. Journal of Magnetic Resonance, 202, 190–202.PubMedCrossRefGoogle Scholar
  105. Scalbert, A., Brennan, L., Fiehn, O., Hankemeier, T., Kristal, B. S., van Ommen, B., et al. (2009). Mass-spectrometry-based metabolomics: Limitations and recommendations for future progress with particular focus on nutrition research. Metabolomics, 5, 435–458.PubMedPubMedCentralCrossRefGoogle Scholar
  106. Schilling, C. H., Schuster, S., Palsson, B. O., & Heinrich, R. (1999). Metabolic pathway analysis: Basic concepts and scientific applications in the post-genomic era. Biotechnology Progress, 15, 296–303.PubMedCrossRefGoogle Scholar
  107. Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319.CrossRefGoogle Scholar
  108. Slupsky, C. M., Rankin, K. N., Wagner, J., Fu, H., Chang, D., Weljie, A. M., et al. (2007). Investigations of the effects of gender, diurnal variation, and age in human urinary metabolomic profiles. Analytical Chemistry, 79, 6995–7004.PubMedCrossRefGoogle Scholar
  109. Smilde, A. K., Jansen, J. J., Hoefsloot, H. C., Lamers, R. J., van der Greef, J., & Timmerman, M. E. (2005). ANOVA-simultaneous component analysis (ASCA): A new tool for analyzing designed metabolomics data. Bioinformatics, 21, 3043–3048. doi: 10.1093/bioinformatics/bti476.PubMedCrossRefGoogle Scholar
  110. Smilde, A. K., Westerhuis, J. A., Hoefsloot, H. C. J., Bijlsma, S., Rubingh, C. M., Vis, D. J., et al. (2010). Dynamic metabolomic data analysis: A tutorial review. Metabolomics, 6, 3–17. doi: 10.1007/s11306-009-0191-1.PubMedCrossRefGoogle Scholar
  111. Smith, C. A., O’Maille, G., Want, E. J., Qin, C., Trauger, S. A., Brandon, T. R., et al. (2005). METLIN: A metabolite mass spectral database. Therapeutic Drug Monitoring, 27, 747–751.PubMedCrossRefGoogle Scholar
  112. Smolinska, A., Blanchet, L., Buydens, L. M., & Wijmenga, S. S. (2012). NMR and pattern recognition methods in metabolomics: From data acquisition to biomarker discovery: A review. Analytica Chimica Acta, 750, 82–97. doi: 10.1016/j.aca.2012.05.049.PubMedCrossRefGoogle Scholar
  113. Steinley, D., & Brusco, M. J. (2008). Selection of variables in cluster analysis: An empirical comparison of eight procedures. Psychometrika, 73, 125–144.CrossRefGoogle Scholar
  114. Steuer, R. (2007). Computational approaches to the topology, stability and dynamics of metabolic networks. Phytochemistry, 68, 2139–2151. doi: 10.1016/j.phytochem.2007.04.041.PubMedCrossRefGoogle Scholar
  115. Stretch, C., Eastman, T., Mandal, R., Eisner, R., Wishart, D. S., Mourtzakis, M., et al. (2012). Prediction of skeletal muscle and fat mass in patients with advanced cancer using a metabolomic approach. The Journal of Nutrition, 142, 14–21.PubMedCrossRefGoogle Scholar
  116. Szczesniak, R. D., McPhail, G. L., Duan, L. L., Macaluso, M., Amin, R. S., & Clancy, J. P. (2013). A semiparametric approach to estimate rapid lung function decline in cystic fibrosis. Annals of Epidemiology, 23, 771–777.PubMedCrossRefGoogle Scholar
  117. Szymanska, E., Saccenti, E., Smilde, A. K., & Westerhuis, J. A. (2012). Double-check: Validation of diagnostic statistics for PLS-DA models in metabolomics studies. Metabolomics, 8, 3–16. doi: 10.1007/s11306-011-0330-3.PubMedCrossRefGoogle Scholar
  118. Theodoridis, G. A., Gika, H. G., Want, E. J., & Wilson, I. D. (2012). Liquid chromatography–mass spectrometry based global metabolite profiling: A review. Analytica Chimica Acta, 711, 7–16.PubMedCrossRefGoogle Scholar
  119. Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63, 411–423.CrossRefGoogle Scholar
  120. Timmerman, M. E., Ceulemans, E., De Roover, K., & Van Leeuwen, K. (2013). Subspace K-means clustering. Behavior Research Methods, 45, 1011–1023.PubMedCrossRefGoogle Scholar
  121. Timmerman, M. E., Ceulemans, E., Kiers, H. A., & Vichi, M. (2010). Factorial and reduced K-means reconsidered. Computational Statistics & Data Analysis, 54, 1858–1871.CrossRefGoogle Scholar
  122. Timmerman, M. E., Hoefsloot, H. C., Smilde, A. K., & Ceulemans, E. (2015). Scaling in ANOVA-simultaneous component analysis. Metabolomics,. doi: 10.1007/s11306-015-0785-8.PubMedPubMedCentralGoogle Scholar
  123. Tomar, N., & De, R. K. (2013). Comparing methods for metabolic network analysis and an application to Metabolic Engineering. Gene, 521, 1–14.PubMedCrossRefGoogle Scholar
  124. Tomasi, G., van den Berg, F., & Andersson, C. (2004). Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data. Journal of Chemometrics, 18, 231–241.CrossRefGoogle Scholar
  125. Trygg, J., & Wold, S. (2002). Orthogonal projections to latent structures (O-PLS). Journal of Chemometrics, 16, 119–128.CrossRefGoogle Scholar
  126. Ultsch, A. (2003). U*-matrix: A tool to visualize clusters in high dimensional data. Marburg: Fachbereich Mathematik und Informatik.Google Scholar
  127. van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K., & van der Werf, M. J. (2006). Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genomics, 7, 142.PubMedPubMedCentralCrossRefGoogle Scholar
  128. VanDyke, R., Ren, Y., Sucharew, H. J., Miodovnik, M., Rosenn, B., & Khoury, J. C. (2012). Characterizing maternal glycemic control: A more informative approach using semiparametric regression. Journal of Maternal-Fetal and Neonatal Medicine, 25, 15–19.PubMedCrossRefGoogle Scholar
  129. Velagapudi, V. R., et al. (2010). The gut microbiota modulates host energy and lipid metabolism in mice. Journal of Lipid Research, 51, 1101–1112.PubMedPubMedCentralCrossRefGoogle Scholar
  130. Vettukattil, R. (2015). Preprocessing of raw metabonomic data. Metabonomics: Methods and Protocols, 1, 123–136.Google Scholar
  131. Vichi, M., & Kiers, H. A. (2001). Factorial k-means analysis for two-way data. Computational Statistics & Data Analysis, 37, 49–64.CrossRefGoogle Scholar
  132. Wang-Sattler, R., Yu, Z., Herder, C., Messias, A. C., Floegel, A., He, Y., et al. (2012). Novel biomarkers for pre-diabetes identified by metabolomics. Molecular Systems Biology,. doi: 10.1038/msb.2012.43.PubMedPubMedCentralGoogle Scholar
  133. Wetmore, D. R., Joseloff, E., Pilewski, J., Lee, D. P., Lawton, K. A., Mitchell, M. W., et al. (2010). Metabolomic profiling reveals biochemical pathways and biomarkers associated with pathogenesis in cystic fibrosis cells. Journal of Biological Chemistry, 285, 30516–30522. doi: 10.1074/jbc.M110.140806.PubMedPubMedCentralCrossRefGoogle Scholar
  134. Wiechert, W. (2002). Modeling and simulation: Tools for metabolic engineering. Journal of Biotechnology, 94, 37–63.PubMedCrossRefGoogle Scholar
  135. Wishart, D. S. (2007). Current progress in computational metabolomics. Briefings in Bioinformatics, 8, 279–293.PubMedCrossRefGoogle Scholar
  136. Wishart, D. S., Jewison, T., Guo, A. C., Wilson, M., Knox, C., Liu, Y., et al. (2013). HMDB 3.0—The human metabolome database in 2013. Nucleic Acids Research, 41, D801–D807. doi: 10.1093/nar/gks1065.PubMedCrossRefGoogle Scholar
  137. Wold, H. (1966). Estimation of principal components and related models by iterative least squares. Multivariate Analysis, 1, 391–420.Google Scholar
  138. Wold, S., Ruhe, A., Wold, H., & Dunn, W. J. (1984). The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM Journal on Scientific and Statistical Computing, 5, 735–743.CrossRefGoogle Scholar
  139. Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: A basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58, 109–130.CrossRefGoogle Scholar
  140. Xi, Y., & Rocke, D. M. (2008). Baseline correction for NMR spectroscopic metabolomics data analysis. BMC Bioinformatics, 9, 324.PubMedPubMedCentralCrossRefGoogle Scholar
  141. Xia, J., Broadhurst, D. I., Wilson, M., & Wishart, D. S. (2012a). Translational biomarker discovery in clinical metabolomics: An introductory tutorial. Metabolomics, 9, 280–299. doi: 10.1007/s11306-012-0482-9.PubMedPubMedCentralCrossRefGoogle Scholar
  142. Xia, J., Mandal, R., Sinelnikov, I. V., Broadhurst, D., & Wishart, D. S. (2012b). MetaboAnalyst 2.0—A comprehensive server for metabolomic data analysis. Nucleic Acids Research, 40, W127–W133.PubMedPubMedCentralCrossRefGoogle Scholar
  143. Xia, J., Psychogios, N., Young, N., & Wishart, D. S. (2009). MetaboAnalyst: A web server for metabolomic data analysis and interpretation. Nucleic Acids Research, 37, W652–W660.PubMedPubMedCentralCrossRefGoogle Scholar
  144. Xing, E. P., Jordan, M. I., Russell, S., & Ng, A. Y. (2002). Distance metric learning with application to clustering with side-information. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems (pp. 505–512). Cambridge, MA: MIT Press.Google Scholar
  145. Yan, M., & Ye, K. (2007). Determining the number of clusters using the weighted gap statistic. Biometrics, 63, 1031–1037.PubMedCrossRefGoogle Scholar
  146. Yang, C., He, Z., & Yu, W. (2009). Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis. BMC Bioinformatics, 10, 4.PubMedPubMedCentralCrossRefGoogle Scholar
  147. Zhang, S., Gowda, G. N., Asiago, V., Shanaiah, N., Barbas, C., & Raftery, D. (2008). Correlative and quantitative 1 H NMR-based metabolomics reveals specific metabolic pathway disturbances in diabetic rats. Analytical Biochemistry, 383, 76–84.PubMedCrossRefGoogle Scholar
  148. Zhang, J. D., & Wiemann, S. (2009). KEGGgraph: A graph approach to KEGG PATHWAY in R and bioconductor. Bioinformatics, 25, 1470–1471.PubMedPubMedCentralCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Sheng Ren
    • 1
    • 2
    • 3
  • Anna A. Hinzman
    • 2
    • 4
  • Emily L. Kang
    • 3
  • Rhonda D. Szczesniak
    • 5
    • 6
  • Long Jason Lu
    • 1
    • 2
    • 4
    • 6
    • 7
    • 8
  1. 1.Institute for Systems BiologyJianghan UniversityWuhanChina
  2. 2.Division of Biomedical InformaticsCincinnati Children’s Hospital Research FoundationCincinnatiUSA
  3. 3.Department of Mathematical Sciences, McMicken College of Arts & SciencesUniversity of CincinnatiCincinnatiUSA
  4. 4.Department of Biomedical Engineering, College of MedicineUniversity of CincinnatiCincinnatiUSA
  5. 5.Division of Pulmonary MedicineCincinnati Children’s Hospital Research FoundationCincinnatiUSA
  6. 6.Division of Biostatistics and EpidemiologyCincinnati Children’s Hospital Research FoundationCincinnatiUSA
  7. 7.Department of Environmental Health, College of MedicineUniversity of CincinnatiCincinnatiUSA
  8. 8.Department of Computer Science, College of MedicineUniversity of CincinnatiCincinnatiUSA

Personalised recommendations