Skip to main content

Statistical Methods in Metabolomics

  • Protocol
  • First Online:
Evolutionary Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 856))

Abstract

Metabolomics is the relatively new field in bioinformatics that uses measurements on metabolite abundance as a tool for disease diagnosis and other medical purposes. Although closely related to proteomics, the statistical analysis is potentially simpler since biochemists have significantly more domain knowledge about metabolites. This chapter reviews the challenges that metabolomics poses in the areas of quality control, statistical metrology, and data mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rozen, S., Cudkowicz, M., Bogdanov, M., Matson, W., Kristal, B., Beecher, C., Harrison, S., Vouros, P., Flarakos, J., Vigneau-Callahan, K., Matson, T., Newhall, K., Beal, M. F., Brown, R. H. Jr., and Kaddurah-Daouk, R. (2005) Metabolomic analyiss and signtures in motor neuron disease. Metabolomics, 1, 101–108.

    Article  PubMed  CAS  Google Scholar 

  2. Kenny, L., Dunn, W., Ellis, D., Myers, J., Baker, P., the GOPEC Consortium, and Kell, D. (2005) Novel biomarkers for pre-eclampsia detected using metabolomics and machine learning. Metabolomics, 1, 227–234.

    Article  Google Scholar 

  3. Murthy, A., Rajendiran, T., Poisson, L., Siddiqui, J., Lonigro, R., Alexander, D., Shuster, J., Beecher, C., Wei, J., Chinnaiya, A., and Sreekumar, A. (2010) An alternative screening tool for prostate adenocarcinoma: Biomarker discovery. MURJ, 19, 71–79.

    Google Scholar 

  4. Romero, R., Mazaki-Tovi, S., Vaisbuch, E., Kusanovic, J., Nien, J., Yoon, B., Mazor, M., Luo, J., Banks, D., Ryals, J., and Beecher, C. (2010) Metabolomics in premature labor: A novel approach to identify patients at risk for preterm delivery. Journal of Maternal-Fetal and Neonatal Medicine, 23, 1344–1359.

    Article  PubMed  CAS  Google Scholar 

  5. Wishart, D. (2008) Metabolomics: Applications to food science and nutrition research. Trends in Food Science and Technology, 19, 482–493.

    Article  CAS  Google Scholar 

  6. Romero, P., Wagg, J., Green, M., Kaiser, D., Krummenacker, M., and Karp, P. (2004) Computational prediction of human metabolic pathways from the complete human genome. Genome Biology, 6, R1–R17.

    Article  Google Scholar 

  7. Dunn, W., and Ellis, D. (2005) Metabolomics: Current analytical platforms and methodologies. Trends in Analytical Chemistry, 24, 285–294.

    Article  CAS  Google Scholar 

  8. Broadhurst, D., and Kell, D. (2007) Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics, 2, 171–196.

    Article  Google Scholar 

  9. Baggerley, K., Morris, J., and Coombes, K. (2004). Reproducibility of SELD-TOF protein patterns in serum: Comparing datasets from different experiments. Bioinformatics, 20, 777–785.

    Article  Google Scholar 

  10. Kempthorne, O. (1952) Design and Analysis of Experiments, John Wiley & Sons, New York, N.Y.

    Google Scholar 

  11. Bose, R., and Shimamoto, T. (1952) Classification and analysis of partially balanced incomplete block designs with two associate classes. Journal of the American Statistical Association, 47, 151–184.

    Article  Google Scholar 

  12. Montgomery, D. (1991) Statistical Quality Control, Wiley, New York, N.Y.

    Google Scholar 

  13. Benjamini, Y., and Hochberg, Y. (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57, 289–300.

    Google Scholar 

  14. Liu, R. (1995). Control charts for multivariate processes. Journal of the American Statistical Association, 90, 1380–1387.

    Article  Google Scholar 

  15. http://www.nist.gov/srd/nist1.cfm

  16. Wang, K., and Gasser, T. (1997). Alignment of curves by dynamic time warping. Annals of Statistics, 25, 1251–1276.

    Article  Google Scholar 

  17. Katajamaa, M., and Orešič, M. (2007) Data processing for mass spectrometry-based metabolomics. Journal of Chromatography A, 1158, 318–328.

    Article  PubMed  CAS  Google Scholar 

  18. Xi, Y., and Rocke, D. (2008) Baseline correction for NMR spectroscopic metabolomics data analysis. BMC Bioinformatics, 9, 1–10, doi:10.1186/1471-2105-9-324.

    Article  Google Scholar 

  19. Morrison, D. (1990). Multivariate Statistical Methods, McGraw-Hill, New York, N.Y.

    Google Scholar 

  20. Martello, S., and Toth, P. (1990) Knapsack Problems: Algorithms and Computer Implementation, John Wiley & Sons, New York, N.Y.

    Google Scholar 

  21. Gilks, W., Richardson, S., and Spiegelhalter, D. (1996) Markov Chain Monte Carlo in Practice, Chapman & Hall/CRC, Boca Raton, FL.

    Google Scholar 

  22. Vidakovic, B. (1999) Statistical Modeling by Wavelets, Wiley, New York, N.Y.

    Book  Google Scholar 

  23. Cameron, J. (1982) Error analysis. Encyclopedia of Statistical Sciences, vol. 2, 545–551, Wiley, New York, N.Y.

    Google Scholar 

  24. Searle, S., Casella, G., and McCulloch, C. (1992) Variance Components, Wiley, New York, N.Y.

    Google Scholar 

  25. Casella, G., and Berger, R. (1990) Statistical Inference, Duxbury Press, Belmont, CA.

    Google Scholar 

  26. Steele, A., Hill, K., and Douglas, R. (2002). Data pooling and key comparison reference values. Metrologia, 39, 269–277.

    Article  Google Scholar 

  27. Milliken, G. A. and Johnson, D. E. (2000) The Analysis of Messy Data, vol. II. Wiley.

    Google Scholar 

  28. Clarke, B., Fokoué, E., and Zhang, H. (2009). Principles and Theory for Data Mining and Machine Learning, Springer, New York, N.Y.

    Book  Google Scholar 

  29. Hastie, T., Tibshirani, R., and Friedman, J. (2009) The Elements of Statistical Learning, Springer, New York, N.Y.

    Google Scholar 

  30. Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Eugenics, 7, 179–188.

    Article  Google Scholar 

  31. Raudys, S. and Young, D. (2004) Results in statistical discriminant analysis: A review of the former Soviet Union literature.” Journal of Multivariate Analysis, 89, 1–35.

    Article  Google Scholar 

  32. Weisberg, S. (1980) Applied Linear Regression, Wiley, New York, N.Y.

    Google Scholar 

  33. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, B, 58, 267–288.

    Google Scholar 

  34. Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, B, 67, 301–320.

    Article  Google Scholar 

  35. Candes, E., and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics, 35, 2313–2351.

    Article  Google Scholar 

  36. Vapnik, V. (1996) The Nature of Statistical Learning. Springer, New York, N.Y.

    Google Scholar 

  37. Cortes, C., and Vapnik, V. (1995), “Support-vector networks,” Machine Learning, 20, 273–297.

    Google Scholar 

  38. Boser, B., Guyon, I., and Vapnik, V. (1992) A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, D. Haussler, ed., pp. 144–152. ACM Press, Pittsburgh, PA.

    Chapter  Google Scholar 

  39. Aizerman, M., Braverman, E., and Rozonoer, L. (1964) Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25, 821–837.

    Google Scholar 

  40. Breiman, L. (2001) Random forests. Machine Learning, 45, 5–32.

    Article  Google Scholar 

  41. Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984) Classification and Regression Trees. Wadsworth/Brooks Cole, Belmont, CA.

    Google Scholar 

  42. Hawkins, D., Kass, G. (1982). Chapter 5: Automatic interaction detection. In Topics in Applied Multivariate Analysis, D. Hawkins, ed., pp. 269–302. Cambridge University Press, Cambridge, U.K.

    Chapter  Google Scholar 

  43. Quinlan, J. R. (1992). C4.5 Programs for Machine Learning, Morgan Kaufmann, San Mateo, CA.

    Google Scholar 

  44. Efron, B., and Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman & Hall/CRC, Boca Raton, FL.

    Google Scholar 

  45. Simmons, K., Kinney, J., Owens, A., Kleier, D., Bloch, K., Argentar, D., Walsh, A., and Vaidyanathan, G. (2008). Comparative study of machine learning and chemometric tools for analysis of in-vivo high-throughput screening data. Journal of Chemical Information and Modeling, 48, 1663–1668.

    Article  PubMed  CAS  Google Scholar 

  46. Truong, Y., Lin, X., Beecher, C., Cutler, A. and Young, S. (2004) Learning a complex dataset using random forests and support vector machines. Proceedings fo the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 835–840.

    Google Scholar 

  47. Bradley, P., and Mangasarian, O. (1998) Feature selection via concave minimization and support vector machines. International Conference on Machine Learning 15, 82–90.

    Google Scholar 

  48. Fan, J., and Li, R. (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.

    Article  Google Scholar 

  49. Wegman, E. (1990) Hyperdimensional data analysis using parallel coordinates. Journal of the American Statistical Association, 85, 664–675.

    Article  Google Scholar 

  50. http://www.ggobi.org

  51. Liu, L., Hawkins, D., Ghosh, S., and Young, S. (2003) Robust singular value decomposition analysis of microarray data. Proceedings of the National Academy of Sciences of the United States of America, 100, 13167–13172.

    Article  PubMed  CAS  Google Scholar 

  52. Stone, M. (1977) Asymptotics for and against cross-validation. Biometrika, 64, 29–35.

    Article  Google Scholar 

  53. Ivahkenko, A. G. (1970). Heuristic self-organization in problems of engineering cybernetics. Automatica, 6, 207–219.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Banks .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Korman, A., Oh, A., Raskind, A., Banks, D. (2012). Statistical Methods in Metabolomics. In: Anisimova, M. (eds) Evolutionary Genomics. Methods in Molecular Biology, vol 856. Humana Press. https://doi.org/10.1007/978-1-61779-585-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-585-5_16

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-61779-584-8

  • Online ISBN: 978-1-61779-585-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics