On the Interpretation of High Throughput MS Based Metabolomics Fingerprints with Random Forest

  • David P. Enot
  • Manfred Beckmann
  • John Draper
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4216)


We discuss application of a machine learning method, Random Forest (RF), for the extraction of relevant biological knowledge from metabolomics fingerprinting experiments. The importance of RF margins and variable significance as well as prediction accuracy is discussed to provide insight into model generalisability and explanatory power. A method is described for detection of relevant features while conserving the redundant structure of the fingerprint data. The methodology is illustrated using two datasets from electrospray ionisation mass spectrometry from 27 Arabidopsis genotypes and a set of transgenic potato lines.


Feature Selection Random Forest Linear Discriminant Analysis Area Under Curve Electrospray Ionisation Mass Spectrometry 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Weckwerth, W.: Metabolomics in systems biology Annu. Rev. Plant Biol. 54, 66989 (2003)CrossRefGoogle Scholar
  2. 2.
    Allen, J., et al.: High-throughput classification of yeast mutants for functional genomics using metabolic footprinting. Nature Biotech. 21, 692–696 (2003)CrossRefGoogle Scholar
  3. 3.
    Catchpole, G.S., et al.: Hierarchical metabolomics demonstrates substantial compositional similarity between genetically modified and conventional potato crops. Proc. Natl. Acad. Sci. USA 102, 14458–14462 (2005)CrossRefGoogle Scholar
  4. 4.
    Breiman, L.: Random Forests. Machine Learning 45(1), 261–277 (2001)MATHCrossRefGoogle Scholar
  5. 5.
    Thomaz, C.E., Gillies, D.F.: A maximum uncertainty LDA-based approach for limited sample size problems with application to face recognition. Technical Report 2004/1, Imperial College London (2004)Google Scholar
  6. 6.
    Tsujinishi, D., Koshiba, Y., Abe, S.: Why Pairwise Is Better than One-against-All or All-at-Once. In: Proc. International Joint Conference on Neural Networks, vol. 1, pp. 693–698 (2004)Google Scholar
  7. 7.
    Diaz-Uriarte, R., Alvarez de Andres, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)CrossRefGoogle Scholar
  8. 8.
    Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. J. Machine Learning Res. 3, 1157–1182 (2003)MATHCrossRefGoogle Scholar
  9. 9.
    Good, P.: Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer Series in Statistics (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • David P. Enot
    • 1
  • Manfred Beckmann
    • 1
  • John Draper
    • 1
  1. 1.Institute of Biological SciencesUniversity of WalesAberystwythUK

Personalised recommendations