Evolutionary Computation for the Interpretation of Metabolomic Data

  • Royston Goodacre
  • Douglas B. Kell


Post-genomic science is producing bounteous data floods, and as the above quotation indicates the extraction of the most meaningful parts of these data is key to the generation of useful new knowledge. Atypical metabolic fingerprint or metabolomics experiment is expected to generate thousands of data points (samples times variables) of which only a handful might be needed to describe the problem adequately. Evolutionary algorithms are ideal strategies for mining such data to generate useful relationships, rules and predictions. This chapter describes these techniques and highlights their exploitation in metabolomics.


Partial Little Square Evolutionary Computation Anal Chim Inductive Logic Programming Rule Induction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Al-Jowder O, Defernez M, Kemsley EK, Wilson RH. Mid-infrared spectroscopy and chemometrics for die authentication of meat products. J Agric Food Chem 47: 3210–3218 (1999).PubMedCrossRefGoogle Scholar
  2. Allen JK, Davey HM, Broadhurst D et al. Metabolic footprinting: a high-throughput, high-information approach to cellular characterisation and functional genomics. Nature Biotechnol submitted (2002).Google Scholar
  3. Alsberg BK, Goodacre R, Rowland JJ, Kell DB. Classification of pyrolysis mass spectra by fuzzy multivariate rule induction - comparison with regression, k-nearest neighbour, neural and decision-tree methods. Anal Chim Acta 348: 389–407 (1997).CrossRefGoogle Scholar
  4. Alsberg BK, Kell DB, Goodacre R. Variable selection in discriminant partial least squares analysis. Anal Chem 70: 4126–4133 (1998).PubMedCrossRefGoogle Scholar
  5. Altshuler D, Daly M, Kruglyak L. Guilt by association. Nature Genet 26: 135–137 (2000).PubMedCrossRefGoogle Scholar
  6. Bäck T, Fogel DB, Michalewicz Z. Handbook of Evolutionary Computation. Oxford University Press, Oxford (1997).CrossRefGoogle Scholar
  7. Banzhaf W, Nordin P, Keller RE, Francone FD. Genetic Programming: An Introduction. Morgan Kaufmann, San Francisco (1998).Google Scholar
  8. Barnaby W. The Plague Makers: The Secret World of Biolgoical Warfare. Vision Paperbacks, London (1997).Google Scholar
  9. Beavis RC, Colby SM, Goodacre R et al. Artificial intelligence and expert systems in mass spectrometry. In Encyclopedia of Analytical Chemistry. Meyers RA (Ed) pp. 11558–11597, John Wiley and Son, Chichester (2000).Google Scholar
  10. Beyer H-G. The Theory of Evolution Strategies. Springer, Berlin (2001)Google Scholar
  11. Bishop CM. Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1995).Google Scholar
  12. Bø TH, Jonassen I. New feature subset selection procedures for classification of expression profiles. http://genomebiologvcom/2Q02/3/4/researcli/00171 3: research0017.1–0017.11 (2002).Google Scholar
  13. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Wadsworth Inc, Pacific Grove (1984).Google Scholar
  14. Brent R. Functional genomics: learning to think about gene expression data. Curr Biol 9: R338–R341 (1999).PubMedCrossRefGoogle Scholar
  15. Brent R. Genomic biology. Cell 100: 169–183 (2000).PubMedCrossRefGoogle Scholar
  16. Broadhurst D, Goodacre R, Jones A et al. Genetic algorithms as a method for variable selection in PLS regression, with application to pyrolysis mass spectra. Anal Chim Acta 348: 71–86 (1997).CrossRefGoogle Scholar
  17. Broomhead DS, Lowe D. Multivariate function interpolation and adaptive networks. Complex Sys 2: 321–355 (1988).Google Scholar
  18. Chatfield C, Collins AJ. Introduction to Multivariate Analysis. Chapman and Hall, London (1980).Google Scholar
  19. Corne D, Dorigo M, Glover F (Ed). New Ideas in Optimization. McGraw Hill, London (1999).Google Scholar
  20. Dainty RH. Chemical/biochemical detection of spoilage. Int J Food Microbiol 33: 19–33 (1996).PubMedCrossRefGoogle Scholar
  21. Dando M. Biological Warfare in the 21 st Century. Brassey’s Ltd., London (1994).Google Scholar
  22. Darby RM, Maddison A, Mur LAJ et al. Cell specific expression of salicylate hydroxylase in an attempt to separate localised HR and systemic signalling establishing SAR in tobacco. Plant Mol Pathol 1: 115–124 (2000).CrossRefGoogle Scholar
  23. Downey G, McElhinney J, Fearn T. Species identification in selected raw homogenized meats by reflectance spectroscopy in the mid-infrared, near-infrared, and visible ranges. Appl Spectr 54: 894–899 (2000).CrossRefGoogle Scholar
  24. Doyle MP, Beuchat LR, Montville TJ (Ed) Food Microbiology: Fundamentals and Frontiers. American Society of Microbiology Press, Washington DC (1997).Google Scholar
  25. Duda RO, Hart PE, Stork DE. Pattern Classification. 2nd Edn. John Wiley and Sons, London (2001).Google Scholar
  26. Ellis DI, Broadhurst D, Kell DB et al. Rapid and quantitative detection of the microbial spoilage of meat using FT-IR spectroscopy and machine learning. Appl Env Microbiol 68: 2822–2828 (2002).CrossRefGoogle Scholar
  27. Everitt BS. Cluster Analysis. Edward Arnold, London (1993).Google Scholar
  28. Fell DA. Understanding the Control of Metabolism. Portland Press, London (1996).Google Scholar
  29. Fiehn O. Metabolomics — the link between genotypes and phenotypes. Plant Mol Biol 48: 155–171 (2002).PubMedCrossRefGoogle Scholar
  30. Fiehn O, Kloska S, Altmann T. Integrated studies on plant biology using multiparallel techniques. Curr Opin Biotechnol 12: 82–86 (2001).PubMedCrossRefGoogle Scholar
  31. Fiehn O, Kopka J, Dormann P et al. Metabolite profiling for plant functional genomics. Nature Biotechnol 18: 1157–1161 (2000a).CrossRefGoogle Scholar
  32. Fiehn O, Kopka J, Trethewey RN, Willmitzer L. Identification of uncommon plant metabolites based on calculation of elemental compositions using gas chromatography and quadrupole mass spectrometry. Anal Chem 72: 3573–3580 (2000b).PubMedCrossRefGoogle Scholar
  33. Fogel DB. A comparison of evolutionary programming and genetic algorithms on selected constrained optimization problems. Simulation 64: 397–404 (1995).CrossRefGoogle Scholar
  34. Fogel DB. Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. IEEE Press, Piscataway (2000).Google Scholar
  35. Garey M, Johnson D. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco (1979).Google Scholar
  36. Gilbert RJ, Goodacre R, Woodward AM, Kell DB. Genetic programming: a novel method for the quantitative analysis of pyrolysis mass spectral data. Anal Chem 69: 4381–4389 (1997).PubMedCrossRefGoogle Scholar
  37. Goldberg DE. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading (1989).Google Scholar
  38. Goodacre R, Neal MJ, Kell DB. Quantitative analysis of multivariate data using artificial neural networks: a tutorial review and applications to the deconvolution of pyrolysis mass spectrtra. Z Bakteriol 284: 516–539 (1996).CrossRefGoogle Scholar
  39. Goodacre R, Shann B, Gilbert R et al. The detection of the dipicolinic acid biomarker in Bacillus spores using Curie-point pyrolysis mass spectrometry and Fourier transform infrared spectroscopy. Anal Chem 72: 119–127 (2000).PubMedCrossRefGoogle Scholar
  40. Goodacre R, Timmins EM, Burton R et al. Rapid identification of urinary tract infection bacteria using hyperspectral, whole organism fingerprinting and artificial neural networks. Microbiol 144: 1157–1170 (1998).CrossRefGoogle Scholar
  41. Harrington PB. Fuzzy rule-building expert systems: minimal neural networks. J Osmometries 5: 467–486 (1991).Google Scholar
  42. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer-Verlag, Berlin (2001).Google Scholar
  43. Heinrich R, Schuster S. The Regulation of Cellular Systems. Chapman and Hall, New York (1996).CrossRefGoogle Scholar
  44. Holland JH. Adaption in Natural and Artificial Systems. MIT Press, Cambridge (1992).Google Scholar
  45. Horchner U, Kalivas JH. Further investigation on a comparative study of simulated annealing and genetic algorithm for wavelength selection. Anal Chim Acta 311: 1–13 (1995).CrossRefGoogle Scholar
  46. Johnson HE, Gilbert RJ, Winson MK et al. Explanatory analysis of the metabolome using genetic programming of simple, interpretable rules. Genet Program Evolv Mach 1: 243–258 (2000).CrossRefGoogle Scholar
  47. Jolliffe IT. Principal Component Analysis. Springer-Verlag, New York (1986).Google Scholar
  48. Kell DB. Defence against the flood: a solution to the data mining and predictive modeling challenges of today. Bioinformatics World (part of Scientific Computing News) Issue 1: 16–18 (2002a)–18 as publ.pdf.Google Scholar
  49. Kell DB. Genotype-phenotype mapping: genes as computer programs. Trends Genet in press (2002b).Google Scholar
  50. Kell DB, Darby RM, Draper J. Genomic computing. Explanatory analysis of plant expression profiling data using machine learning. Plant Phys 126: 943–951 (2001).CrossRefGoogle Scholar
  51. Kell DB, King RD. On the optimization of classes for the assignment of unidentified reading frames in functional genomics programmes: the need for machine learning. Trends Biotechnol 18: 93–98 (2000).PubMedCrossRefGoogle Scholar
  52. Kell DB, Mendes P. Snapshots of systems: metabolic control analysis and biotechnology in the post-genomic era. In Technological and Medical Implications of Metabolic Control Analysis. Cornish-Bowden A, Cardenas ML (Ed) pp. 3–25, Kluwer Academic Publishers, Dordrecht (2000) (see Scholar
  53. Kell DB, Sonnleitner B. GMP — Good Modelling Practice: an essential component of Good Manafacturing Practice. Trends Biotechnol 13: 481–492 (1995).CrossRefGoogle Scholar
  54. Kell DB, Westerhoff HV. Towards a rational approach to the optimization of flux in microbial biotransformations. Trends Biotechnol 4: 137–142 (1986).CrossRefGoogle Scholar
  55. King RD, Muggleton S, Lewis RA, Sternberg MJE. Drug design by machine learning — the use of inductive logic programming to model the structure-activity-relationships of trimethoprim analogs binding to dihydrofolate-reductase. Proc Natl Acad Sci USA 89: 11322–11326 (1992).PubMedCrossRefGoogle Scholar
  56. Koza JR. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992).Google Scholar
  57. Koza JR. Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge (1994).Google Scholar
  58. Koza JR, Bennett FH, Keane MA, Andre D. Genetic Programming III: Darwinian Invention and Problem Solving. Morgan Kaufmann, San Francisco (1999).Google Scholar
  59. Langdon WB. Genetic Programming and Data Structures: Genetic Programming + Data Structures = Automatic Programming! Kluwer Academic Publishers, Boston (1998).CrossRefGoogle Scholar
  60. Langdon WB, Poli R. Fitness causes bloat: mutation. In Proc First European Workshop on Genetic Programming. Vol. 1391. Banzhaf W, Poli R, Schoenauer M, Fogarty TC (Ed) pp. 37–48, Springer-Verlag, Berlin (1998).Google Scholar
  61. Langdon WB, Poli R. Foundations of Genetic Programming. Springer-Verlag, Berlin (2002).Google Scholar
  62. Lavrac N, Dzeroski S. Inductive Logic Programming: Techniques and Applications. Ellis Horwood, Chichester (1994).Google Scholar
  63. Leardi R, Seasholtz MB, Pell RJ. Variable selection for multivariate calibration using a genetic algorithm: prediction of additive concentrations in polymer films from Fourier transform-infrared spectral data. Anal Chim Acta 461: 189–200 (2002).CrossRefGoogle Scholar
  64. Lindon JC, Nicholson JK, Holmes E, Everett JR. Metabonomics: metabolic processes studied by NMR spectroscopy of biofluids. Concepts Magn Reson 12: 289–320 (2000).CrossRefGoogle Scholar
  65. Lloyd JW. Foundations of Logic Programming. Springer-Verlag, Berlin (1987).CrossRefGoogle Scholar
  66. Manly BFJ. Multivariate Statistical Methods: A Primer. Chapman and Hall, London (1994).Google Scholar
  67. Martens H, Naes T. Multivariate Calibration. John Wiley and Sons, Chichester (1989).Google Scholar
  68. McGovern AC, Broadhurst D, Taylor J et al. Monitoring of complex industrial bioprocesses for metabolite concentrations using modern spectroscopies and machine learning: application to gibberellic acid production. Biotechnol Bioeng 78: 527–538 (2002).PubMedCrossRefGoogle Scholar
  69. McGovern AC, Ernill R, Kara BV et al. Rapid analysis of the expression of heterologous proteins in Escherichia coli using pyrolysis mass spectrometry and Fourier transform infrared spectroscopy with chemometrics: application to α2-interferon production. J Biotechnol 72: 157–167 (1999).PubMedCrossRefGoogle Scholar
  70. Mendes P. Emerging bioinformatics for the metabolome. Briefings Bioinformat 3: 134–45 (2002).CrossRefGoogle Scholar
  71. Mendes P, Kell DB, Westerhoff HV. Why and when channeling can decrease pool size at constant net flux in a simple dynamic channel. Biochim Biophys Acta 1289: 175–186 (1996).PubMedCrossRefGoogle Scholar
  72. Michalewicz Z. Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag, Berlin (1994).Google Scholar
  73. Michalewicz Z, Fogel DB. How to Solve It: Modern Heuristics. Springer-Verlag, Heidelberg (2000).Google Scholar
  74. Mitchell M. An Introduction to Genetic Algorithms. MIT Press, Boston (1995).Google Scholar
  75. Mitchell TM. Machine Learning. McGraw Hill, New York (1997).Google Scholar
  76. Muggleton SH. Inductive logic programming. New Generation Comput 8: 295–318 (1990).CrossRefGoogle Scholar
  77. Nychas GJE, Tassou CC. Spoilage processes and proteolysis in chicken as detected by HPLC. J Sci Food Agric 74: 199–208 (1997).CrossRefGoogle Scholar
  78. Oldroyd D. The Arch of Knowledge: An Introduction to the History of the Philosophy and Methodology of Science. Methuen, New York (1986).Google Scholar
  79. Oliver SG. Proteomics: guilt-by-association goes global. Nature 403: 601–603 (2000).PubMedCrossRefGoogle Scholar
  80. Oliver SG, Winson MK, Kell DB, Baganz F. Systematic functional analysis of the yeast genome. Trends Biotechnol 16: 373–378 (1998).PubMedCrossRefGoogle Scholar
  81. Quinlan JR. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993).Google Scholar
  82. Raamsdonk LM, Teusink B, Broadhurst D et al. A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations. Nature Biotechnol 19: 45–50 (2001).CrossRefGoogle Scholar
  83. Radovic BS, Goodacre R, Anklam E. Contribution of pyrolysis mass spectrtrometry (Py-MS) to authenticity testing of honey. J Anal Appl Pyrolysis 60: 79–87 (2001).CrossRefGoogle Scholar
  84. Roger JM, Bellon-Maurel V. Using genetic algorithms to select wavelengths in near-infrared spectra: application to sugar content prediction in cherries. Appl Spectr 54: 1313–1320 (2000).CrossRefGoogle Scholar
  85. Rudolph G. Convergence Properties of Evolutionary Algorithms. Verlag Dr Kovac, Hamburg (1997).Google Scholar
  86. Sana A, Keller JD. Algorithms for better representation and faster learning in radial basis functions. In Advances in Neural Information Processing Sytems. Vol. 2. Touretzky D (Ed) pp. 482–489, Morgan Kaufmann, San Mateo (1990).Google Scholar
  87. Schwefel H-P. Evolution and Optimum Seeking. John Wiley and Sons, New York (1995).Google Scholar
  88. Seasholtz MB, Kowalski B. The parsimony principle applied to multivariate calibration. Anal Chim Act 277: 165–177 (1993).CrossRefGoogle Scholar
  89. Shaw AD, Kaderbhai N, Jones A et al. Non-invasive, on-line monitoring of the biotransformation by yeast of glucose to ethanol using dispersive Raman spectroscopy and chemometrics. Appl Spectr 53: 1419–1428 (1999).CrossRefGoogle Scholar
  90. Tukey JW. Exploratory Data Analysis. Addison-Wesley, Reading (1977).Google Scholar
  91. Vaidyanathan S, Kell DB, Goodacre R. Flow-injection electrospray ionization mass spectrometry of crude cell extracts for high-throughput bacterial identification. J Am Sot-Mass Spectrom 13: 118–128 (2002).CrossRefGoogle Scholar
  92. Vaidyanathan S, Macaloney G, McNeill B. Fundamental investigations on the near-infrared spectra of microbial biomass as applicable to bioprocess monitoring. Analyst 124: 157–162 (1999).CrossRefGoogle Scholar
  93. Vaidyanathan S, Rowland JJ, Kell DB, Goodacre R. Rapid discrimination of aerobic endospore-forming bacteria via electrospray-ionisation mass spectrometry of whole cell suspensions. Anal Chem 73: 4134–4144 (2001).PubMedCrossRefGoogle Scholar
  94. Werbos PJ. The Roots of Back-Propagation: From Ordered Derivatives to Neural Networks and Political Forecasting. John Wiley and Sons, Chichester (1994).Google Scholar
  95. Westerhoff HV, Kell DB. What BioTechnologists knew all along…? J Theor Biol 182: 411–420 (1996).PubMedCrossRefGoogle Scholar
  96. Wilkinson L. The Grammar of Graphics. Springer-Verlag, New York (1999).Google Scholar
  97. Williams RR, Paradkar RP. Correcting fluctuating baselines and spectral overlap with genetic regression. Appl Spectr 51: 92–100 (1997).CrossRefGoogle Scholar
  98. Winson MK, Goodacre R, Woodward AM et al. Diffuse reflectance absorbance spectroscopy taking in chemometrics (DRASTIC). A hyperspectral FT-IR-based approach to rapid screening for metabolite overproduction. Anal Chim Acta 348: 273–282 (1997).CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2003

Authors and Affiliations

  • Royston Goodacre
    • 1
    • 2
  • Douglas B. Kell
    • 2
  1. 1.Institute of Biological SciencesUniversity of WalesAberystwythUK
  2. 2.Department of ChemistryUniversity of Manchester Institute of Science and TechnologyManchesterUK

Personalised recommendations