, Volume 21, Issue 3, pp 631–636 | Cite as

Finding biomarkers is getting easier

Review Article


Single biomarkers are rarely accurate. Even suites of biomarkers can give conflicting results. Ideally potent combinations of variables are isolated which accurately identify specific analytes and their level of toxicity. The search for such combinations can be done by reducing the thousands of candidate variables to the small number necessary for treatment classification. When the key variables are recognized by machine learning (ML) the results are quite surprising, given the apparent failure of other searching methods to produce good diagnostics. Proteins seem especially useful for portable field tests of a variety of adverse conditions. This review shows how ML, in particular artificial neural networks, can find potent biomarkers embedded in any type of expression data, mainly proteins in this article. A computer does multiple iterations to produce sets of proteins which systematically identify (to near 100% accuracy) the treatment classes of interest. Whether these proteins are useful in actual diagnoses is tested by presenting the computer model with unknown classes. Finding the biomarkers is getting easier but there still must be confirmation, by multivariable statistics and with field studies.


Isolating biomarkers Neural networks Machine learning Statistical verification 



First to my colleague Mike O’Neill, who suggested ANN some time ago and then fitted the models. Next to all the graduate students who collected the data and the evidence that PES (Protein expression signatures) seemed to exist for everything. To Nagaraj Neerchal and his students, Sverdlov and Siddani for their parametric analysis of the medaka (uv) data. To the questions from attendees at PRIMO and SETAC meetings, the annoying, the amusing and the useful, questions that is. To two reviewers for (somewhat) restraining my verbosity. Not least to Editor-in-Chief Lee Shugart for his patience and his valuable comments. And to Justice Oliver Holmes, Jr., posthumously, for the quote. I am supported currently by TIAA (an NGO).


  1. Aebersold R, Mann M (2003) Mass-spectrometry-based proteomics. Nature 422:198–207CrossRefGoogle Scholar
  2. Ahluwalia RS, Chidambaram S (2008) Proteome pattern analysis using neural networks. Int J Ind Eng 15:45–52Google Scholar
  3. Anderson TJ, Tchernyshyov I, Diez R, Cole RN, Geman D, Dang CV, Winslow RL (2007) Discovering robust protein biomarkers for disease from relative expression reversals in 2-D DIGE data. Proteomics 7:1197–1207CrossRefGoogle Scholar
  4. Ankley GT et al (2006) Toxicogenomics in regulatory ecotoxicology. Environ Sci Toxicol 40:4055–4065Google Scholar
  5. Ball G, Mian S et al (2002) An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers. Bioinformatics 18:395–404CrossRefGoogle Scholar
  6. Bergeron B (2007) Data mining. In: Bergeron B (ed) Bioinformatics computing. Prentice Hall PTR, Upper Saddle RiverGoogle Scholar
  7. Biron DG, Brun C, Lefevre T, Lebarbenchon C, Loxdale HD, Chevenet F, Brizard J-P, Thomas F (2006) The pitfalls of proteomics experiments without the correct use of bioinformatics tools. Proteomics 6:5577–5596CrossRefGoogle Scholar
  8. Blom A, Harder W, Matin A (1992) Unique and overlapping pollutant stress proteins of Escherichia coli. Appl Environ Microbiol 58:331–334Google Scholar
  9. Bradley BP, Bond J-A, Gonzalez CM, Tepper BE (1994) Complex mixture analysis using protein expression as a qualitative and quantitative tool. Environ Toxicol Chem 13:1043–1050CrossRefGoogle Scholar
  10. Bradley BP, Brown DC, Iamonte TN, Boyd SM, O’Neill MC (1996) Protein patterns and toxicity identification using artificial neural network models. In: Bengston DA, Henshel DS (eds) Biomarkers and risk assessment. American Society for Testing and Materials, PhiladelphiaGoogle Scholar
  11. Bradley BP, Kalampanayil B, O’Neill MC (2009) Protein expression profiling. In: Sheehan D, Ryther R (eds) Methods in molecular biology, vol 519. Springer protocols. Humana Press, New York, pp 455–468Google Scholar
  12. Bradley BP, Shrader EA, Kimmel DG, Meiller JC (2002) Protein expression signatures: an application of proteomics. Mar Environ Res 54:373–377CrossRefGoogle Scholar
  13. Cairns J Jr (1992) The threshold problem in ecotoxicology. Ecotoxicology 1:3–16CrossRefGoogle Scholar
  14. Chen L et al (2007) Comparison between back propagation neural network and regression models for the estimation of pigment content in rice leaves and panicles using hyper spectral data. Int J Remote Sens 28:3457–3478CrossRefGoogle Scholar
  15. Cooper JCB (1999) Artificial neural networks versus multivariate statistics: an application from economics. J Appl Stat 26:909–992CrossRefGoogle Scholar
  16. Cowan ML, Vera J (2008) Proteomics: advances in biomarker discovery. Expert Rev Proteomics 5:21–23CrossRefGoogle Scholar
  17. Crane M et al (2004) Risk characterization in Direct Toxicity Assessment of the river esk and the tees estuary. Ecotoxicology 13:463–474CrossRefGoogle Scholar
  18. De Iorio M, Ebbels TMD, Stephens DA (2007) Statistical techniques in metabolic profiling. In: Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics, vol 1. Wiley-Interscience, ChichesterGoogle Scholar
  19. D’heygere T, Goethals PLM, dePauw N (2003) Use of genetic algorithms to select input variables in decision tree models for the prediction of benthic macro invertebrates. Ecol Model 160:291–300CrossRefGoogle Scholar
  20. Djavan B et al (2002) Novel artificial neural network for early detection of prostate cancer. J Clin Oncol 20:921–929CrossRefGoogle Scholar
  21. Dooki AD, Mayer-Posner FJ, Askari H, Zaiee A-a, Salekdeh GH (2006) Proteomic responses of rice young panicles to salinity. Proteomics 6:8498–8507CrossRefGoogle Scholar
  22. Ein-dor L, Zuk O, Domany E (2006) Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci 103:5923–5928CrossRefGoogle Scholar
  23. Fairbrother A (2009) Federal environmental legislation in the US for protection of wildlife and regulation of environmental contaminants. Ecotoxicology 18:784–790CrossRefGoogle Scholar
  24. Ferguson PL, Smith RD (2003) Proteome analysis by mass spectrometry. Annu Rev Biophys Biomol Struct 32:399–424CrossRefGoogle Scholar
  25. Figeys EM (2005) Strategy to design improved proteomic experiments based on statistical analyses of the chemical properties of identified peptides. J Proteome Res 4:2201–2206CrossRefGoogle Scholar
  26. Friedman TL (2007) “The world is flat”3.0 Picador/Farrar. Strauss and Giroux, New York, p 198Google Scholar
  27. Hale JE, Gelfanova V, Ludwig JR, Knierman MD (2003) Application of proteomics for discovery of protein biomarkers. Brief Funct Genomics 2:185–193CrossRefGoogle Scholar
  28. Hilario M, Kalousis A (2008) Approaches to dimensionality reduction in proteomics biomarker studies. Brief Bioinform 9:102–118CrossRefGoogle Scholar
  29. Hutchings M et al (2004) Toxicity reduction evaluation, toxicity evaluation valuation and toxicity tracking in direct toxicity assessment. Ecotoxicology 13:475–484CrossRefGoogle Scholar
  30. Hutchinson TH (2007) Small is useful in endocrine disruptor assessment-four key recommendations for aquatic invertebrate research. Ecotoxicology 16:231–238CrossRefGoogle Scholar
  31. Jellum E, Thoresud AK, Karesek FW (1983) Two-dimensional electrophoresis for determining toxicity of environmental substances. Anal Chem 55:2340–2344CrossRefGoogle Scholar
  32. Kimmel DG, Bradley BP (2001) Temperature and salinity stress in Eurytemora affinis: defining ecological limits using protein expression. J Exp Mar Biol Ecol 266:135–146CrossRefGoogle Scholar
  33. Kultz D, Somero GN (1996) Differences in protein patterns of gill epithelial cells of the fish Gillichthys mirabilis after osmotic and thermal acclimation. Comp Physiol 166B:88–100Google Scholar
  34. Lancashire Lee et al (2005) Utilizing artificial neural networks to elucidate serum biomarker patterns which discriminate between clinical stages in melanoma. In: Proceedings of the 2005 IEEE symposium on computational intelligence in bioinformatics and computational biology, La Jolla, USA, 14–15 Nov 2005Google Scholar
  35. Lopez JL (2007) Applications of proteomics in marine ecology. Mar Ecol Prog Ser 332:275–279CrossRefGoogle Scholar
  36. McCarthy JF, Shugart LR (1990) Biomarkers of environmental contamination. Lewis, Boca RatonGoogle Scholar
  37. Meng Y (2006) A swarm intelligence based algorithm for proteomic pattern detection of ovarian cancer. In: 2006 IEEE symposium on computational intelligence and bioinformatics and computational biology CIBCB ’06, Toronto, Ontario, 28–29 Sept 2006Google Scholar
  38. Merenyi E (1999) The challenges in spectral image analysis: an introduction, and review of ANN approaches. In: Proceedings of the 7th European symposium on artificial neural networks ESANN99, Bruges, Belgium, 21–23 April 1999Google Scholar
  39. Miracle AL, Ankley GT (2005) Ecotoxicogenomics: linkages between exposure and effects in assessing risks of aquatic containments to fish. Reprod Toxicol 19:321–326CrossRefGoogle Scholar
  40. Montana DJ, L Davis (1989) Training feed forward neural networks using genetic algorithms. In: Proceedings of the 11th international joint conference on artificial intelligence. Morgan Kaufmann, San MateoGoogle Scholar
  41. Olsson B, Bradley BP, Gilek M, Reimer O, Shepard JL, Tedengren M (2004) Physiological and proteomic responses in Mytilus edulis exposed to PCBs and PAHs extracted from Baltic Sea sediments. Hydrobiologia 514:15–27CrossRefGoogle Scholar
  42. O’Neill M, Song L (2003) Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect. BMC Bioinform 4:13–25CrossRefGoogle Scholar
  43. Plomion C, Lalanne C et al (2006) Mapping the proteome of poplar and application to the discovery of drought-stress responsive proteins. Proteomics 6:6509–6527CrossRefGoogle Scholar
  44. Plomion C, Sterky F, Yuceer C (2010) Populus Genome Project panel. Accessed June 2010
  45. Shepard JL, Olsson B, Tedengren M, Bradley BP (2000) Protein expression signatures identified in Mytilus edulis exposed to PCBs, copper, and salinity stress. Mar Environ Res 50:337–340CrossRefGoogle Scholar
  46. Shrader EA, Henry TR, Greeley MS, Bradley BP (2003) Proteomics in zebrafish exposed to endocrine disrupting chemicals. Ecotoxicology 12:485–488CrossRefGoogle Scholar
  47. Smit, Suzanne et al (2007) Assessing the validity of proteomics based biomarkers. Anal Chim Acta 592:210–217CrossRefGoogle Scholar
  48. Snape JR, Maund SJ, Pickford DB, Hutchinson TH (2004) Ecotoxicogenomics: the challenge of integrating genomics into aquatic and terrestrial ecotoxicology. Aquat Toxicol 67:143–154CrossRefGoogle Scholar
  49. Spiegelman CH, Pfeiffer R, Mitchell G (2005) Using chemometrics and statistics to improve proteomics biomarker discovery. J Proteomics Res 5:461–462CrossRefGoogle Scholar
  50. Tinsley D, Wharfe J et al (2004) The use of direct toxicity assessment in the assessment and control of complex effluents in the UK: A demonstration programme. Ecotoxicology 13:423–436CrossRefGoogle Scholar
  51. Urfer W, Grzegorczyk M, Jung K (2006) Statistics for proteomics: A review of tools for analyzing experimental data. Proteomics 6:48–55CrossRefGoogle Scholar
  52. van den Bergh F (1999) Particle swarm weight initialization in multi-layer perceptron artificial neural networks. In: Development and practice of artificial intelligence techniques, Durban, South Africa, September 1999, pp 41–45Google Scholar
  53. Wagner M et al (2004) Computational protein biomarker prediction: a case study for prostate cancer. BMC Bioinform 5:26–34CrossRefGoogle Scholar
  54. Weltje L, Schulte-Oehlmann U (2007) Special issue on endocrine disruption in aquatic invertebrates. Ecotoxicology 16(1):231–238CrossRefGoogle Scholar
  55. Werbos PJ (1994) The roots of back propagation. Wiley, New YorkGoogle Scholar
  56. Wharfe J, Tinsley D, Crane M (2004) Managing complex mixtures of chemicals: A forward look from the regulators’ perspective. Ecotoxicology 13:485–492CrossRefGoogle Scholar
  57. Whitley D (1995) Genetic algorithms in engineering and computer science. In: Periaux J, Winter G (eds) Genetic algorithms and neural networks, 2nd edn. Wiley, New YorkGoogle Scholar
  58. Yamanaka H, Yakabe Y, Saito K, Sekijima M, Shirai T (2007) Quantitative proteomic analysis of rat liver for carcinogenicity prediction in a 28-day repeated dose study. Proteomics 7:781–795CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Biological SciencesUniversity of Maryland Baltimore CountyBaltimoreUSA
  2. 2.BuffaloUSA

Personalised recommendations