Skip to main content
Log in

Finding biomarkers is getting easier

  • Review Article
  • Published:
Ecotoxicology Aims and scope Submit manuscript

Abstract

Single biomarkers are rarely accurate. Even suites of biomarkers can give conflicting results. Ideally potent combinations of variables are isolated which accurately identify specific analytes and their level of toxicity. The search for such combinations can be done by reducing the thousands of candidate variables to the small number necessary for treatment classification. When the key variables are recognized by machine learning (ML) the results are quite surprising, given the apparent failure of other searching methods to produce good diagnostics. Proteins seem especially useful for portable field tests of a variety of adverse conditions. This review shows how ML, in particular artificial neural networks, can find potent biomarkers embedded in any type of expression data, mainly proteins in this article. A computer does multiple iterations to produce sets of proteins which systematically identify (to near 100% accuracy) the treatment classes of interest. Whether these proteins are useful in actual diagnoses is tested by presenting the computer model with unknown classes. Finding the biomarkers is getting easier but there still must be confirmation, by multivariable statistics and with field studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aebersold R, Mann M (2003) Mass-spectrometry-based proteomics. Nature 422:198–207

    Article  CAS  Google Scholar 

  • Ahluwalia RS, Chidambaram S (2008) Proteome pattern analysis using neural networks. Int J Ind Eng 15:45–52

    Google Scholar 

  • Anderson TJ, Tchernyshyov I, Diez R, Cole RN, Geman D, Dang CV, Winslow RL (2007) Discovering robust protein biomarkers for disease from relative expression reversals in 2-D DIGE data. Proteomics 7:1197–1207

    Article  CAS  Google Scholar 

  • Ankley GT et al (2006) Toxicogenomics in regulatory ecotoxicology. Environ Sci Toxicol 40:4055–4065

    CAS  Google Scholar 

  • Ball G, Mian S et al (2002) An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers. Bioinformatics 18:395–404

    Article  CAS  Google Scholar 

  • Bergeron B (2007) Data mining. In: Bergeron B (ed) Bioinformatics computing. Prentice Hall PTR, Upper Saddle River

    Google Scholar 

  • Biron DG, Brun C, Lefevre T, Lebarbenchon C, Loxdale HD, Chevenet F, Brizard J-P, Thomas F (2006) The pitfalls of proteomics experiments without the correct use of bioinformatics tools. Proteomics 6:5577–5596

    Article  CAS  Google Scholar 

  • Blom A, Harder W, Matin A (1992) Unique and overlapping pollutant stress proteins of Escherichia coli. Appl Environ Microbiol 58:331–334

    CAS  Google Scholar 

  • Bradley BP, Bond J-A, Gonzalez CM, Tepper BE (1994) Complex mixture analysis using protein expression as a qualitative and quantitative tool. Environ Toxicol Chem 13:1043–1050

    Article  CAS  Google Scholar 

  • Bradley BP, Brown DC, Iamonte TN, Boyd SM, O’Neill MC (1996) Protein patterns and toxicity identification using artificial neural network models. In: Bengston DA, Henshel DS (eds) Biomarkers and risk assessment. American Society for Testing and Materials, Philadelphia

    Google Scholar 

  • Bradley BP, Kalampanayil B, O’Neill MC (2009) Protein expression profiling. In: Sheehan D, Ryther R (eds) Methods in molecular biology, vol 519. Springer protocols. Humana Press, New York, pp 455–468

    Google Scholar 

  • Bradley BP, Shrader EA, Kimmel DG, Meiller JC (2002) Protein expression signatures: an application of proteomics. Mar Environ Res 54:373–377

    Article  CAS  Google Scholar 

  • Cairns J Jr (1992) The threshold problem in ecotoxicology. Ecotoxicology 1:3–16

    Article  CAS  Google Scholar 

  • Chen L et al (2007) Comparison between back propagation neural network and regression models for the estimation of pigment content in rice leaves and panicles using hyper spectral data. Int J Remote Sens 28:3457–3478

    Article  Google Scholar 

  • Cooper JCB (1999) Artificial neural networks versus multivariate statistics: an application from economics. J Appl Stat 26:909–992

    Article  Google Scholar 

  • Cowan ML, Vera J (2008) Proteomics: advances in biomarker discovery. Expert Rev Proteomics 5:21–23

    Article  CAS  Google Scholar 

  • Crane M et al (2004) Risk characterization in Direct Toxicity Assessment of the river esk and the tees estuary. Ecotoxicology 13:463–474

    Article  CAS  Google Scholar 

  • De Iorio M, Ebbels TMD, Stephens DA (2007) Statistical techniques in metabolic profiling. In: Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics, vol 1. Wiley-Interscience, Chichester

    Google Scholar 

  • D’heygere T, Goethals PLM, dePauw N (2003) Use of genetic algorithms to select input variables in decision tree models for the prediction of benthic macro invertebrates. Ecol Model 160:291–300

    Article  Google Scholar 

  • Djavan B et al (2002) Novel artificial neural network for early detection of prostate cancer. J Clin Oncol 20:921–929

    Article  Google Scholar 

  • Dooki AD, Mayer-Posner FJ, Askari H, Zaiee A-a, Salekdeh GH (2006) Proteomic responses of rice young panicles to salinity. Proteomics 6:8498–8507

    Article  Google Scholar 

  • Ein-dor L, Zuk O, Domany E (2006) Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci 103:5923–5928

    Article  CAS  Google Scholar 

  • Fairbrother A (2009) Federal environmental legislation in the US for protection of wildlife and regulation of environmental contaminants. Ecotoxicology 18:784–790

    Article  CAS  Google Scholar 

  • Ferguson PL, Smith RD (2003) Proteome analysis by mass spectrometry. Annu Rev Biophys Biomol Struct 32:399–424

    Article  CAS  Google Scholar 

  • Figeys EM (2005) Strategy to design improved proteomic experiments based on statistical analyses of the chemical properties of identified peptides. J Proteome Res 4:2201–2206

    Article  Google Scholar 

  • Friedman TL (2007) “The world is flat”3.0 Picador/Farrar. Strauss and Giroux, New York, p 198

    Google Scholar 

  • Hale JE, Gelfanova V, Ludwig JR, Knierman MD (2003) Application of proteomics for discovery of protein biomarkers. Brief Funct Genomics 2:185–193

    Article  CAS  Google Scholar 

  • Hilario M, Kalousis A (2008) Approaches to dimensionality reduction in proteomics biomarker studies. Brief Bioinform 9:102–118

    Article  Google Scholar 

  • Hutchings M et al (2004) Toxicity reduction evaluation, toxicity evaluation valuation and toxicity tracking in direct toxicity assessment. Ecotoxicology 13:475–484

    Article  CAS  Google Scholar 

  • Hutchinson TH (2007) Small is useful in endocrine disruptor assessment-four key recommendations for aquatic invertebrate research. Ecotoxicology 16:231–238

    Article  CAS  Google Scholar 

  • Jellum E, Thoresud AK, Karesek FW (1983) Two-dimensional electrophoresis for determining toxicity of environmental substances. Anal Chem 55:2340–2344

    Article  CAS  Google Scholar 

  • Kimmel DG, Bradley BP (2001) Temperature and salinity stress in Eurytemora affinis: defining ecological limits using protein expression. J Exp Mar Biol Ecol 266:135–146

    Article  CAS  Google Scholar 

  • Kultz D, Somero GN (1996) Differences in protein patterns of gill epithelial cells of the fish Gillichthys mirabilis after osmotic and thermal acclimation. Comp Physiol 166B:88–100

    Google Scholar 

  • Lancashire Lee et al (2005) Utilizing artificial neural networks to elucidate serum biomarker patterns which discriminate between clinical stages in melanoma. In: Proceedings of the 2005 IEEE symposium on computational intelligence in bioinformatics and computational biology, La Jolla, USA, 14–15 Nov 2005

  • Lopez JL (2007) Applications of proteomics in marine ecology. Mar Ecol Prog Ser 332:275–279

    Article  CAS  Google Scholar 

  • McCarthy JF, Shugart LR (1990) Biomarkers of environmental contamination. Lewis, Boca Raton

    Google Scholar 

  • Meng Y (2006) A swarm intelligence based algorithm for proteomic pattern detection of ovarian cancer. In: 2006 IEEE symposium on computational intelligence and bioinformatics and computational biology CIBCB ’06, Toronto, Ontario, 28–29 Sept 2006

  • Merenyi E (1999) The challenges in spectral image analysis: an introduction, and review of ANN approaches. In: Proceedings of the 7th European symposium on artificial neural networks ESANN99, Bruges, Belgium, 21–23 April 1999

  • Miracle AL, Ankley GT (2005) Ecotoxicogenomics: linkages between exposure and effects in assessing risks of aquatic containments to fish. Reprod Toxicol 19:321–326

    Article  CAS  Google Scholar 

  • Montana DJ, L Davis (1989) Training feed forward neural networks using genetic algorithms. In: Proceedings of the 11th international joint conference on artificial intelligence. Morgan Kaufmann, San Mateo

  • Olsson B, Bradley BP, Gilek M, Reimer O, Shepard JL, Tedengren M (2004) Physiological and proteomic responses in Mytilus edulis exposed to PCBs and PAHs extracted from Baltic Sea sediments. Hydrobiologia 514:15–27

    Article  CAS  Google Scholar 

  • O’Neill M, Song L (2003) Neural network analysis of lymphoma microarray data: prognosis and diagnosis near-perfect. BMC Bioinform 4:13–25

    Article  Google Scholar 

  • Plomion C, Lalanne C et al (2006) Mapping the proteome of poplar and application to the discovery of drought-stress responsive proteins. Proteomics 6:6509–6527

    Article  CAS  Google Scholar 

  • Plomion C, Sterky F, Yuceer C (2010) Populus Genome Project panel. www.ornl.gov/sci/ipgc/proteomics-panel. Accessed June 2010

  • Shepard JL, Olsson B, Tedengren M, Bradley BP (2000) Protein expression signatures identified in Mytilus edulis exposed to PCBs, copper, and salinity stress. Mar Environ Res 50:337–340

    Article  CAS  Google Scholar 

  • Shrader EA, Henry TR, Greeley MS, Bradley BP (2003) Proteomics in zebrafish exposed to endocrine disrupting chemicals. Ecotoxicology 12:485–488

    Article  CAS  Google Scholar 

  • Smit, Suzanne et al (2007) Assessing the validity of proteomics based biomarkers. Anal Chim Acta 592:210–217

    Article  CAS  Google Scholar 

  • Snape JR, Maund SJ, Pickford DB, Hutchinson TH (2004) Ecotoxicogenomics: the challenge of integrating genomics into aquatic and terrestrial ecotoxicology. Aquat Toxicol 67:143–154

    Article  CAS  Google Scholar 

  • Spiegelman CH, Pfeiffer R, Mitchell G (2005) Using chemometrics and statistics to improve proteomics biomarker discovery. J Proteomics Res 5:461–462

    Article  Google Scholar 

  • Tinsley D, Wharfe J et al (2004) The use of direct toxicity assessment in the assessment and control of complex effluents in the UK: A demonstration programme. Ecotoxicology 13:423–436

    Article  CAS  Google Scholar 

  • Urfer W, Grzegorczyk M, Jung K (2006) Statistics for proteomics: A review of tools for analyzing experimental data. Proteomics 6:48–55

    Article  Google Scholar 

  • van den Bergh F (1999) Particle swarm weight initialization in multi-layer perceptron artificial neural networks. In: Development and practice of artificial intelligence techniques, Durban, South Africa, September 1999, pp 41–45

  • Wagner M et al (2004) Computational protein biomarker prediction: a case study for prostate cancer. BMC Bioinform 5:26–34

    Article  Google Scholar 

  • Weltje L, Schulte-Oehlmann U (2007) Special issue on endocrine disruption in aquatic invertebrates. Ecotoxicology 16(1):231–238

    Article  Google Scholar 

  • Werbos PJ (1994) The roots of back propagation. Wiley, New York

    Google Scholar 

  • Wharfe J, Tinsley D, Crane M (2004) Managing complex mixtures of chemicals: A forward look from the regulators’ perspective. Ecotoxicology 13:485–492

    Article  CAS  Google Scholar 

  • Whitley D (1995) Genetic algorithms in engineering and computer science. In: Periaux J, Winter G (eds) Genetic algorithms and neural networks, 2nd edn. Wiley, New York

    Google Scholar 

  • Yamanaka H, Yakabe Y, Saito K, Sekijima M, Shirai T (2007) Quantitative proteomic analysis of rat liver for carcinogenicity prediction in a 28-day repeated dose study. Proteomics 7:781–795

    Article  CAS  Google Scholar 

Download references

Acknowledgments

First to my colleague Mike O’Neill, who suggested ANN some time ago and then fitted the models. Next to all the graduate students who collected the data and the evidence that PES (Protein expression signatures) seemed to exist for everything. To Nagaraj Neerchal and his students, Sverdlov and Siddani for their parametric analysis of the medaka (uv) data. To the questions from attendees at PRIMO and SETAC meetings, the annoying, the amusing and the useful, questions that is. To two reviewers for (somewhat) restraining my verbosity. Not least to Editor-in-Chief Lee Shugart for his patience and his valuable comments. And to Justice Oliver Holmes, Jr., posthumously, for the quote. I am supported currently by TIAA (an NGO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brian Patrick Bradley.

Addendum

Addendum

The principle of ML with an artificial neural network is modeled on how a small human does it, with her standard neural network. She will not identify the relevant variables (we hope) but she will learn the difference between any dog and any cat, for example, even if some pet dogs (viz. the purse size) barely qualify. Humans (including adults) can learn without actually “knowing” the key indicators, thus rendering most of us useless in actually identifying the key variables. If that was not enough, there is the sheer weight of the thousands of inputs. What the machine has learned can be tested on different samples. If what it has learned is useful, the classification will be mainly correct. Most important, we can ask what it used to make the classification. (The principle can be applied to pre-med students. Both the machine and they will have learned something from training set or class but whether either learned the truth needs to be tested).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bradley, B.P. Finding biomarkers is getting easier. Ecotoxicology 21, 631–636 (2012). https://doi.org/10.1007/s10646-011-0848-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10646-011-0848-1

Keywords

Navigation