Conducting Molecular Biomarker Discovery Studies in Plants

  • Christian Schudoma
  • Matthias Steinfath
  • Heike Sprenger
  • Joost T. van Dongen
  • Dirk Hincha
  • Ellen Zuther
  • Peter Geigenberger
  • Joachim Kopka
  • Karin Köhl
  • Dirk WaltherEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 918)


Molecular biomarkers are molecules whose concentrations in a biological system inform about the current phenotypical state and, more importantly, may also be predictive of future phenotypic trait endpoints. The identification of biomarkers has gained much attention in targeted plant breeding since technologies have become available that measure many molecules across different levels of molecular organization and at decreasing costs. In this chapter, we outline the general strategy and workflow of conducting biomarker discovery studies. Critical aspects of study design as well as the statistical data analysis and model building will be highlighted.

Key words:

Biomarker OMICS technologies Machine learning Classification Feature selection Phenotype Study design Breeding Plants 



Support for this work was provided by the BMELV-funded TROST and the BMBF-funded SEPSAPE projects.


  1. 1.
    McCouch S (2004) Diversifying selection in plant breeding. PLoS Biol 2:e347PubMedCrossRefGoogle Scholar
  2. 2.
    Vale G, Francia E, Tacconi G, Crosatti C, Barabaschi D, Bulgarelli D, Dall’Aglio E (2005) Marker assisted selection in crop plants. Plant Cell Tissue Organ Cult 82:317–342CrossRefGoogle Scholar
  3. 3.
    Oliveira MM, Negrao S, Jena KK, Mackill D (2008) Integration of genomic tools to assist breeding in the japonica subspecies of rice. Mol Breed 22:159–168CrossRefGoogle Scholar
  4. 4.
    Moose SP, Mumm RH (2008) Molecular plant breeding as the foundation for 21st century crop improvement. Plant Physiol 147:969–977PubMedCrossRefGoogle Scholar
  5. 5.
    Mackill DJ, Collard BCY (2008) Marker-assisted selection: an approach for precision plant breeding in the twenty-first century. Philos Trans R Soc Lond B Biol Sci 363:557–572PubMedCrossRefGoogle Scholar
  6. 6.
    Steinfath M, Strehmel N, Peters R, Schauer N, Groth D, Hummel J, Steup M, Selbig J, Kopka J, Geigenberger P et al (2010) Discovering plant metabolic biomarkers for phenotype prediction using an untargeted approach. Plant Biotechnol J 8:900–911PubMedCrossRefGoogle Scholar
  7. 7.
    Meyer RC, Steinfath M, Lisec J, Becher M, Witucka-Wall H, Torjek O, Fiehn O, Eckardt A, Willmitzer L, Selbig J et al (2007) The metabolic signature related to high plant growth rate in Arabidopsis thaliana. Proc Natl Acad Sci USA 104:4759–4764PubMedCrossRefGoogle Scholar
  8. 8.
    Korn M, Gartner T, Erban A, Kopka J, Selbig J, Hincha DK (2010) Predicting Arabidopsis freezing tolerance and heterosis in freezing tolerance from metabolite composition. Mol Plant 3:224–235PubMedCrossRefGoogle Scholar
  9. 9.
    Gartner T, Steinfath M, Andorf S, Lisec J, Meyer RC, Altmann T, Willmitzer L, Selbig J (2009) Improved heterosis prediction by combining information on DNA- and metabolic markers. PLoS One 4:e5220PubMedCrossRefGoogle Scholar
  10. 10.
    Paik S, Tang G, Shak S, Kim C, Baker J, Kim W, Cronin M, Baehner FL, Watson D, Bryant J et al (2006) Gene expression and benefit of chemotherapy in women with node-negative, estrogen receptor-positive breast cancer. J Clin Oncol 24:3726–3734PubMedCrossRefGoogle Scholar
  11. 11.
    Paik S (2006) Methods for gene expression profiling in clinical trials of adjuvant breast cancer therapy. Clin Cancer Res 12:1019s–1023sPubMedCrossRefGoogle Scholar
  12. 12.
    Deng MC, Eisen HJ, Mehra MR, Billingham M, Marboe CC, Berry G, Kobashigawa J, Johnson FL, Starling RC, Murali S et al (2006) Noninvasive discrimination of rejection in cardiac allograft recipients using gene expression profiling. Am J Transplant 6:150–160PubMedCrossRefGoogle Scholar
  13. 13.
    Fan Y, Wang J, Yang Y, Liu Q, Fan Y, Yu J, Zheng S, Li M, Wang J (2010) Detection and identification of potential biomarkers of breast cancer. J Cancer Res Clin Oncol 136:1243–1254PubMedCrossRefGoogle Scholar
  14. 14.
    Michiels S, Koscielny S, Hill C (2005) Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365:488–492PubMedCrossRefGoogle Scholar
  15. 15.
    Hochberg Y, Benjamini Y (1990) More powerful procedures for multiple significance testing. Stat Med 9:811–818PubMedCrossRefGoogle Scholar
  16. 16.
    Nicholson G, Rantalainen M, Maher AD, Li JV, Malmodin D, Ahmadi KR, Faber JH, Hallgrimsdottir IB, Barrett A, Toft H et al (2011) Human metabolic profiles are stably controlled by genetic and environmental variation. Mol Syst Biol 7:525PubMedCrossRefGoogle Scholar
  17. 17.
    Bergmann W (1992) Colour atlas nutritional disorders of plants: visual and analytical diagnosis. Gustav Fisher Verlag, Jena. GermanyGoogle Scholar
  18. 18.
    Geigenberger P, Tiessen A, Meurer J (2011) Use of non-aqueous fractionation and metabolomics to study chloroplast function in Arabidopsis. Methods Mol Biol 775:135–160PubMedCrossRefGoogle Scholar
  19. 19.
    Fernie AR, Aharoni A, Willmitzer L, Stitt M, Tohge T, Kopka J, Carroll AJ, Saito K, Fraser PD, Deluca V (2011) Recommendations for reporting metabolite data. Plant Cell 23:2477–2482PubMedCrossRefGoogle Scholar
  20. 20.
    Sumner LW, Mendes P, Dixon RA (2003) Plant metabolomics: large-scale phytochemistry in the functional genomics era. Phytochemistry 62:817–836PubMedCrossRefGoogle Scholar
  21. 21.
    Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New YorkGoogle Scholar
  22. 22.
    Peng HC, Long FH, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238PubMedCrossRefGoogle Scholar
  23. 23.
    Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkGoogle Scholar
  24. 24.
    Larranaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, Lozano JA, Armananzas R, Santafe G, Perez A et al (2006) Machine learning in bioinformatics. Brief Bioinform 7:86–112PubMedCrossRefGoogle Scholar
  25. 25.
    Kotsiantis SB, Zaharakis ID, Pintelas PE (2006) Machine learning: a review of classification and combining techniques. Artif Intell Rev 26:159–190CrossRefGoogle Scholar
  26. 26.
    Mccullagh P (1980) Regression-models for ordinal data. J R Stat Soc Series B Methodol 42:109–142Google Scholar
  27. 27.
    Lal TN, Chapelle O, Weston J, Elisseeff A (2006) Embedded methods. In: Guyon G, Nikravesh, Zadeh (eds) Feature extraction: foundation and applications. Springer, New York, pp 137–162Google Scholar
  28. 28.
    Huda S, Yearwood J, Strainieri A (2010) Hybrid wrapper-filter approaches for input feature selection using maximum relevance and artificial neural network input gain measurement approximation (ANNIGMA). NSS ‘10 Proceedings of the 2010 Fourth International Conference on Network and Systems SecurityGoogle Scholar
  29. 29.
    Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517PubMedCrossRefGoogle Scholar
  30. 30.
    Guyon I, Gunn S, Nikravesh M, Zadeh LA (2006) Feature extraction: foundations and applications (studies in fuzziness and soft computing). Springer, New YorkGoogle Scholar
  31. 31.
    Kantardzic M (2002) Data mining: concepts, models, methods, and algorithms. Wiley Hoboken, New Jersey, USAGoogle Scholar
  32. 32.
    Breiman L (2001) Random forests. Mach Lear 45:5–32CrossRefGoogle Scholar
  33. 33.
    Lorena AC, de Carvalho ACPLF, Gama JMP (2008) A review on the combination of binary classifiers in multiclass problems. Artif Intell Rev 30:19–37CrossRefGoogle Scholar
  34. 34.
    Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. Proc Int Conf Artific IntelliGoogle Scholar
  35. 35.
    Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Series B Methodol 58:267–288Google Scholar
  36. 36.
    Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. Chapman & HallGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Christian Schudoma
    • 1
  • Matthias Steinfath
    • 1
  • Heike Sprenger
    • 1
  • Joost T. van Dongen
    • 1
  • Dirk Hincha
    • 1
  • Ellen Zuther
    • 1
  • Peter Geigenberger
    • 2
  • Joachim Kopka
    • 1
  • Karin Köhl
    • 1
  • Dirk Walther
    • 1
    Email author
  1. 1.Max Planck Institute for Molecular Plant PhysiologyPotsdam-GolmGermany
  2. 2.Department Biologie ILudwig-Maximilians-Universität MünchenPlanegg-MartinsriedGermany

Personalised recommendations