Machine Learning with Special Emphasis on Support Vector Machines (SVMs) in Systems Biology: A Plant Perspective

Chapter

Abstract

Systems biology has been progressing with integrative genomics and tools such as bioinformatics. Recent developments in high-throughput techniques have led to the accumulation of deluge of biological data. To address specific biological questions and to generate biologically meaningful information from this deluge of data, there was a need to integrate components and system levels at biological point of view. Combined strategies from systems biology and computational biology lead to computational systems biology. Logical applications from machine learning have lots of applications with state-of-the-art techniques to deal with this data. Machine-learning applications in biology gave enhancements to the overall aspects of biological problems and their fast and accurate solutions. This chapter addresses the implications and applications of machine-learning techniques with special emphasis on support vector machines, on plants and associated research areas.

Keywords

Machine learning SVM Bioinformatics Systems biology 

Abbreviations

ANN

Artificial neural network

ATF3

Activating transcription factor 3

IFN γ

Interferon gamma

LOO

Leave-one-out

MCMV

Murine cytomegalovirus

miRNA

microRNA

SNPs

Single nucleotide polymorphisms

SVMs

Support Vector Machines

TRN

Transcriptional regulatory network

References

  1. Baldi P, Brunak S (2001) Bioinformatics: the machine learning approach. MIT Press, CambridgeGoogle Scholar
  2. Baldi P, Brunak S, Chauvin Y, Andersen CAF, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16:412–424PubMedCrossRefGoogle Scholar
  3. Brown MPS, Grundy WN, Lion D, Cristianini N, Sugnet CW, Furey TS, Ares M Jr, Haussler D (2000) Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A 97:262–297PubMedCentralPubMedCrossRefGoogle Scholar
  4. Bruggeman FJ, Westerhoff HV (2007) The nature of systems biology. Trends Microbiol 15:45–50PubMedCrossRefGoogle Scholar
  5. Bülow L, Schindler M, Choi C, Hehl R (2004) PathoPlant: a database on plant-pathogen interactions. In Silico Biol 4:0044Google Scholar
  6. Cui D, Zhang O, Li M, Zhao Y, Hartman GL (2009) Detection of soybean rust using a multispectral image sensor. Sens Instrum Food Qual 3:49–56CrossRefGoogle Scholar
  7. De Jong H (2002) Modeling and simulation of genetic regulatory systems: a literature review. J Comput Biol 9:67–103PubMedCrossRefGoogle Scholar
  8. Donnes P, Elofsson A (2002) Prediction of MHC class I binding peptides, using SVMHC. BMC Bioinformatics 3:25PubMedCentralPubMedCrossRefGoogle Scholar
  9. Fiehn O (2002) Metabolomics-the link between genotypes and phenotypes. Plant Mol Biol 48:155–171PubMedCrossRefGoogle Scholar
  10. Garg A, Bhasin M, Raghava GPS (2005) Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J Biol Chem 280:14427–14432PubMedCrossRefGoogle Scholar
  11. Gkirtzou K, Tsamardinos L, Tsakalides P, Poirazi P (2010) Mature Bayes: a probabilistic algorithm for identifying the mature miRNA within novel precursors. PLoS One 5:e11843PubMedCentralPubMedCrossRefGoogle Scholar
  12. Gupta A, Singh TR (2013) SHIFT: server for hidden stops analysis in frame-shifted translation. BMC Res Notes 6:68PubMedCentralPubMedCrossRefGoogle Scholar
  13. Han X, Gross RW (2003) Global analyses of cellular lipidomes directly from crude extracts of biological samples by ESI mass spectrometry: a bridge to lipidomics. J Lipid Res 4:1071–1079CrossRefGoogle Scholar
  14. Huang S (2004) Back to the biology in systems biology: what can we learn from biomolecular networks? Brief Funct Genomic Proteomic 2:279–297PubMedCrossRefGoogle Scholar
  15. Huang S, Wikswo J (2006) Dimensions of systems biology. Rev Physiol Biochem Pharmacol 157:81–104PubMedCrossRefGoogle Scholar
  16. Jiang P, Wu H, Wang W, Ma W, Sun X et al (2007) MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features. Nucleic Acids Res 35:339–344CrossRefGoogle Scholar
  17. Joachims T (1999) Making large-scale SVM learning practical. In: Scholkopf B, Burges C, Smole A (eds) Advances in kernel methods – support vector learning. MIT Press, Cambridge, pp 169–184Google Scholar
  18. Kaundal R, Kapoor AS, Raghava GPS (2006) Machine learning techniques in disease forecasting: a case study on rice blast prediction. BMC Bioinformatics 7:485PubMedCentralPubMedCrossRefGoogle Scholar
  19. Kenyon C (2005) The plasticity of aging: insight from long lived mutant. Cell 120:449–460PubMedCrossRefGoogle Scholar
  20. Kim SK, Nam JW, Rhee JK, Lee WJ, Zhang BT (2006) miTarget: microRNA target gene prediction using a support vector machine. BMC Bioinformatics 7:411PubMedCentralPubMedCrossRefGoogle Scholar
  21. Klipp E, Heinrich R et al (2002) Prediction of temporal gene expression. Metabolic optimization by re-distribution of enzyme activities. Eur J Biochem 269:5604–5613CrossRefGoogle Scholar
  22. Laska MS, Wootton JT (1998) Theoretical concepts and empirical approaches to measuring interaction strength. Ecology 79:461–476CrossRefGoogle Scholar
  23. Li P, Zang W, Li Y, Xu F, Wang J, Shi T (2011) AtPID: the overall hierarchical functional protein interaction network interface and analytic platform for Arabidopsis. Nucl Acids Res 39(suppl 1):D1130–D1133PubMedCentralPubMedCrossRefGoogle Scholar
  24. Liang Y, Zhang F, Wang J, Joshi T, Wang Y et al (2011) Prediction of drought-resistant genes in Arabidopsis thaliana using SVM-RFE. PLoS One 6:e21750PubMedCentralPubMedCrossRefGoogle Scholar
  25. Longo VD, Finch CE (2003) Evolutionary medicine: from dwarf model systems to healthy centenarians. Science 299:1342–1346PubMedCrossRefGoogle Scholar
  26. Longo VD, Leiber MR, Vijg J (2008) Turning antiaging genes against cancer. Mol Cell Biol 9:903–910Google Scholar
  27. Man Q-K, Zheng C-H, Wang X-F, Lin F-Y (2008) Recognition of plant leaves using support vector machine. Commun Comput Inf Sci 15:192–199CrossRefGoogle Scholar
  28. Matukumalli LK, Grefenstette JJ, Hyten DL, Choi I-Y, Cregan PB, Tassell CPV (2006) Application of machine learning in SNP discovery. BMC Bioinformatics 7:4PubMedCentralPubMedCrossRefGoogle Scholar
  29. Morel NM, Holland JM, van der Greef J, Marple EW et al (2004) Primer on medical genomics. Part XIV: Introduction to systems biology-a new approach to understanding disease and treatment. Mayo Clin Proc 79:651–658PubMedCrossRefGoogle Scholar
  30. Nelander S, Wang W, Nilsson B, Pratilas C, She QB, Rossen N, Gennemark P (2008) Models from experiments: combinatorial drug perturbations of cancer cells. Mol Syst Biol 4:216PubMedCentralPubMedCrossRefGoogle Scholar
  31. Ng SK, Zhang Z, Tan SH (2003) Integrative approach for computationally inferring protein domain interactions. Bioinformatics 19:923–929PubMedCrossRefGoogle Scholar
  32. Olivier BG, Snoep JL (2004) Web-based kinetic modeling using JWS online. Bioinformatics 20:2143–2144PubMedCrossRefGoogle Scholar
  33. Robertson SH, Smith CK, Langhans AL, McLinden SE, Oberhardt MA, Jakab KR, Dzamba B, DeSimone DW, Papin JA, Peirce SM (2007) Multiscale computational analysis of Xenopus laevis morphogenesis reveals key insights of systems level behaviour. BMC Syst Biol 1:46PubMedCentralPubMedCrossRefGoogle Scholar
  34. Rosenberger CM, Clark AE, Treuting PM, Jhonson CD, Aderem A (2008) Atf3 regulates mcmv infection in mice by modulating inf γ expression in natural killer cells. Proc Natl Acad Sci U S A 105:2544–2549PubMedCentralPubMedCrossRefGoogle Scholar
  35. Shu O, Robin BC (2004) The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants. Nucleic Acids Res 32:D360–D363CrossRefGoogle Scholar
  36. Singh TR (2011) Phylogenetic networks: concepts, algorithms and applications, book review. Curr Sci 100:1570–1571Google Scholar
  37. Singh TR, Gupta A, Riju A, Mahalaxmi M, Seal A, Arunachalam V (2011) Computational identification and analysis of single nucleotide polymorphisms and insertions/deletions in expressed sequence tag data of Eucalyptus. J Genet 90:e34–e38PubMedCrossRefGoogle Scholar
  38. Tang, YH, Baojun Y (2009) Application of support vector machine for detecting rice diseases using shape and color texture features. In: Proceedings of international conference on engineering computation. IEEE Computer Society, pp 79–83Google Scholar
  39. Tong AH, Lesage G, Bader GD, Ding H, Xu H et al (2004) Global mapping of the yeast genetic interaction network. Science 294:2364–2368CrossRefGoogle Scholar
  40. Wang Y, Jin C, Zhou M, Zhou A (2012) An SVM-based approach to discover microRNA precursors in plant genomes. Lect Notes Comput Sci 7104:304–315CrossRefGoogle Scholar
  41. Ward JJ, McGuffin LJ, Buxton BF, Jones DT (2003) Secondary structure prediction using support vector machines. Bioinformatics 19:1650–1655PubMedCrossRefGoogle Scholar
  42. Winnenburg R, Baldwin TK, Urban M, Rawlings C, Köhler J, Hammond-Kosack KE (2006) PHI-base: a new database for pathogen host interactions. Nucleic Acids Res 34:D459–D464PubMedCentralPubMedCrossRefGoogle Scholar
  43. Wu Y, Wei B, Liu H, Li T, Rayner S (2011) MiRPara: a SVM-based software tool for prediction of most probable microRNA coding regions in genome scale sequences. BMC Bioinformatics 12:107PubMedCentralPubMedCrossRefGoogle Scholar
  44. Xuan P, Guo M, Huang Y, Li W, Huang Y (2011) MaturePred: efficient identification of microRNAs within novel plant pre-miRNAs. PLoS One 6:e27422PubMedCentralPubMedCrossRefGoogle Scholar
  45. Yang ZR (2004) Biological applications of support vector machines. Brief Bioinformatics 5:328–338PubMedCrossRefGoogle Scholar
  46. Yang Y, Wang Y-P, Li K-B (2008) MiRTif: a support vector machine-based microRNA target interaction filter. BMC Bioinformatics 9:S4PubMedCentralPubMedCrossRefGoogle Scholar
  47. Yin-xiao MA, Min YAO (2007) Application of SVM in plant classification. Bull Sci Technol 3:404–407Google Scholar
  48. Zhang L, Athale CA, Deisboeck TS (2007) Development of a three dimensional multiscale agent based tumor model: simulating gene protein interaction profiles, cell phenotypes and multicellular patterns in brain cancer. J Theor Biol 244:96–107PubMedCrossRefGoogle Scholar
  49. Zhang Y, Xuan J, de los Reyes BG, Clarke R, Ressom HW (2008) Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data. BMC Bioinformatics 9:203PubMedCentralPubMedCrossRefGoogle Scholar
  50. Zhao X-M, Zhang X-W, Tang W-H, Chen L (2009) FPPI: Fusarium graminearum protein-protein interaction database. J Proteome Res 8:4714–4721PubMedCrossRefGoogle Scholar

Copyright information

© Springer India 2014

Authors and Affiliations

  1. 1.Department of Biotechnology and BioinformaticsJaypee University of Information Technology (JUIT)Waknaghat, SolanIndia

Personalised recommendations