Skip to main content

Part of the book series: Algorithms for Intelligent Systems ((AIS))

Abstract

Machine learning (ML) deals with the automated learning of machines without being programmed explicitly. It focuses on performing data-based predictions and has several applications in the field of bioinformatics. Bioinformatics involves the processing of biological data using approaches based on computation and mathematics. The biological data has grown exponentially in recent times leading to two issues. One issue is of efficient information storage and the second issue deals with how useful knowledge can be mined from the data. The second issue can be solved using machine learning which can generate knowledge from data that is heterogeneous in nature. The feature learning is enabled automatically by deep learning which represents a machine learning technique. New set of features are constructed by combining multiple features based on the dataset. This approach enables algorithms to perform complex predictions on large datasets. ML is currently being applied in six key subfields of bioinformatics such as microarrays, evolution, systems biology, genomics, text mining, and proteomics. This chapter is composed of four sections. The first section will provide an outline of ML in bioinformatics. This is followed by the second section which highlights the different machine learning techniques in bioinformatics. The third section describes two case studies using artificial neural network in bioinformatics. The fourth section analyzes the various research areas related to bioinformatics that can be explored by the academicians and researchers. The conclusion of the chapter is presented in the end.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abdurakhmonov IY (2016) Bioinformatics: basics, development, and future. IntechOpen. http://dx.doi.org/10.5772/63817

  2. Hakeem K, Mujtaba Babar M, Sadaf Zaidi N-u-S, Pothineni V, Ali Z, Faisal S, Gul A (2017) Application of bioinformatics and system biology in medicinal plant studies. https://doi.org/10.1007/978-3-319-67156-7_15

    Chapter  Google Scholar 

  3. Yin Z, Lan H, Tan G, Lu M, Vasilakos AV, Liu W (2017) Computing platforms for big biological data analytics: perspectives and challenges. Comput Struct Biotechnol J 15:403–411. ISSN 2001-0370. https://doi.org/10.1016/j.csbj.2017.07.004

    Article  Google Scholar 

  4. Awad M, Khanna R (2015) Machine learning. Efficient learning machines. Apress, Berkeley, CA

    Chapter  Google Scholar 

  5. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517. https://doi.org/10.1093/bioinformatics/btm344

    Article  Google Scholar 

  6. Liu S, Xu C, Zhang Y, Liu J, Yu B, Liu X, Dehmer M (2018) Feature selection of gene expression data for Cancer classification using double RBF-kernels. BMC Bioinform 19(1):396. https://doi.org/10.1186/s12859-018-2400-2

    Article  Google Scholar 

  7. Masoudi-Sobhanzadeh Y, Motieghader H, Masoudi-Nejad A (2019) FeatureSelect: a software for feature selection based on machine learning approaches. BMC Bioinform 20:170. https://doi.org/10.1186/s12859-019-2754-0

  8. Le T, Urbanowicz R, Moore J, Mckinney B (2018) STatistical Inference Relief (STIR) feature selection. Bioinformatics (Oxford, England) 35. https://doi.org/10.1093/bioinformatics/bty788

    Article  Google Scholar 

  9. Budach S, Marsico A (2018) pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks. Bioinformatics 34(17):3035–3037. https://doi.org/10.1093/bioinformatics/bty222

    Article  Google Scholar 

  10. Fiannaca A, La Paglia L, La Rosa M, Lo Bosco G, Renda G, Rizzo R, Gaglio S, Urso A (2018) Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinform 19(Suppl 7):198. https://doi.org/10.1186/s12859-018-2182-6. PubMed PMID: 30066629. PMCID: PMC6069770

  11. Tsubaki M, Tomii K, Sese J (2019) Compound–protein interaction prediction with end-to-end learning of neural networks for graphs and sequences. Bioinformatics 35(2):309–318. https://doi.org/10.1093/bioinformatics/bty535

    Article  Google Scholar 

  12. Karimi M, Wu D, Wang Z, Shen Y (2018) DeepAffinity: interpretable deep learning of compound protein affinity through unified recurrent and convolutional neural networks. https://doi.org/10.1101/351601

  13. Gligorijevic V, Barot M, Bonneau R (2018) deepNF: deep network fusion for protein function prediction. Bioinformatics (Oxford, England) 34. https://doi.org/10.1093/bioinformatics/bty440

    Article  Google Scholar 

  14. Pazos Obregón F, Soto P, Lavín JL, Cortázar AR, Barrio R, Aransay AM, Cantera R (2018) Cluster Locator, online analysis and visualization of gene clustering. Bioinformatics 34(19):3377–3379. https://doi.org/10.1093/bioinformatics/bty336

    Article  Google Scholar 

  15. Tasoulis DK, Plagianakos VP, Vrahatis M (2004) Unsupervised clustering of bioinformatics data

    Google Scholar 

  16. Zhang J, Fan J, Christina Fan H, Rosenfeld D, Tse DN (2018) An interpretable framework for clustering single-cell RNA-Seq datasets. BMC Bioinform 19. https://doi.org/10.1186/s12859-018-2092-7

  17. Larranaga P (2006) Machine learning in bioinformatics. Brief Bioinform 7:86–112. https://doi.org/10.1093/bib/bbk007

    Article  Google Scholar 

  18. Chen Yi-Ping Phoebe (2005) Bioinformatics technologies. Springer, Berlin, Heidelberg

    Book  Google Scholar 

  19. Sung W (2012) Bioinformatics applications in genomics. Computer 45(6):57–63. https://doi.org/10.1109/MC.2012.151

    Article  Google Scholar 

  20. Rokde CN, Kshirsagar M (2013) Bioinformatics: protein structure prediction. In: 2013 fourth international conference on computing, communications and networking technologies (ICCCNT), Tiruchengode, pp 1–5. https://doi.org/10.1109/icccnt.2013.6726753

  21. Moreau Y, De Smet F, Thijs G, Marchal K, De Moor B (2002) Functional bioinformatics of microarray data: from expression to regulation. Proc IEEE 90(11):1722–1743. https://doi.org/10.1109/JPROC.2002.804681

    Article  Google Scholar 

  22. Yeol JW, Barjis I, Ryu YS (2005) Modeling of system biology: from DNA to protein by automata networks. In: Proceedings of 2005 international conference on intelligent sensing and information processing, Chennai, India, 2005, pp 523–528. https://doi.org/10.1109/icisip.2005.1529510

  23. Bereg S, Bean K (2005) Constructing phylogenetic networks from trees. In: Fifth IEEE symposium on bioinformatics and bioengineering (BIBE’05), Minneapolis, MN, USA, pp 299–305. https://doi.org/10.1109/bibe.2005.19

  24. Tan AC, Gilbert D (2001) Machine learning and its application to bioinformatics: an overview

    Google Scholar 

  25. Stormo G, Schneider T, Gold L, Ehrenfeucht A (1982) Use of the perceptron algorithm to distinguish translational initiation in E. coli. Nucleic Acids Res 10:2997–3011

    Article  Google Scholar 

  26. Li Y, Huang C, Ding L, Li Z, Pan Y, Gao X (2019) Deep learning in bioinformatics: introduction, application, and perspective in the big data era. Methods. ISSN 1046-2023. https://doi.org/10.1016/j.ymeth.2019.04.008

    Article  Google Scholar 

  27. Hirst JD, Sternberg MJE (1992) Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. Biochemistry 31:7211–7218

    Article  Google Scholar 

  28. Qian N, Sejnowski TJ (1988) Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202:865–884

    Article  Google Scholar 

  29. Howard Holley L, Karplus M (1989) Protein secondary structure prediction with a neural network. Proc Natl Acad Sci USA 86:152–156

    Article  Google Scholar 

  30. Mathkour H, Ahmad M (2010) An integrated approach for protein structure prediction using artificial neural network. In: International conference on computer engineering and applications, vol 2, pp 484–488. https://doi.org/10.1109/ICCEA.2010.243

  31. Chen K, Kurgan LA (2012) Neural networks in bioinformatics. In: Rozenberg G, Bäck T, Kok JN (eds) Handbook of natural computing. Springer, Berlin, Heidelberg

    Google Scholar 

  32. Rossi ALD, de Oliveira Camargo-Brunetto MA (2007) Protein classification using artificial neural networks with different protein encoding methods. In: Seventh international conference on intelligent systems design and applications (ISDA 2007), Rio de Janeiro, pp 169–176. https://doi.org/10.1109/isda.2007.81

  33. Rossi A, Camargo-Brunetto MA (2007) Protein classification using artificial neural networks with different protein encoding methods. https://doi.org/10.1109/isda.2007.81

  34. Lee NK, Wang D, Wah Tan K (2005) Protein classification using neural networks: a review

    Google Scholar 

  35. Nijil RN, Mahalekshmi T (2018) Multilabel classification of membrane protein in human by decision tree (DT) approach. Biomed Pharmacol J 11(1)

    Google Scholar 

  36. Siva Sankari E, Manimegalai D (2017) Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets. J Theor Biol 435. https://doi.org/10.1016/j.jtbi.2017.09.018

    Article  Google Scholar 

  37. He J, Hu HJ, Harrison R, Tai PC, Dong Y, Pan Y (2005) Understanding protein structure prediction using SVM_DT. In: Chen G, Pan Y, Guo M, Lu J (eds) Parallel and distributed processing and applications—ISPA 2005 workshops. ISPA 2005. Lecture notes in computer science, vol 3759. Springer, Berlin, Heidelberg

    Chapter  Google Scholar 

  38. He J, Hu H-J, Harrison R, Tai PC, Pan Y (2006) Rule generation for protein secondary structure prediction with support vector machines and decision tree. IEEE Trans Nano Biosci 5(1):46–53. https://doi.org/10.1109/TNB.2005.864021

    Article  Google Scholar 

  39. Sivan S, Filo O, Siegelmann H (2007) Application of expert networks for predicting proteins secondary structure. Biomol Eng 24:237–243. https://doi.org/10.1016/j.bioeng.2006.12.001

    Article  Google Scholar 

  40. Salzberg S, Delcher AL, Fasman K, Henderson J (1998) A decision tree system for finding genes in DNA. J Comput Biol 5:667–680. https://doi.org/10.1089/cmb.1998.5.667

    Article  Google Scholar 

  41. Stiglic G, Kocbek S, Pernek I, Kokol P (2012) Comprehensive decision tree models in bioinformatics

    Google Scholar 

  42. Bhaskara Murthy V, Pardha Saradhi Varma G (2013) Genetic algorithm—a case study in gene identification. Int J Adv Res Comput Sci 4(5)

    Google Scholar 

  43. Parsons RJ, Forrest S, Burks C (1995) Mach Learn 21:11. https://doi.org/10.1007/BF00993377

    Article  Google Scholar 

  44. Rathee M, Vijay Kumar TV (2014) DNA fragment assembly using multi-objective genetic algorithms. Int J Appl Evol Comput 5(3):84–108

    Article  Google Scholar 

  45. Alba E, Luque G, Khuri S (2005) Assembling DNA fragments with parallel algorithms. In: 2005 IEEE congress on evolutionary computation, Edinburgh, Scotland, vol 1, pp 57–64. https://doi.org/10.1109/cec.2005.1554667

  46. Nebro AJ, Luque G, Luna F, Alba E (2008) DNA fragment assembly using a grid-based genetic algorithm. Comput Oper Res 35(9):2776–2790. ISSN 0305-0548. https://doi.org/10.1016/j.cor.2006.12.011

    Article  Google Scholar 

  47. Horng JT, Wu LC, Lin CM et al (2005) Soft Comput 9:407. https://doi.org/10.1007/s00500-004-0356-9

    Article  Google Scholar 

  48. Bhaskar H, Hoyle DC, Singh S (2006) Machine learning in bioinformatics: a brief survey and recommendations for practitioners. Comput Biol Med 36:1104–1125. https://doi.org/10.1016/j.compbiomed.2005.09.002

    Article  Google Scholar 

  49. Hapudeniya M (2010) Artificial neural networks in bioinformatics. Sri Lanka J Bio-Med Inform 1:104–111. https://doi.org/10.4038/sljbmi.v1i2.1719

    Article  Google Scholar 

  50. Seiffert U, Hammer B, Kaski S, Villmann T (2006) Neural networks and machine learning in bioinformatics-theory and applications. In: European symposium on artificial neural networks, pp 521–532

    Google Scholar 

  51. Bordoloi H, Sarma K (2019) Protein structure prediction using artificial neural network

    Google Scholar 

  52. Brunak S, Engelbrecht J, Knudsen S (1990) Cleaning up gene databases. Nature 343:123

    Article  Google Scholar 

  53. Korning PG, Hebsgaard SM, Rouze P, Brunak S (1996) Cleaning the GenBank Arabidopsis thaliana data set. Nucleic Acids Res 24:316–320

    Article  Google Scholar 

  54. Sekhar SM, Siddesh GM, Manvi SS, Srinivasa KG (2019) Optimized focused web crawler with natural language processing based relevance measure in bioinformatics web sources. Cybern Inf Technol 19(2):146–158

    Google Scholar 

  55. Sekhar M, Sivagnanam R, Matt SG, Manvi SS, Gopalalyengar SK (2019) Identification of essential proteins in yeast using mean weighted average and recursive feature elimination. Recent Patents Comput Sci 12(1):5–10

    Article  Google Scholar 

  56. Patil SB, Sekhar SM, Siddesh GM, Manvi SS (2017) A method for predicting essential proteins using gene expression data. In: 2017 international conference on smart technologies for smart nation (SmartTechCon). IEEE, pp 1278–1281

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Aditya Shastry .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Shastry, K.A., Sanjay, H.A. (2020). Machine Learning for Bioinformatics. In: Srinivasa, K., Siddesh, G., Manisekhar, S. (eds) Statistical Modelling and Machine Learning Principles for Bioinformatics Techniques, Tools, and Applications. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-2445-5_3

Download citation

Publish with us

Policies and ethics