Abstract
Machine learning (ML) deals with the automated learning of machines without being programmed explicitly. It focuses on performing data-based predictions and has several applications in the field of bioinformatics. Bioinformatics involves the processing of biological data using approaches based on computation and mathematics. The biological data has grown exponentially in recent times leading to two issues. One issue is of efficient information storage and the second issue deals with how useful knowledge can be mined from the data. The second issue can be solved using machine learning which can generate knowledge from data that is heterogeneous in nature. The feature learning is enabled automatically by deep learning which represents a machine learning technique. New set of features are constructed by combining multiple features based on the dataset. This approach enables algorithms to perform complex predictions on large datasets. ML is currently being applied in six key subfields of bioinformatics such as microarrays, evolution, systems biology, genomics, text mining, and proteomics. This chapter is composed of four sections. The first section will provide an outline of ML in bioinformatics. This is followed by the second section which highlights the different machine learning techniques in bioinformatics. The third section describes two case studies using artificial neural network in bioinformatics. The fourth section analyzes the various research areas related to bioinformatics that can be explored by the academicians and researchers. The conclusion of the chapter is presented in the end.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdurakhmonov IY (2016) Bioinformatics: basics, development, and future. IntechOpen. http://dx.doi.org/10.5772/63817
Hakeem K, Mujtaba Babar M, Sadaf Zaidi N-u-S, Pothineni V, Ali Z, Faisal S, Gul A (2017) Application of bioinformatics and system biology in medicinal plant studies. https://doi.org/10.1007/978-3-319-67156-7_15
Yin Z, Lan H, Tan G, Lu M, Vasilakos AV, Liu W (2017) Computing platforms for big biological data analytics: perspectives and challenges. Comput Struct Biotechnol J 15:403–411. ISSN 2001-0370. https://doi.org/10.1016/j.csbj.2017.07.004
Awad M, Khanna R (2015) Machine learning. Efficient learning machines. Apress, Berkeley, CA
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517. https://doi.org/10.1093/bioinformatics/btm344
Liu S, Xu C, Zhang Y, Liu J, Yu B, Liu X, Dehmer M (2018) Feature selection of gene expression data for Cancer classification using double RBF-kernels. BMC Bioinform 19(1):396. https://doi.org/10.1186/s12859-018-2400-2
Masoudi-Sobhanzadeh Y, Motieghader H, Masoudi-Nejad A (2019) FeatureSelect: a software for feature selection based on machine learning approaches. BMC Bioinform 20:170. https://doi.org/10.1186/s12859-019-2754-0
Le T, Urbanowicz R, Moore J, Mckinney B (2018) STatistical Inference Relief (STIR) feature selection. Bioinformatics (Oxford, England) 35. https://doi.org/10.1093/bioinformatics/bty788
Budach S, Marsico A (2018) pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks. Bioinformatics 34(17):3035–3037. https://doi.org/10.1093/bioinformatics/bty222
Fiannaca A, La Paglia L, La Rosa M, Lo Bosco G, Renda G, Rizzo R, Gaglio S, Urso A (2018) Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinform 19(Suppl 7):198. https://doi.org/10.1186/s12859-018-2182-6. PubMed PMID: 30066629. PMCID: PMC6069770
Tsubaki M, Tomii K, Sese J (2019) Compound–protein interaction prediction with end-to-end learning of neural networks for graphs and sequences. Bioinformatics 35(2):309–318. https://doi.org/10.1093/bioinformatics/bty535
Karimi M, Wu D, Wang Z, Shen Y (2018) DeepAffinity: interpretable deep learning of compound protein affinity through unified recurrent and convolutional neural networks. https://doi.org/10.1101/351601
Gligorijevic V, Barot M, Bonneau R (2018) deepNF: deep network fusion for protein function prediction. Bioinformatics (Oxford, England) 34. https://doi.org/10.1093/bioinformatics/bty440
Pazos Obregón F, Soto P, LavÃn JL, Cortázar AR, Barrio R, Aransay AM, Cantera R (2018) Cluster Locator, online analysis and visualization of gene clustering. Bioinformatics 34(19):3377–3379. https://doi.org/10.1093/bioinformatics/bty336
Tasoulis DK, Plagianakos VP, Vrahatis M (2004) Unsupervised clustering of bioinformatics data
Zhang J, Fan J, Christina Fan H, Rosenfeld D, Tse DN (2018) An interpretable framework for clustering single-cell RNA-Seq datasets. BMC Bioinform 19. https://doi.org/10.1186/s12859-018-2092-7
Larranaga P (2006) Machine learning in bioinformatics. Brief Bioinform 7:86–112. https://doi.org/10.1093/bib/bbk007
Chen Yi-Ping Phoebe (2005) Bioinformatics technologies. Springer, Berlin, Heidelberg
Sung W (2012) Bioinformatics applications in genomics. Computer 45(6):57–63. https://doi.org/10.1109/MC.2012.151
Rokde CN, Kshirsagar M (2013) Bioinformatics: protein structure prediction. In: 2013 fourth international conference on computing, communications and networking technologies (ICCCNT), Tiruchengode, pp 1–5. https://doi.org/10.1109/icccnt.2013.6726753
Moreau Y, De Smet F, Thijs G, Marchal K, De Moor B (2002) Functional bioinformatics of microarray data: from expression to regulation. Proc IEEE 90(11):1722–1743. https://doi.org/10.1109/JPROC.2002.804681
Yeol JW, Barjis I, Ryu YS (2005) Modeling of system biology: from DNA to protein by automata networks. In: Proceedings of 2005 international conference on intelligent sensing and information processing, Chennai, India, 2005, pp 523–528. https://doi.org/10.1109/icisip.2005.1529510
Bereg S, Bean K (2005) Constructing phylogenetic networks from trees. In: Fifth IEEE symposium on bioinformatics and bioengineering (BIBE’05), Minneapolis, MN, USA, pp 299–305. https://doi.org/10.1109/bibe.2005.19
Tan AC, Gilbert D (2001) Machine learning and its application to bioinformatics: an overview
Stormo G, Schneider T, Gold L, Ehrenfeucht A (1982) Use of the perceptron algorithm to distinguish translational initiation in E. coli. Nucleic Acids Res 10:2997–3011
Li Y, Huang C, Ding L, Li Z, Pan Y, Gao X (2019) Deep learning in bioinformatics: introduction, application, and perspective in the big data era. Methods. ISSN 1046-2023. https://doi.org/10.1016/j.ymeth.2019.04.008
Hirst JD, Sternberg MJE (1992) Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. Biochemistry 31:7211–7218
Qian N, Sejnowski TJ (1988) Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202:865–884
Howard Holley L, Karplus M (1989) Protein secondary structure prediction with a neural network. Proc Natl Acad Sci USA 86:152–156
Mathkour H, Ahmad M (2010) An integrated approach for protein structure prediction using artificial neural network. In: International conference on computer engineering and applications, vol 2, pp 484–488. https://doi.org/10.1109/ICCEA.2010.243
Chen K, Kurgan LA (2012) Neural networks in bioinformatics. In: Rozenberg G, Bäck T, Kok JN (eds) Handbook of natural computing. Springer, Berlin, Heidelberg
Rossi ALD, de Oliveira Camargo-Brunetto MA (2007) Protein classification using artificial neural networks with different protein encoding methods. In: Seventh international conference on intelligent systems design and applications (ISDA 2007), Rio de Janeiro, pp 169–176. https://doi.org/10.1109/isda.2007.81
Rossi A, Camargo-Brunetto MA (2007) Protein classification using artificial neural networks with different protein encoding methods. https://doi.org/10.1109/isda.2007.81
Lee NK, Wang D, Wah Tan K (2005) Protein classification using neural networks: a review
Nijil RN, Mahalekshmi T (2018) Multilabel classification of membrane protein in human by decision tree (DT) approach. Biomed Pharmacol J 11(1)
Siva Sankari E, Manimegalai D (2017) Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets. J Theor Biol 435. https://doi.org/10.1016/j.jtbi.2017.09.018
He J, Hu HJ, Harrison R, Tai PC, Dong Y, Pan Y (2005) Understanding protein structure prediction using SVM_DT. In: Chen G, Pan Y, Guo M, Lu J (eds) Parallel and distributed processing and applications—ISPA 2005 workshops. ISPA 2005. Lecture notes in computer science, vol 3759. Springer, Berlin, Heidelberg
He J, Hu H-J, Harrison R, Tai PC, Pan Y (2006) Rule generation for protein secondary structure prediction with support vector machines and decision tree. IEEE Trans Nano Biosci 5(1):46–53. https://doi.org/10.1109/TNB.2005.864021
Sivan S, Filo O, Siegelmann H (2007) Application of expert networks for predicting proteins secondary structure. Biomol Eng 24:237–243. https://doi.org/10.1016/j.bioeng.2006.12.001
Salzberg S, Delcher AL, Fasman K, Henderson J (1998) A decision tree system for finding genes in DNA. J Comput Biol 5:667–680. https://doi.org/10.1089/cmb.1998.5.667
Stiglic G, Kocbek S, Pernek I, Kokol P (2012) Comprehensive decision tree models in bioinformatics
Bhaskara Murthy V, Pardha Saradhi Varma G (2013) Genetic algorithm—a case study in gene identification. Int J Adv Res Comput Sci 4(5)
Parsons RJ, Forrest S, Burks C (1995) Mach Learn 21:11. https://doi.org/10.1007/BF00993377
Rathee M, Vijay Kumar TV (2014) DNA fragment assembly using multi-objective genetic algorithms. Int J Appl Evol Comput 5(3):84–108
Alba E, Luque G, Khuri S (2005) Assembling DNA fragments with parallel algorithms. In: 2005 IEEE congress on evolutionary computation, Edinburgh, Scotland, vol 1, pp 57–64. https://doi.org/10.1109/cec.2005.1554667
Nebro AJ, Luque G, Luna F, Alba E (2008) DNA fragment assembly using a grid-based genetic algorithm. Comput Oper Res 35(9):2776–2790. ISSN 0305-0548. https://doi.org/10.1016/j.cor.2006.12.011
Horng JT, Wu LC, Lin CM et al (2005) Soft Comput 9:407. https://doi.org/10.1007/s00500-004-0356-9
Bhaskar H, Hoyle DC, Singh S (2006) Machine learning in bioinformatics: a brief survey and recommendations for practitioners. Comput Biol Med 36:1104–1125. https://doi.org/10.1016/j.compbiomed.2005.09.002
Hapudeniya M (2010) Artificial neural networks in bioinformatics. Sri Lanka J Bio-Med Inform 1:104–111. https://doi.org/10.4038/sljbmi.v1i2.1719
Seiffert U, Hammer B, Kaski S, Villmann T (2006) Neural networks and machine learning in bioinformatics-theory and applications. In: European symposium on artificial neural networks, pp 521–532
Bordoloi H, Sarma K (2019) Protein structure prediction using artificial neural network
Brunak S, Engelbrecht J, Knudsen S (1990) Cleaning up gene databases. Nature 343:123
Korning PG, Hebsgaard SM, Rouze P, Brunak S (1996) Cleaning the GenBank Arabidopsis thaliana data set. Nucleic Acids Res 24:316–320
Sekhar SM, Siddesh GM, Manvi SS, Srinivasa KG (2019) Optimized focused web crawler with natural language processing based relevance measure in bioinformatics web sources. Cybern Inf Technol 19(2):146–158
Sekhar M, Sivagnanam R, Matt SG, Manvi SS, Gopalalyengar SK (2019) Identification of essential proteins in yeast using mean weighted average and recursive feature elimination. Recent Patents Comput Sci 12(1):5–10
Patil SB, Sekhar SM, Siddesh GM, Manvi SS (2017) A method for predicting essential proteins using gene expression data. In: 2017 international conference on smart technologies for smart nation (SmartTechCon). IEEE, pp 1278–1281
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Shastry, K.A., Sanjay, H.A. (2020). Machine Learning for Bioinformatics. In: Srinivasa, K., Siddesh, G., Manisekhar, S. (eds) Statistical Modelling and Machine Learning Principles for Bioinformatics Techniques, Tools, and Applications. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-2445-5_3
Download citation
DOI: https://doi.org/10.1007/978-981-15-2445-5_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2444-8
Online ISBN: 978-981-15-2445-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)