Multi-label classifier for protein sequence using heuristic-based deep convolution neural network

Chauhan, Vikas; Tiwari, Aruna; Joshi, Niranjan; Khandelwal, Sahaj

doi:10.1007/s10489-021-02529-6

Multi-label classifier for protein sequence using heuristic-based deep convolution neural network

Published: 23 June 2021

Volume 52, pages 2820–2837, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Vikas Chauhan¹,
Aruna Tiwari¹,
Niranjan Joshi¹ &
…
Sahaj Khandelwal¹

692 Accesses
2 Citations
Explore all metrics

Abstract

Deep learning techniques are found very useful to classify sequential data in recent times. The protein sequences belong to the functional classes based on the structure of their sequences. The annotation task of protein sequences into corresponding functional classes is multi-label in nature. The primary structure of protein contains a notable amount of vast data compared to the other secondary, tertiary, and quaternary structures. The clustering-based techniques require expert domain knowledge from the extensive data samples. Traditional methods use the n-gram features of amino acids while ignoring the relationship of motifs and amino acid sequence. This paper proposes an efficient method to classify the proteins into their functional classes using a convolution neural network based on heuristic rules. The proposed approach works on the primary structure of protein sequences which considers the relationship among motifs and amino acids. The proposed approach also takes into account the amino acid locations in the protein sequence. The proposed approach considers the affinity information between amino acids and motifs. Along with achieving high performance in the classification of protein sequences, we propose a heuristic approach to improve the precision and recall of the individual functional classes. The proposed heuristic approach improves the performance and handles the data imbalance problem. The proposed approach is compared with other competitive approaches, and our approach provides better performance metrics in terms of precision, recall, AUC, and subset accuracy. The greatest challenge with multi-label classification is to handle the data imbalance, which appears due to variance in frequencies of the labels in the data. This data imbalance is dealt with weight modulation in the loss function to influence the learning process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Deep learning applications and challenges in big data analytics

Article Open access 24 February 2015

References

Nguyen Np, Nute M, Mirarab S, Warnow T (2016) Hippi: highly accurate protein family classification with ensembles of hmms. BMC Genomics 17(10):89–100
Google Scholar
Dawson N, Sillitoe I, Marsden RL, Orengo CA (2017) The classification of protein domains. In: Bioinformatics. Springer, pp 137–164
Creighton TE (1993) Proteins: Structures and molecular properties. W. H. Freeman. https://books.google.co.in/books?id=hu8T_kI1LrkC
Szalkai B, Grolmusz V (2017) Near perfect protein multi-label classification with deep neural networks. Methods 132
Nadzirin N, Firdaus Raih M (2012) Proteins of unknown function in the protein data bank (pdb): An inventory of true uncharacterized proteins and computational tools for their analysis. Int J Mol Sci 13:12761–12772
Article Google Scholar
Zhang M, Zhou Z (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837
Article Google Scholar
Smith TF, Waterman MS (1981) Identification of common molecular subsequences, vol 147
Ivan G, Banky D, Grolmusz V (2013) Fast and exact sequence alignment with the smith-waterman algorithm: The swissalign webserver. Gene Rep 4
Altschul S, Gish W, Miller W, Myers E, Lipman D (1990) Basic local aligment search tool. J Mol Biol 215:403–10
Article Google Scholar
Eddy SR (2011) Accelerated profile hmm searches. PLOS Comput Biol 7(10):1–16
Article MathSciNet Google Scholar
Illergård K, Ardell D, Elofsson A (2009) Structure is three to ten times more conserved than sequence-a study of structural response in protein cores. Proteins 77:499–508
Article Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Seo S, Oh M, Park Y, Kim S (2018) Deepfam: deep learning based alignment-free method for protein family modeling and prediction. Bioinformatics 34(13):i254–i262
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision – ECCV 2014. Springer International Publishing, Cham, pp 346–361
Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of dna-and rna-binding proteins by deep learning. Nat Biotechnol 33(8):831–838
Article Google Scholar
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
MathSciNet MATH Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. IEEE International Conference on Computer Vision (ICCV 2015) 1502
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift
ichi Amari S (1993) Backpropagation and stochastic gradient descent method. Neurocomputing 5(4):185–196
Article Google Scholar
Dauphin Y, de Vries H, Bengio Y (2015) Equilibrated adaptive learning rates for non-convex optimization. In: Cortes C, Lawrence N D, Lee D D, Sugiyama M, Garnett R (eds) Advances in Neural Information Processing Systems 28. Curran Associates, Inc., pp 1504–1512
Dauphin Y, de Vries H, Chung J, Bengio Y (2015) Equilibrated adaptive learning rates for non-convex optimization. In: NIPS
Elisseeff A, Weston J, et al. (2001) A kernel method for multi-labelled classification. In: NIPS, vol 14, pp 681–687
Zhang M-L, Zhou Z-H (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18(10):1338–1351
Article Google Scholar
Hashemifar S, Neyshabur B, Khan AA, Xu J (2018) Predicting protein–protein interactions through sequence-based deep learning. Bioinformatics 34(17):i802–i810
Article Google Scholar
Zhang D, Kabuka M (2020) Protein family classification from scratch: A cnn based deep learning approach. IEEE/ACM Trans Comput Biol Bioinform:1–1
Cheng Y, Song F, Qian K (2021) Missing multi-label learning with non-equilibrium based on two-level autoencoder. Appl Intell:1–19
Chauhan V, Tiwari A, Arya S (2020) Multi-label classifier based on kernel random vector functional link network. In: 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, pp 1–7
Nepomuceno-Chamorro IA, Nepomuceno JA, Galván-Rojas JL, Vega-Márquez B, Rubio-Escudero C (2020) Using prior knowledge in the inference of gene association networks. Appl Intell 50(11):3882–3893
Article Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456
Geem ZW (2009) Music-inspired harmony search algorithm: theory and applications, vol 191. Springer, Berlin
Book Google Scholar
Das S, Mukhopadhyay A, Roy A, Abraham A, Panigrahi B K (2010) Exploratory power of the harmony search algorithm: analysis and improvements for global numerical optimization. IEEE Trans Syst Man Cybern Part B (Cybern) 41(1):89–106
Article Google Scholar
Diao R, Shen Q (2012) Feature selection with harmony search. IEEE Trans Syst Man Cybern Part B (Cybern) 42(6):1509–1523
Article Google Scholar
Hoang DC, Yadav P, Kumar R, Panda SK (2013) Real-time implementation of a harmony search algorithm-based clustering protocol for energy-efficient wireless sensor networks. IEEE Transa Ind Inf 10(1):774–783
Article Google Scholar
Bairoch A, Bougueleret L, Altairac S, Amendolia V, Auchincloss A, Argoud-Puy G, Axelsen K, Baratin D, Blatter M-C, Boeckmann B, Bolleman J, Bollondi L, Boutet E, SB Q, Breuza L, Bridge A, deCastro E, Ciapina L, Coral D, Zhang J (2009) The universal protein resource (uniprot) 2009. Nucleic Acids Res 37
Ferrán EA, Ferrara P, Pflugfelder B (1993) Protein classification using neural networks. Proc Int Conf Intell Syst Mol Biol 1:127–35
Google Scholar
Wu C, Berry M, Fung Y, Mclarty J (1993) Neural networks for molecular sequence classification. Proc Int Conf Intell Syst Mol Bio ISMB 1:429–37
Google Scholar
Lei X, Yang X, Fujita H (2019) Random walk based method to identify essential proteins by integrating network topology and biological characteristics. Knowl-Based Syst 167:53–67
Article Google Scholar
Lei X, Ding Y, Fujita H, Zhang A (2016) Identification of dynamic protein complexes based on fruit fly optimization algorithm. Knowl-Based Syst 105:270–277
Article Google Scholar
Asgari E, Mofrad MRK (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PloS one 10(11):e0141287
Article Google Scholar
Liu X (2017) Deep recurrent neural network for protein function prediction from sequence. arXiv:1701.08318
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proc. icml, vol 30, p 3
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Lee TK, Nguyen T (2016) Protein family classification with neural networks

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology, Indore, India
Vikas Chauhan, Aruna Tiwari, Niranjan Joshi & Sahaj Khandelwal

Authors

Vikas Chauhan
View author publications
You can also search for this author in PubMed Google Scholar
Aruna Tiwari
View author publications
You can also search for this author in PubMed Google Scholar
Niranjan Joshi
View author publications
You can also search for this author in PubMed Google Scholar
Sahaj Khandelwal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vikas Chauhan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chauhan, V., Tiwari, A., Joshi, N. et al. Multi-label classifier for protein sequence using heuristic-based deep convolution neural network. Appl Intell 52, 2820–2837 (2022). https://doi.org/10.1007/s10489-021-02529-6

Download citation

Accepted: 12 May 2021
Published: 23 June 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10489-021-02529-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-label classifier for protein sequence using heuristic-based deep convolution neural network

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Deep learning applications and challenges in big data analytics

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-label classifier for protein sequence using heuristic-based deep convolution neural network

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Deep learning applications and challenges in big data analytics

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation