Beyond Homology Transfer: Deep Learning for Automated Annotation of Proteins

Nauman, Mohammad; Ur Rehman, Hafeez; Politano, Gianfranco; Benso, Alfredo

doi:10.1007/s10723-018-9450-6

Beyond Homology Transfer: Deep Learning for Automated Annotation of Proteins

Published: 28 July 2018

Volume 17, pages 225–237, (2019)
Cite this article

Journal of Grid Computing Aims and scope Submit manuscript

Mohammad Nauman¹,
Hafeez Ur Rehman ORCID: orcid.org/0000-0002-3274-6347¹,
Gianfranco Politano² &
…
Alfredo Benso²

359 Accesses
28 Citations
1 Altmetric
Explore all metrics

Abstract

Accurate annotation of protein functions is important for a profound understanding of molecular biology. A large number of proteins remain uncharacterized because of the sparsity of available supporting information. For a large set of uncharacterized proteins, the only type of information available is their amino acid sequence. This motivates the need to make sequence based computational techniques that can precisely annotate uncharacterized proteins. In this paper, we propose DeepSeq – a deep learning architecture – that utilizes only the protein sequence information to predict its associated functions. The prediction process does not require handcrafted features; rather, the architecture automatically extracts representations from the input sequence data. Results of our experiments with DeepSeq indicate significant improvements in terms of prediction accuracy when compared with other sequence-based methods. Our deep learning model achieves an overall validation accuracy of 86.72%, with an F1 score of 71.13%. We achieved improved results for protein function prediction problem through DeepSeq, by utilizing sequence only information. Moreover, using the automatically learned features and without any changes to DeepSeq, we successfully solved a different problem i.e. protein function localization, with no human intervention. Finally, we discuss how the same architecture can be used to solve even more complicated problems such as prediction of 2D and 3D structure as well as protein-protein interactions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Iqbal H. Sarker

Machine learning and deep learning

Article Open access 08 April 2021

Christian Janiesch, Patrick Zschech & Kai Heinrich

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

Laith Alzubaidi, Jinglan Zhang, … Laith Farhan

References

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool (blast). Mol. Biol. 215(3), 403–410 (1990)
Article Google Scholar
Benso, A., Carlo, S.D., Ur Rehman, H., Politano, G., Savino, A., Suravajhala, P.: A combined approach for genome wide protein function annotation/prediction. Proteome Sci. 11(No. S1), 1–12 (2013)
Article Google Scholar
Bork, P., Dandekar, T., Diaz-Lazcoz, Y., Eisenhaber, F., Huynen, M., Yuan, Y.: Predicting function: from genes to genomes and back. J. Mol. Biol. 283(4), 707–725 (1998)
Article Google Scholar
Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-Based Models for Speech Recognition. In: Advances in Neural Information Processing Systems, pp. 577–585 (2015)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555 (2014)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug), 2493–2537 (2011)
MATH Google Scholar
Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving Deep Neural Networks for Lvcsr Using Rectified Linear Units and Dropout. In: 2013 IEEE International Conference On Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 8609–8613 (2013)
Deng, M., Zhang, K., Mehta, S., Chen, T., Sun, F.: Prediction of protein function using protein-protein interaction data. J. Comput. Biol. 10(6), 947–960 (2003)
Article Google Scholar
Duvenaud, D., Maclaurin, D., Adams, R.: Early Stopping as Nonparametric Variational Inference. In: Artificial Intelligence and Statistics, pp. 1070–1077 (2016)
Friedberg, I.: Automated protein function prediction–the genomic challenge. Brief. Bioinform. 7(3), 225–242 (2006)
Article Google Scholar
Gaudet, P., Livstone, M.S., Lewis, S.E., Thomas, P.D.: Phylogenetic-based propagation of functional annotations within the gene ontology consortium. Brief. Bioinform. 12(5), 449–462 (2011)
Article Google Scholar
GO: The gene ontology consortium. gene ontology consortium: going forward. Nucleic Acids Res. 43 (Database Issue), D1049–D1056 (2015)
Google Scholar
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1764–1772 (2014)
Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huttenhower, C., Hibbs, M., Myers, C., Troyanskaya, O.G.: A scalable method for integration and functional analysis of multiple microarray datasets. Bioinformatics 22(23), 2890–2897 (2006)
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 448–456 (2015)
Jiang, Y., Oron, T.R., Clark, W.T., Bankapur, A.R., D’Andrea, D., Lepore, R., Funk, C.S., Kahanda, I., Verspoor, K.M., Ben-Hur, A., et al.: An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biology 17(1), 184–203 (2016). https://doi.org/10.1186/s13059-016-1037-6
Article Google Scholar
Kanehisa, M., et al.: Kanehisa Laboratories at Institute for Chemical Research (ICR). Kyoto University, Japan (2015). http://www.kanehisa.jp/en/db_growth.html
Google Scholar
Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet Classification with Deep Convolutional Neural Networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Laurent, C., Pereyra, G., Brakel, P., Zhang, Y., Bengio, Y.: Batch Normalized Recurrent Neural Networks. In: 2016 IEEE International Conference On Acoustics, Speech and Signal Processing (ICASSP). IEEE, Pp. 2657–2661 (2016)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Leifert, G., Strauß, T., Grüning, T., Wustlich, W., Labahn, R.: Cells in multidimensional recurrent neural networks. J. Mach. Learn. Res. 17(1), 3313–3349 (2016)
MathSciNet MATH Google Scholar
Letovsky, S., Kasif, S.: Predicting protein function from protein-protein interaction data: a probabilistic approach. Bioinformatics 19(suppl. 1), i197–i204 (2003)
Article Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: 1999. The proceedings of the seventh IEEE international conference on Computer vision. IEEE, vol. 2, pp. 1150–1157 (1999)
Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., Eisenberg, D.: Detecting protein function and protein-protein interactions from genome sequences. Science 285(5428), 751–753 (1999)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 (2013)
Mitchell, A., Chang, H.Y., Daugherty, L., Fraser, M., Hunter, S., Lopez, R., McAnulla, C., McMenamin, C., Nuka, G., Pesseat, S., et al.: Interpro protein families database: the classification resource after 15 years. Nucleic Acids Res. 43(Database Issue), D213–21 (2015)
Article Google Scholar
Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21(suppl. 1), i302–i310 (2005)
Article Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814 (2010)
Oord, A.v.d., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., Kavukcuoglu, K.: Wavenet: A generative model for raw audio. arXiv:1609.03499 (2016)
Pal, D., Eisenberg, D.: Inference of protein function from protein structure. Structure 13(1), 121–130 (2005)
Article Google Scholar
Pazos, F., Sternberg, M.J.: Automated prediction of protein function and detection of functional sites from structure. Proc. Natl. Acad. Sci. USA 101(41), 14754–14759 (2004)
Article Google Scholar
Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., Yeates, T.O.: Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc. Natl. Acad. Sci. 96 (8), 4285–4288 (1999)
Article Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: Global Vectors Forword Representation. In: Empiricial Methods in Natural Language Processing, Vol. 14, pp. 1532–1543 (2014)
Piovesan, D., Giollo, M., Leonardi, E., Ferrari, C., Tosatto, C.E.S.: Inga: protein function prediction combining interaction networks, domain assignments and sequence similarity. Nucleic Acids Res. 43(W1), W134–40 (2015). https://doi.org/10.1093/nar/gkv523
Article Google Scholar
Radivojac, P., Clark, W.T., Oron, T.R.: Schnoes, others: a large-scale evaluation of computational protein function prediction. Nat. Methods 10(3), 221–227 (2013)
Article Google Scholar
Shen, L.X., Basilion, J.P., Stanton, V.P.: Single-nucleotide polymorphisms can cause different structural folds of mRNA. Proc. Natl. Acad. Sci. 96(14), 7871–7876 (1999)
Article Google Scholar
Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Stat. Plann. Inference 90(2), 227–244 (2000)
Article MathSciNet MATH Google Scholar
Shin, H.C., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., Summers, R.M.: Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35(5), 1285–1298 (2016)
Article Google Scholar
Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891–1898 (2014)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to Sequence Learning with Neural Networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
The UniProt Consortium: Uniprot: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015)
Article Google Scholar
Vazquez, A., Flammini, A., Maritan, A., Vespignani, A.: Global protein function prediction from protein-protein interaction networks. Nat. Biotechnol. 21(6), 697–700 (2003)
Article Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)
MathSciNet MATH Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)
Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. 15(3), 275–284 (2005)
Article Google Scholar
Xin, F., Radivojac, P.: Computational methods for identification of functional residues in protein structures. Curr. Protein Pept. Sci. 12(6), 456–469 (2011)
Article Google Scholar

Download references

Acknowledgements

The execution of our deep learning experiments was made possible by the gracious contribution of a Tesla K40 GPU by NVIDIA Corporation. The contents of this paper are not necessarily endorsed by the funding agencies.

Funding

Hafeez Ur Rehman’s contribution in this work was partially supported by Grant Number: 21-915/SRGP/R&D/HEC/2016 by the HEC.

Author information

Authors and Affiliations

FAST National University of Computer and Emerging Sciences, Peshawar, 25000, Pakistan
Mohammad Nauman & Hafeez Ur Rehman
Department of Computer & Control Engineering, Politecnico di Torino, I-10129, Torino, Italy
Gianfranco Politano & Alfredo Benso

Authors

Mohammad Nauman
View author publications
You can also search for this author in PubMed Google Scholar
Hafeez Ur Rehman
View author publications
You can also search for this author in PubMed Google Scholar
Gianfranco Politano
View author publications
You can also search for this author in PubMed Google Scholar
Alfredo Benso
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.N. conceived the idea of using deep learning for Bioinformatics, H.R. provided domain knowledge and structured the problem. Both M.N. and H.R. wrote the code for the experiments. G.P. contributed with running experiments and analyzed results. A.B. helped analyze results and formalize the details of discussion. All authors contributed in manuscript preparation and review.

Corresponding author

Correspondence to Hafeez Ur Rehman.

Ethics declarations

Competing interests

There are no competing financial interests associated with this research work.

Consent to publish

All authors consent to publication of the details presented in this manuscript.

Additional information

Availability of data and materials

We provide the dataset (also freely available from UniProt [44]) and our code for the whole pipeline as open source at http://github.com/recluze/deepseq. Instructions for executing the code are provided in the attached README.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nauman, M., Ur Rehman, H., Politano, G. et al. Beyond Homology Transfer: Deep Learning for Automated Annotation of Proteins. J Grid Computing 17, 225–237 (2019). https://doi.org/10.1007/s10723-018-9450-6

Download citation

Received: 31 January 2018
Accepted: 16 July 2018
Published: 28 July 2018
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s10723-018-9450-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Beyond Homology Transfer: Deep Learning for Automated Annotation of Proteins

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Machine learning and deep learning

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Consent to publish

Additional information

Availability of data and materials

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Beyond Homology Transfer: Deep Learning for Automated Annotation of Proteins

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Machine learning and deep learning

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Consent to publish

Additional information

Availability of data and materials

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation