Skip to main content

Extreme Gradient Boost with CNN: A Deep Learning-Based Approach for Predicting Protein Subcellular Localization

  • Conference paper
  • First Online:
Proceedings of the International Conference on Big Data, IoT, and Machine Learning

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 95))

  • 992 Accesses

Abstract

Optimal protein subcellular localization provides physiological context for a protein’s activity. Traditionally, laboratory approaches are used for this purpose. However, these methods can be time consuming and tedious. Detecting the optimal position of proteins using machine learning techniques is a challenging task because of the varying length of sequential data. This study proposes a machine learning model that leverages Convolutional neural networks (CNNs) with extreme gradient boosting (XGBoost) technique. The research contributes to take a deep learning approach for the classification of ten types of protein locations using the benchmark DeepLoc data set. Our study comes out with better accuracy and F1 score of \(79.30\%\) and \(73.2\%\), respectively, compared to some other state-of-the-art works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Khan Academy (2015). Introduction to proteins and amino acids

    Google Scholar 

  2. Armenteros JJA, Sønderby CK, Kaae Sønderby S, Nielsen H, Winther O (2017) Deeploc: prediction of protein subcellular localization using deep learning. Bioinformatics 33(21):3387–3395

    Google Scholar 

  3. Wei L, Ding Y, Ran S, Tang J, Zou Q (2018) Prediction of human protein subcellular localization using deep learning. J Parall Distrib Comput 117:212–217

    Article  Google Scholar 

  4. Pang L, Wang J, Zhao L, Wang C, Zhan H (2019) A novel protein subcellular localization method with CNN-XGBoost model for Alzheimer’s disease. Frontiers Genet 9:751

    Article  Google Scholar 

  5. Höglund A, Dönnes P, Blum T, Adolph H-W, Kohlbacher O (2006) Multiloc: prediction of protein subcellular localization using n-terminal targeting sequences, sequence motifs and amino acid composition. Bioinformatics 22(10):1158–1165

    Article  Google Scholar 

  6. Blum T, Briesemeister S, Kohlbacher Oliver (2009) Multiloc2: integrating phylogeny and gene ontology terms improves subcellular protein localization prediction. BMC Bioinform 10(1):274

    Article  Google Scholar 

  7. Shatkay H, Höglund A, Brady S, Blum T, Dönnes P, Kohlbacher O (2007) Sherloc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data. Bioinformatics 23(11):1410–1417

    Article  Google Scholar 

  8. Zhou H, Yang Y, Shen H-B (2017) Hum-mPloc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features. Bioinformatics 33(6):843–853

    Google Scholar 

  9. Kaae Sønderby S, Kaae Sønderby C, Nielsen H, Winther O (2015) Convolutional lstm networks for subcellular localization of proteins. In International conference on algorithms for computational biology. Springer, pp 68–80

    Google Scholar 

  10. Liu S, Mocanu DC, Pechenizkiy M (2019) Intrinsically sparse long short-term memory networks. arXiv:1901.09208

  11. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794

    Google Scholar 

  12. Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M et al (2004) UniProt: the universal protein knowledgebase. Nucleic Acids Res 32(suppl_1):D115–D119

    Google Scholar 

  13. Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In 2017 international conference on engineering and technology (ICET). IEEE, pp 1–6

    Google Scholar 

  14. O’Shea K, Nash R (2015) An introduction to convolutional neural networks. arXiv:1511.08458

  15. Brownlee J (2020) Data preparation for gradient boosting with XGBoost in python

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ismail, M., Islam Mondal, M.N. (2022). Extreme Gradient Boost with CNN: A Deep Learning-Based Approach for Predicting Protein Subcellular Localization. In: Arefin, M.S., Kaiser, M.S., Bandyopadhyay, A., Ahad, M.A.R., Ray, K. (eds) Proceedings of the International Conference on Big Data, IoT, and Machine Learning. Lecture Notes on Data Engineering and Communications Technologies, vol 95. Springer, Singapore. https://doi.org/10.1007/978-981-16-6636-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-6636-0_16

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-6635-3

  • Online ISBN: 978-981-16-6636-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics