Abstract
Optimal protein subcellular localization provides physiological context for a protein’s activity. Traditionally, laboratory approaches are used for this purpose. However, these methods can be time consuming and tedious. Detecting the optimal position of proteins using machine learning techniques is a challenging task because of the varying length of sequential data. This study proposes a machine learning model that leverages Convolutional neural networks (CNNs) with extreme gradient boosting (XGBoost) technique. The research contributes to take a deep learning approach for the classification of ten types of protein locations using the benchmark DeepLoc data set. Our study comes out with better accuracy and F1 score of \(79.30\%\) and \(73.2\%\), respectively, compared to some other state-of-the-art works.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Khan Academy (2015). Introduction to proteins and amino acids
Armenteros JJA, Sønderby CK, Kaae Sønderby S, Nielsen H, Winther O (2017) Deeploc: prediction of protein subcellular localization using deep learning. Bioinformatics 33(21):3387–3395
Wei L, Ding Y, Ran S, Tang J, Zou Q (2018) Prediction of human protein subcellular localization using deep learning. J Parall Distrib Comput 117:212–217
Pang L, Wang J, Zhao L, Wang C, Zhan H (2019) A novel protein subcellular localization method with CNN-XGBoost model for Alzheimer’s disease. Frontiers Genet 9:751
Höglund A, Dönnes P, Blum T, Adolph H-W, Kohlbacher O (2006) Multiloc: prediction of protein subcellular localization using n-terminal targeting sequences, sequence motifs and amino acid composition. Bioinformatics 22(10):1158–1165
Blum T, Briesemeister S, Kohlbacher Oliver (2009) Multiloc2: integrating phylogeny and gene ontology terms improves subcellular protein localization prediction. BMC Bioinform 10(1):274
Shatkay H, Höglund A, Brady S, Blum T, Dönnes P, Kohlbacher O (2007) Sherloc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data. Bioinformatics 23(11):1410–1417
Zhou H, Yang Y, Shen H-B (2017) Hum-mPloc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features. Bioinformatics 33(6):843–853
Kaae Sønderby S, Kaae Sønderby C, Nielsen H, Winther O (2015) Convolutional lstm networks for subcellular localization of proteins. In International conference on algorithms for computational biology. Springer, pp 68–80
Liu S, Mocanu DC, Pechenizkiy M (2019) Intrinsically sparse long short-term memory networks. arXiv:1901.09208
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M et al (2004) UniProt: the universal protein knowledgebase. Nucleic Acids Res 32(suppl_1):D115–D119
Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In 2017 international conference on engineering and technology (ICET). IEEE, pp 1–6
O’Shea K, Nash R (2015) An introduction to convolutional neural networks. arXiv:1511.08458
Brownlee J (2020) Data preparation for gradient boosting with XGBoost in python
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ismail, M., Islam Mondal, M.N. (2022). Extreme Gradient Boost with CNN: A Deep Learning-Based Approach for Predicting Protein Subcellular Localization. In: Arefin, M.S., Kaiser, M.S., Bandyopadhyay, A., Ahad, M.A.R., Ray, K. (eds) Proceedings of the International Conference on Big Data, IoT, and Machine Learning. Lecture Notes on Data Engineering and Communications Technologies, vol 95. Springer, Singapore. https://doi.org/10.1007/978-981-16-6636-0_16
Download citation
DOI: https://doi.org/10.1007/978-981-16-6636-0_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6635-3
Online ISBN: 978-981-16-6636-0
eBook Packages: EngineeringEngineering (R0)