ICIC 2017: Intelligent Computing Theories and Application pp 748-756 | Cite as
Predicting Multisite Protein Sub-cellular Locations Based on Correlation Coefficient
Abstract
With the development of proteomics and cell biology, protein sub-cellular location has become a hot topic in bioinformatics. As the time goes on, more and more researchers make great efforts on studying protein sub-cellular location. But they only do research on single-site protein sub-cellular location. However, some proteins can belong to two or more sub-cellulars. So, we should transfer the line of sight to multisite protein sub-cellular location. In this article, we use Virus-mPLoc data set and choose pseudo amino acid composition and correlation coefficient two effective feature extraction methods. Then, putting these features into multi-label k-nearest neighbor classifier to predict protein sub-cellular location. The experiment proves that this method is reasonable and the precision reached 68.65% through the Jack-knife test.
Keywords
Multisite Pseudo amino acid composition Correlation coefficient Multi-label k-nearest neighborNotes
Acknowledgment
This research was supported by the National Key Research And Development Program of China (No. 2016YFC0106000), National Natural Science Foundation of China (Grant No. 61302128, 61573166, 61572230, 61671220, 61640218), the Youth Science and Technology Star Program of Jinan City (201406003), the Natural Science Foundation of Shandong Province (ZR2013FL002), the Shandong Distinguished Middle-aged and Young Scientist Encourage and Reward Foundation, China (Grant No. ZR2016FB14), the Project of Shandong Province Higher Educational Science and Technology Program, China (Grant No. J16LN07), the Shandong Province Key Research and Development Program, China (Grant No. 2016GGX101022).
References
- 1.Chen, K., Kurgan, L.A., Ruan, J.: Prediction of protein structural class using novel evolutionary collocation-based sequence representation. J. Comput. Chem. 29(10), 1596–1604 (2008)CrossRefGoogle Scholar
- 2.Du, P., Xu, C.: Predicting multisite protein subcellular locations: progress and challenges. Expert Rev. Proteomics 10(3), 227–237 (2013)CrossRefGoogle Scholar
- 3.Fan, G.L., Li, Q.Z.: Predict mycobacterial proteins subcellular locations by incorporating pseudo-average chemical shift into the general form of Chous pseudo amino acid composition. J. Theor. Biol. 304, 88–95 (2012)CrossRefGoogle Scholar
- 4.Huang, C., Yuan, J.Q.: Predicting protein subchloroplast locations with both single and multiple sites via three different modes of Chou’s pseudo amino acid compositions. J. Theor. Biol. 335, 205–212 (2013)CrossRefGoogle Scholar
- 5.Li, L., Yu, S., Xiao, W., Li, Y., Li, M., Huang, L., Zheng, X., Zhou, S., Yang, H.: Prediction of bacterial protein subcellular localization by incorporating various features into Chou’s PseAAC and a backward feature selection approach. Biochimie 104, 100–107 (2014)CrossRefGoogle Scholar
- 6.Lin, W.Z., Fang, J.A., Xiao, X., Chou, K.C.: iLoc-animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. Mol. BioSyst. 9(4), 634–644 (2013)CrossRefGoogle Scholar
- 7.Mei, S.: Multi-kernel transfer learning based on Chou’s PseAAC formulation for protein submitochondria localization. J. Theor. Biol. 293, 121–130 (2012)CrossRefMATHGoogle Scholar
- 8.Shen, H.B., Chou, K.C.: PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal. Biochem. 373(2), 386–388 (2008)CrossRefGoogle Scholar
- 9.Shen, H.B., Chou, K.C.: Virus-mPLoc: a fusion classifier for viral protein subcellular location prediction by incorporating multiple sites. J. Biomol. Struct. Dyn. 28(2), 175–186 (2010)CrossRefGoogle Scholar
- 10.Wang, Z., Zou, Q., Jiang, Y., Ju, Y., Zeng, X.: Review of protein subcellular localization prediction. Curr. Bioinform. 9(3), 331–342 (2014)CrossRefGoogle Scholar
- 11.Xiao, X., Wu, Z.C., Chou, K.C.: iLoc-virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. J. Theor. Biol. 284(1), 42–51 (2011)CrossRefGoogle Scholar
- 12.Huang, D.S.: Systematic Theory of Neural Networks for Pattern Recognition (in Chinese). Publishing House of Electronic Industry of China, Beijing (1996)Google Scholar
- 13.Ji, Z., Wu, D., Zhao, W., et al.: Systemic modeling myeloma-osteoclast interactions under normoxic/hypoxic condition using a novel computational approach. Sci. Rep. 5, 13291 (2015)CrossRefGoogle Scholar
- 14.Wang, B., Zhang, J., Chen, P., et al.: Prediction of peptide drift time in ion mobility mass spectrometry from sequence-based features. BMC Bioinform. 14(8), S9 (2013)CrossRefGoogle Scholar