Feature Combination Methods for Prediction of Subcellular Locations of Proteins with Both Single and Multiple Sites

  • Luyao Wang
  • Dong WangEmail author
  • Yuehui ChenEmail author
  • Shanping Qiao
  • Yaou Zhao
  • Hanhan Cong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9771)


Effective feature extraction methods play very important role for prediction of multisite protein subcellular locations. With the progress of many proteome projects, more and more proteins are annotated with more than one subcellular location. However, compared with the problems of single-site protein, the problems of multiplex protein subcellular localizations are far more difficult and complicated to deal with. To improve the multisite prediction quality, it is necessary to incorporate different feature extraction methods. In this paper, a version of feature combination method which is to make use of the 20 dimensions of entropy density instead of the former 20 dimensions of amphiphilic pseudo amino acid composition (AmPseAAC), is used in two different datasets. It is different from the way of simple dimensions additive feature fusion. On base of this novel feature combination method, we adopt the multi-label k-nearest neighbors (ML-KNN) algorithm and setting different weights into different attributes’ ML-KNN, which is called wML-KNN, to predict multiplex protein subcellular locations. The best overall accuracy rate on dataset S1 from the predictor of Virus-mPLoc is 61.11 % and 82.03 % on dataset S2 from Gpos-mPLoc, respectively.


Multisite protein subcellular localizations The entropy density AmPseAAC Multi-label k-nearest neighbors algorithm wML-KNN 



This research was partially supported by the Science and Technology Foundation of University of Jinan (Grant No. XKY1402), Shandong Provincial Natural Science Foundation, China, under Grant ZR2015JL025, the Youth Project of National Natural Science Fund (Grant No. 61302128), the Youth Science and Technology Star Program of Jinan City (201406003), the Natural Science Foundation of Shandong Province (ZR2011FL022, ZR2013FL002), the Scientific Research Fund of Jinan University (XKY1410, XKY1411), the Program for Scientific research innovation team in Colleges and Universities of Shandong Province (2012–2015), and the Shandong Provincial Key Laboratory of Network Based Intelligent Computing.


  1. 1.
    Chou, K.C.: Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct. Funct. Genet. 43, 246–256 (2001)CrossRefGoogle Scholar
  2. 2.
    Du, P.F., Xu, C.: Predicting multisite protein subcellular locations: progress and challenges. Proteomics 10(3), 227–237 (2013)Google Scholar
  3. 3.
    Chou, K.C.: Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr. Proteomics 6, 262–274 (2009)CrossRefGoogle Scholar
  4. 4.
    Chou, K.C., Cai, Y.D.: Predicting protein localization in budding yeast. Bioinformatics 21(7), 944–950 (2005)CrossRefGoogle Scholar
  5. 5.
    Su, C.Y., Lo, A., Lin, C.C., et al.: A novel approach for prediction of multi-labeled protein subcellular localization for prokaryotic bacteria. In: Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference Workshops, Stanford, California, 8–12 August, pp. 79–80. IEEE, Piscataway (2005)Google Scholar
  6. 6.
    Zhu, H.Q., She, Z.S., Wang, J.: An EDP-based description of DNA sequences and its application in identification of exons in human genome. In: The Second Chinese Bioinformatics Conference Proceedings, Beijing, pp. 23–24 (2002)Google Scholar
  7. 7.
    Shannon, C.E.: The mathematical theory of communication. Bell Syst. Tech. 27, 623–656 (1948)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Chou, K.C., Wu, Z.C., Xiao, X.: iLoc-virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. J. Theor. Biol. 284, 42–51 (2011)CrossRefGoogle Scholar
  9. 9.
    Chou, K.C.: Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10–19 (2005)CrossRefGoogle Scholar
  10. 10.
    Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038 (2007)CrossRefzbMATHGoogle Scholar
  11. 11.
    Shen, Z.B., Bai, Q.Y.: KNN text classification method based on weight modify. Comput. Sci. 35(10), 123–126 (2008)Google Scholar
  12. 12.
    Qu, X., Chen, Y., Qiao, S., Wang, D., Zhao, Q.: Predicting the subcellular localization of proteins with multiple sites based on multiple features fusion. In: Huang, D.-S., Han, K., Gromiha, M. (eds.) ICIC 2014. LNCS, vol. 8590, pp. 456–465. Springer, Heidelberg (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.School of Information Science and EngineeringUniversity of JinanJinanChina
  2. 2.Shandong Provincial Key Laboratory of Network Based Intelligent ComputingJinanChina
  3. 3.University of JinanJinanChina

Personalised recommendations