Skip to main content

Feature Combination Methods for Prediction of Subcellular Locations of Proteins with Both Single and Multiple Sites

  • Conference paper
  • First Online:
Book cover Intelligent Computing Theories and Application (ICIC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9771))

Included in the following conference series:

Abstract

Effective feature extraction methods play very important role for prediction of multisite protein subcellular locations. With the progress of many proteome projects, more and more proteins are annotated with more than one subcellular location. However, compared with the problems of single-site protein, the problems of multiplex protein subcellular localizations are far more difficult and complicated to deal with. To improve the multisite prediction quality, it is necessary to incorporate different feature extraction methods. In this paper, a version of feature combination method which is to make use of the 20 dimensions of entropy density instead of the former 20 dimensions of amphiphilic pseudo amino acid composition (AmPseAAC), is used in two different datasets. It is different from the way of simple dimensions additive feature fusion. On base of this novel feature combination method, we adopt the multi-label k-nearest neighbors (ML-KNN) algorithm and setting different weights into different attributes’ ML-KNN, which is called wML-KNN, to predict multiplex protein subcellular locations. The best overall accuracy rate on dataset S1 from the predictor of Virus-mPLoc is 61.11 % and 82.03 % on dataset S2 from Gpos-mPLoc, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chou, K.C.: Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct. Funct. Genet. 43, 246–256 (2001)

    Article  Google Scholar 

  2. Du, P.F., Xu, C.: Predicting multisite protein subcellular locations: progress and challenges. Proteomics 10(3), 227–237 (2013)

    Google Scholar 

  3. Chou, K.C.: Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr. Proteomics 6, 262–274 (2009)

    Article  Google Scholar 

  4. Chou, K.C., Cai, Y.D.: Predicting protein localization in budding yeast. Bioinformatics 21(7), 944–950 (2005)

    Article  Google Scholar 

  5. Su, C.Y., Lo, A., Lin, C.C., et al.: A novel approach for prediction of multi-labeled protein subcellular localization for prokaryotic bacteria. In: Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference Workshops, Stanford, California, 8–12 August, pp. 79–80. IEEE, Piscataway (2005)

    Google Scholar 

  6. Zhu, H.Q., She, Z.S., Wang, J.: An EDP-based description of DNA sequences and its application in identification of exons in human genome. In: The Second Chinese Bioinformatics Conference Proceedings, Beijing, pp. 23–24 (2002)

    Google Scholar 

  7. Shannon, C.E.: The mathematical theory of communication. Bell Syst. Tech. 27, 623–656 (1948)

    Article  MathSciNet  MATH  Google Scholar 

  8. Chou, K.C., Wu, Z.C., Xiao, X.: iLoc-virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. J. Theor. Biol. 284, 42–51 (2011)

    Article  Google Scholar 

  9. Chou, K.C.: Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10–19 (2005)

    Article  Google Scholar 

  10. Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038 (2007)

    Article  MATH  Google Scholar 

  11. Shen, Z.B., Bai, Q.Y.: KNN text classification method based on weight modify. Comput. Sci. 35(10), 123–126 (2008)

    Google Scholar 

  12. Qu, X., Chen, Y., Qiao, S., Wang, D., Zhao, Q.: Predicting the subcellular localization of proteins with multiple sites based on multiple features fusion. In: Huang, D.-S., Han, K., Gromiha, M. (eds.) ICIC 2014. LNCS, vol. 8590, pp. 456–465. Springer, Heidelberg (2014)

    Google Scholar 

Download references

Acknowledgment

This research was partially supported by the Science and Technology Foundation of University of Jinan (Grant No. XKY1402), Shandong Provincial Natural Science Foundation, China, under Grant ZR2015JL025, the Youth Project of National Natural Science Fund (Grant No. 61302128), the Youth Science and Technology Star Program of Jinan City (201406003), the Natural Science Foundation of Shandong Province (ZR2011FL022, ZR2013FL002), the Scientific Research Fund of Jinan University (XKY1410, XKY1411), the Program for Scientific research innovation team in Colleges and Universities of Shandong Province (2012–2015), and the Shandong Provincial Key Laboratory of Network Based Intelligent Computing.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Dong Wang or Yuehui Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, L., Wang, D., Chen, Y., Qiao, S., Zhao, Y., Cong, H. (2016). Feature Combination Methods for Prediction of Subcellular Locations of Proteins with Both Single and Multiple Sites. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2016. Lecture Notes in Computer Science(), vol 9771. Springer, Cham. https://doi.org/10.1007/978-3-319-42291-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42291-6_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42290-9

  • Online ISBN: 978-3-319-42291-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics