Feature Combination Methods for Prediction of Subcellular Locations of Proteins with Both Single and Multiple Sites

Wang, Luyao; Wang, Dong; Chen, Yuehui; Qiao, Shanping; Zhao, Yaou; Cong, Hanhan

doi:10.1007/978-3-319-42291-6_19

Luyao Wang^16,17,
Dong Wang^16,17,
Yuehui Chen^16,17,
Shanping Qiao^16,17,
Yaou Zhao^16,17 &
…
Hanhan Cong¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9771))

Included in the following conference series:

International Conference on Intelligent Computing

1877 Accesses
3 Citations

Abstract

Effective feature extraction methods play very important role for prediction of multisite protein subcellular locations. With the progress of many proteome projects, more and more proteins are annotated with more than one subcellular location. However, compared with the problems of single-site protein, the problems of multiplex protein subcellular localizations are far more difficult and complicated to deal with. To improve the multisite prediction quality, it is necessary to incorporate different feature extraction methods. In this paper, a version of feature combination method which is to make use of the 20 dimensions of entropy density instead of the former 20 dimensions of amphiphilic pseudo amino acid composition (AmPseAAC), is used in two different datasets. It is different from the way of simple dimensions additive feature fusion. On base of this novel feature combination method, we adopt the multi-label k-nearest neighbors (ML-KNN) algorithm and setting different weights into different attributes’ ML-KNN, which is called wML-KNN, to predict multiplex protein subcellular locations. The best overall accuracy rate on dataset S1 from the predictor of Virus-mPLoc is 61.11 % and 82.03 % on dataset S2 from Gpos-mPLoc, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chou, K.C.: Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct. Funct. Genet. 43, 246–256 (2001)
Article Google Scholar
Du, P.F., Xu, C.: Predicting multisite protein subcellular locations: progress and challenges. Proteomics 10(3), 227–237 (2013)
Google Scholar
Chou, K.C.: Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr. Proteomics 6, 262–274 (2009)
Article Google Scholar
Chou, K.C., Cai, Y.D.: Predicting protein localization in budding yeast. Bioinformatics 21(7), 944–950 (2005)
Article Google Scholar
Su, C.Y., Lo, A., Lin, C.C., et al.: A novel approach for prediction of multi-labeled protein subcellular localization for prokaryotic bacteria. In: Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference Workshops, Stanford, California, 8–12 August, pp. 79–80. IEEE, Piscataway (2005)
Google Scholar
Zhu, H.Q., She, Z.S., Wang, J.: An EDP-based description of DNA sequences and its application in identification of exons in human genome. In: The Second Chinese Bioinformatics Conference Proceedings, Beijing, pp. 23–24 (2002)
Google Scholar
Shannon, C.E.: The mathematical theory of communication. Bell Syst. Tech. 27, 623–656 (1948)
Article MathSciNet MATH Google Scholar
Chou, K.C., Wu, Z.C., Xiao, X.: iLoc-virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. J. Theor. Biol. 284, 42–51 (2011)
Article Google Scholar
Chou, K.C.: Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21, 10–19 (2005)
Article Google Scholar
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038 (2007)
Article MATH Google Scholar
Shen, Z.B., Bai, Q.Y.: KNN text classification method based on weight modify. Comput. Sci. 35(10), 123–126 (2008)
Google Scholar
Qu, X., Chen, Y., Qiao, S., Wang, D., Zhao, Q.: Predicting the subcellular localization of proteins with multiple sites based on multiple features fusion. In: Huang, D.-S., Han, K., Gromiha, M. (eds.) ICIC 2014. LNCS, vol. 8590, pp. 456–465. Springer, Heidelberg (2014)
Google Scholar

Download references

Acknowledgment

This research was partially supported by the Science and Technology Foundation of University of Jinan (Grant No. XKY1402), Shandong Provincial Natural Science Foundation, China, under Grant ZR2015JL025, the Youth Project of National Natural Science Fund (Grant No. 61302128), the Youth Science and Technology Star Program of Jinan City (201406003), the Natural Science Foundation of Shandong Province (ZR2011FL022, ZR2013FL002), the Scientific Research Fund of Jinan University (XKY1410, XKY1411), the Program for Scientific research innovation team in Colleges and Universities of Shandong Province (2012–2015), and the Shandong Provincial Key Laboratory of Network Based Intelligent Computing.

Author information

Authors and Affiliations

School of Information Science and Engineering, University of Jinan, Jinan, 250022, China
Luyao Wang, Dong Wang, Yuehui Chen, Shanping Qiao & Yaou Zhao
Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, 250022, China
Luyao Wang, Dong Wang, Yuehui Chen, Shanping Qiao & Yaou Zhao
University of Jinan, Jinan, 250022, China
Hanhan Cong

Authors

Luyao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuehui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shanping Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Yaou Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Hanhan Cong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Dong Wang or Yuehui Chen .

Editor information

Editors and Affiliations

Tongji University , Shanghai, China
De-Shuang Huang
Polytechnic of Bari , Bari, Italy
Vitoantonio Bevilacqua
University of Wollongong , North Wollongong, New South Wales, Australia
Prashan Premaratne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Wang, D., Chen, Y., Qiao, S., Zhao, Y., Cong, H. (2016). Feature Combination Methods for Prediction of Subcellular Locations of Proteins with Both Single and Multiple Sites. In: Huang, DS., Bevilacqua, V., Premaratne, P. (eds) Intelligent Computing Theories and Application. ICIC 2016. Lecture Notes in Computer Science(), vol 9771. Springer, Cham. https://doi.org/10.1007/978-3-319-42291-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-42291-6_19
Published: 12 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42290-9
Online ISBN: 978-3-319-42291-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics