A New Multi-label Classifier in Identifying the Functional Types of Human Membrane Proteins
- 219 Downloads
- 5 Citations
Abstract
Membrane proteins were found to be involved in various cellular processes performing various important functions, which are mainly associated to their type. Given a membrane protein sequence, how can we identify its type(s)? Particularly, how can we deal with the multi-type problem since one membrane protein may simultaneously belong to two or more different types? To address these problems, which are obviously very important to both basic research and drug development, a new multi-label classifier was developed based on pseudo amino acid composition with multi-label k-nearest neighbor algorithm. The success rate achieved by the new predictor on the benchmark dataset by jackknife test is 73.94 %, indicating that the method is promising and the predictor may become a very useful high-throughput tool, or at least play a complementary role to the existing predictors in identifying functional types of membrane proteins.
Keywords
Membrane proteins Pseudo amino acid composition ML-KNN algorithm Jackknife testSupplementary material
References
- Boutet E, Lieberherr D, Tognolli M, Schneider M, Bairoch A (2007) Uniprotkb/swiss-prot. Springer, Plant Bioinformatics, pp 89–112CrossRefGoogle Scholar
- Chen Y-L, Li Q-Z (2007) Prediction of the subcellular location of apoptosis proteins. J Theor Biol 245:775–783CrossRefPubMedGoogle Scholar
- Chou KC (1995) A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins 21:319–344CrossRefPubMedGoogle Scholar
- Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43:246–255CrossRefPubMedGoogle Scholar
- Chou K-C (2011) Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 273:236–247CrossRefPubMedGoogle Scholar
- Chou KC, Elrod DW (1999) Prediction of membrane protein types and subcellular locations. Proteins 34:137–153CrossRefPubMedGoogle Scholar
- Chou K-C, Elrod DW (2003) Prediction of enzyme family classes. J Proteome Res 2:183–190CrossRefPubMedGoogle Scholar
- Chou K-C, Shen H-B (2007) MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Commun 360:339–345CrossRefPubMedGoogle Scholar
- Chou K-C, Shen H-B (2009) Review: recent advances in developing web-servers for predicting protein attributes. Nat Sci 1:63Google Scholar
- Chou C-H, Shen H-B (2010) Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms. Eng, 2Google Scholar
- Chou K-C, Wu Z-C, Xiao X (2011) iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins. PLoS One 6:e18258PubMedCentralCrossRefPubMedGoogle Scholar
- Chou K-C, Wu Z-C, Xiao X (2012) iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. Mol BioSyst 8:629–641CrossRefPubMedGoogle Scholar
- Ding C, Yuan L-F, Guo S-H, Lin H, Chen W (2012) Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. J Proteomics 77:321–328CrossRefPubMedGoogle Scholar
- Glory E, Murphy RF (2007) Automated subcellular location determination and high-throughput microscopy. Dev Cell 12:7–16CrossRefPubMedGoogle Scholar
- Huang C, Yuan J-Q (2013a) A multilabel model based on Chou’s pseudo-amino acid composition for identifying membrane proteins with both single and multiple functional types. J Membr Biol 246:327–334CrossRefPubMedGoogle Scholar
- Huang C, Yuan J-Q (2013b) Predicting protein subchloroplast locations with both single and multiple sites via three different modes of Chou’s pseudo amino acid compositions. J Theor Biol 335:205–212CrossRefPubMedGoogle Scholar
- Jian X, Wei R, Zhan T, Gu Q (2008) Using the concept of Chou’s pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy. Protein Pept Lett 15:392–396CrossRefGoogle Scholar
- Khosravian M, Kazemi Faramarzi F, Mohammad Beigi M, Behbahani M, Mohabatkar H (2013) Predicting antibacterial peptides by the concept of Chou’s pseudo-amino acid composition and machine learning methods. Protein Pept Lett 20:180–186CrossRefPubMedGoogle Scholar
- Li F-M, Li Q-Z (2008) Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach. Protein Pept Lett 15:612–616CrossRefPubMedGoogle Scholar
- Lin H, Wang H, Ding H, Chen Y-L, Li Q-Z (2009) Prediction of subcellular localization of apoptosis protein using Chou’s pseudo amino acid composition. Acta Biotheor 57:321–330CrossRefPubMedGoogle Scholar
- Lin W-Z, Fang J-A, Xiao X, Chou K-C (2013) iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. Mol BioSyst 9:634–644CrossRefPubMedGoogle Scholar
- Nakashima H, Nishikawa K, Tatsuo O (1986) The folding type of a protein is relevant to the amino acid composition. J Biochem 99:153–162PubMedGoogle Scholar
- Park K-J, Kanehisa M (2003) Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 19:1656–1663CrossRefPubMedGoogle Scholar
- Pu X, Guo J, Leung H, Lin Y (2007) Prediction of membrane protein types from sequences and position-specific scoring matrices. J Theor Biol 247:259–265CrossRefPubMedGoogle Scholar
- Saravanan V, Lakshmi P (2013) APSLAP: an adaptive boosting technique for predicting subcellular localization of apoptosis protein. Acta Biotheor 61:481–497CrossRefPubMedGoogle Scholar
- Schäffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 29:2994–3005PubMedCentralCrossRefPubMedGoogle Scholar
- Shen H-B, Chou K-C (2007) Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Commun 355:1006–1011CrossRefPubMedGoogle Scholar
- Shen H-B, Chou K-C (2008) PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal Biochem 373:386–388CrossRefPubMedGoogle Scholar
- Shen H-B, Yang J, Chou K-C (2006) Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. J Theor Biol 240:9–13CrossRefPubMedGoogle Scholar
- Smith C (2008) Subcellular targeting of proteins and drugs. http://www.biocompare.com/Articles/TechnologySpotlight/976/Subcellular-Target-ing-Of-Proteins-An
- UniProt Consortium (2008) The universal protein resource (UniProt). Nucleic Acids Res 36:D190–D195CrossRefGoogle Scholar
- Wang X, Li G-Z (2012) A multi-label predictor for identifying the subcellular locations of singleplex and multiplex eukaryotic proteins. PLoS One 7:e36317PubMedCentralCrossRefPubMedGoogle Scholar
- Wang M, Yang J, Xu Z-J, Chou K-C (2005) SLLE for predicting membrane protein types. J Theor Biol 232:7–15CrossRefPubMedGoogle Scholar
- Wootton JC, Federhen S (1993) Statistics of local complexity in amino acid sequences and sequence databases. Comput Chem 17:149–163CrossRefGoogle Scholar
- Wu Z-C, Xiao X, Chou K-C (2012) iLoc-Gpos: a multi-layer classifier for predicting the subcellular localization of singleplex and multiplex gram-positive bacterial proteins. Protein Pept Lett 19:4–14CrossRefPubMedGoogle Scholar
- Xiao X, Shao S, Ding Y, Huang Z, Huang Y, Chou K-C (2005) Using complexity measure factor to predict protein subcellular location. Amino Acids 28:57–61CrossRefPubMedGoogle Scholar
- Xiao X, Wang P, Lin W-Z, Jia J-H, Chou K-C (2013) iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Anal Biochem 436:168–177CrossRefPubMedGoogle Scholar
- Zhang M-L, Zhou Z-H (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40:2038–2048CrossRefGoogle Scholar
- Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:17Google Scholar
- Zhou X-B, Chen C, Li Z-C, Zou X-Y (2007) Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol 248:546–551CrossRefPubMedGoogle Scholar
- Zou Q, Wang Z, Guan X, Liu B, Wu Y, Lin Z (2013) An Approach for identifying cytokines based on a novel ensemble classifier. BioMed Res Int 2013Google Scholar
- Zou Q, Chen W, Huang Y, Liu X, Jiang Y (2013b) Identifying multi-functional enzyme by hierarchical multi-label classifier. J Comput Theor Nanosci 10:1038–1043CrossRefGoogle Scholar