Amino Acids

, Volume 35, Issue 2, pp 345–353

AAIndexLoc: predicting subcellular localization of proteins based on a new representation of sequences using amino acid indices

  • E. Tantoso
  • Kuo-Bin Li
Article

Summary.

Identifying a protein’s subcellular localization is an important step to understand its function. However, the involved experimental work is usually laborious, time consuming and costly. Computational prediction hence becomes valuable to reduce the inefficiency. Here we provide a method to predict protein subcellular localization by using amino acid composition and physicochemical properties. The method concatenates the information extracted from a protein’s N-terminal, middle and full sequence. Each part is represented by amino acid composition, weighted amino acid composition, five-level grouping composition and five-level dipeptide composition. We divided our dataset into training and testing set. The training set is used to determine the best performing amino acid index by using five-fold cross validation, whereas the testing set acts as the independent dataset to evaluate the performance of our model. With the novel representation method, we achieve an accuracy of approximately 75% on independent dataset. We conclude that this new representation indeed performs well and is able to extract the protein sequence information. We have developed a web server for predicting protein subcellular localization. The web server is available at http://aaindexloc.bii.a-star.edu.sg.

Keywords: Subcellular localization – Support vector machine – Amino acid indices 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bhasin, M, Raghava, GP 2004ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLASTNucleic Acids Res32W414W419PubMedCrossRefGoogle Scholar
  2. Brown, MP, Grundy, WN, Lin, D, Cristianini, N, Sugnet, CW, Furey, TS, Ares, M,Jr, Haussler, D 2000Knowledge-based analysis of microarray gene expression data by using support vector machinesProc Natl Acad Sci USA97262267PubMedCrossRefGoogle Scholar
  3. Cai, YD, Zhou, GP, Chou, KC 2003Support vector machines for predicting membrane protein types by using functional domain compositionBiophys J8432573263PubMedGoogle Scholar
  4. Cedano, J, Aloy, P, Perez-Pons, JA, Querol, E 1997Relation between amino acid composition and cellular location of proteinsJ Mol Biol266594600PubMedCrossRefGoogle Scholar
  5. Chen, C, Tian, YX, Zou, XY, Cai, PX, Mo, JY 2006aUsing pseudo-amino acid composition and support vector machine to predict protein structural classJ Theor Biol243444448CrossRefGoogle Scholar
  6. Chen, C, Zhou, X, Tian, Y, Zou, X, Cai, P 2006bPredicting protein structural class with pseudo-amino acid composition and support vector machine fusion networkAnal Biochem357116121CrossRefGoogle Scholar
  7. Chen, J, Liu, H, Yang, J, Chou, KC 2007Prediction of linear B-cell epitopes using amino acid pair antigenicity scaleAmino Acids33423428PubMedCrossRefGoogle Scholar
  8. Chen, YL, Li, QZ 2007Prediction of the subcellular location of apoptosis proteinsJ Theor Biol245775783PubMedCrossRefGoogle Scholar
  9. Chou, KC 2000aPrediction of protein structural classes and subcellular locationsCurr Protein Pept Sci1171208CrossRefGoogle Scholar
  10. Chou, KC 2000bPrediction of protein subcellular locations by incorporating quasi-sequence-order effectBiochem Biophys Res Commun278477483CrossRefGoogle Scholar
  11. Chou, KC 2000cReview: prediction of protein structural classes and subcellular locationsCurr Protein Peptide Sci1171208CrossRefGoogle Scholar
  12. Chou, KC 2001Prediction of protein cellular attributes using pseudo-amino acid compositionProteins43246255PubMedCrossRefGoogle Scholar
  13. Chou, KC 2002

    A new branch of proteomics: prediction of protein cellular attributes

    Weinrer, PWLu, Q eds. Gene cloning and expression technologiesEaton PublishingWestborough MA5770
    Google Scholar
  14. Chou, KC 2005Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classesBioinformatics211019PubMedCrossRefGoogle Scholar
  15. Chou, KC, Cai, YD 2002Using functional domain composition and support vector machines for prediction of protein subcellular locationJ Biol Chem2774576545769PubMedCrossRefGoogle Scholar
  16. Chou, KC, Cai, YD 2003A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontologyBiochem Biophys Res Commun311743747PubMedCrossRefGoogle Scholar
  17. Chou, KC, Cai, YD 2005Prediction of membrane protein types by incorporating amphipathic effectsJ Chem Inf Model45407413PubMedCrossRefGoogle Scholar
  18. Chou, KC, Elrod, DW 1998Using discriminant function for prediction of subcellular location of prokaryotic proteinsBiochem Biophys Res Commun2526368PubMedCrossRefGoogle Scholar
  19. Chou, KC, Elrod, DW 1999aPrediction of membrane protein types and subcellular locationsProteins34137153CrossRefGoogle Scholar
  20. Chou, KC, Elrod, DW 1999bProtein subcellular location predictionProtein Eng12107118CrossRefGoogle Scholar
  21. Chou, KC, Shen, HB 2006aHum-PLoc: a novel ensemble classifier for predicting human protein subcellular localizationBiochem Biophys Res Commun347150157CrossRefGoogle Scholar
  22. Chou, KC, Shen, HB 2006bPredicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiersJ Proteome Res518881897CrossRefGoogle Scholar
  23. Chou, KC, Shen, HB 2006cPredicting protein subcellular location by fusing multiple classifiersJ Cell Biochem99517527CrossRefGoogle Scholar
  24. Chou, KC, Shen, HB 2007aEuk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sitesJ Proteome Res617281734Google Scholar
  25. Chou, KC, Shen, HB 2007bLarge-scale plant protein subcellular location predictionJ Cell Biochem100665678CrossRefGoogle Scholar
  26. Chou, KC, Shen, HB 2007cMemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSMBiochem Biophys Res Commun360339345CrossRefGoogle Scholar
  27. Chou, KC, Shen, HB 2007dRecent progress in protein subcellular location predictionAnal Biochem370116CrossRefGoogle Scholar
  28. Chou, KC, Shen, HB 2007eSignal-CF: a subsite-coupled and window-fusing approach for predicting signal peptidesBiochem Biophys Res Commun357633640CrossRefGoogle Scholar
  29. Chou, KC, Zhang, CT 1994Predicting protein folding types by distance functions that make allowances for amino acid interactionsJ Biol Chem2692201422020PubMedGoogle Scholar
  30. Chou, KC, Zhang, CT 1995Prediction of protein structural classesCrit Rev Biochem Mol Biol30275349PubMedCrossRefGoogle Scholar
  31. Clausmeyer, S, Klosgen, RB, Herrmann, RG 1993Protein import into chloroplasts. The hydrophilic lumenal proteins exhibit unexpected import and sorting specificities in spite of structurally conserved transit peptidesJ Biol Chem2681386913876PubMedGoogle Scholar
  32. Ding, YS, Zhang, TL, Chou, KC 2007Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine networkProtein Pept Lett14811815PubMedCrossRefGoogle Scholar
  33. Du, P, Li, Y 2006Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequenceBMC Bioinformatics7518PubMedCrossRefGoogle Scholar
  34. Emanuelsson, O, Nielsen, H, Brunak, S, von Heijne, G 2000Predicting subcellular localization of proteins based on their N-terminal amino acid sequenceJ Mol Biol30010051016PubMedCrossRefGoogle Scholar
  35. Endo, T, Shimada, I, Roise, D, Inagaki, F 1989N-terminal half of a mitochondrial presequence peptide takes a helical conformation when bound to dodecylphosphocholine micelles: a proton nuclear magnetic resonance studyJ Biochem (Tokyo)106396400Google Scholar
  36. Feng, ZP 2001Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid compositionBiopolymers58491499PubMedCrossRefGoogle Scholar
  37. Feng, ZP 2002An overview on predicting the subcellular location of a proteinIn Silico Biol2291303PubMedGoogle Scholar
  38. Feng, ZP, Zhang, CT 2001Prediction of the subcellular location of prokaryotic proteins based on the hydrophobicity index of amino acidsInt J Biol Macromol28255261PubMedCrossRefGoogle Scholar
  39. Gao, QB, Wang, ZZ, Yan, C, Du, YH 2005aPrediction of protein subcellular location using a combined feature of sequenceFEBS Lett57934443448CrossRefGoogle Scholar
  40. Gao, Y, Shao, S, Xiao, X, Ding, Y, Huang, Y, Huang, Z, Chou, KC 2005bUsing pseudo amino acid composition to predict protein subcellular location: approached with Lyapunov index, Bessel function, and Chebyshev filterAmino Acids28373376CrossRefGoogle Scholar
  41. Gardy, JL, Spencer, C, Wang, K, Ester, M, Tusnady, GE, Simon, I, Hua, S, deFays, K, Lambert, C, Nakai, K, Brinkman, FS 2003PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteriaNucleic Acids Res3136133617PubMedCrossRefGoogle Scholar
  42. Garg, A, Bhasin, M, Raghava, GP 2005Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity searchJ Biol Chem2801442714432PubMedCrossRefGoogle Scholar
  43. Guo, J, Lin, Y, Liu, X 2006aGNBSL: a new integrative system to predict the subcellular location for Gram-negative bacteria proteinsProteomics650995105CrossRefGoogle Scholar
  44. Guo, YZ, Li, M, Lu, M, Wen, Z, Wang, K, Li, G, Wu, J 2006bClassifying G protein-coupled receptors and nuclear receptors on the basis of protein power spectrum from fast Fourier transformAmino Acids30397402CrossRefGoogle Scholar
  45. Hammen, PK, Gorenstein, DG, Weiner, H 1994Structure of the signal sequences for two mitochondrial matrix proteins that are not proteolytically processed upon importBiochemistry3386108617PubMedCrossRefGoogle Scholar
  46. Hoglund, A, Donnes, P, Blum, T, Adolph, HW, Kohlbacher, O 2006MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs, and amino acid compositionBioinformatics2211581165PubMedCrossRefGoogle Scholar
  47. Hua, S, Sun, Z 2001Support vector machine approach for protein subcellular localization predictionBioinformatics17721728PubMedCrossRefGoogle Scholar
  48. Huang, Y, Li, Y 2004Prediction of protein subcellular locations using fuzzy k-NN methodBioinformatics202128PubMedCrossRefGoogle Scholar
  49. Jin, L, Tang, H, Fang, W 2005Prediction of protein subcellular locations using a new measure of information discrepancyJ Bioinform Comput Biol3915927PubMedCrossRefGoogle Scholar
  50. Kedarisetti, KD, Kurgan, L, Dick, S 2006Classifier ensembles for protein structural class prediction with varying homologyBiochem Biophys Res Commun348981988PubMedCrossRefGoogle Scholar
  51. Keegstra, K, Cline, K 1999Protein import and routing systems of chloroplastsPlant Cell11557570PubMedCrossRefGoogle Scholar
  52. Klee, EW, Finlay, JA, McDonald, C, Attewell, JR, Hebrink, D, Dyer, R, Love, B, Vasmatzis, G, Li, TM, Beechem, JM, Klee, GG 2006Bioinformatics methods prioritizing serum biomarker candidatesJ Clin Chem5221622164CrossRefGoogle Scholar
  53. Kurgan, LA, Stach, W, Ruan, J 2007Novel scales based on hydrophobicity indices for secondary protein structureJ Theor Biol248354366PubMedCrossRefGoogle Scholar
  54. Lee, K, Kim, DW, Na, D, Lee, KH, Lee, D 2006PLPD: reliable protein localization prediction from imbalanced and overlapped datasetsNucleic Acids Res3446554666PubMedCrossRefGoogle Scholar
  55. Lee, Y, Lee, CK 2003Classification of multiple cancer types by multicategory support vector machines using gene expression dataBioinformatics1911321139PubMedCrossRefGoogle Scholar
  56. Lei, Z, Dai, Y 2005An SVM-based system for predicting protein subnuclear localizationsBMC Bioinformatics6291PubMedCrossRefGoogle Scholar
  57. Lin, H, Li, QZ 2007aPredicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminantBiochem Biophys Res Commun354548551CrossRefGoogle Scholar
  58. Lin, H, Li, QZ 2007bUsing pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide componentsJ Comput Chem2814631466CrossRefGoogle Scholar
  59. Liu, DQ, Liu, H, Shen, HB, Yang, J, Chou, KC 2007Predicting secretory protein signal sequence cleavage sites by fusing the marks of global alignmentsAmino Acids32493496PubMedCrossRefGoogle Scholar
  60. Liu, H, Wang, M, Chou, KC 2005aLow-frequency Fourier spectrum for predicting membrane protein typesBiochem Biophys Res Commun336737739CrossRefGoogle Scholar
  61. Liu, H, Yang, J, Wang, M, Xue, L, Chou, KC 2005bUsing fourier spectrum analysis and pseudo amino acid composition for prediction of membrane protein typesProtein J24385389CrossRefGoogle Scholar
  62. Mahdavi, M, Lin, Y-H 2007False positive reduction in protein–protein interaction predictions using gene ontology annotationsBMC Bioinformatics8262PubMedCrossRefGoogle Scholar
  63. Matsuda, S, Vert, JP, Saigo, H, Ueda, N, Toh, H, Akutsu, T 2005A novel representation of protein sequences for prediction of subcellular location using support vector machinesProtein Sci1428042813PubMedCrossRefGoogle Scholar
  64. Matthews, BW 1975Comparison of the predicted and observed secondary structure of T4 phage lysozymeBiochim Biophys Acta405442451PubMedGoogle Scholar
  65. Mondal, S, Bhavna, R, Mohan Babu, R, Ramakumar, S 2006Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classificationJ Theor Biol243252260PubMedCrossRefGoogle Scholar
  66. Mundra, P, Kumar, M, Kumar, KK, Jayaraman, VK, Kulkarni, BD 2007Using pseudo amino acid composition to predict protein subnuclear localization: Approached with PSSMPattern Recogn Lett2816101615CrossRefGoogle Scholar
  67. Murphy, RF, Boland, MV, Velliste, M 2000Towards a systematics for protein subcelluar location: quantitative description of protein localization patterns and automated analysis of fluorescence microscope imagesProc Int Conf Intell Syst Mol Biol8251259PubMedGoogle Scholar
  68. Nakai, K 2000Protein sorting signals and prediction of subcellular localizationAdv Protein Chem54277344PubMedCrossRefGoogle Scholar
  69. Nakai, K, Horton, P 1999PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localizationTrends Biochem Sci243436PubMedCrossRefGoogle Scholar
  70. Nakai, K, Kanehisa, M 1992A knowledge base for predicting protein localization sites in eukaryotic cellsGenomics14897911PubMedCrossRefGoogle Scholar
  71. Nakashima, H, Nishikawa, K 1994Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequenciesJ Mol Biol2385461PubMedCrossRefGoogle Scholar
  72. Niu, B, Cai, YD, Lu, WC, Li, GZ, Chou, KC 2006Predicting protein structural class with AdaBoost LearnerProtein Pept Lett13489492PubMedCrossRefGoogle Scholar
  73. Pan, YX, Zhang, ZZ, Guo, ZM, Feng, GY, Huang, ZD, He, L 2003Application of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approachJ Protein Chem22395402PubMedCrossRefGoogle Scholar
  74. Park, KJ, Kanehisa, M 2003Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairsBioinformatics1916561663PubMedCrossRefGoogle Scholar
  75. Pu, X, Guo, J, Leung, H, Lin, Y 2007Prediction of membrane protein types from sequences and position-specific scoring matricesJ Theor Biol247259265PubMedCrossRefGoogle Scholar
  76. Reinhardt, A, Hubbard, T 1998Using neural networks for prediction of the subcellular location of proteinsNucleic Acids Res2622302236PubMedCrossRefGoogle Scholar
  77. Sarda, D, Chua, GH, Li, KB, Krishnan, A 2005pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical propertiesBMC Bioinformatics6152PubMedCrossRefGoogle Scholar
  78. Shen, H, Chou, KC 2005aUsing optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein typesBiochem Biophys Res Commun334288292CrossRefGoogle Scholar
  79. Shen, HB, Chou, KC 2005bPredicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid compositionBiochem Biophys Res Commun337752756CrossRefGoogle Scholar
  80. Shen, HB, Chou, KC 2006Ensemble classifier for protein fold pattern recognitionBioinformatics2217171722PubMedCrossRefGoogle Scholar
  81. Shen, HB, Chou, KC 2007aGpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteinsProtein Eng Des Sel203946CrossRefGoogle Scholar
  82. Shen, HB, Chou, KC 2007bHum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sitesBiochem Biophys Res Commun35510061011CrossRefGoogle Scholar
  83. Shen, HB, Chou, KC 2007cUsing ensemble classifier to identify membrane protein typesAmino Acids32483488CrossRefGoogle Scholar
  84. Shen, HB, Chou, KC 2007dVirus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cellsBiopolymers85233240CrossRefGoogle Scholar
  85. Shen, HB, Yang, J, Chou, KC 2006Fuzzy KNN for predicting membrane protein types from pseudo-amino acid compositionJ Theor Biol240913PubMedCrossRefGoogle Scholar
  86. Shen, HB, Yang, J, Chou, KC 2007Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location predictionAmino Acids335767PubMedCrossRefGoogle Scholar
  87. Shi, JY, Zhang, SW, Pan, Q, Cheng, YM, Xie, J 2007Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid compositionAmino Acids336974PubMedCrossRefGoogle Scholar
  88. Sun, XD, Huang, RB 2006Prediction of protein structural classes using support vector machinesAmino Acids30469475PubMedCrossRefGoogle Scholar
  89. Vapnik, V 1995The nature of statistical learning theorySpringer-VerlagNew YorkGoogle Scholar
  90. Wang, M, Yang, J, Chou, KC 2005Using string kernel to predict signal peptide cleavage site based on subsite coupling modelAmino Acids28395402PubMedCrossRefGoogle Scholar
  91. Wang, M, Yang, J, Liu, GP, Xu, ZJ, Chou, KC 2004Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid compositionProtein Eng Des Sel17509516PubMedCrossRefGoogle Scholar
  92. Wang, SQ, Yang, J, Chou, KC 2006Using stacked generalization to predict membrane protein types based on pseudo-amino acid compositionJ Theor Biol242941946PubMedCrossRefGoogle Scholar
  93. Ward, JJ, McGuffin, LJ, Buxton, BF, Jones, DT 2003Secondary structure prediction with support vector machinesBioinformatics1916501655PubMedCrossRefGoogle Scholar
  94. Wen, Z, Li, M, Li, Y, Guo, Y, Wang, K 2007Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognitionAmino Acids32277283PubMedCrossRefGoogle Scholar
  95. Xiao, X, Shao, S, Ding, Y, Huang, Z, Chou, KC 2006aUsing cellular automata images and pseudo amino acid composition to predict protein subcellular locationAmino Acids304954CrossRefGoogle Scholar
  96. Xiao, X, Shao, SH, Huang, ZD, Chou, KC 2006bUsing pseudo amino acid composition to predict protein structural classes: approached with complexity measure factorJ Comput Chem27478482CrossRefGoogle Scholar
  97. Xiao, X, Shao, S, Ding, Y, Huang, Z, Huang, Y, Chou, KC 2005Using complexity measure factor to predict protein subcellular locationAmino Acids285761PubMedCrossRefGoogle Scholar
  98. Xie, D, Li, A, Wang, M, Fan, Z, Feng, H 2005LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLASTNucleic Acids Res33W105W110PubMedCrossRefGoogle Scholar
  99. Yu, CS, Lin, CJ, Hwang, JK 2004Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositionsProtein Sci1314021406PubMedCrossRefGoogle Scholar
  100. Yuan, Z 1999Prediction of protein subcellular locations using Markov chain modelsFEBS Lett4512326PubMedCrossRefGoogle Scholar
  101. Zhang, SW, Pan, Q, Zhang, HC, Shao, ZC, Shi, JY 2006aPrediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature FusionAmino Acids30461468CrossRefGoogle Scholar
  102. Zhang, T, Ding, Y, Chou, KC 2006bPrediction of protein subcellular location using hydrophobic patterns of amino acid sequenceComput Biol Chem30367371CrossRefGoogle Scholar
  103. Zhang, TL, Ding, YS 2007Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classesAmino Acids33623629PubMedCrossRefGoogle Scholar
  104. Zhou, GP 1998An intriguing controversy over protein structural class predictionJ Protein Chem17729738PubMedCrossRefGoogle Scholar
  105. Zhou, GP, Assa-Munt, N 2001Some insights into protein structural class predictionProteins445759PubMedCrossRefGoogle Scholar
  106. Zhou, GP, Doctor, K 2003Subcellular location prediction of apoptosis proteinsProteins504448PubMedCrossRefGoogle Scholar
  107. Zhou, XB, Chen, C, Li, ZC, Zou, XY 2007Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classesJ Theor Biol248546551PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  • E. Tantoso
    • 1
  • Kuo-Bin Li
    • 2
  1. 1.Bioinformatics InstituteSingapore
  2. 2.Center for Systems and Synthetic BiologyNational Yang-Ming UniversityTaipeiTaiwan

Personalised recommendations