Molecular Diversity

, 12:41

Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins

Full Length Paper

Abstract

In this paper, AdaBoost algorithm, a popular and effective prediction method, is applied to predict the subcellular locations of Prokaryotic and Eukaryotic Proteins—a dataset derived from SWISSPROT 33.0. Its prediction ability was evaluated by re-substitution test, Leave-One-Out Cross validation (LOOCV) and jackknife test. By comparing its results with some most popular predictors such as Discriminant Function, neural networks, and SVM, we demonstrated that the AdaBoost predictor outperformed these predictors. As a result, we arrive at the conclusion that AdaBoost algorithm could be employed as a robust method to predict subcellular location. An online web server for predicting subcellular location of prokaryotic and eukaryotic proteins is available at http://chemdata.shu.edu.cn/subcell/.

Keywords

AdaBoost Subcellular location Self-consistency Jackknife test 

References

  1. 1.
    Eisenhaber F, Bork PW (1998) Subcellular localization of proteins based on sequence. Trends Cell Biol 8: 169–170PubMedCrossRefGoogle Scholar
  2. 2.
    Nakai K (2000) Protein sorting signals and prediction of subcellular localization. Adv Protein Chem 54: 277–344PubMedCrossRefGoogle Scholar
  3. 3.
    Nakai K, Kanehisa M (1991) Expert system for predicting protein localization sites in gram negative bacteria. Proteins Struct Funct Genet 1: 95–110Google Scholar
  4. 4.
    Nakai K, Kanehisa M (1992) A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14: 897–911PubMedCrossRefGoogle Scholar
  5. 5.
    Von Heijne G, Nielsen H, Engelbrecht J, Brunak S (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 10: 1–6PubMedCrossRefGoogle Scholar
  6. 6.
    Nakashima H, Nishikawa K (1994) Discrimination of intracellular and extracellular proteins using amino acid composition and residue pair frequencies. J Mol Biol 238: 54–61PubMedCrossRefGoogle Scholar
  7. 7.
    Cedano J, Aloy P, Pérez-Pons JA (1997) Relation between am ion acid composition and cellular location of proteins. J Mol Biol 266: 594–600PubMedCrossRefGoogle Scholar
  8. 8.
    Reinhardt A, Hubbard T (1998) Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res 9: 2230–2236CrossRefGoogle Scholar
  9. 9.
    Cai YD, Chou KC (2000) Using neural networks for prediction of subcellular location of prokaryotic and eukaryotic proteins. Mol Cell Biol Res Commun 4: 172–173PubMedCrossRefGoogle Scholar
  10. 10.
    Cai YD, Chou KC (2003) Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochem Biophys Res Commun 2: 407–411CrossRefGoogle Scholar
  11. 11.
    Cai YD, Chou KC (2004) Predicting subcellular localization of proteins in a hybridization space. Bioinformatics 7: 1151–1156CrossRefGoogle Scholar
  12. 12.
    Cai YD, Chou KC (2004) Predicting 22 protein localizations in budding yeast. Biochem Biophys Res Communi 2: 425–428CrossRefGoogle Scholar
  13. 13.
    Chou KC, Elrod DW (1998) Using discriminant function for prediction of subcellular location of prokaryotic proteins. Biochem Biophys Res Commun 252: 63–68PubMedCrossRefGoogle Scholar
  14. 14.
    Chou KC, Elord DW (1999) Prediction of membrane protein types and subcellular locations. Proteins Struct Funct Genet 34: 137–153PubMedCrossRefGoogle Scholar
  15. 15.
    Chou KC, Elrod D (1999) Protein subcellular location prediction. Protein Eng 2: 107–118CrossRefGoogle Scholar
  16. 16.
    Chou KC, Cai YD (2002) Using functional domain composition and support vector machines for prediction of protein subcellular location. J Mol Biol 48: 45765–45769Google Scholar
  17. 17.
    Yuan Z (1999) Prediction of protein subcellular locations using Markov chain models. FEBS Lett 1: 23–26CrossRefGoogle Scholar
  18. 18.
    Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet C, Ares JM, Haussler D, Chou KC (1995) A novel approach to predict protein structural classes in a (20-1)-D amino acid composition space. Proteins Struct Funct Genet 21: 319–344CrossRefGoogle Scholar
  19. 19.
    Schapire RE, Freund Y, Bartlett P, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5): 1651–1686CrossRefGoogle Scholar
  20. 20.
    Freund Y, Schapire RE (1997) A decision-theoretic generalization of online learning and an application to boosting. J Comput Syst Sci 1: 119–139CrossRefGoogle Scholar
  21. 21.
    Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Machine Learn 37: 297–336CrossRefGoogle Scholar
  22. 22.
    Romero E (2004) Margin maximization with feed-forward neural networks: a comparative study with SVM and AdaBoost. Neurocomputing 57: 313–344CrossRefGoogle Scholar
  23. 23.
    Schapire RE (2002) The boosting approach to machine learning. An Overview MSRI Workshop on Nonlinear Estimation and Classification.Google Scholar
  24. 24.
    Duffy N, Helmbold D (2002) A geometric approach to leveraging weak learners. Theor Comput Sci 284: 67–108CrossRefGoogle Scholar
  25. 25.
    Ding CHQ, Dubchak I (2001) Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17: 349–358PubMedCrossRefGoogle Scholar
  26. 26.
    Breiman L (2001) Random Forests. Machine Learn 15–32Google Scholar
  27. 27.
    Witten IH, Frank E (1999) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San FranciscoGoogle Scholar
  28. 28.
    Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, LondonGoogle Scholar
  29. 29.
    Vapnik V (1998) Statistical learning theory. Wiley-Interscience, New YorkGoogle Scholar
  30. 30.
    Chen NY, Lu WC, Li GZ, Yang J (2004) Support vector machine in chemistry. World Scientific Publishing Company, SingaporeGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  1. 1.School of Materials Science and EngineeringShanghai UniversityShanghaiChina
  2. 2.Department of Chemistry, College of SciencesShanghai UniversityShanghaiChina
  3. 3.Division of Imaging Science & Biomedical EngineeringThe University of ManchesterManchesterUK
  4. 4.Department of Combinatorics and Geometry, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological SciencesChinese Academy of SciencesShanghaiChina
  5. 5.School of Computer Science & EngineeringShanghai UniversityShanghaiChina

Personalised recommendations