Skip to main content

SCL-Epred: a generalised de novo eukaryotic protein subcellular localisation predictor

Abstract

Knowledge of the subcellular location of a protein provides valuable information about its function, possible interaction with other proteins and drug targetability, among other things. The experimental determination of a protein’s location in the cell is expensive, time consuming and open to human error. Fast and accurate predictors of subcellular location have an important role to play if the abundance of sequence data which is now available is to be fully exploited. In the post-genomic era, genomes in many diverse organisms are available. Many of these organisms are important in human and veterinary disease and fall outside of the well-studied plant, animal and fungi groups. We have developed a general eukaryotic subcellular localisation predictor (SCL-Epred) which predicts the location of eukaryotic proteins into three classes which are important, in particular, for determining the drug targetability of a protein—secreted proteins, membrane proteins and proteins that are neither secreted nor membrane. The algorithm powering SCL-Epred is a N-to-1 neural network and is trained on very large non-redundant sets of protein sequences. SCL-Epred performs well on training data achieving a Q of 86 % and a generalised correlation of 0.75 when tested in tenfold cross-validation on a set of 15,202 redundancy reduced protein sequences. The three class accuracy of SCL-Epred and LocTree2, and in particular a consensus predictor comprising both methods, surpasses that of other widely used predictors when benchmarked using a large redundancy reduced independent test set of 562 proteins. SCL-Epred is publicly available at http://distillf.ucd.ie/distill/.

This is a preview of subscription content, access via your institution.

References

  1. Altschul S, Madden T, Schäffer A, Zhang J, Zhang Z, Miller W, Lipman D (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402

    PubMed  Article  CAS  Google Scholar 

  2. Bakheet T, Doig A (2009) Properties and identification of human protein drug targets. Bioinformatics 25(4):451–457

    PubMed  Article  CAS  Google Scholar 

  3. Baldi P, Brunak S, Chauvin Y, Andersen C, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5):412–424

    PubMed  Article  CAS  Google Scholar 

  4. Bender A, van Dooren G, Ralph S, McFadden G, Schneider G (2003) Properties and prediction of mitochondrial transit peptides from Plasmodium falciparum. Mol Biochem Parasitol 132:59–66

    PubMed  Article  CAS  Google Scholar 

  5. Bendtsen J, Jensen L, Blom N, Von Heijne G, Brunak S (2004) Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Des Sel 17(4):349–356

    PubMed  Article  CAS  Google Scholar 

  6. Boeckmann B, Bairoch A, Apweiler R, Blatter M, Estreicher A, Gasteiger E, Martin M, Michoud K, O’Donovan C, Phan I, Pilbout S, Schneider M (2003) The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31:365–370

    PubMed  Article  CAS  Google Scholar 

  7. Brayton K, Lau A, Herndon D, Hannick L, Kappmeyer L, Berens S, Bidwell S, Brown W, Crabtree J, Fadrosh D et al (2007) Genome sequence of Babesia bovis and comparative analysis of apicomplexan hemoprotozoa. PLoS Pathog 3(10):e148

    Article  Google Scholar 

  8. Burki F, Shalchian-Tabrizi K, Minge M, Skjæveland A, Nikolaev S, Jakobsen K, Pawlowski J (2007) Phylogenomics reshuffles the eukaryotic supergroups. PLoS One 2(8):e790

    PubMed  Article  Google Scholar 

  9. Choo K, Tan T, Ranganathan S (2009) A comprehensive assessment of N-terminal signal peptides prediction methods. BMC Bioinformatics 10(15):S2

    PubMed  Article  Google Scholar 

  10. Chou K, Shen H (2010) A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0. PLoS One 5(4):e9931

    PubMed  Article  Google Scholar 

  11. Emanuelsson O, Nielsen H, Brunak S, von Heijne G et al (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300(4):1005–1016

    PubMed  Article  CAS  Google Scholar 

  12. Foth B, Ralph S, Tonkin C, Struck N, Fraunholz M, Roos DS, Cowman A, McFadden G (2003) Dissecting apicoplast targeting in the malaria parasite Plasmodium falciparum. Science 299:705

    PubMed  Article  CAS  Google Scholar 

  13. Frank K, Sippl M (2008) High-performance signal peptide prediction based on sequence alignment techniques. Bioinformatics 24(19):2172–2176

    PubMed  Article  CAS  Google Scholar 

  14. Gardner M, Bishop R, Shah T, de Villiers E, Carlton J, Hall N, Ren Q, Paulsen I, Pain A, Berriman M et al (2005) Genome sequence of Theileria parva, a bovine pathogen that transforms lymphocytes. Science 309(5731):134

    PubMed  Article  CAS  Google Scholar 

  15. Garg A, Raghava G (2008) ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins. BMC Bioinformatics 9(1):503

    PubMed  Article  Google Scholar 

  16. Garg A, Bhasin M, Raghava G (2005) Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J Biol Chem 280(15):14427–14432

    PubMed  Article  CAS  Google Scholar 

  17. Gellin B, Soave R (1992) Coccidian infections in AIDS. Toxoplasmosis, cryptosporidiosis, and isosporiasis. Med Clin N Am 76(1):205

    PubMed  CAS  Google Scholar 

  18. Goldberg T, Hamp T, Rost B (2012) LocTree2 predicts localization for all domains of life. Bioinformatics 28(18):i458–i465

    PubMed  Article  CAS  Google Scholar 

  19. Horton P, Park K, Obayashi T, Fujita N, Harada H, Adams-Collier C, Naka K (2007) WoLF PSORT:protein localization predictor. Nucleic Acids Res 35:W585–W5857

    PubMed  Article  Google Scholar 

  20. Jia P, Qian Z, Zeng Z, Cai Y, Li Y (2007) Prediction of subcellular protein localization based on functional domain composition. Biochem Bioph Res Co 357(2):366–370

    Article  CAS  Google Scholar 

  21. Kaundal R, Raghava G (2009) RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information. Proteomics 9(9):2324–2342

    PubMed  Article  CAS  Google Scholar 

  22. Keeling P, Burger G, Durnford D, Lang B, Lee R, Pearlman R, Roger A, Gray M (2005) The tree of eukaryotes. Trends Ecol Evol 20(12):670–676

    PubMed  Article  Google Scholar 

  23. Mariani V, Kiefer F, Schmidt T, Haas J, Schwede T (2011) Assessment of template based protein structure predictions in CASP9. Proteins 79(S10):37–58

    PubMed  Article  CAS  Google Scholar 

  24. Mooney C, Pollastri G et al (2011) SCLpred: protein subcellular localization prediction by N-to-1 neural networks. Bioinformatics 27(20):2812–2819

    PubMed  Article  CAS  Google Scholar 

  25. Murray C, Rosenfeld L, Lim S, Andrews K, Foreman K, Haring D, Fullman N, Naghavi M, Lozano R, Lopez A (2012) Global malaria mortality between 1980 and 2010: a systematic analysis. Lancet 379(9814):413–431

    PubMed  Article  Google Scholar 

  26. Nakai K, Horton P (1999) PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24(1):34–35

    PubMed  Article  CAS  Google Scholar 

  27. Nancy Y, Wagner J, Laird M, Melli G, Rey S, Lo R, Sahinalp S, Ester M, Foster L et al (2010) PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 26(13):1608–1615

    Article  Google Scholar 

  28. Nielsen H, Engelbrecht J, Brunak S, Von Heijne G (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 10(1):1–6

    PubMed  Article  CAS  Google Scholar 

  29. Petersen T, Brunak S, von Heijne G, Nielsen H (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 8(10):785–786

    PubMed  Article  CAS  Google Scholar 

  30. Pierleoni A, Martelli PL, Fariselli P, Casadio R (2006) BaCelLo: a balanced subcellular localization predictor. Bioinformatics 422(14):408–416

    Article  Google Scholar 

  31. Pierleoni A, Martelli P, Casadio R (2011) MemLoci: predicting subcellular localization of membrane proteins in Eukaryotes. Bioinformatics 27(9):1224–1230

    PubMed  Article  CAS  Google Scholar 

  32. Pollastri G, McLysaght A (2005) Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21(8):1719–1720

    PubMed  Article  CAS  Google Scholar 

  33. Shatkay H, Höglund A, Brady S, Blum T, Dönnes P, Kohlbacher O (2007) SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data. Bioinformatics 23(11):1410–1417

    PubMed  Article  CAS  Google Scholar 

  34. Suzek B, Huang H, McGarvey P, Mazumder R, Wu C (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23(10):1282

    PubMed  Article  CAS  Google Scholar 

  35. Tamura T, Akutsu T (2007) Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition. BMC Bioinformatics 8(1):466

    PubMed  Article  Google Scholar 

  36. Volpato V, Adelfio A, Pollastri G (2013) Accurate prediction of protein enzymatic class by N-to-1 neural networks. BMC Bioinformatics 14(1):S11

    PubMed  Article  CAS  Google Scholar 

  37. Yu C, Chen Y, Lu C, Hwang J (2006) Prediction of protein subcellular localization. Proteins 64(3):643–651

    PubMed  Article  CAS  Google Scholar 

  38. Yuan Z, Teasdale R (2002) Prediction of Golgi Type II membrane proteins based on their transmembrane domains. Bioinformatics 18(8):1109–1115

    PubMed  Article  CAS  Google Scholar 

  39. Zuegge J, Ralph S, Schmuker M, McFadden G, Schneider G (2001) Deciphering apicoplast targeting signals—feature extraction from nuclear-encoded precursors of Plasmodium falciparum apicoplast proteins. Gene 280:19–26

    PubMed  Article  CAS  Google Scholar 

Download references

Acknowledgments

The work was funded through a Science Foundation Ireland principal investigator grant (08/IN.1/B1864) to D. C. Shields and a Science Foundation Ireland research frontiers grant (10/RFP/GEN2749) to G. Pollastri. The authors wish to acknowledge UCD IT Services, and in particular the Phaeton administrators, for the provision of computational facilities and support. We thank Tatyana Goldberg from the Rost Lab at TU Munich for providing LocTree2 predictions.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Gianluca Pollastri.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Mooney, C., Cessieux, A., Shields, D.C. et al. SCL-Epred: a generalised de novo eukaryotic protein subcellular localisation predictor. Amino Acids 45, 291–299 (2013). https://doi.org/10.1007/s00726-013-1491-3

Download citation

Keywords

  • Subcellular localisation prediction
  • Eukaryotes
  • N-to-1 neural network
  • SCL-Epred