Advertisement

Amino Acids

, Volume 45, Issue 2, pp 291–299 | Cite as

SCL-Epred: a generalised de novo eukaryotic protein subcellular localisation predictor

  • Catherine Mooney
  • Amélie Cessieux
  • Denis C. Shields
  • Gianluca Pollastri
Original Article

Abstract

Knowledge of the subcellular location of a protein provides valuable information about its function, possible interaction with other proteins and drug targetability, among other things. The experimental determination of a protein’s location in the cell is expensive, time consuming and open to human error. Fast and accurate predictors of subcellular location have an important role to play if the abundance of sequence data which is now available is to be fully exploited. In the post-genomic era, genomes in many diverse organisms are available. Many of these organisms are important in human and veterinary disease and fall outside of the well-studied plant, animal and fungi groups. We have developed a general eukaryotic subcellular localisation predictor (SCL-Epred) which predicts the location of eukaryotic proteins into three classes which are important, in particular, for determining the drug targetability of a protein—secreted proteins, membrane proteins and proteins that are neither secreted nor membrane. The algorithm powering SCL-Epred is a N-to-1 neural network and is trained on very large non-redundant sets of protein sequences. SCL-Epred performs well on training data achieving a Q of 86 % and a generalised correlation of 0.75 when tested in tenfold cross-validation on a set of 15,202 redundancy reduced protein sequences. The three class accuracy of SCL-Epred and LocTree2, and in particular a consensus predictor comprising both methods, surpasses that of other widely used predictors when benchmarked using a large redundancy reduced independent test set of 562 proteins. SCL-Epred is publicly available at http://distillf.ucd.ie/distill/.

Keywords

Subcellular localisation prediction Eukaryotes N-to-1 neural network SCL-Epred 

Notes

Acknowledgments

The work was funded through a Science Foundation Ireland principal investigator grant (08/IN.1/B1864) to D. C. Shields and a Science Foundation Ireland research frontiers grant (10/RFP/GEN2749) to G. Pollastri. The authors wish to acknowledge UCD IT Services, and in particular the Phaeton administrators, for the provision of computational facilities and support. We thank Tatyana Goldberg from the Rost Lab at TU Munich for providing LocTree2 predictions.

References

  1. Altschul S, Madden T, Schäffer A, Zhang J, Zhang Z, Miller W, Lipman D (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402PubMedCrossRefGoogle Scholar
  2. Bakheet T, Doig A (2009) Properties and identification of human protein drug targets. Bioinformatics 25(4):451–457PubMedCrossRefGoogle Scholar
  3. Baldi P, Brunak S, Chauvin Y, Andersen C, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16(5):412–424PubMedCrossRefGoogle Scholar
  4. Bender A, van Dooren G, Ralph S, McFadden G, Schneider G (2003) Properties and prediction of mitochondrial transit peptides from Plasmodium falciparum. Mol Biochem Parasitol 132:59–66PubMedCrossRefGoogle Scholar
  5. Bendtsen J, Jensen L, Blom N, Von Heijne G, Brunak S (2004) Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Des Sel 17(4):349–356PubMedCrossRefGoogle Scholar
  6. Boeckmann B, Bairoch A, Apweiler R, Blatter M, Estreicher A, Gasteiger E, Martin M, Michoud K, O’Donovan C, Phan I, Pilbout S, Schneider M (2003) The Swiss-Prot protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31:365–370PubMedCrossRefGoogle Scholar
  7. Brayton K, Lau A, Herndon D, Hannick L, Kappmeyer L, Berens S, Bidwell S, Brown W, Crabtree J, Fadrosh D et al (2007) Genome sequence of Babesia bovis and comparative analysis of apicomplexan hemoprotozoa. PLoS Pathog 3(10):e148CrossRefGoogle Scholar
  8. Burki F, Shalchian-Tabrizi K, Minge M, Skjæveland A, Nikolaev S, Jakobsen K, Pawlowski J (2007) Phylogenomics reshuffles the eukaryotic supergroups. PLoS One 2(8):e790PubMedCrossRefGoogle Scholar
  9. Choo K, Tan T, Ranganathan S (2009) A comprehensive assessment of N-terminal signal peptides prediction methods. BMC Bioinformatics 10(15):S2PubMedCrossRefGoogle Scholar
  10. Chou K, Shen H (2010) A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0. PLoS One 5(4):e9931PubMedCrossRefGoogle Scholar
  11. Emanuelsson O, Nielsen H, Brunak S, von Heijne G et al (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300(4):1005–1016PubMedCrossRefGoogle Scholar
  12. Foth B, Ralph S, Tonkin C, Struck N, Fraunholz M, Roos DS, Cowman A, McFadden G (2003) Dissecting apicoplast targeting in the malaria parasite Plasmodium falciparum. Science 299:705PubMedCrossRefGoogle Scholar
  13. Frank K, Sippl M (2008) High-performance signal peptide prediction based on sequence alignment techniques. Bioinformatics 24(19):2172–2176PubMedCrossRefGoogle Scholar
  14. Gardner M, Bishop R, Shah T, de Villiers E, Carlton J, Hall N, Ren Q, Paulsen I, Pain A, Berriman M et al (2005) Genome sequence of Theileria parva, a bovine pathogen that transforms lymphocytes. Science 309(5731):134PubMedCrossRefGoogle Scholar
  15. Garg A, Raghava G (2008) ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins. BMC Bioinformatics 9(1):503PubMedCrossRefGoogle Scholar
  16. Garg A, Bhasin M, Raghava G (2005) Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J Biol Chem 280(15):14427–14432PubMedCrossRefGoogle Scholar
  17. Gellin B, Soave R (1992) Coccidian infections in AIDS. Toxoplasmosis, cryptosporidiosis, and isosporiasis. Med Clin N Am 76(1):205PubMedGoogle Scholar
  18. Goldberg T, Hamp T, Rost B (2012) LocTree2 predicts localization for all domains of life. Bioinformatics 28(18):i458–i465PubMedCrossRefGoogle Scholar
  19. Horton P, Park K, Obayashi T, Fujita N, Harada H, Adams-Collier C, Naka K (2007) WoLF PSORT:protein localization predictor. Nucleic Acids Res 35:W585–W5857PubMedCrossRefGoogle Scholar
  20. Jia P, Qian Z, Zeng Z, Cai Y, Li Y (2007) Prediction of subcellular protein localization based on functional domain composition. Biochem Bioph Res Co 357(2):366–370CrossRefGoogle Scholar
  21. Kaundal R, Raghava G (2009) RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information. Proteomics 9(9):2324–2342PubMedCrossRefGoogle Scholar
  22. Keeling P, Burger G, Durnford D, Lang B, Lee R, Pearlman R, Roger A, Gray M (2005) The tree of eukaryotes. Trends Ecol Evol 20(12):670–676PubMedCrossRefGoogle Scholar
  23. Mariani V, Kiefer F, Schmidt T, Haas J, Schwede T (2011) Assessment of template based protein structure predictions in CASP9. Proteins 79(S10):37–58PubMedCrossRefGoogle Scholar
  24. Mooney C, Pollastri G et al (2011) SCLpred: protein subcellular localization prediction by N-to-1 neural networks. Bioinformatics 27(20):2812–2819PubMedCrossRefGoogle Scholar
  25. Murray C, Rosenfeld L, Lim S, Andrews K, Foreman K, Haring D, Fullman N, Naghavi M, Lozano R, Lopez A (2012) Global malaria mortality between 1980 and 2010: a systematic analysis. Lancet 379(9814):413–431PubMedCrossRefGoogle Scholar
  26. Nakai K, Horton P (1999) PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24(1):34–35PubMedCrossRefGoogle Scholar
  27. Nancy Y, Wagner J, Laird M, Melli G, Rey S, Lo R, Sahinalp S, Ester M, Foster L et al (2010) PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 26(13):1608–1615CrossRefGoogle Scholar
  28. Nielsen H, Engelbrecht J, Brunak S, Von Heijne G (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 10(1):1–6PubMedCrossRefGoogle Scholar
  29. Petersen T, Brunak S, von Heijne G, Nielsen H (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 8(10):785–786PubMedCrossRefGoogle Scholar
  30. Pierleoni A, Martelli PL, Fariselli P, Casadio R (2006) BaCelLo: a balanced subcellular localization predictor. Bioinformatics 422(14):408–416CrossRefGoogle Scholar
  31. Pierleoni A, Martelli P, Casadio R (2011) MemLoci: predicting subcellular localization of membrane proteins in Eukaryotes. Bioinformatics 27(9):1224–1230PubMedCrossRefGoogle Scholar
  32. Pollastri G, McLysaght A (2005) Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21(8):1719–1720PubMedCrossRefGoogle Scholar
  33. Shatkay H, Höglund A, Brady S, Blum T, Dönnes P, Kohlbacher O (2007) SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data. Bioinformatics 23(11):1410–1417PubMedCrossRefGoogle Scholar
  34. Suzek B, Huang H, McGarvey P, Mazumder R, Wu C (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23(10):1282PubMedCrossRefGoogle Scholar
  35. Tamura T, Akutsu T (2007) Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition. BMC Bioinformatics 8(1):466PubMedCrossRefGoogle Scholar
  36. Volpato V, Adelfio A, Pollastri G (2013) Accurate prediction of protein enzymatic class by N-to-1 neural networks. BMC Bioinformatics 14(1):S11PubMedCrossRefGoogle Scholar
  37. Yu C, Chen Y, Lu C, Hwang J (2006) Prediction of protein subcellular localization. Proteins 64(3):643–651PubMedCrossRefGoogle Scholar
  38. Yuan Z, Teasdale R (2002) Prediction of Golgi Type II membrane proteins based on their transmembrane domains. Bioinformatics 18(8):1109–1115PubMedCrossRefGoogle Scholar
  39. Zuegge J, Ralph S, Schmuker M, McFadden G, Schneider G (2001) Deciphering apicoplast targeting signals—feature extraction from nuclear-encoded precursors of Plasmodium falciparum apicoplast proteins. Gene 280:19–26PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Wien 2013

Authors and Affiliations

  • Catherine Mooney
    • 1
  • Amélie Cessieux
    • 2
  • Denis C. Shields
    • 1
  • Gianluca Pollastri
    • 3
  1. 1.Complex and Adaptive Systems Laboratory, Conway Institute of Biomolecular and Biomedical Science, School of Medicine and Medical ScienceUniversity College DublinBelfield, DublinIreland
  2. 2.Département Génie BiologiquePolytech’Nice-SophiaBiotFrance
  3. 3.Complex and Adaptive Systems Laboratory and School of Computer Science and InformaticsUniversity College DublinBelfield, DublinIreland

Personalised recommendations