Skip to main content

Advertisement

Log in

Secreted protein prediction system combining CJ-SPHMM, TMHMM, and PSORT

  • Published:
Mammalian Genome Aims and scope Submit manuscript

Abstract

To increase the coverage of secreted protein prediction, we describe a combination strategy. Instead of using a single method, we combine Hidden Markov Model (HMM)-based methods CJ-SPHMM and TMHMM with PSORT in secreted protein prediction. CJ-SPHMM is an HMM-based signal peptide prediction method, while TMHMM is an HMM-based transmembrane (TM) protein prediction algorithm. With CJ-SPHMM and TMHMM, proteins with predicted signal peptide and without predicted TM regions are taken as putative secreted proteins. This HMM-based approach predicts secreted protein with Ac (Accuracy) at 0.82 and Cc (Correlation coefficient) at 0.75, which are similar to PSORT with Ac at 0.82 and Cc at 0.76. When we further complement the HMM-based method, i.e., CJ-SPHMM + TMHMM with PSORT in secreted protein prediction, the Ac value is increased to 0.86 and the Cc value is increased to 0.81. Taking this combination strategy to search putative secreted proteins from the International Protein Index (IPI) maintained at the European Bioinformatics Institute (EBI), we constructed a putative human secretome with 5235 proteins. The prediction system described here can also be applied to predicting secreted proteins from other vertebrate proteomes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2

Similar content being viewed by others

References

  1. SF Altschul W Gish W Miller EW Myers DJ Lipman (1990) ArticleTitleBasic local alignment search tool. J Mol Biol 215 403–410 Occurrence Handle10.1006/jmbi.1990.9999 Occurrence Handle1:CAS:528:DyaK3MXitVGmsA%3D%3D Occurrence Handle2231712

    Article  CAS  PubMed  Google Scholar 

  2. SF Altschul TL Madden AA Schaffer J Zhang Z Zhang et al. (1997) ArticleTitleGapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25 3389–3402 Occurrence Handle9254694

    PubMed  Google Scholar 

  3. A Bairoch R Apweiler (2000) ArticleTitleThe SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28 45–48 Occurrence Handle1:CAS:528:DC%2BD3cXhvVGqu7s%3D Occurrence Handle10592178

    CAS  PubMed  Google Scholar 

  4. S Barash W Wang Y Shi (2002) ArticleTitleHuman secretory signal peptide description by hidden Markov model and generation of a strong artificial signal peptide for secreted protein expression. Biochem Biophys Res Commun 294 835–842 Occurrence Handle10.1016/S0006-291X(02)00566-1 Occurrence Handle1:CAS:528:DC%2BD38Xks1GnsL0%3D Occurrence Handle12061783

    Article  CAS  PubMed  Google Scholar 

  5. MJ Bishop (1998) Guide to Human Genome Computing. 2nd ed. Academic Press New York

    Google Scholar 

  6. G Blobel (2000) ArticleTitleProtein targeting (Nobel lecture). CHEMBIOCHEM 1 86–102 Occurrence Handle10.1002/1439-7633(20000818)1:2<86::AID-CBIC86>3.3.CO;2-1 Occurrence Handle1:CAS:528:DC%2BD3cXmsVWltL8%3D Occurrence Handle11828402

    Article  CAS  PubMed  Google Scholar 

  7. C Burge S Karlin (1997) ArticleTitlePrediction of complete gene structure in human genome DNA. J Mol Biol 268 78–94 Occurrence Handle10.1006/jmbi.1997.0951 Occurrence Handle1:CAS:528:DyaK2sXjtlSqtL4%3D Occurrence Handle9149143

    Article  CAS  PubMed  Google Scholar 

  8. M Burset R Guigo (1996) ArticleTitleEvaluation of gene structure prediction programs. Genomics 34 353–367 Occurrence Handle10.1006/geno.1996.0298 Occurrence Handle1:CAS:528:DyaK28XktV2htrs%3D Occurrence Handle8786136

    Article  CAS  PubMed  Google Scholar 

  9. CP Chen B Rost (2002) ArticleTitleState-of-the-art in membrane protein prediction. Appl Bioinform 1 21–35 Occurrence Handle1:CAS:528:DC%2BD3sXotVers78%3D

    CAS  Google Scholar 

  10. JK Damas L Gullestad P Aukrust (2001) ArticleTitleCytokines as new treatment targets in chronic heart failure. Curr Control Trials Cardiovasc Med 2 271–277 Occurrence Handle10.1186/CVM-2-6-271 Occurrence Handle11806813

    Article  PubMed  Google Scholar 

  11. SR Eddy (199) ArticleTitleMultiple alignment using hidden Markov models. Proc Int Conf Intell Syst Mol Biol 3 114–120

    Google Scholar 

  12. O Emanuelsson H Nielsen S Brunak G von Heijne (2000) ArticleTitlePredicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300 1005–1016 Occurrence Handle10.1006/jmbi.2000.3903 Occurrence Handle1:CAS:528:DC%2BD3cXks1OntrY%3D Occurrence Handle10891285

    Article  CAS  PubMed  Google Scholar 

  13. R Fagan M Swindells (2000) ArticleTitleBioinformatics, target discovery and pharmaceutical/biotechnology industry. Curr Opin Mol Ther 2 655–661 Occurrence Handle1:CAS:528:DC%2BD3MXktFentQ%3D%3D Occurrence Handle11249743

    CAS  PubMed  Google Scholar 

  14. U Hobohm M Scharf R Schneider C Sander (1992) ArticleTitleSelection of representative protein data sets. Protein Sci 1 409–417 Occurrence Handle1:CAS:528:DyaK38XktVOiu70%3D Occurrence Handle1304348

    CAS  PubMed  Google Scholar 

  15. P Horton K Nakai (1997) ArticleTitleBetter prediction of protein cellular localization sites with the k nearest neighbors classifier. Intell Syst Mol Biol 5 147–152 Occurrence Handle1:STN:280:ByiH28zisVE%3D

    CAS  Google Scholar 

  16. InstitutionalAuthorNameInternational Human Genome Sequencing Consortium (2001) ArticleTitleInitial sequencing and analysis of the human genome. Nature 409 860–921 Occurrence Handle1:CAS:528:DC%2BD3MXhsFCjtLc%3D Occurrence Handle11237011

    CAS  PubMed  Google Scholar 

  17. B Jagla J Schuchhardt (2000) ArticleTitleAdaptive encoding neural networks for the recognition of human signal peptide cleavage site. Bioinformatics 16 245–250 Occurrence Handle10.1093/bioinformatics/16.3.245 Occurrence Handle1:CAS:528:DC%2BD3cXksFaju7w%3D Occurrence Handle10869017

    Article  CAS  PubMed  Google Scholar 

  18. I Ladunga (1999) ArticleTitlePHYSEAN: PHYsical SEquence ANalysis for the identification of protein domains on the basis of physical and chemical properties of amino acids. Bioinformatics 15 1028–1038 Occurrence Handle10.1093/bioinformatics/15.12.1028 Occurrence Handle1:CAS:528:DC%2BD3cXit1yks70%3D Occurrence Handle10745993

    Article  CAS  PubMed  Google Scholar 

  19. I Ladunga (2000) ArticleTitleLarge-scale predictions of secretory proteins from mammalian genomic and EST sequences. Curr Opin Biotechnol 11 13–18 Occurrence Handle10.1016/S0958-1669(99)00048-8 Occurrence Handle1:CAS:528:DC%2BD3cXhtlyms70%3D Occurrence Handle10679337

    Article  CAS  PubMed  Google Scholar 

  20. S Moller MD Croning R Apweiler (2001) ArticleTitleEvaluation of methods for the prediction of membrane spanning regions. Bioinformatics 17 646–653 Occurrence Handle11448883

    PubMed  Google Scholar 

  21. InstitutionalAuthorNameMouse Genome Sequencing Consortium (2002) ArticleTitleInitial sequencing and comparative analysis of the mouse genome. Nature 420 520–562 Occurrence Handle10.1038/nature01262 Occurrence Handle12466850

    Article  PubMed  Google Scholar 

  22. A Muller B Homey H Soto et al. (2001) ArticleTitleInvolvement of chemokine receptors in breast cancer metastasis. Nature 410 50–56 Occurrence Handle1:CAS:528:DC%2BD3MXhvVSgu74%3D Occurrence Handle11242036

    CAS  PubMed  Google Scholar 

  23. PM Murphy (2001) ArticleTitleViral exploitation and subversion of the immune system through chemokine mimicry. Nat Immunol 2 116–122 Occurrence Handle10.1038/84214 Occurrence Handle1:CAS:528:DC%2BD3MXhtVOlsL8%3D Occurrence Handle11175803

    Article  CAS  PubMed  Google Scholar 

  24. K Nakai P Horton (1999) ArticleTitlePSORT: a program for detecting the sorting signals of proteins and predicting their subcellular localization. Trends Biochem Sci 24 34–35 Occurrence Handle1:CAS:528:DyaK1MXks12qtLk%3D Occurrence Handle10087920

    CAS  PubMed  Google Scholar 

  25. H Nielsen A Krogh (1998) ArticleTitlePrediction of signal peptides and signal anchors by a hidden Markov model. Proc Int Conf Intell Syst Mol Biol 6 122–130 Occurrence Handle1:STN:280:DyaK1cvls1GntA%3D%3D Occurrence Handle9783217

    CAS  PubMed  Google Scholar 

  26. H Nielsen J Engelbrecht G von Heijne S Brunak (1996) ArticleTitleDefining a similarity threshold for a functional protein sequence pattern: the signal peptide cleavage site. PROTEINS: Struct Funct Genet 24 165–177 Occurrence Handle10.1002/(SICI)1097-0134(199602)24:2<165::AID-PROT4>3.0.CO;2-I Occurrence Handle1:CAS:528:DyaK28XhsVSku7w%3D

    Article  CAS  Google Scholar 

  27. H Nielsen J Engelbrecht S Brunak G von Heijne (1997a) ArticleTitleA neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Int J Neural Syst 8 581–599 Occurrence Handle1:STN:280:DyaK1M7mtlOhsg%3D%3D

    CAS  Google Scholar 

  28. H Nielsen J Engelbrecht S Brunak G von Heijne (1997b) ArticleTitleIdentification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 10 1–6 Occurrence Handle1:CAS:528:DyaK2sXhsVersrs%3D

    CAS  Google Scholar 

  29. H Nielsen S Brunak G von Heijne (1999) ArticleTitleMachine learning approaches to the prediction of signal peptides and other protein sorting signals. Protein Eng 12 3–9 Occurrence Handle10.1093/protein/12.1.3 Occurrence Handle1:CAS:528:DyaK1MXht1KhtbY%3D Occurrence Handle10065704

    Article  CAS  PubMed  Google Scholar 

  30. M Sakaguchi (1997) ArticleTitleEukaryotic protein secretion. Curr Opin Biotechnol 8 595–601 Occurrence Handle10.1016/S0958-1669(97)80035-3 Occurrence Handle1:CAS:528:DyaK2sXmvV2jtL0%3D Occurrence Handle9353225

    Article  CAS  PubMed  Google Scholar 

  31. TD Schneider RM Stephens (1990) ArticleTitleSequence logos: a new way to display consensus sequences. Nucleic Acids Res 18 6097–6100 Occurrence Handle1:CAS:528:DyaK3MXitVGmuw%3D%3D Occurrence Handle2172928

    CAS  PubMed  Google Scholar 

  32. EL Sonnhammer G von Heijne A Krogh (1998) ArticleTitleA hidden Markov model for predicting transmembrane helices in protein sequences. Proc Int Conf Intell Syst Mol Biol 6 175–182 Occurrence Handle1:STN:280:DyaK1cvls1GmsA%3D%3D Occurrence Handle9783223

    CAS  PubMed  Google Scholar 

  33. JD Thompson DG Higgins TJ Gibson (1994) ArticleTitleCLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22 4673–4680 Occurrence Handle7984417

    PubMed  Google Scholar 

  34. H Tjalsma A Bolhuis JD Jongbloed S Bron J van Dijl (2000) ArticleTitleSignal peptide-dependent protein transport in Bacillus subtilis: a genome-based survey of the secretome. Microbiol Mol Biol Rev 64 515–547 Occurrence Handle1:CAS:528:DC%2BD3cXntVCgt7c%3D Occurrence Handle10974125

    CAS  PubMed  Google Scholar 

  35. ML Vance N Mauras (1999) ArticleTitleGrowth hormone therapy in adults and children. N Eng J Med 341 1206–1216 Occurrence Handle10.1056/NEJM199910143411607 Occurrence Handle1:CAS:528:DyaK1MXmvFamsbo%3D

    Article  CAS  Google Scholar 

  36. G von Heijne (1985) ArticleTitleSignal sequences the limits of variation. J Mol Biol 184 99–105 Occurrence Handle4032478

    PubMed  Google Scholar 

  37. G von Heijne L Abrahmsen (1989) ArticleTitleSpecies-specific variation in signal peptide design. Implications for protein secretion in foreign hosts. FEBS Lett 244 439–446 Occurrence Handle10.1016/0014-5793(89)80579-4 Occurrence Handle1:CAS:528:DyaL1MXhs1yms7k%3D Occurrence Handle2646153

    Article  CAS  PubMed  Google Scholar 

  38. P Walter R Gilmore G Blobel (1984) ArticleTitleProtein translocation across the endoplasmic reticulum. Cell 38 5–8 Occurrence Handle1:CAS:528:DyaL2cXlt1Gjtrw%3D Occurrence Handle6088076

    CAS  PubMed  Google Scholar 

  39. MQ Zhang (1997) ArticleTitleIdentification of protein coding regions in the human genome by quadratic discriminant analysis. Proc Natl Acad Sci USA 94 565–568 Occurrence Handle1:CAS:528:DyaK2sXnslentg%3D%3D Occurrence Handle9012824

    CAS  PubMed  Google Scholar 

  40. Z Zhang WI Wood (2003) ArticleTitleA profile hidden Markov model for signal peptides generated by HMMER. Bioinformatics 19 307–308 Occurrence Handle10.1093/bioinformatics/19.2.307 Occurrence Handle1:CAS:528:DC%2BD3sXitlCntrs%3D Occurrence Handle12538263

    Article  CAS  PubMed  Google Scholar 

  41. N Zheng LM Gierasch (1996) ArticleTitleSignal sequence: the same yet different. Cell 86 849–852 Occurrence Handle1:CAS:528:DyaK28XlvV2ns7s%3D Occurrence Handle8808619

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by China High-Tech Program (grant No 2001AA231011) and the Beijing Municipal Committee of Science and Technology (grant No H010210360119). We thank Dr. Kenta Nakai and Dr. Anders Krogh for their kindness in providing us PSORT and TMHMM running on our server locally. We also thank Mr. Yanbin Yin for some valuable suggestion.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying Jiang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Yu, P., Luo, J. et al. Secreted protein prediction system combining CJ-SPHMM, TMHMM, and PSORT . Mamm Genome 14, 859–865 (2003). https://doi.org/10.1007/s00335-003-2296-6

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00335-003-2296-6

Keywords

Navigation