Mammalian Genome

, Volume 14, Issue 12, pp 859–865 | Cite as

Secreted protein prediction system combining CJ-SPHMM, TMHMM, and PSORT

  • Yunjia Chen
  • Peng Yu
  • Jingchu Luo
  • Ying Jiang


To increase the coverage of secreted protein prediction, we describe a combination strategy. Instead of using a single method, we combine Hidden Markov Model (HMM)-based methods CJ-SPHMM and TMHMM with PSORT in secreted protein prediction. CJ-SPHMM is an HMM-based signal peptide prediction method, while TMHMM is an HMM-based transmembrane (TM) protein prediction algorithm. With CJ-SPHMM and TMHMM, proteins with predicted signal peptide and without predicted TM regions are taken as putative secreted proteins. This HMM-based approach predicts secreted protein with Ac (Accuracy) at 0.82 and Cc (Correlation coefficient) at 0.75, which are similar to PSORT with Ac at 0.82 and Cc at 0.76. When we further complement the HMM-based method, i.e., CJ-SPHMM + TMHMM with PSORT in secreted protein prediction, the Ac value is increased to 0.86 and the Cc value is increased to 0.81. Taking this combination strategy to search putative secreted proteins from the International Protein Index (IPI) maintained at the European Bioinformatics Institute (EBI), we constructed a putative human secretome with 5235 proteins. The prediction system described here can also be applied to predicting secreted proteins from other vertebrate proteomes.


Signal Peptide Hide Markov Model Profile Hide Markov Model Predict Signal Peptide Signal Peptide Prediction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported by China High-Tech Program (grant No 2001AA231011) and the Beijing Municipal Committee of Science and Technology (grant No H010210360119). We thank Dr. Kenta Nakai and Dr. Anders Krogh for their kindness in providing us PSORT and TMHMM running on our server locally. We also thank Mr. Yanbin Yin for some valuable suggestion.


  1. 1.
    Altschul, SF, Gish, W, Miller, W, Myers, EW, Lipman, DJ 1990Basic local alignment search tool.J Mol Biol215403410CrossRefPubMedGoogle Scholar
  2. 2.
    Altschul, SF, Madden, TL, Schaffer, AA, Zhang, J, Zhang, Z,  et al. 1997Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.Nucleic Acids Res2533893402PubMedGoogle Scholar
  3. 3.
    Bairoch, A, Apweiler, R 2000The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.Nucleic Acids Res284548PubMedGoogle Scholar
  4. 4.
    Barash, S, Wang, W, Shi, Y 2002Human secretory signal peptide description by hidden Markov model and generation of a strong artificial signal peptide for secreted protein expression.Biochem Biophys Res Commun294835842CrossRefPubMedGoogle Scholar
  5. 5.
    Bishop, MJ 1998Guide to Human Genome Computing. 2nd ed.Academic PressNew YorkGoogle Scholar
  6. 6.
    Blobel, G 2000Protein targeting (Nobel lecture).CHEMBIOCHEM186102CrossRefPubMedGoogle Scholar
  7. 7.
    Burge, C, Karlin, S 1997Prediction of complete gene structure in human genome DNA.J Mol Biol2687894CrossRefPubMedGoogle Scholar
  8. 8.
    Burset, M, Guigo, R 1996Evaluation of gene structure prediction programs.Genomics34353367CrossRefPubMedGoogle Scholar
  9. 9.
    Chen, CP, Rost, B 2002State-of-the-art in membrane protein prediction.Appl Bioinform12135Google Scholar
  10. 10.
    Damas, JK, Gullestad, L, Aukrust, P 2001Cytokines as new treatment targets in chronic heart failure.Curr Control Trials Cardiovasc Med2271277CrossRefPubMedGoogle Scholar
  11. 11.
    Eddy, SR 199Multiple alignment using hidden Markov models.Proc Int Conf Intell Syst Mol Biol3114120Google Scholar
  12. 12.
    Emanuelsson, O, Nielsen, H, Brunak, S, von Heijne, G 2000Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.J Mol Biol30010051016CrossRefPubMedGoogle Scholar
  13. 13.
    Fagan, R, Swindells, M 2000Bioinformatics, target discovery and pharmaceutical/biotechnology industry.Curr Opin Mol Ther2655661PubMedGoogle Scholar
  14. 14.
    Hobohm, U, Scharf, M, Schneider, R, Sander, C 1992Selection of representative protein data sets.Protein Sci1409417PubMedGoogle Scholar
  15. 15.
    Horton, P, Nakai, K 1997Better prediction of protein cellular localization sites with the k nearest neighbors classifier.Intell Syst Mol Biol5147152Google Scholar
  16. 16.
    International Human Genome Sequencing Consortium2001Initial sequencing and analysis of the human genome.Nature409860921PubMedGoogle Scholar
  17. 17.
    Jagla, B, Schuchhardt, J 2000Adaptive encoding neural networks for the recognition of human signal peptide cleavage site.Bioinformatics16245250CrossRefPubMedGoogle Scholar
  18. 18.
    Ladunga, I 1999PHYSEAN: PHYsical SEquence ANalysis for the identification of protein domains on the basis of physical and chemical properties of amino acids.Bioinformatics1510281038CrossRefPubMedGoogle Scholar
  19. 19.
    Ladunga, I 2000Large-scale predictions of secretory proteins from mammalian genomic and EST sequences.Curr Opin Biotechnol111318CrossRefPubMedGoogle Scholar
  20. 20.
    Moller, S, Croning, MD, Apweiler, R 2001Evaluation of methods for the prediction of membrane spanning regions.Bioinformatics17646653PubMedGoogle Scholar
  21. 21.
    Mouse Genome Sequencing Consortium2002Initial sequencing and comparative analysis of the mouse genome.Nature420520562CrossRefPubMedGoogle Scholar
  22. 22.
    Muller, A, Homey, B, Soto, H,  et al. 2001Involvement of chemokine receptors in breast cancer metastasis.Nature4105056PubMedGoogle Scholar
  23. 23.
    Murphy, PM 2001Viral exploitation and subversion of the immune system through chemokine mimicry.Nat Immunol2116122CrossRefPubMedGoogle Scholar
  24. 24.
    Nakai, K, Horton, P 1999PSORT: a program for detecting the sorting signals of proteins and predicting their subcellular localization.Trends Biochem Sci243435PubMedGoogle Scholar
  25. 25.
    Nielsen, H, Krogh, A 1998Prediction of signal peptides and signal anchors by a hidden Markov model.Proc Int Conf Intell Syst Mol Biol6122130PubMedGoogle Scholar
  26. 26.
    Nielsen, H, Engelbrecht, J, von Heijne, G, Brunak, S 1996Defining a similarity threshold for a functional protein sequence pattern: the signal peptide cleavage site.PROTEINS: Struct Funct Genet24165177CrossRefGoogle Scholar
  27. 27.
    Nielsen, H, Engelbrecht, J, Brunak, S, von Heijne, G 1997aA neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.Int J Neural Syst8581599Google Scholar
  28. 28.
    Nielsen, H, Engelbrecht, J, Brunak, S, von Heijne, G 1997bIdentification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.Protein Eng1016Google Scholar
  29. 29.
    Nielsen, H, Brunak, S, von Heijne, G 1999Machine learning approaches to the prediction of signal peptides and other protein sorting signals.Protein Eng1239CrossRefPubMedGoogle Scholar
  30. 30.
    Sakaguchi, M 1997Eukaryotic protein secretion.Curr Opin Biotechnol8595601CrossRefPubMedGoogle Scholar
  31. 31.
    Schneider, TD, Stephens, RM 1990Sequence logos: a new way to display consensus sequences.Nucleic Acids Res1860976100PubMedGoogle Scholar
  32. 32.
    Sonnhammer, EL, von Heijne, G, Krogh, A 1998A hidden Markov model for predicting transmembrane helices in protein sequences.Proc Int Conf Intell Syst Mol Biol6175182PubMedGoogle Scholar
  33. 33.
    Thompson, JD, Higgins, DG, Gibson, TJ 1994CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.Nucleic Acids Res2246734680PubMedGoogle Scholar
  34. 34.
    Tjalsma, H, Bolhuis, A, Jongbloed, JD, Bron, S, van Dijl, J 2000Signal peptide-dependent protein transport in Bacillus subtilis: a genome-based survey of the secretome.Microbiol Mol Biol Rev64515547PubMedGoogle Scholar
  35. 35.
    Vance, ML, Mauras, N 1999Growth hormone therapy in adults and children.N Eng J Med34112061216CrossRefGoogle Scholar
  36. 36.
    von Heijne, G 1985Signal sequences the limits of variation.J Mol Biol18499105PubMedGoogle Scholar
  37. 37.
    von Heijne, G, Abrahmsen, L 1989Species-specific variation in signal peptide design. Implications for protein secretion in foreign hosts.FEBS Lett244439446CrossRefPubMedGoogle Scholar
  38. 38.
    Walter, P, Gilmore, R, Blobel, G 1984Protein translocation across the endoplasmic reticulum.Cell3858PubMedGoogle Scholar
  39. 39.
    Zhang, MQ 1997Identification of protein coding regions in the human genome by quadratic discriminant analysis.Proc Natl Acad Sci USA94565568PubMedGoogle Scholar
  40. 40.
    Zhang, Z, Wood, WI 2003A profile hidden Markov model for signal peptides generated by HMMER.Bioinformatics19307308CrossRefPubMedGoogle Scholar
  41. 41.
    Zheng, N, Gierasch, LM 1996Signal sequence: the same yet different.Cell86849852PubMedGoogle Scholar

Copyright information

© Springer-Verlag New York Inc. 2003

Authors and Affiliations

  1. 1.College of Life Sciences, National Laboratory of Protein Engineering and Plant Genetic Engineering, and Centre of BioinformaticsPeking University, Beijing 100871China

Personalised recommendations