Finding Signal Peptides in Human Protein Sequences Using Recurrent Neural Networks

  • Martin Reczko
  • Petko Fiziev
  • Eike Staub
  • Artemis Hatzigeorgiou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2452)


A new approach called Sigfind for the prediction of signal peptides in human protein sequences is introduced. The method is based on the bidirectional recurrent neural network architecture. The modifications to this architecture and a better learning algorithm result in a very accurate identification of signal peptides (99.5% correct in fivefold cross-validation). The Sigfind system is available on the WWW for predictions ( synaptic/sig.nd.html).


Signal Peptide Recurrent Neural Network Neural Network Architecture Positional Weight Matrix Human Protein Sequence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bairoch, A., Boeckmann, B.: The swiss-prot protein sequence data bank: current status. Nucleic Acids Res. 22 (1994) 3578–3580CrossRefGoogle Scholar
  2. 2.
    Baldi, P., Brunak, S., Frasconi, P., Pollastri, G., Soda, G.: Bidirectional dynamics for protein secondary structure prediction. In: Sun, R., Giles, L. (eds.): Sequence Learning: Paradigms, Algorithms, and Applications. Springer Verlag (2000)Google Scholar
  3. 3.
    Bengio, Y., P. Simard, Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. on Neural Networks, 5 (1994) 157–166CrossRefGoogle Scholar
  4. 4.
    Hatzigeorgiou, A.: Translation initiation site prediction in human cDNAs with high accuracy. Bioinformatics, 18 (2002) 343–350CrossRefGoogle Scholar
  5. 5.
    Hatzigeorgiou, A., Fizief, P., Reczko, M. Diana-est: A statistical analysis. Bioinformatics, 17 (2001) 913–919CrossRefGoogle Scholar
  6. 6.
    Hatzigeorgiou, A., Papanikolaou, H., Reczko, M.: Finding the reading frame in protein coding regions on dna sequences: a combination of statistical and neural network methods. In: Computational Intelligence: Neural Networks & Advanced Control Strategies. IOS Press, Vienna (1999) 148–153Google Scholar
  7. 7.
    Horton, P., Nakai, K.: Better prediction of protein cellular localization sites with the k nearest neighbors classifier. In: ISMB (1997) 147–152Google Scholar
  8. 8.
    Kyte, J., Doolittle, R.: A simple method dor displaying the hydrophatic character of a protein. J. Mol. Biol., 157 (1982) 105–132CrossRefGoogle Scholar
  9. 9.
    Ladunga, I.: Large-scale predictions of secretory proteins from mammalian genomic and est sequences. Curr. Opin. in Biotechnolgy, 11 (2000) 13–18CrossRefGoogle Scholar
  10. 10.
    Mathews, B. W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochem. Biophys. Acta, Vol. 405 (1975) 442–451Google Scholar
  11. 11.
    Minsky, M., Papert, S.: Perceptrons: An Introduction to Computational Geometry. The MIT Press, Cambridge, Massachusetts (1969) 145zbMATHGoogle Scholar
  12. 12.
    Nielsen, H., Brunak, S., von Heijne, G. Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Engineering, 12 (1999) 3–9CrossRefGoogle Scholar
  13. 13.
    Nielsen, H., Engelbrecht, J., S. Brunak, von Heijne, G.: Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Engineering, 10 (1997) 1–6CrossRefGoogle Scholar
  14. 14.
    Nielsen, H., Krogh, A.: Prediction of signal peptides and signal anchors by a hidden markov model. In: ISMB (1998) 122–130Google Scholar
  15. 15.
    Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In: Ruspini, H., (ed.): Proceedings of the IEEE International Conference on Neural Networks (ICNN 93). IEEE, San Francisco (1993) 586–591Google Scholar
  16. 16.
    Rumelhart, D. E., Hinton, G. E., Williams, R. J.: Learning internal representations by error propagation. In: Rumelhart, D. E., McClelland, J. L. (eds.): Parallel Distributed Processing: Explorations in the microstructure of cognition; Vol. 1: Foundations. The MIT Press, Cambridge, Massachusetts (1986)Google Scholar
  17. 17.
    v. Heijne, G.: A new method for predicting signal sequence cleavage sites. Nucleid Acids Res., 14 (1986) 4683–4690CrossRefGoogle Scholar
  18. 18.
    von Heijne, G.: Computer-assisted identification of protein sorting signals and prediction of membrane protein topology and structure. In: Advances in Computational Biology, volume 2, Jai Press Inc. (1996) 1–14Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Martin Reczko
    • 1
  • Petko Fiziev
    • 2
  • Eike Staub
    • 2
  • Artemis Hatzigeorgiou
    • 3
  1. 1.Synaptic Ltd.Science and Technology Park of CreteVoutes HeraklionGreece
  2. 2.metaGen Pharmaceuticals GmbHBerlinGermany
  3. 3.Department of GeneticsUniversity of Pennsylvania, School of Medicine PhiladelphiaUSA

Personalised recommendations