Bidirectional Dynamics for Protein Secondary Structure Prediction

  • Pierre Baldi
  • Søren Brunak
  • Paolo Frasconi
  • Gianluca Pollastri
  • Giovanni Soda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1828)


Connectionist models for learning in sequential domains are typically dynamical systems that use hidden states to store contextual information. In principle, these models can adapt to variable time lags and perform complex sequential mappings. In spite of several successful applications (mostly based on hidden Markov models), the general class of sequence learning problems is still far from being satisfactorily solved. In particular, learning sequential translations is generally a hard task and current models seem to exhibit a number of limitations. One of these limitations, at least for some application domains, is the causality assumption. A dynamical system is said to be causal if the output at (discrete) time t does not depend on future inputs. Causality is easy to justify in dynamics that attempt to model the behavior of many physical systems. Clearly, in these cases the response at time t cannot depend on stimulae that the system has not yet received as input. As it turns out, non-causal dynamics over infinite time horizons cannot be realized by any physical or computational device. For certain categories of finite sequences, however, information from both the past and the future can be very useful for analysis and predictions at time t. This is the case, for example, of DNA and protein sequences where the structure and function of a region in the sequence may strongly depend on events located both upstream and downstream of the region, sometimes at considerable distances. Another good example is provided by the off-line translation of a language into another one.


Hide Markov Model Bayesian Network Recurrent Neural Network Hide Unit Hide State 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Andrew, F., & Dimitriadis, M. (1994). Forecasting probability densities by using hidden markov models with mixed states. In Weigend, A. S., & Gershenfeld, N. (Eds. ), Time Series Prediction: Forecasting the Future and Understanding the Past. Addison-Wesley.Google Scholar
  2. Angluin, D., & Smith, C.H. (1983). A survey of inductive inference: Theory and methods. ACM Comput. Surv., 15(3), 237–269.CrossRefMathSciNetGoogle Scholar
  3. Bairoch, A., & Apweiler, R. (1999). The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res, pp.49–54.Google Scholar
  4. Baldi, P., & Brunak, S. (1998). Bioinformatics: The Machine Learning Approach. MIT Press, Cambridge, MA.Google Scholar
  5. Baldi, P., Brunak, S., Chauvin, Y., & Nielsen, H. (1999). Assessing the accuracy of prediction algorithms for classification: an overview. Submitted for publication.Google Scholar
  6. Baldi, P., & Chauvin, Y. (1996). Hybrid modeling, HMM/NN architectures, and protein applications. Neural Computation, 8(7), 1541–1565.CrossRefGoogle Scholar
  7. Baldi, P., Chauvin, Y., Hunkapillar, T., & McClure, M. (1994). Hidden Markov models of biological primary sequence information. Proc. Natl. Acad. Sci. USA, 91, 1059–1063.CrossRefGoogle Scholar
  8. Bengio, Y., & Frasconi, P. (1995). An input output HMM architecture. In Tesauro, G., Touretzky, D., & Leen, T. (Eds.), Advances in Neural Information Processing Systems 7, pp. 427–434. The MIT Press.Google Scholar
  9. Bengio, Y., & Frasconi, P. (1996). Input-output HMM’s for sequence processing. IEEE Trans. on Neural Networks, 7(5), 1231–1249.CrossRefGoogle Scholar
  10. Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Trans. on Neural Networks, 5(2), 157–166.CrossRefGoogle Scholar
  11. Bengio, Y., & Frasconi, P. (1994). Credit assignment through time: Alternatives to backpropagation. In Cowan, J. D., Tesauro, G., & Alspector, J. (Eds.), Advances in Neural Information Processing Systems, Vol. 6, pp.75–82. Morgan Kaufmann Publishers, Inc.Google Scholar
  12. Bengio, Y., LeCun, Y., Nohl, C., & Burges, C. (1995). LeRec: A NN/HMM hybrid for on-line handwriting recognition. Neural Computation, 7(6), 1289–1303.CrossRefGoogle Scholar
  13. Bernstein, F. C., & et al. (1977). The protein data bank: A computer based archival file for macromolecular structures. J. Mol. Biol., 112, 535–542.CrossRefGoogle Scholar
  14. Bridle, J.S. (1989). Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. In D.S. Touretzky (Ed.), Advances in Neural Information Processing Systems, Vol. 2, pp. 211–217. Morgan Kaufmann.Google Scholar
  15. Brown, P. (1987). The Acoustic-Modeling problem in Automatic Speech Recognition. Ph.D. thesis, Dept. of Computer Science, Carnegie-Mellon University.Google Scholar
  16. Bunke, H., Roth, M., & Schukat-Talamazzini, E. (1995). Off-line Cursive Handwriting Recognition Using Hidden Markov Models. Pattern Recognition, 28(9), 1399–1413.CrossRefGoogle Scholar
  17. CASP3 (1998). Third community wide experiment on the critical assessment of techniques for protein structure prediction. Unpublished results available in
  18. Charniak, E. (1993). Statistical Language Learning. MIT Press.Google Scholar
  19. Cuff, J. A., & Barton, G. J. (1999). Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins, 34, 508–519.CrossRefGoogle Scholar
  20. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum-likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society B, 39, 1–38.MATHMathSciNetGoogle Scholar
  21. Frasconi, P., Gori, M., Maggini, M., & Soda, G. (1996). Representation of finite state automata in recurrent radial basis function networks. Machine Learning, 23, 5–32.MATHGoogle Scholar
  22. Frasconi, P., Gori, M., & Sperduti, A. (1998). A general framework for adaptive processing of data structures. IEEE Trans. on Neural Networks, 9(5), 768–786.CrossRefGoogle Scholar
  23. Freitag, D., & McCallum, A. (2000). Information extraction with hmm structures learned by stochastic optimization. In Proc. AAAI.Google Scholar
  24. Frishman, D., & Argos, P. (1995). Knowledge-based secondary structure assignment. Proteins, 23, 566–579.CrossRefGoogle Scholar
  25. Ghahramani, Z., & Jordan, M.I. (1997). Factorial hidden Markov models. Machine Learning, 29, 245–274.MATHCrossRefGoogle Scholar
  26. Giles, C. L., Miller, C. B., Chen, D., Chen, H. H., Sun, G. Z., & Lee, Y. C. (1992). Learning and extracting finite state automata with second-order recurrent neural networks. Neural Computation, 4(3), 393–405.CrossRefGoogle Scholar
  27. Goller, C., & Kuechler, A. (1996). Learning task-dependent distributed structure-representations by backpropagation through structure. In IEEE International Conference on Neural Networks, pp. 347–352.Google Scholar
  28. Hansen, L. K., & Salamon, P. (1990). Neural network ensembles. IEEE Trans. on Pattern Analysis and Machine Intelligence, 12, 993–1001.CrossRefGoogle Scholar
  29. Heckerman, D. (1997). Bayesian networks for data mining. Data Mining and Knowledge Discovery, 1(1), 79–119.CrossRefGoogle Scholar
  30. Hobohm, U., Scharf, M., Schneider, R., & Sander, C. (1992). Selection of representative data sets. Prot. Sci., 1, 409–417.CrossRefGoogle Scholar
  31. Jelinek, F. (1997). Statistical Methods for Speech Recognition. MIT Press.Google Scholar
  32. Jensen, F. V., Lauritzen, S. L., & Olosen, K. G. (1990). Bayesian updating in recursive graphical models by local computations. Comput. Stat. Quarterly, 4, 269–282.Google Scholar
  33. Jones, D. (1999). Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology, pp.195–202.Google Scholar
  34. Kabsch, W., & Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577–2637.CrossRefGoogle Scholar
  35. Krogh, A., Brown, M., Mian, I. S., Sjolander, K., & Haussler, D. (1994). Hidden Markov models in computational biology: Applications to protein modeling. J. Mol. Biol., pp.1501–1531.Google Scholar
  36. Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation, and active learning. In Tesauro, G., Touretzky, D., & Leen, T. (Eds.), Advances in Neural Information Processing Systems 7, pp.231–238. The MIT Press.Google Scholar
  37. Lin, T., Horne, B. G., & Giles, C. L. (1998). How embedded memory in recurrent neural network architectures helps learning long-term temporal dependencies. Neural Networks, 11(5), 861–868.CrossRefGoogle Scholar
  38. Lin, T., Horne, B. G., Tino, P., & Giles, C. L. (1996). Learning long-term dependencies in NARX recurrent neural networks. IEEE Transactions on Neural Networks, 7(6), 1329–1338.CrossRefGoogle Scholar
  39. Lucke, H. (1995). Bayesian belief networks as a tool for stochastic parsing. Speech Communication, 16, 89–118.CrossRefGoogle Scholar
  40. Moult, J., & et al. (1997). Critical assessment of methods of protein structure prediction (CASP): Round II. Proteins, 29(S1), 2–6. Supplement 1.CrossRefGoogle Scholar
  41. Myers, E.W., & Miller, W. (1988). Optimal alignments in linear space. Comput. Appl. Biosci., 4, 11–7.Google Scholar
  42. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann.Google Scholar
  43. Pearson, W. R. (1990). Rapid and sensitive sequence comparison with FASTP and FASTA. Meth. Enzymol., pp. 63–98.Google Scholar
  44. Qian, N., & Sejnowski, T. J. (1988). Predicting the secondary structure of glubular proteins using neural network models. J. Mol. Biol., 202, 865–884.CrossRefGoogle Scholar
  45. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRefGoogle Scholar
  46. Richards, F.M., & Kundrot, C. E. (1988). Identification of structural motifs from protein coordinate data: secondary structure and first-level supersecondary structure. Proteins, 3, 71–84.CrossRefGoogle Scholar
  47. Riis, S. K., & Krogh, A. (1996). Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments. J. Comput. Biol., 3, 163–183.CrossRefGoogle Scholar
  48. Rost, B., & Sander, C. (1993a). Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. Natl. Acad. Sci. USA, 90(16), 7558–7562.CrossRefGoogle Scholar
  49. Rost, B., & Sander, C. (1993b). Prediction of protein secondary structure at better than 70 % accuracy. J. Mol. Biol., 232(2), 584–599.CrossRefGoogle Scholar
  50. Rost, B., & Sander, C. (1994). Combining evolutionary information and neural networks to predict protein secondary structure. Proteins, pp. 55–72.Google Scholar
  51. Schneider, R., de Daruvar, A., & Sander, C. (1997). The hssp database of protein structure-sequence alignments. Nucleic Acids Research, 25, 226–230.CrossRefGoogle Scholar
  52. Smyth, P., Heckerman, D., & Jordan, M. I. (1997). Probabilistic independence networks for hidden markov probability models. Neural Computation, 9(2), 227–269.MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Pierre Baldi
    • 1
  • Søren Brunak
    • 2
  • Paolo Frasconi
    • 3
  • Gianluca Pollastri
    • 1
  • Giovanni Soda
    • 3
  1. 1.University of California at IrvineIrvine
  2. 2.The Technical University of DenmarkDenmark
  3. 3.University of FlorenceFlorence

Personalised recommendations