PAC-Learning of Markov Models with Hidden State

  • Ricard Gavaldà
  • Philipp W. Keller
  • Joelle Pineau
  • Doina Precup
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)


The standard approach for learning Markov Models with Hidden State uses the Expectation-Maximization framework. While this approach had a significant impact on several practical applications (e.g. speech recognition, biological sequence alignment) it has two major limitations: it requires a known model topology, and learning is only locally optimal. We propose a new PAC framework for learning both the topology and the parameters in partially observable Markov models. Our algorithm learns a Probabilistic Deterministic Finite Automata (PDFA) which approximates a Hidden Markov Model (HMM) up to some desired degree of accuracy. We discuss theoretical conditions under which the algorithm produces an optimal solution (in the PAC-sense) and demonstrate promising performance on simple dynamical systems.


  1. Carrasco, R., Oncina, J.: Learning stochastic regular grammars by means of a state merging method. In: Carrasco, R.C., Oncina, J. (eds.) ICGI 1994. LNCS, vol. 862. Springer, Heidelberg (1994)Google Scholar
  2. Clark, A., Thollard, F.: PAC-learnability of Probabilistic Deterministic Finite State Automata. Journal of Machine Learning Research 5 (2004)Google Scholar
  3. Dupont, P., Denis, F., Esposito, Y.: Links between Probabilistic Automata and Hidden Markov Models. Pattern Recognition 38(9) (2005)Google Scholar
  4. Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.J.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)zbMATHCrossRefGoogle Scholar
  5. Holmes, M., Isbell, C.: Looping Suffix Tree-Based Inference of Partially Observable Hidden State. In: Proceedings of ICML (2006)Google Scholar
  6. Jaeger, H., Zhao, M., Kolling, A.: Efficient estimation of OOMs. In: Proceedings of NIPS (2005)Google Scholar
  7. Kearns, M., Mansour, Y., Ron, D., Rubinfeld, R., Schapire, R.E., Sellie, L.: On the learnability of discrete distributions. In: ACM Symposium on the Theory of Computing (1995)Google Scholar
  8. Lipton, R.J., Naughton, J.F.: Query size estimation by adaptive sampling. J. Computer and System Sciences 51, 18–25 (1995)zbMATHCrossRefMathSciNetGoogle Scholar
  9. Ostendorf, M., Singer, H.: HMM topology design using maximum likelihood successive state splitting. Computer Speech and Language 11 (1997)Google Scholar
  10. Rabiner, L.R.: A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE 77(2) (1989)Google Scholar
  11. Ron, D., Singer, Y., Tishby, N.: On the learnability and usage of acyclic robabilistic finite automata. In: Proceedings of COLT (1995)Google Scholar
  12. Rosencrantz, M., Gordon, G., Thrun, S.: Learning Low Dimensional Predictive Representations. In: Proceedings of ICML (2004)Google Scholar
  13. Singh, S., Littman, M.L., Jong, N.K., Pardoe, D., Stone, P.: Learning Predictive State Representations. In: Proceedings of ICML (2003)Google Scholar
  14. Stolcke, A., Omohundro, S.M.: Hidden Markov Model Induction by Bayesian Model Merging. In: Proceedings of NIPS (1993)Google Scholar
  15. Thollard, F., Dupont, P., de la Higuera, C.: Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality. In: Proceedings of ICML (2000)Google Scholar
  16. Valiant, L.: A theory of the learnable. Communications of the ACM 27(11) (1984)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ricard Gavaldà
    • 1
  • Philipp W. Keller
    • 2
  • Joelle Pineau
    • 2
  • Doina Precup
    • 2
  1. 1.Universitat Politècnica de CatalunyaBarcelonaSpain
  2. 2.School of Computer ScienceMcGill UniversityMontrealCanada

Personalised recommendations