Learning Hidden Markov Models Using Probabilistic Matrix Factorization

  • Ashutosh Tewari
  • Michael J. Giering
Part of the Studies in Big Data book series (SBD, volume 3)


Hidden Markov Models (HMM) provide an excellent tool for building probabilistic graphical models to describe a sequence of observable entities. The parameters of a HMM are estimated using the Baum–Welch algorithm, which scales linearly with the sequence length and quadratically with the number of hidden states. In this chapter, we propose a significantly faster algorithm for HMM parameter estimation. The crux of the algorithm is the probabilistic factorization of a 2D matrix, in which the \((i,j)\)th element represents the number of times the \(j\)th symbol is found right after the \(i\)th symbol in the observed sequence. We compare the Baum–Welch with the proposed algorithm in various experimental settings and present empirical evidences of the benefits of the proposed method in regards to the reduced time complexity and increased robustness.


  1. 1.
    Baum, L., Petrie, T., Soules, G., Weiss, N.: A maximization technique occuring in statistical analysis of probabilistic function of Markov chains. Ann. Math. Stat. 41, 164–171 (1970)CrossRefMATHMathSciNetGoogle Scholar
  2. 2.
    Eddy, S.: What is a hidden Markov model? Nat. Biotechnol. 22, 1315–1316 (2004)CrossRefGoogle Scholar
  3. 3.
    Fonzo, V., Pentini, F., Parisi, V.: Hidden Markov models in bioinformatics. Curr. Bioinform. 2(1), 49–61 (2007).
  4. 4.
    Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of Uncertainty in Artificial Intelligence, UAI. Stockholm (1999).
  5. 5.
    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’ 99, ACM, New York, NY, USA, pp. 50–57 (1999).
  6. 6.
    Hu, J., Yu, X., Qiu, D., Chen, H.: A simple and efficient hidden Markov model scheme for host- based anomaly intrusion detection. Netw. Mag. Glob. Internetwkg. 23, 42–47 (2009).
  7. 7.
    Juang, B.: On the hidden Markov model and dynamic time wraping for speech recognition—a unified view. AT &T Tech. J. 63, 1212–1243 (1984)MathSciNetGoogle Scholar
  8. 8.
    Killourhy, K., Maxion, R.: Comparing anomaly-detection algorithms for keystroke dynamics. In: 39th International Conference on Dependable Systems and Networks. Lisbon, Portugal (2009)Google Scholar
  9. 9.
    Lakshminarayanan, B., Raich, R.: Non-negative matrix factorization for parameter estimation in hidden Markov models. In: Proceedings of IEEE International Workshop on Machine Learning for Signal Processing. IEEE, Kittila, Finland (2010)Google Scholar
  10. 10.
    Lee, D., Seung, H.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999). doi:  10.1038/44565 Google Scholar
  11. 11.
    Levinson, S., Rabiner, L., Sondhi, M.: An introduction to the application of probabilistic functions of Markov process to automatic speech recognition. Bell Syst. Tech. J. 62, 1035–1074 (1983)CrossRefMATHMathSciNetGoogle Scholar
  12. 12.
    Rabiner, L.R.: Readings in speech recognition. Chap. A tutorial on hidden Markov models and selected applications in speech recognition., pp. 267–296. Morgan Kaufmann Publishers Inc., San Francisco (1990).
  13. 13.
    Uebersax, J.: Diversity of decision-making models and the measurement of interrater agreement. Psychol. Bull. 101, 140–146 (1987)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.United Technologies Research CenterEast HartfordUSA

Personalised recommendations