# Hidden Markov models

**DOI:**https://doi.org/10.1007/1-4020-0611-X_417

## INTRODUCTION

Hidden Markov models (HMMs) constitute a family of versatile statistical models that have proven useful in many applications. HMMs were introduced in their full generality in 1966 by Baum and Petrie (Baum and Petrie, 1966; Baum et al., 1970). Baum, Petrie and other colleagues at the Institute for Defense Analysis also developed and analyzed a maximum likelihood (ML) procedure for efficient estimation of the HMM parameters from a training sequence. This procedure turned out to be an instance of the now well known EM (Expectation-Maximization) algorithm of Dempster, Laird and Rubin (1977). A form of HMM, referred to as a Markov Source, was introduced as early as 1948 by Shannon in developing a model for the English language (Shannon, 1948).

Baum et al. (1970)referred to HMMs as probabilistic functions of Markov chains. Indeed, an HMM process comprises a Markov chain whose states are associated with some probability distributions. For example, the Markov states may be...

## References

- [1]Baum, L.E. and Petrie, T. (1966). “Statistical inference for probabilistic functions of finite state Markov chains,” Ann. Math. Statist., 37, 1554–1563.Google Scholar
- [2]Baum, L.E., Petrie, T., Soules, G., and Weiss, N. (1970). “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains,” Ann. Math. Statist., 41, 164–171.Google Scholar
- [3]Couvreur, C. (1996). Hidden Markov Models and Their Mixtures, Department of Mathematics, Université Catholique de Louvain, Belgium [http://thor.fpms.ac.be/~couvreur/listpub.html].Google Scholar
- [4]Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). “Maximum likelihood from incomplete data via the EM algorithm,” Jl. Royal Stat. Soc. B, 39, 1–38.Google Scholar
- [5]Ferguson, J.D., editor (1980). Proc. of the Symposium on the applications of hidden Markov models to text and speech. IDA-CRD, Princeton, New Jersey.Google Scholar
- [6]Grimmett, G.R. and Stirzaker, D.R. (1995). Probability and Random Processes. Oxford Science Publications, Oxford, UK.Google Scholar
- [7]Jelinek, F. (1974). “Continuous speech recognition by statistical methods,” Proc. IEEE, 64, 532–556.Google Scholar
- [8]Leroux, B.G. (1992). “Maximum likelihood estimation for hidden Markov models,” Stochastic Processes and Their Applications, 40, 127–143.Google Scholar
- [9]Rabiner, L.R. (1989). “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. IEEE, 257–286. Google Scholar
- [10]Shannon, C.E. (1948). “A mathematical theory of communication,” Bell Syst. Tech. Jl., 27, 379–423, 623–656.Google Scholar
- [11]Wu, C.F.J. (1983). “On the convergence properties of the
*EM*algorithm, Ann. Statist., 11, 95–103.Google Scholar