Hidden Markov models
Hidden Markov models (HMMs) constitute a family of versatile statistical models that have proven useful in many applications. HMMs were introduced in their full generality in 1966 by Baum and Petrie (Baum and Petrie, 1966; Baum et al., 1970). Baum, Petrie and other colleagues at the Institute for Defense Analysis also developed and analyzed a maximum likelihood (ML) procedure for efficient estimation of the HMM parameters from a training sequence. This procedure turned out to be an instance of the now well known EM (Expectation-Maximization) algorithm of Dempster, Laird and Rubin (1977). A form of HMM, referred to as a Markov Source, was introduced as early as 1948 by Shannon in developing a model for the English language (Shannon, 1948).
Baum et al. (1970)referred to HMMs as probabilistic functions of Markov chains. Indeed, an HMM process comprises a Markov chain whose states are associated with some probability distributions. For example, the Markov states may be...
- Baum, L.E. and Petrie, T. (1966). “Statistical inference for probabilistic functions of finite state Markov chains,” Ann. Math. Statist., 37, 1554–1563.Google Scholar
- Baum, L.E., Petrie, T., Soules, G., and Weiss, N. (1970). “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains,” Ann. Math. Statist., 41, 164–171.Google Scholar
- Couvreur, C. (1996). Hidden Markov Models and Their Mixtures, Department of Mathematics, Université Catholique de Louvain, Belgium [http://thor.fpms.ac.be/~couvreur/listpub.html].Google Scholar
- Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). “Maximum likelihood from incomplete data via the EM algorithm,” Jl. Royal Stat. Soc. B, 39, 1–38.Google Scholar
- Ferguson, J.D., editor (1980). Proc. of the Symposium on the applications of hidden Markov models to text and speech. IDA-CRD, Princeton, New Jersey.Google Scholar
- Grimmett, G.R. and Stirzaker, D.R. (1995). Probability and Random Processes. Oxford Science Publications, Oxford, UK.Google Scholar
- Jelinek, F. (1974). “Continuous speech recognition by statistical methods,” Proc. IEEE, 64, 532–556.Google Scholar
- Leroux, B.G. (1992). “Maximum likelihood estimation for hidden Markov models,” Stochastic Processes and Their Applications, 40, 127–143.Google Scholar
- Rabiner, L.R. (1989). “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. IEEE, 257–286. Google Scholar
- Shannon, C.E. (1948). “A mathematical theory of communication,” Bell Syst. Tech. Jl., 27, 379–423, 623–656.Google Scholar
- Wu, C.F.J. (1983). “On the convergence properties of the EM algorithm, Ann. Statist., 11, 95–103.Google Scholar