Skip to main content
Log in

The complete realization problem for hidden Markov models: a survey and some new results

  • Original Article
  • Published:
Mathematics of Control, Signals, and Systems Aims and scope Submit manuscript

Abstract

Suppose m is a positive integer, and let \({\mathcal{M} = \{1, \ldots ,m\}}\) . Suppose \({\{\mathcal{Y}_t \}}\) is a stationary stochastic process assuming values in \({\mathcal{M}}\) . In this paper we study the question: When does there exist a hidden Markov model (HMM) that reproduces the statistics of this process? This question is more than forty years old, and as yet no complete solution is available. In this paper, we begin by surveying several known results, and then we present some new results that provide ‘almost’ necessary and sufficient conditions for the existence of a HMM for a mixing and ultra-mixing process (where the notion of ultra-mixing is introduced here). In the survey part of the paper, consisting of Sects. 2 through 8, we rederive the following known results: (i) Associate an infinite matrix H with the process, and call it a ‘Hankel’ matrix (because of some superficial similarity to a Hankel matrix). Then the process has a HMM realization only if H has finite rank. (ii) However, the finite Hankel rank condition is not sufficient in general. There exist processes with finite Hankel rank that do not admit a HMM realization. (iii) An abstract necessary and sufficient condition states that a frequency distribution has a realization as an HMM if and only if it belongs to a ‘stable polyhedral’ convex set within the set of all frequency distributions on \({\mathcal{M}^{*}}\) , the set of all finite strings over \({\mathcal{M}}\) . While this condition may be ‘necessary and sufficient,’ it virtually amounts to a restatement of the problem rather than a solution of it, as observed by Anderson (Math Control Signals Syst 12(1):80–120, 1999). (iv) Suppose a process has finite Hankel rank, say r. Then there always exists a ‘regular quasi-realization’ of the process. That is, there exist a row vector, a column vector, and a set of matrices, each of dimension r or r × r as appropriate, such that the frequency of arbitrary strings is given by a formula that is similar to the corresponding formula for HMM’s. Moreover, all quasi-regular realizations of the process can be obtained from one of them via a similarity transformation. Hence, given a finite Hankel-rank process, it is a simple matter to determine whether or not it has a regular HMM in the conventional sense, by testing the feasibility of a linear programming problem. (v) If in addition the process is α-mixing, every regular quasi-realization has additional features. Specifically, a matrix associated with the quasi-realization (which plays the role of the state transition matrix in a HMM) is ‘quasi-row stochastic’ (in that its rows add up to one, even though the matrix may not be nonnegative), and it also satisfies the ‘quasi-strong Perron property’ (its spectral radius is one, the spectral radius is a simple eigenvalue, and there are no other eigenvalues on the unit circle). A corollary is that if a finite Hankel rank α-mixing process has a regular HMM in the conventional sense, then the associated Markov chain is irreducible and aperiodic. While this last result is not surprising, it does not seem to have been stated explicitly. While the above results are all ‘known,’ they are scattered over the literature; moreover, the presentation here is unified and occasionally consists of relatively simpler proofs than are found in the literature. Next we move on to present some new results. The key is the introduction of a property called ‘ultra-mixing.’ The following results are established: (a) Suppose a process has finite Hankel rank, is both α-mixing as well as ‘ultra-mixing,’ and in addition satisfies a technical condition. Then it has an irreducible HMM realization (and not just a quasi-realization). Moreover, the Markov process underlying the HMM is either aperiodic (and is thus α-mixing), or else satisfies a ‘consistency condition.’ (b) In the other direction, suppose a HMM satisfies the consistency condition plus another technical condition. Then the associated output process has finite Hankel rank, is α-mixing and is also ultra-mixing. Moreover, it is shown that under a natural topology on the set of HMMs, both ‘technical’ conditions are indeed satisfied by an open dense set of HMMs. Taken together, these two results show that, modulo two technical conditions, the finite Hankel rank condition, α-mixing, and ultra-mixing are ‘almost’ necessary and sufficient for a process to have an irreducible and aperiodic HMM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Anderson BDO (1999) The realization problem for hidden Markov models. Math Control Signals Syst 12(1): 80–120

    Article  MATH  Google Scholar 

  2. Anderson BDO, Deistler M, Farina L, Benvenuti L (1996) Nonnegative realization of a system with a nonnegative impulse response. IEEE Trans Circ Syst I Fundam Theory Appl 43: 134–142

    Article  MathSciNet  Google Scholar 

  3. Baldi P, Brunak S (2001) Bioinformatics: a machine learning approach, 2nd edn. MIT Press, Cambridge

    MATH  Google Scholar 

  4. Baum LE, Petrie T (1966) Statistical inference for probabilistic functions of finite state Markov chains. Ann Math Stat 37: 1554–1563

    Article  MathSciNet  MATH  Google Scholar 

  5. Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occuring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41(1): 164–171

    Article  MathSciNet  MATH  Google Scholar 

  6. Benvenuti L, Farina L (2004) A tutorial on the positive realization problem. IEEE Trans Autom Control 49: 651–664

    Article  MathSciNet  Google Scholar 

  7. Berman A, Plemmons RJ (1979) Nonnegative matrices. Academic Press, New York

    MATH  Google Scholar 

  8. Blackwell D, Koopmans L (1957) On the identifiability problem for functions of finite Markov chains. Ann Math Stat 28: 1011–1015

    Article  MathSciNet  MATH  Google Scholar 

  9. Blondel V, Catarini V (2003) Undecidable problems for probabilistic automata of fixed dimension. Theory Comput Syst 36: 231–245

    Article  MathSciNet  MATH  Google Scholar 

  10. Carlyle JW (1967) Identification of state-calculable functions of finite Markov chains. Ann Math Stat 38: 201–205

    Article  MathSciNet  MATH  Google Scholar 

  11. Carlyle JW (1969) Stochastic finite-state system theory. In: Zadeh L, Polak E (eds) System theory, chap 10. McGraw-Hill, New York

  12. Cawley SE, Wirth AL, Speed TP (2001) Phat—a gene finding program for Plasmodium falciparum. Mol Biochem Parasitol 118: 167–174

    Article  Google Scholar 

  13. Delcher AL, Harmon D, Kasif S, White O, Salzberg SL (1999) Improved microbial gene identification with GLIMMER. Nucl Acids Res 27(23): 4636–4641

    Article  Google Scholar 

  14. Dharmadhikari SW (1963) Functions of finite Markov chains. Ann Math Stat 34: 1022–1031

    Article  MathSciNet  MATH  Google Scholar 

  15. Dharmadhikari SW (1963) Sufficient conditions for a stationary process to be a function of a Markov chain. Ann Math Stat 34: 1033–1041

    Article  MathSciNet  MATH  Google Scholar 

  16. Dharmadhikari SW (1965) A characterization of a class of functions of finite Markov chains. Ann Math Stat 36: 524–528

    Article  MathSciNet  MATH  Google Scholar 

  17. Dharmadhikari SW (1969) A note on exchangeable processes with states of finite rank. Ann Math Stat 40(6): 2207–2208

    Article  MathSciNet  MATH  Google Scholar 

  18. Dharmadhikari SW, Nadkarni MG (1970) Some regular and non-regular functions of finite Markov chains. Ann Math Stat 41(1): 207–213

    Article  MathSciNet  MATH  Google Scholar 

  19. Erickson RV (1970) Functions of Markov chains. Ann Math Stat 41: 843–850

    Article  MathSciNet  MATH  Google Scholar 

  20. Fliess M (1975) Series rationelles positives et processus stochastique. Ann Inst Henri Poincaré Sect B XI:1–21

    Google Scholar 

  21. Fox M, Rubin H (1968) Functions of processes with Markovian states. Ann Mathematical Stat 39: 938–946

    Article  MathSciNet  MATH  Google Scholar 

  22. Gilbert EJ (1959) The identifiability problem for functions of Markov chains. Ann Math Stat 30: 688–697

    Article  MATH  Google Scholar 

  23. Heller A (1965) On stochastic processes derived from Markov chains. Ann Math 36: 1286–1291

    MathSciNet  MATH  Google Scholar 

  24. Ito H, Amari S, Kobayashi K (1992) Identifiability of hidden Markov information sources and their minimum degrees of freedom. IEEE Trans Inf Theory 38: 324–333

    Article  MathSciNet  MATH  Google Scholar 

  25. Jelinek F (1997) Statistical Methods for speech recognition. MIT Press, Cambridge

    Google Scholar 

  26. Kalikow S (1990) Random Markov processes and uniform martingales. Isr J Math 71(1): 33–54

    Article  MathSciNet  MATH  Google Scholar 

  27. Krogh A, Brown M, Mian IS, Sjölander K, Haussler D (1994) Hidden Markov models in computational biology: applications to protein modeling. J Mol Biol 235: 1501–1531

    Article  Google Scholar 

  28. Krogh A, Mian IS, Haussler D (1994) A hidden Markov model that finds genes in E. coli DNA. Nucl Acids Res 22(22): 4768–4778

    Article  Google Scholar 

  29. Kronecker L (1881) Zur Theorie der Elimination einer Variablen aus zwei algebraischen Gleichungen. Monatsber Königl Preuss Akad Wiss Berlin, pp 535–600

  30. Majoros WH, Salzberg SL (2004) An empirical analysis of training protocols for probabilistic gene finders. BMC Bioinforma. http://www.biomedcentral.com/1471-2105/5/206

  31. Ornstein DS, Weiss B (1990) How sampling reveals a process. Ann Probab 18(3): 905–930

    Article  MathSciNet  MATH  Google Scholar 

  32. Picci G (1978) On the internal structure of finite-state stochastic processes. In: Mohler R, Ruberti A (eds) Recent developments in variable structure systems. Lecture notes in economics and mathematical systems, vol 162. Springer, Heidelberg

  33. Rabiner LW (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2): 257–285

    Article  Google Scholar 

  34. Rozenberg G, Salomaa A (1994) Cornerstones in undecidability. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  35. Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucl Acids Res 26(2): 544–548

    Article  Google Scholar 

  36. Seneta E (1981) Non-negative matrices and Markov chains, 2nd edn. Springer, New York

    MATH  Google Scholar 

  37. Sontag ED (1975) On certain questions of rationality and decidability. J Comput Syst Sci 11: 375–381

    Article  MathSciNet  MATH  Google Scholar 

  38. van den Hof JM (1997) Realization of continuous-time positive linear systems. Syst Control Lett 31: 243–253

    Article  MATH  Google Scholar 

  39. van den Hof JM, van Schuppen JH (1994) Realization of positive linear systems using polyhedral cones. In: Proceedings of the 33rd IEEE conference on decision and control, pp 3889–3893

  40. Vidyasagar M (2003) Learning and generalization with applications to neural networks. Springer, London

    Google Scholar 

  41. Vidyasagar M (2003) Nonlinear systems analysis. SIAM Publications, Philadelphia

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Vidyasagar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vidyasagar, M. The complete realization problem for hidden Markov models: a survey and some new results. Math. Control Signals Syst. 23, 1–65 (2011). https://doi.org/10.1007/s00498-011-0066-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00498-011-0066-7

Keywords

Navigation