The complete realization problem for hidden Markov models: a survey and some new results

Original Article

Abstract

Suppose m is a positive integer, and let \({\mathcal{M} = \{1, \ldots ,m\}}\) . Suppose \({\{\mathcal{Y}_t \}}\) is a stationary stochastic process assuming values in \({\mathcal{M}}\) . In this paper we study the question: When does there exist a hidden Markov model (HMM) that reproduces the statistics of this process? This question is more than forty years old, and as yet no complete solution is available. In this paper, we begin by surveying several known results, and then we present some new results that provide ‘almost’ necessary and sufficient conditions for the existence of a HMM for a mixing and ultra-mixing process (where the notion of ultra-mixing is introduced here). In the survey part of the paper, consisting of Sects. 2 through 8, we rederive the following known results: (i) Associate an infinite matrix H with the process, and call it a ‘Hankel’ matrix (because of some superficial similarity to a Hankel matrix). Then the process has a HMM realization only if H has finite rank. (ii) However, the finite Hankel rank condition is not sufficient in general. There exist processes with finite Hankel rank that do not admit a HMM realization. (iii) An abstract necessary and sufficient condition states that a frequency distribution has a realization as an HMM if and only if it belongs to a ‘stable polyhedral’ convex set within the set of all frequency distributions on \({\mathcal{M}^{*}}\) , the set of all finite strings over \({\mathcal{M}}\) . While this condition may be ‘necessary and sufficient,’ it virtually amounts to a restatement of the problem rather than a solution of it, as observed by Anderson (Math Control Signals Syst 12(1):80–120, 1999). (iv) Suppose a process has finite Hankel rank, say r. Then there always exists a ‘regular quasi-realization’ of the process. That is, there exist a row vector, a column vector, and a set of matrices, each of dimension r or r × r as appropriate, such that the frequency of arbitrary strings is given by a formula that is similar to the corresponding formula for HMM’s. Moreover, all quasi-regular realizations of the process can be obtained from one of them via a similarity transformation. Hence, given a finite Hankel-rank process, it is a simple matter to determine whether or not it has a regular HMM in the conventional sense, by testing the feasibility of a linear programming problem. (v) If in addition the process is α-mixing, every regular quasi-realization has additional features. Specifically, a matrix associated with the quasi-realization (which plays the role of the state transition matrix in a HMM) is ‘quasi-row stochastic’ (in that its rows add up to one, even though the matrix may not be nonnegative), and it also satisfies the ‘quasi-strong Perron property’ (its spectral radius is one, the spectral radius is a simple eigenvalue, and there are no other eigenvalues on the unit circle). A corollary is that if a finite Hankel rank α-mixing process has a regular HMM in the conventional sense, then the associated Markov chain is irreducible and aperiodic. While this last result is not surprising, it does not seem to have been stated explicitly. While the above results are all ‘known,’ they are scattered over the literature; moreover, the presentation here is unified and occasionally consists of relatively simpler proofs than are found in the literature. Next we move on to present some new results. The key is the introduction of a property called ‘ultra-mixing.’ The following results are established: (a) Suppose a process has finite Hankel rank, is both α-mixing as well as ‘ultra-mixing,’ and in addition satisfies a technical condition. Then it has an irreducible HMM realization (and not just a quasi-realization). Moreover, the Markov process underlying the HMM is either aperiodic (and is thus α-mixing), or else satisfies a ‘consistency condition.’ (b) In the other direction, suppose a HMM satisfies the consistency condition plus another technical condition. Then the associated output process has finite Hankel rank, is α-mixing and is also ultra-mixing. Moreover, it is shown that under a natural topology on the set of HMMs, both ‘technical’ conditions are indeed satisfied by an open dense set of HMMs. Taken together, these two results show that, modulo two technical conditions, the finite Hankel rank condition, α-mixing, and ultra-mixing are ‘almost’ necessary and sufficient for a process to have an irreducible and aperiodic HMM.

Keywords

Markov Chain Hide Markov Model Markov Process Spectral Radius Math Stat 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Anderson BDO (1999) The realization problem for hidden Markov models. Math Control Signals Syst 12(1): 80–120MATHCrossRefGoogle Scholar
  2. 2.
    Anderson BDO, Deistler M, Farina L, Benvenuti L (1996) Nonnegative realization of a system with a nonnegative impulse response. IEEE Trans Circ Syst I Fundam Theory Appl 43: 134–142MathSciNetCrossRefGoogle Scholar
  3. 3.
    Baldi P, Brunak S (2001) Bioinformatics: a machine learning approach, 2nd edn. MIT Press, CambridgeMATHGoogle Scholar
  4. 4.
    Baum LE, Petrie T (1966) Statistical inference for probabilistic functions of finite state Markov chains. Ann Math Stat 37: 1554–1563MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occuring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41(1): 164–171MathSciNetMATHCrossRefGoogle Scholar
  6. 6.
    Benvenuti L, Farina L (2004) A tutorial on the positive realization problem. IEEE Trans Autom Control 49: 651–664MathSciNetCrossRefGoogle Scholar
  7. 7.
    Berman A, Plemmons RJ (1979) Nonnegative matrices. Academic Press, New YorkMATHGoogle Scholar
  8. 8.
    Blackwell D, Koopmans L (1957) On the identifiability problem for functions of finite Markov chains. Ann Math Stat 28: 1011–1015MathSciNetMATHCrossRefGoogle Scholar
  9. 9.
    Blondel V, Catarini V (2003) Undecidable problems for probabilistic automata of fixed dimension. Theory Comput Syst 36: 231–245MathSciNetMATHCrossRefGoogle Scholar
  10. 10.
    Carlyle JW (1967) Identification of state-calculable functions of finite Markov chains. Ann Math Stat 38: 201–205MathSciNetMATHCrossRefGoogle Scholar
  11. 11.
    Carlyle JW (1969) Stochastic finite-state system theory. In: Zadeh L, Polak E (eds) System theory, chap 10. McGraw-Hill, New YorkGoogle Scholar
  12. 12.
    Cawley SE, Wirth AL, Speed TP (2001) Phat—a gene finding program for Plasmodium falciparum. Mol Biochem Parasitol 118: 167–174CrossRefGoogle Scholar
  13. 13.
    Delcher AL, Harmon D, Kasif S, White O, Salzberg SL (1999) Improved microbial gene identification with GLIMMER. Nucl Acids Res 27(23): 4636–4641CrossRefGoogle Scholar
  14. 14.
    Dharmadhikari SW (1963) Functions of finite Markov chains. Ann Math Stat 34: 1022–1031MathSciNetMATHCrossRefGoogle Scholar
  15. 15.
    Dharmadhikari SW (1963) Sufficient conditions for a stationary process to be a function of a Markov chain. Ann Math Stat 34: 1033–1041MathSciNetMATHCrossRefGoogle Scholar
  16. 16.
    Dharmadhikari SW (1965) A characterization of a class of functions of finite Markov chains. Ann Math Stat 36: 524–528MathSciNetMATHCrossRefGoogle Scholar
  17. 17.
    Dharmadhikari SW (1969) A note on exchangeable processes with states of finite rank. Ann Math Stat 40(6): 2207–2208MathSciNetMATHCrossRefGoogle Scholar
  18. 18.
    Dharmadhikari SW, Nadkarni MG (1970) Some regular and non-regular functions of finite Markov chains. Ann Math Stat 41(1): 207–213MathSciNetMATHCrossRefGoogle Scholar
  19. 19.
    Erickson RV (1970) Functions of Markov chains. Ann Math Stat 41: 843–850MathSciNetMATHCrossRefGoogle Scholar
  20. 20.
    Fliess M (1975) Series rationelles positives et processus stochastique. Ann Inst Henri Poincaré Sect B XI:1–21Google Scholar
  21. 21.
    Fox M, Rubin H (1968) Functions of processes with Markovian states. Ann Mathematical Stat 39: 938–946MathSciNetMATHCrossRefGoogle Scholar
  22. 22.
    Gilbert EJ (1959) The identifiability problem for functions of Markov chains. Ann Math Stat 30: 688–697MATHCrossRefGoogle Scholar
  23. 23.
    Heller A (1965) On stochastic processes derived from Markov chains. Ann Math 36: 1286–1291MathSciNetMATHGoogle Scholar
  24. 24.
    Ito H, Amari S, Kobayashi K (1992) Identifiability of hidden Markov information sources and their minimum degrees of freedom. IEEE Trans Inf Theory 38: 324–333MathSciNetMATHCrossRefGoogle Scholar
  25. 25.
    Jelinek F (1997) Statistical Methods for speech recognition. MIT Press, CambridgeGoogle Scholar
  26. 26.
    Kalikow S (1990) Random Markov processes and uniform martingales. Isr J Math 71(1): 33–54MathSciNetMATHCrossRefGoogle Scholar
  27. 27.
    Krogh A, Brown M, Mian IS, Sjölander K, Haussler D (1994) Hidden Markov models in computational biology: applications to protein modeling. J Mol Biol 235: 1501–1531CrossRefGoogle Scholar
  28. 28.
    Krogh A, Mian IS, Haussler D (1994) A hidden Markov model that finds genes in E. coli DNA. Nucl Acids Res 22(22): 4768–4778CrossRefGoogle Scholar
  29. 29.
    Kronecker L (1881) Zur Theorie der Elimination einer Variablen aus zwei algebraischen Gleichungen. Monatsber Königl Preuss Akad Wiss Berlin, pp 535–600Google Scholar
  30. 30.
    Majoros WH, Salzberg SL (2004) An empirical analysis of training protocols for probabilistic gene finders. BMC Bioinforma. http://www.biomedcentral.com/1471-2105/5/206
  31. 31.
    Ornstein DS, Weiss B (1990) How sampling reveals a process. Ann Probab 18(3): 905–930MathSciNetMATHCrossRefGoogle Scholar
  32. 32.
    Picci G (1978) On the internal structure of finite-state stochastic processes. In: Mohler R, Ruberti A (eds) Recent developments in variable structure systems. Lecture notes in economics and mathematical systems, vol 162. Springer, HeidelbergGoogle Scholar
  33. 33.
    Rabiner LW (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2): 257–285CrossRefGoogle Scholar
  34. 34.
    Rozenberg G, Salomaa A (1994) Cornerstones in undecidability. Prentice-Hall, Englewood CliffsGoogle Scholar
  35. 35.
    Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucl Acids Res 26(2): 544–548CrossRefGoogle Scholar
  36. 36.
    Seneta E (1981) Non-negative matrices and Markov chains, 2nd edn. Springer, New YorkMATHGoogle Scholar
  37. 37.
    Sontag ED (1975) On certain questions of rationality and decidability. J Comput Syst Sci 11: 375–381MathSciNetMATHCrossRefGoogle Scholar
  38. 38.
    van den Hof JM (1997) Realization of continuous-time positive linear systems. Syst Control Lett 31: 243–253MATHCrossRefGoogle Scholar
  39. 39.
    van den Hof JM, van Schuppen JH (1994) Realization of positive linear systems using polyhedral cones. In: Proceedings of the 33rd IEEE conference on decision and control, pp 3889–3893Google Scholar
  40. 40.
    Vidyasagar M (2003) Learning and generalization with applications to neural networks. Springer, LondonGoogle Scholar
  41. 41.
    Vidyasagar M (2003) Nonlinear systems analysis. SIAM Publications, PhiladelphiaGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.The University of Texas at DallasRichardsonUSA

Personalised recommendations