Skip to main content
Log in

Selecting hidden Markov model state number with cross-validated likelihood

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

The problem of estimating the number of hidden states in a hidden Markov model is considered. Emphasis is placed on cross-validated likelihood criteria. Using cross-validation to assess the number of hidden states allows to circumvent the well-documented technical difficulties of the order identification problem in mixture models. Moreover, in a predictive perspective, it does not require that the sampling distribution belongs to one of the models in competition. However, computing cross-validated likelihood for hidden Markov models for which only one training sample is available, involves difficulties since the data are not independent. Two approaches are proposed to compute cross-validated likelihood for a hidden Markov model. The first one consists of using a deterministic half-sampling procedure, and the second one consists of an adaptation of the EM algorithm for hidden Markov models, to take into account randomly missing values induced by cross-validation. Numerical experiments on both simulated and real data sets compare different versions of cross-validated likelihood criterion and penalised likelihood criteria, including BIC and a penalised marginal likelihood criterion. Those numerical experiments highlight a promising behaviour of the deterministic half-sampling criterion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akaike H (1973). Information theory as an extension of the maximum likelihood theory. In: Petrov, BN and Csaki, F (eds) Second International Symposium on Information Theory, pp 267–281. Akademiai Kiado, Budapest

    Google Scholar 

  • Baum LE, Petrie T, Soules G and Weiss N (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41(1): 164–171

    Article  MATH  MathSciNet  Google Scholar 

  • Bernardo JM and Smith AFM (1994). Bayesian theory. Wiley, Chichester

    MATH  Google Scholar 

  • Biernacki C, Celeux G and Govaert G (2001). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intel 22(7): 719–725

    Article  Google Scholar 

  • Biernacki C, Celeux G and Govaert G (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41(3–4): 561–575

    Article  MathSciNet  Google Scholar 

  • Boucheron S, Gassiat E (2005) Inference in hidden Markov models, chapter order estimation. In: Cappé O, Moulines E, Rydén T (eds) Springer, Heidelberg

  • Celeux G, Clairambault J (1992) Estimation de chaînes de Markov cachées : méthodes et problèmes. In: Actes des journées thématiques Approches markoviennes en signal et images. GDR signal-images CNRS, pp 5–20

  • Churchill GA (1989). Stochastic models for heterogeneous DNA sequences. Bull Math Biol 51: 79–94

    MATH  MathSciNet  Google Scholar 

  • Clairambault J, Curzi-Dascalova L, Kauffmann F, Médigue C and Leffler C (1992). Heart rate variability in normal sleeping full-term and preterm neonates. Early Human Dev 28: 169–183

    Article  Google Scholar 

  • Dempster AP, Laird NM and Rubin DB (1977). Maximum likelihood from incomplete data via the EM Algorithm. J R Stat Soc Ser B 39: 1–38

    MATH  MathSciNet  Google Scholar 

  • Devijver PA (1985). Baum’s forward–backward Algorithm revisited. Pattern Recogn Lett 3: 369–373

    Article  MATH  Google Scholar 

  • Durand J-B (2003) Modèles à structure cachée : inférence, s諥ction de modèles et applications (in French). Ph.D. thesis, Université Grenoble 1 - Joseph Fourier

  • Ephraim Y and Merhav N (2002). Hidden Markov processes. IEEE Trans Inform Theory 48: 1518–1569

    Article  MATH  MathSciNet  Google Scholar 

  • Fraley C and Raftery AE (2002). Model-based clustering, discriminant Analysis and density estimation. J Am Stat Assoc 97: 611–631

    Article  MATH  MathSciNet  Google Scholar 

  • Gassiat E (2002). Likelihood ratio inequalities with application to various mixtures. Ann Inst Henri Poincaré 38: 897–906

    Article  MATH  MathSciNet  Google Scholar 

  • Gassiat E and Kéribin C (2000). The likelihood ratio test for the number of components in a mixture with Markov regime. ESAIM P S 4: 25–52

    Article  MATH  Google Scholar 

  • Kass RE and Raftery AE (1995). Bayes factors. J Am Stat Assoc 90(430): 773–795

    Article  MATH  Google Scholar 

  • Kéribin C (2000). Consistent estimation of the order of mixture models. Sankhya Ser A 62: 49–66

    MATH  MathSciNet  Google Scholar 

  • McLachlan GJ and Peel D (1997). On a resampling approach to choosing the number of components in normal mixture models. In: Billard, L and Fisher, NI (eds) Computing science and statistics, vol 28, pp 260–266. Interface Foundation of North America, Fairfax Station

    Google Scholar 

  • McLachlan GJ and Peel D (2000). Finite mixture models. Wiley Series in probability and statistics. Wiley, London

    Google Scholar 

  • Rabiner LR (1989). A tutorial on hidden Markov models and selected Applications in speech recognition. Proc IEEE 77: 257–286 (February)

    Article  Google Scholar 

  • Redner RA and Walker HF (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26(2): 195–239

    Article  MATH  MathSciNet  Google Scholar 

  • Ripley BD (1996). Pattern recognition and neural networks. Cambridge University Press, London

    MATH  Google Scholar 

  • Robert CP, Celeux G and Diebolt J (1993). Bayesian estimation of hidden Markov chains: A stochastic implementation. Stat Probab Lett 16(1): 77–83

    Article  MATH  MathSciNet  Google Scholar 

  • Robertson AW, Kirshner S and Smyth P (2004). Downscaling of daily rainfall occurence over Northeast Brazil using a hidden Markov model. J Clim 17(7): 4407–4424

    Article  Google Scholar 

  • Roeder K and Wasserman L (1997). Practical Bayesian density estimation using mixtures of normals. J Am Stat Assoc 92(439): 894–902

    Article  MATH  MathSciNet  Google Scholar 

  • Schwarz G (1978). Estimating the dimension of a model. Ann Stat 6: 461–464

    Article  MATH  Google Scholar 

  • Smyth P (2000). Model selection for probabilistic clustering using cross-validated likelihood. Stat Comput 10(1): 63–72

    Article  Google Scholar 

  • Spiegelhalter DJ, Best NG and Carlin BP (2000). Bayesian measures of model complexity and fit (with discussion). J R Stat Soc Ser B 64(4): 583–639

    Article  Google Scholar 

  • Yang Y (2005). Can the strengths of AIC and BIC be shared? A confict between model identification and regression estimation. Biometrika 92: 937–950

    Article  MathSciNet  Google Scholar 

  • Zhang P (1993). Model selection via multifold cross validation. Ann Stat 21(1): 299–313

    Article  Google Scholar 

  • Zhang NR and Siegmund DO (2007). A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics 63(1): 22–32

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean-Baptiste Durand.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Celeux, G., Durand, JB. Selecting hidden Markov model state number with cross-validated likelihood. Comput Stat 23, 541–564 (2008). https://doi.org/10.1007/s00180-007-0097-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-007-0097-1

Keywords

Navigation