Computational Statistics

, Volume 23, Issue 4, pp 541–564 | Cite as

Selecting hidden Markov model state number with cross-validated likelihood

Original Paper


The problem of estimating the number of hidden states in a hidden Markov model is considered. Emphasis is placed on cross-validated likelihood criteria. Using cross-validation to assess the number of hidden states allows to circumvent the well-documented technical difficulties of the order identification problem in mixture models. Moreover, in a predictive perspective, it does not require that the sampling distribution belongs to one of the models in competition. However, computing cross-validated likelihood for hidden Markov models for which only one training sample is available, involves difficulties since the data are not independent. Two approaches are proposed to compute cross-validated likelihood for a hidden Markov model. The first one consists of using a deterministic half-sampling procedure, and the second one consists of an adaptation of the EM algorithm for hidden Markov models, to take into account randomly missing values induced by cross-validation. Numerical experiments on both simulated and real data sets compare different versions of cross-validated likelihood criterion and penalised likelihood criteria, including BIC and a penalised marginal likelihood criterion. Those numerical experiments highlight a promising behaviour of the deterministic half-sampling criterion.


Hidden Markov models Model selection Cross-validation Missing values at random EM algorithm 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Akaike H (1973). Information theory as an extension of the maximum likelihood theory. In: Petrov, BN and Csaki, F (eds) Second International Symposium on Information Theory, pp 267–281. Akademiai Kiado, Budapest Google Scholar
  2. Baum LE, Petrie T, Soules G and Weiss N (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41(1): 164–171 MATHCrossRefMathSciNetGoogle Scholar
  3. Bernardo JM and Smith AFM (1994). Bayesian theory. Wiley, Chichester MATHGoogle Scholar
  4. Biernacki C, Celeux G and Govaert G (2001). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intel 22(7): 719–725 CrossRefGoogle Scholar
  5. Biernacki C, Celeux G and Govaert G (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41(3–4): 561–575 CrossRefMathSciNetGoogle Scholar
  6. Boucheron S, Gassiat E (2005) Inference in hidden Markov models, chapter order estimation. In: Cappé O, Moulines E, Rydén T (eds) Springer, HeidelbergGoogle Scholar
  7. Celeux G, Clairambault J (1992) Estimation de chaînes de Markov cachées : méthodes et problèmes. In: Actes des journées thématiques Approches markoviennes en signal et images. GDR signal-images CNRS, pp 5–20Google Scholar
  8. Churchill GA (1989). Stochastic models for heterogeneous DNA sequences. Bull Math Biol 51: 79–94 MATHMathSciNetGoogle Scholar
  9. Clairambault J, Curzi-Dascalova L, Kauffmann F, Médigue C and Leffler C (1992). Heart rate variability in normal sleeping full-term and preterm neonates. Early Human Dev 28: 169–183 CrossRefGoogle Scholar
  10. Dempster AP, Laird NM and Rubin DB (1977). Maximum likelihood from incomplete data via the EM Algorithm. J R Stat Soc Ser B 39: 1–38 MATHMathSciNetGoogle Scholar
  11. Devijver PA (1985). Baum’s forward–backward Algorithm revisited. Pattern Recogn Lett 3: 369–373 MATHCrossRefGoogle Scholar
  12. Durand J-B (2003) Modèles à structure cachée : inférence, s諥ction de modèles et applications (in French). Ph.D. thesis, Université Grenoble 1 - Joseph FourierGoogle Scholar
  13. Ephraim Y and Merhav N (2002). Hidden Markov processes. IEEE Trans Inform Theory 48: 1518–1569 MATHCrossRefMathSciNetGoogle Scholar
  14. Fraley C and Raftery AE (2002). Model-based clustering, discriminant Analysis and density estimation. J Am Stat Assoc 97: 611–631 MATHCrossRefMathSciNetGoogle Scholar
  15. Gassiat E (2002). Likelihood ratio inequalities with application to various mixtures. Ann Inst Henri Poincaré 38: 897–906 MATHCrossRefMathSciNetGoogle Scholar
  16. Gassiat E and Kéribin C (2000). The likelihood ratio test for the number of components in a mixture with Markov regime. ESAIM P S 4: 25–52 MATHCrossRefGoogle Scholar
  17. Kass RE and Raftery AE (1995). Bayes factors. J Am Stat Assoc 90(430): 773–795 MATHCrossRefGoogle Scholar
  18. Kéribin C (2000). Consistent estimation of the order of mixture models. Sankhya Ser A 62: 49–66 MATHMathSciNetGoogle Scholar
  19. McLachlan GJ and Peel D (1997). On a resampling approach to choosing the number of components in normal mixture models. In: Billard, L and Fisher, NI (eds) Computing science and statistics, vol 28, pp 260–266. Interface Foundation of North America, Fairfax Station Google Scholar
  20. McLachlan GJ and Peel D (2000). Finite mixture models. Wiley Series in probability and statistics. Wiley, London Google Scholar
  21. Rabiner LR (1989). A tutorial on hidden Markov models and selected Applications in speech recognition. Proc IEEE 77: 257–286 (February) CrossRefGoogle Scholar
  22. Redner RA and Walker HF (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26(2): 195–239 MATHCrossRefMathSciNetGoogle Scholar
  23. Ripley BD (1996). Pattern recognition and neural networks. Cambridge University Press, London MATHGoogle Scholar
  24. Robert CP, Celeux G and Diebolt J (1993). Bayesian estimation of hidden Markov chains: A stochastic implementation. Stat Probab Lett 16(1): 77–83 MATHCrossRefMathSciNetGoogle Scholar
  25. Robertson AW, Kirshner S and Smyth P (2004). Downscaling of daily rainfall occurence over Northeast Brazil using a hidden Markov model. J Clim 17(7): 4407–4424 CrossRefGoogle Scholar
  26. Roeder K and Wasserman L (1997). Practical Bayesian density estimation using mixtures of normals. J Am Stat Assoc 92(439): 894–902 MATHCrossRefMathSciNetGoogle Scholar
  27. Schwarz G (1978). Estimating the dimension of a model. Ann Stat 6: 461–464 MATHCrossRefGoogle Scholar
  28. Smyth P (2000). Model selection for probabilistic clustering using cross-validated likelihood. Stat Comput 10(1): 63–72 CrossRefGoogle Scholar
  29. Spiegelhalter DJ, Best NG and Carlin BP (2000). Bayesian measures of model complexity and fit (with discussion). J R Stat Soc Ser B 64(4): 583–639 CrossRefGoogle Scholar
  30. Yang Y (2005). Can the strengths of AIC and BIC be shared? A confict between model identification and regression estimation. Biometrika 92: 937–950 CrossRefMathSciNetGoogle Scholar
  31. Zhang P (1993). Model selection via multifold cross validation. Ann Stat 21(1): 299–313 CrossRefGoogle Scholar
  32. Zhang NR and Siegmund DO (2007). A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics 63(1): 22–32 MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  1. 1.Département de Mathématiques, INRIA Futurs, OrsayUniversité Paris-SudOrsay CedexFrance
  2. 2.Laboratoire Jean Kuntzmann, INRIA Rhône-AlpesGrenoble UniversitésGrenoble Cedex 9France

Personalised recommendations