Abstract
The problem of estimating the number of hidden states in a hidden Markov model is considered. Emphasis is placed on cross-validated likelihood criteria. Using cross-validation to assess the number of hidden states allows to circumvent the well-documented technical difficulties of the order identification problem in mixture models. Moreover, in a predictive perspective, it does not require that the sampling distribution belongs to one of the models in competition. However, computing cross-validated likelihood for hidden Markov models for which only one training sample is available, involves difficulties since the data are not independent. Two approaches are proposed to compute cross-validated likelihood for a hidden Markov model. The first one consists of using a deterministic half-sampling procedure, and the second one consists of an adaptation of the EM algorithm for hidden Markov models, to take into account randomly missing values induced by cross-validation. Numerical experiments on both simulated and real data sets compare different versions of cross-validated likelihood criterion and penalised likelihood criteria, including BIC and a penalised marginal likelihood criterion. Those numerical experiments highlight a promising behaviour of the deterministic half-sampling criterion.
Similar content being viewed by others
References
Akaike H (1973). Information theory as an extension of the maximum likelihood theory. In: Petrov, BN and Csaki, F (eds) Second International Symposium on Information Theory, pp 267–281. Akademiai Kiado, Budapest
Baum LE, Petrie T, Soules G and Weiss N (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41(1): 164–171
Bernardo JM and Smith AFM (1994). Bayesian theory. Wiley, Chichester
Biernacki C, Celeux G and Govaert G (2001). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intel 22(7): 719–725
Biernacki C, Celeux G and Govaert G (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41(3–4): 561–575
Boucheron S, Gassiat E (2005) Inference in hidden Markov models, chapter order estimation. In: Cappé O, Moulines E, Rydén T (eds) Springer, Heidelberg
Celeux G, Clairambault J (1992) Estimation de chaînes de Markov cachées : méthodes et problèmes. In: Actes des journées thématiques Approches markoviennes en signal et images. GDR signal-images CNRS, pp 5–20
Churchill GA (1989). Stochastic models for heterogeneous DNA sequences. Bull Math Biol 51: 79–94
Clairambault J, Curzi-Dascalova L, Kauffmann F, Médigue C and Leffler C (1992). Heart rate variability in normal sleeping full-term and preterm neonates. Early Human Dev 28: 169–183
Dempster AP, Laird NM and Rubin DB (1977). Maximum likelihood from incomplete data via the EM Algorithm. J R Stat Soc Ser B 39: 1–38
Devijver PA (1985). Baum’s forward–backward Algorithm revisited. Pattern Recogn Lett 3: 369–373
Durand J-B (2003) Modèles à structure cachée : inférence, s諥ction de modèles et applications (in French). Ph.D. thesis, Université Grenoble 1 - Joseph Fourier
Ephraim Y and Merhav N (2002). Hidden Markov processes. IEEE Trans Inform Theory 48: 1518–1569
Fraley C and Raftery AE (2002). Model-based clustering, discriminant Analysis and density estimation. J Am Stat Assoc 97: 611–631
Gassiat E (2002). Likelihood ratio inequalities with application to various mixtures. Ann Inst Henri Poincaré 38: 897–906
Gassiat E and Kéribin C (2000). The likelihood ratio test for the number of components in a mixture with Markov regime. ESAIM P S 4: 25–52
Kass RE and Raftery AE (1995). Bayes factors. J Am Stat Assoc 90(430): 773–795
Kéribin C (2000). Consistent estimation of the order of mixture models. Sankhya Ser A 62: 49–66
McLachlan GJ and Peel D (1997). On a resampling approach to choosing the number of components in normal mixture models. In: Billard, L and Fisher, NI (eds) Computing science and statistics, vol 28, pp 260–266. Interface Foundation of North America, Fairfax Station
McLachlan GJ and Peel D (2000). Finite mixture models. Wiley Series in probability and statistics. Wiley, London
Rabiner LR (1989). A tutorial on hidden Markov models and selected Applications in speech recognition. Proc IEEE 77: 257–286 (February)
Redner RA and Walker HF (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26(2): 195–239
Ripley BD (1996). Pattern recognition and neural networks. Cambridge University Press, London
Robert CP, Celeux G and Diebolt J (1993). Bayesian estimation of hidden Markov chains: A stochastic implementation. Stat Probab Lett 16(1): 77–83
Robertson AW, Kirshner S and Smyth P (2004). Downscaling of daily rainfall occurence over Northeast Brazil using a hidden Markov model. J Clim 17(7): 4407–4424
Roeder K and Wasserman L (1997). Practical Bayesian density estimation using mixtures of normals. J Am Stat Assoc 92(439): 894–902
Schwarz G (1978). Estimating the dimension of a model. Ann Stat 6: 461–464
Smyth P (2000). Model selection for probabilistic clustering using cross-validated likelihood. Stat Comput 10(1): 63–72
Spiegelhalter DJ, Best NG and Carlin BP (2000). Bayesian measures of model complexity and fit (with discussion). J R Stat Soc Ser B 64(4): 583–639
Yang Y (2005). Can the strengths of AIC and BIC be shared? A confict between model identification and regression estimation. Biometrika 92: 937–950
Zhang P (1993). Model selection via multifold cross validation. Ann Stat 21(1): 299–313
Zhang NR and Siegmund DO (2007). A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics 63(1): 22–32
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Celeux, G., Durand, JB. Selecting hidden Markov model state number with cross-validated likelihood. Comput Stat 23, 541–564 (2008). https://doi.org/10.1007/s00180-007-0097-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-007-0097-1