Selecting hidden Markov model state number with cross-validated likelihood
- 643 Downloads
The problem of estimating the number of hidden states in a hidden Markov model is considered. Emphasis is placed on cross-validated likelihood criteria. Using cross-validation to assess the number of hidden states allows to circumvent the well-documented technical difficulties of the order identification problem in mixture models. Moreover, in a predictive perspective, it does not require that the sampling distribution belongs to one of the models in competition. However, computing cross-validated likelihood for hidden Markov models for which only one training sample is available, involves difficulties since the data are not independent. Two approaches are proposed to compute cross-validated likelihood for a hidden Markov model. The first one consists of using a deterministic half-sampling procedure, and the second one consists of an adaptation of the EM algorithm for hidden Markov models, to take into account randomly missing values induced by cross-validation. Numerical experiments on both simulated and real data sets compare different versions of cross-validated likelihood criterion and penalised likelihood criteria, including BIC and a penalised marginal likelihood criterion. Those numerical experiments highlight a promising behaviour of the deterministic half-sampling criterion.
KeywordsHidden Markov models Model selection Cross-validation Missing values at random EM algorithm
Unable to display preview. Download preview PDF.
- Akaike H (1973). Information theory as an extension of the maximum likelihood theory. In: Petrov, BN and Csaki, F (eds) Second International Symposium on Information Theory, pp 267–281. Akademiai Kiado, Budapest Google Scholar
- Boucheron S, Gassiat E (2005) Inference in hidden Markov models, chapter order estimation. In: Cappé O, Moulines E, Rydén T (eds) Springer, HeidelbergGoogle Scholar
- Celeux G, Clairambault J (1992) Estimation de chaînes de Markov cachées : méthodes et problèmes. In: Actes des journées thématiques Approches markoviennes en signal et images. GDR signal-images CNRS, pp 5–20Google Scholar
- Durand J-B (2003) Modèles à structure cachée : inférence, s諥ction de modèles et applications (in French). Ph.D. thesis, Université Grenoble 1 - Joseph FourierGoogle Scholar
- McLachlan GJ and Peel D (1997). On a resampling approach to choosing the number of components in normal mixture models. In: Billard, L and Fisher, NI (eds) Computing science and statistics, vol 28, pp 260–266. Interface Foundation of North America, Fairfax Station Google Scholar
- McLachlan GJ and Peel D (2000). Finite mixture models. Wiley Series in probability and statistics. Wiley, London Google Scholar