Abstract
For a general class of hidden Markov models that may include time-varying covariates, we illustrate how to compute the observed information matrix, which may be used to obtain standard errors for the parameter estimates and check model identifiability. The proposed method is based on the Oakes’ identity and, as such, it allows for the exact computation of the information matrix on the basis of the output of the expectation-maximization (EM) algorithm for maximum likelihood estimation. In addition to this output, the method requires the first derivative of the posterior probabilities computed by the forward-backward recursions introduced by Baum and Welch. Alternative methods for computing exactly the observed information matrix require, instead, to differentiate twice the forward recursion used to compute the model likelihood, with a greater additional effort with respect to the EM algorithm. The proposed method is illustrated by a series of simulations and an application based on a longitudinal dataset in Health Economics.
Similar content being viewed by others
References
Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley, Hoboken (2002)
Bartolucci, F., Bacci, S., Pennoni, F.: Longitudinal analysis of self-reported health status by mixture latent autoregressive models. J. R. Stat. Soc. Ser. C, in press (2013a)
Bartolucci, F., Farcomeni, A.: A multivariate extension of the dynamic logit model for longitudinal data based on a latent Markov heterogeneity structure. J. Am. Stat. Assoc. 104, 816–831 (2009)
Bartolucci, F., Farcomeni, A., Pennoni, F.: Latent Markov Models for Longitudinal Data. Chapman and Hall/CRC Press, Boca Raton (2013b)
Bartolucci, F., Pandolfi, S.: A new constant memory recursion for hidden Markov models. J. Comput. Biol. 21, 99–117 (2014)
Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 37, 1554–1563 (1966)
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41, 164–171 (1970)
Berchtold, A.: Optimization of mixture models: comparison of different strategies. Comput. Stat. 19, 385–406 (2004)
Cappé, O., Moulines, E., Rydén, T.: Inference in Hidden Markov Models. Springer, New York (2005)
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall, New York (1993)
Farcomeni, A.: Quantile regression for longitudinal data based on latent Markov subject-specific parameters. Stat. Comput. 22, 141–152 (2012)
Goodman, L.A.: Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61, 215–231 (1974)
Hughes, J.: Computing the observed information in the hidden Markov model using the EM algorithm. Stat. Probab. Lett. 32, 107–114 (1997)
Khan, R.N.: Statistical Modelling and Analysis of Ion Channel Data Based on Hidden Markov Models and the EM Algorithm. PhD thesis, University of Western Australia, Crawley (2003)
Khreich, W., Granger, E., Miri, A., Sabourin, R.: On the memory complexity of the forward-backward algorithm. Pattern Recognit. Lett. 31, 91–99 (2010)
Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. Ser. B 44, 226–233 (1982)
Lystig, T.C., Hughes, J.: Exact computation of the observed information matrix for hidden Markov models. J. Comput. Gr. Stat. 11, 678–689 (2002)
McCullagh, P.: Regression models for ordinal data (with discussion). J. R. Stat. Soc. Ser. B 42, 109–142 (1980)
McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman and Hall/CRC Press, London (1989)
McHugh, R.B.: Efficient estimation and local identification in latent class analysis. Psychometrika 21, 331–347 (1956)
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New Jersey (2008)
Oakes, D.: Direct calculation of the information matrix via the EM algorithm. J. R. Stat. Soc. Ser. B 61, 479–482 (1999)
Orchard, T., Woodbury, M. A.: A missing information principle: theory and applications. In Le Cam, L. M., Neyman, J., and Scott, E. L., (eds.) Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 697–715. Berkeley University of California Press (1972)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Scott, S.L.: Bayesian methods for hidden Markov models: recursive computing in the 21st century. J. Am. Stat. Assoc. 97, 337–351 (2002)
Turner, R.: Direct maximization of the likelihood of a hidden Markov model. Comput. Stat. Data Anal. 52, 4147–4160 (2008)
Turner, T.R., Cameron, M.A., Thomson, P.J.: Hidden Markov chains in generalized linear models. Can. J. Stat. 26, 107–125 (1998)
Welch, L.R.: Hidden Markov models and the Baum–Welch algorithm. IEEE Inf. Theory Soc. Newsl. 53, 1–13 (2003)
Zucchini, W., MacDonald, I.L.: Hidden Markov Models for Time Series: An Introduction Using R. Chapman & Hall/CRC Press, Boca Raton (2009)
Acknowledgments
The authors are grateful to an Associate Editor and two Referees for useful comments that helped us to improve the presentation. Francesco Bartolucci acknowledges the financial support from the grant RBFR12SHVV of the Italian Government (FIRB-Futuro in Ricerca-project “Mixture and latent variable models for causal inference and analysis of socio-economic data”).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Appendix 1: derivatives of the forward-backward recursions
First of all we have that
and
Now considering Eq. (8) we have that
which may be recursively computed for \(t=2,\ldots ,T\) also taking into account the results in Appendix 2 and that
In the end we obtain
In a similar way we have that
and
for \(t=2,\ldots ,T-1\), where
Then these derivatives may be computed by a backward recursion.
1.2 Appendix 2: derivatives of the density and probability mass functions
In the case of a canonical GLM parametrization, and considering the general situation of multivariate outcomes, for the measurement component we have
where \(\tau \) denotes the dispersion parameter and \(g(\tau )\) denotes the function involving this parameter in the typical expression for an exponential family distribution (McCullagh and Nelder 1989). In the case of categorical data where a multinomial logit parametrization is adopted, we have
where \(\varvec{e}_c(y+1)\) is a vector of \(c\) zeros with element \(y+1\) equal to 1 (because the first category is labelled as 0) and
where, for a generic probability vector \(\varvec{f}\), we have \(\varvec{\Omega }(\varvec{f})=\mathrm{diag}(\varvec{f})-\varvec{f}\varvec{f}^{\prime }\).
Regarding, the other derivatives, we have
and, finally,
where \(\varvec{\pi }(\varvec{x})\) is the column vector of the initial probabilities \(\pi (u|\varvec{x})\) and \(\varvec{\pi }^{(t)}(\bar{u},\varvec{x})\) is that of the transition probabilities \(\pi ^{(t)}(u|\bar{u},\varvec{x})\), with \(u=1,\ldots ,k\).
Rights and permissions
About this article
Cite this article
Bartolucci, F., Farcomeni, A. Information matrix for hidden Markov models with covariates. Stat Comput 25, 515–526 (2015). https://doi.org/10.1007/s11222-014-9450-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-014-9450-8