Information matrix for hidden Markov models with covariates

Bartolucci, Francesco; Farcomeni, Alessio

doi:10.1007/s11222-014-9450-8

Information matrix for hidden Markov models with covariates

Published: 13 February 2014

Volume 25, pages 515–526, (2015)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Francesco Bartolucci¹ &
Alessio Farcomeni²

713 Accesses
11 Citations
Explore all metrics

Abstract

For a general class of hidden Markov models that may include time-varying covariates, we illustrate how to compute the observed information matrix, which may be used to obtain standard errors for the parameter estimates and check model identifiability. The proposed method is based on the Oakes’ identity and, as such, it allows for the exact computation of the information matrix on the basis of the output of the expectation-maximization (EM) algorithm for maximum likelihood estimation. In addition to this output, the method requires the first derivative of the posterior probabilities computed by the forward-backward recursions introduced by Baum and Welch. Alternative methods for computing exactly the observed information matrix require, instead, to differentiate twice the forward recursion used to compute the model likelihood, with a greater additional effort with respect to the EM algorithm. The proposed method is illustrated by a series of simulations and an application based on a longitudinal dataset in Health Economics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates

Article 21 August 2014

Parsimonious hidden Markov models for matrix-variate longitudinal data

Article Open access 15 June 2022

Multivariate hidden Markov regression models: random covariates and heavy-tailed distributions

Article 18 November 2019

References

Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley, Hoboken (2002)
Book MATH Google Scholar
Bartolucci, F., Bacci, S., Pennoni, F.: Longitudinal analysis of self-reported health status by mixture latent autoregressive models. J. R. Stat. Soc. Ser. C, in press (2013a)
Bartolucci, F., Farcomeni, A.: A multivariate extension of the dynamic logit model for longitudinal data based on a latent Markov heterogeneity structure. J. Am. Stat. Assoc. 104, 816–831 (2009)
Article MathSciNet Google Scholar
Bartolucci, F., Farcomeni, A., Pennoni, F.: Latent Markov Models for Longitudinal Data. Chapman and Hall/CRC Press, Boca Raton (2013b)
MATH Google Scholar
Bartolucci, F., Pandolfi, S.: A new constant memory recursion for hidden Markov models. J. Comput. Biol. 21, 99–117 (2014)
Google Scholar
Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 37, 1554–1563 (1966)
Google Scholar
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41, 164–171 (1970)
Google Scholar
Berchtold, A.: Optimization of mixture models: comparison of different strategies. Comput. Stat. 19, 385–406 (2004)
Google Scholar
Cappé, O., Moulines, E., Rydén, T.: Inference in Hidden Markov Models. Springer, New York (2005)
MATH Google Scholar
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall, New York (1993)
MATH Google Scholar
Farcomeni, A.: Quantile regression for longitudinal data based on latent Markov subject-specific parameters. Stat. Comput. 22, 141–152 (2012)
Article MathSciNet Google Scholar
Goodman, L.A.: Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61, 215–231 (1974)
Article MATH MathSciNet Google Scholar
Hughes, J.: Computing the observed information in the hidden Markov model using the EM algorithm. Stat. Probab. Lett. 32, 107–114 (1997)
Article MATH Google Scholar
Khan, R.N.: Statistical Modelling and Analysis of Ion Channel Data Based on Hidden Markov Models and the EM Algorithm. PhD thesis, University of Western Australia, Crawley (2003)
Khreich, W., Granger, E., Miri, A., Sabourin, R.: On the memory complexity of the forward-backward algorithm. Pattern Recognit. Lett. 31, 91–99 (2010)
Article Google Scholar
Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. Ser. B 44, 226–233 (1982)
MATH MathSciNet Google Scholar
Lystig, T.C., Hughes, J.: Exact computation of the observed information matrix for hidden Markov models. J. Comput. Gr. Stat. 11, 678–689 (2002)
Article MathSciNet Google Scholar
McCullagh, P.: Regression models for ordinal data (with discussion). J. R. Stat. Soc. Ser. B 42, 109–142 (1980)
MATH MathSciNet Google Scholar
McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman and Hall/CRC Press, London (1989)
Book MATH Google Scholar
McHugh, R.B.: Efficient estimation and local identification in latent class analysis. Psychometrika 21, 331–347 (1956)
Article MATH MathSciNet Google Scholar
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New Jersey (2008)
Book MATH Google Scholar
Oakes, D.: Direct calculation of the information matrix via the EM algorithm. J. R. Stat. Soc. Ser. B 61, 479–482 (1999)
Article MATH MathSciNet Google Scholar
Orchard, T., Woodbury, M. A.: A missing information principle: theory and applications. In Le Cam, L. M., Neyman, J., and Scott, E. L., (eds.) Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 697–715. Berkeley University of California Press (1972)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Article MATH Google Scholar
Scott, S.L.: Bayesian methods for hidden Markov models: recursive computing in the 21st century. J. Am. Stat. Assoc. 97, 337–351 (2002)
Article MATH Google Scholar
Turner, R.: Direct maximization of the likelihood of a hidden Markov model. Comput. Stat. Data Anal. 52, 4147–4160 (2008)
Article MATH Google Scholar
Turner, T.R., Cameron, M.A., Thomson, P.J.: Hidden Markov chains in generalized linear models. Can. J. Stat. 26, 107–125 (1998)
Article MATH MathSciNet Google Scholar
Welch, L.R.: Hidden Markov models and the Baum–Welch algorithm. IEEE Inf. Theory Soc. Newsl. 53, 1–13 (2003)
Google Scholar
Zucchini, W., MacDonald, I.L.: Hidden Markov Models for Time Series: An Introduction Using R. Chapman & Hall/CRC Press, Boca Raton (2009)
Google Scholar

Download references

Acknowledgments

The authors are grateful to an Associate Editor and two Referees for useful comments that helped us to improve the presentation. Francesco Bartolucci acknowledges the financial support from the grant RBFR12SHVV of the Italian Government (FIRB-Futuro in Ricerca-project “Mixture and latent variable models for causal inference and analysis of socio-economic data”).

Author information

Authors and Affiliations

Department of Economics, University of Perugia, Via A. Pascoli, 20, 06123 , Perugia, Italy
Francesco Bartolucci
Department of Public Health and Infectious Diseases, Sapienza - University of Rome, Piazzale Aldo Moro, 5, 00185 , Roma, Italy
Alessio Farcomeni

Authors

Francesco Bartolucci
View author publications
You can also search for this author in PubMed Google Scholar
Alessio Farcomeni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francesco Bartolucci.

Appendix

1.1 Appendix 1: derivatives of the forward-backward recursions

First of all we have that

$$\begin{aligned} \frac{\partial \log l^{(1)}(u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}&= \frac{\partial \log \pi (u|\varvec{x}^{(1)})}{\partial \varvec{\theta }}\\&+\,\,\frac{\partial \log \phi ^{(1)}(\varvec{y}^{(1)}|u,\varvec{w}^{(1)})}{\partial \varvec{\theta }} \end{aligned}$$

and

$$\begin{aligned}&\frac{\partial \log l^{(t)}({\bar{u}},u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}= \frac{\partial \log l^{(t-1)}({\bar{u}},\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}\\&\quad +\,\,\frac{\partial \log \pi ^{(t)}(u|{\bar{u}},\varvec{x}^{(t)})}{\partial \varvec{\theta }} +\frac{\partial \log \phi ^{(t)}(\varvec{y}^{(t)}|u,\varvec{w}^{(t)})}{\partial \varvec{\theta }}. \end{aligned}$$

Now considering Eq. (8) we have that

$$\begin{aligned} \frac{\partial \log l^{(t)}(u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }} = \sum _{{\bar{u}}=1}^k\frac{ l^{(t)}({\bar{u}},u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{l^{(t)}(u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}\frac{\partial \log l^{(t)}({\bar{u}},u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}, \end{aligned}$$

which may be recursively computed for $t=2,\ldots ,T$ also taking into account the results in Appendix 2 and that

$$\begin{aligned} \frac{\partial \log \phi ^{(t)}(\varvec{y}|u,\varvec{w})}{\partial \varvec{\alpha }}= \sum _{j=1}^r\frac{\partial \log \phi ^{(t)}_j(y_j|u,\varvec{w})}{\partial \varvec{\alpha }}. \end{aligned}$$

In the end we obtain

$$\begin{aligned} \frac{\partial \log f(\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}= \sum _{u=1}^k \frac{l^{(T)}(u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{f(\tilde{\varvec{y}}|\tilde{\varvec{z}})} \frac{\partial \log l^{(T)}(u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}. \end{aligned}$$

In a similar way we have that

$$\begin{aligned} \frac{\partial \log m^{(T)}(\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})}{\partial \varvec{\theta }}=0 \end{aligned}$$

and

$$\begin{aligned} \frac{\partial \log m^{(t)}(\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})}{\partial \varvec{\theta }} \!=\! \sum _{u=1}^k \frac{m^{(t)}(u,\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})}{m^{(t)}(\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})} \frac{\partial \log m^{(t)}(u,\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})}{\partial \varvec{\theta }} \end{aligned}$$

for $t=2,\ldots ,T-1$, where

$$\begin{aligned} \frac{\partial \log m^{(t)}(u,\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})}{\partial \varvec{\theta }}&= \frac{\partial \log m^{(t+1)}(\tilde{\varvec{y}}|u,\tilde{\varvec{z}})}{\partial \varvec{\theta }}\\&+\,\, \frac{\partial \log \pi ^{(t+1)}(u|{\bar{u}},\varvec{x}^{(t+1)})}{\partial \varvec{\theta }} \\&+\,\, \frac{\partial \log \phi ^{(t+1)}(\varvec{y}^{(t+1)}|u,\varvec{w}^{(t+1)})}{\partial \varvec{\theta }}. \end{aligned}$$

Then these derivatives may be computed by a backward recursion.

1.2 Appendix 2: derivatives of the density and probability mass functions

In the case of a canonical GLM parametrization, and considering the general situation of multivariate outcomes, for the measurement component we have

$$\begin{aligned}&\frac{\partial \log \phi _j^{(t)}(y|u,\varvec{w})}{\partial \varvec{\alpha }}=\frac{y-\mu ^{(t)}(u,\varvec{w})}{g(\tau )} \varvec{a}^{(t)}_{ju\varvec{w}},\\&\frac{\partial ^2\log \phi _j^{(t)}(y|u,\varvec{w})}{\partial \varvec{\alpha }\partial \varvec{\alpha }^{\prime }}= -V(Y^{(t)}|U^{(t)}=u, \varvec{W}^{(t)}=\varvec{w}) \varvec{a}^{(t)}_{ju\varvec{w}}(\varvec{a}^{(t)}_{ju\varvec{w}})^{\prime }, \end{aligned}$$

where $\tau $ denotes the dispersion parameter and $g(\tau )$ denotes the function involving this parameter in the typical expression for an exponential family distribution (McCullagh and Nelder 1989). In the case of categorical data where a multinomial logit parametrization is adopted, we have

$$\begin{aligned} \frac{\partial \log \phi _j^{(t)}(y|u,\varvec{w})}{\partial \varvec{\alpha }}= (\varvec{A}^{(t)}_{ju\varvec{w}})^{\prime }\varvec{G}_{1c_j}^{\prime }(\varvec{e}_j(y\!+\!1)-\varvec{\phi }_j^{(t)}(u,\varvec{w})), \end{aligned}$$

where $\varvec{e}_c(y+1)$ is a vector of $c$ zeros with element $y+1$ equal to 1 (because the first category is labelled as 0) and

$$\begin{aligned} \frac{\partial ^2\log \phi _j^{(t)}(y|u,\varvec{w})}{\partial \varvec{\alpha }\partial \varvec{\alpha }^{\prime }}\!=\! (\varvec{A}^{(t)}_{ju\varvec{w}})^{\prime }\varvec{G}_{1c_j}^{\prime }\varvec{\Omega }\left( \varvec{\phi }_j^{(t)}(u,\varvec{w})\right) \varvec{G}_{1c_j}\varvec{A}^{(t)}_{ju\varvec{w}}, \end{aligned}$$

where, for a generic probability vector $\varvec{f}$, we have $\varvec{\Omega }(\varvec{f})=\mathrm{diag}(\varvec{f})-\varvec{f}\varvec{f}^{\prime }$.

Regarding, the other derivatives, we have

$$\begin{aligned}&\frac{\partial \log \pi (u|\varvec{x})}{\partial \varvec{\beta }}=\varvec{B}^{\prime }_{\varvec{x}}\varvec{G}_{1k}^{\prime }(\varvec{e}_k(u)-\varvec{\pi }(\varvec{x})),\\&\frac{\partial ^2\log \pi (u|\varvec{x})}{\partial \varvec{\beta }\partial \varvec{\beta }^{\prime }}= -\varvec{B}^{\prime }_{\varvec{x}}\varvec{G}_{1k}^{\prime }\varvec{\Omega }(\varvec{\pi }(\varvec{x}))\varvec{G}_{1k}\varvec{B}_{\varvec{x}}, \end{aligned}$$

and, finally,

$$\begin{aligned}&\frac{\partial \log \pi ^{(t)}(u|{\bar{u}},\varvec{x})}{\partial \varvec{\gamma }}=\big (\varvec{C}^{(t)}_{{\bar{u}}\varvec{x}}\big )^{^{\prime }} \varvec{G}_{{\bar{u}}k}^{\prime }\left( \varvec{e}_k(u)-\varvec{\pi }^{(t)}({\bar{u}},\varvec{x})\right) ,\\&\frac{\partial ^2\log \pi ^{(t)}(u|{\bar{u}},\varvec{x})}{\partial \varvec{\gamma }\partial \varvec{\gamma }^{\prime }}= -\big (\varvec{C}^{(t)}_{{\bar{u}}\varvec{x}}\big )^{^{\prime }} \varvec{G}_{{\bar{u}}k}^{\prime }\varvec{\Omega }\left( \varvec{\pi }^{(t)}({\bar{u}},\varvec{x})\right) \varvec{G}_{{\bar{u}}k}\varvec{C}^{(t)}_{{\bar{u}}\varvec{x}}, \end{aligned}$$

where $\varvec{\pi }(\varvec{x})$ is the column vector of the initial probabilities $\pi (u|\varvec{x})$ and $\varvec{\pi }^{(t)}(\bar{u},\varvec{x})$ is that of the transition probabilities $\pi ^{(t)}(u|\bar{u},\varvec{x})$, with $u=1,\ldots ,k$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bartolucci, F., Farcomeni, A. Information matrix for hidden Markov models with covariates. Stat Comput 25, 515–526 (2015). https://doi.org/10.1007/s11222-014-9450-8

Download citation

Received: 01 May 2013
Accepted: 18 January 2014
Published: 13 February 2014
Issue Date: May 2015
DOI: https://doi.org/10.1007/s11222-014-9450-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information matrix for hidden Markov models with covariates

Abstract

Access this article

Similar content being viewed by others

Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates

Parsimonious hidden Markov models for matrix-variate longitudinal data

Multivariate hidden Markov regression models: random covariates and heavy-tailed distributions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Appendix 1: derivatives of the forward-backward recursions

1.2 Appendix 2: derivatives of the density and probability mass functions

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Information matrix for hidden Markov models with covariates

Abstract

Access this article

Similar content being viewed by others

Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates

Parsimonious hidden Markov models for matrix-variate longitudinal data

Multivariate hidden Markov regression models: random covariates and heavy-tailed distributions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Appendix 1: derivatives of the forward-backward recursions

1.2 Appendix 2: derivatives of the density and probability mass functions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation