Skip to main content
Log in

Information matrix for hidden Markov models with covariates

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

For a general class of hidden Markov models that may include time-varying covariates, we illustrate how to compute the observed information matrix, which may be used to obtain standard errors for the parameter estimates and check model identifiability. The proposed method is based on the Oakes’ identity and, as such, it allows for the exact computation of the information matrix on the basis of the output of the expectation-maximization (EM) algorithm for maximum likelihood estimation. In addition to this output, the method requires the first derivative of the posterior probabilities computed by the forward-backward recursions introduced by Baum and Welch. Alternative methods for computing exactly the observed information matrix require, instead, to differentiate twice the forward recursion used to compute the model likelihood, with a greater additional effort with respect to the EM algorithm. The proposed method is illustrated by a series of simulations and an application based on a longitudinal dataset in Health Economics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley, Hoboken (2002)

    Book  MATH  Google Scholar 

  • Bartolucci, F., Bacci, S., Pennoni, F.: Longitudinal analysis of self-reported health status by mixture latent autoregressive models. J. R. Stat. Soc. Ser. C, in press (2013a)

  • Bartolucci, F., Farcomeni, A.: A multivariate extension of the dynamic logit model for longitudinal data based on a latent Markov heterogeneity structure. J. Am. Stat. Assoc. 104, 816–831 (2009)

    Article  MathSciNet  Google Scholar 

  • Bartolucci, F., Farcomeni, A., Pennoni, F.: Latent Markov Models for Longitudinal Data. Chapman and Hall/CRC Press, Boca Raton (2013b)

    MATH  Google Scholar 

  • Bartolucci, F., Pandolfi, S.: A new constant memory recursion for hidden Markov models. J. Comput. Biol. 21, 99–117 (2014)

    Google Scholar 

  • Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 37, 1554–1563 (1966)

    Google Scholar 

  • Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41, 164–171 (1970)

    Google Scholar 

  • Berchtold, A.: Optimization of mixture models: comparison of different strategies. Comput. Stat. 19, 385–406 (2004)

    Google Scholar 

  • Cappé, O., Moulines, E., Rydén, T.: Inference in Hidden Markov Models. Springer, New York (2005)

    MATH  Google Scholar 

  • Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall, New York (1993)

    MATH  Google Scholar 

  • Farcomeni, A.: Quantile regression for longitudinal data based on latent Markov subject-specific parameters. Stat. Comput. 22, 141–152 (2012)

    Article  MathSciNet  Google Scholar 

  • Goodman, L.A.: Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61, 215–231 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  • Hughes, J.: Computing the observed information in the hidden Markov model using the EM algorithm. Stat. Probab. Lett. 32, 107–114 (1997)

    Article  MATH  Google Scholar 

  • Khan, R.N.: Statistical Modelling and Analysis of Ion Channel Data Based on Hidden Markov Models and the EM Algorithm. PhD thesis, University of Western Australia, Crawley (2003)

  • Khreich, W., Granger, E., Miri, A., Sabourin, R.: On the memory complexity of the forward-backward algorithm. Pattern Recognit. Lett. 31, 91–99 (2010)

    Article  Google Scholar 

  • Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. Ser. B 44, 226–233 (1982)

    MATH  MathSciNet  Google Scholar 

  • Lystig, T.C., Hughes, J.: Exact computation of the observed information matrix for hidden Markov models. J. Comput. Gr. Stat. 11, 678–689 (2002)

    Article  MathSciNet  Google Scholar 

  • McCullagh, P.: Regression models for ordinal data (with discussion). J. R. Stat. Soc. Ser. B 42, 109–142 (1980)

    MATH  MathSciNet  Google Scholar 

  • McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman and Hall/CRC Press, London (1989)

    Book  MATH  Google Scholar 

  • McHugh, R.B.: Efficient estimation and local identification in latent class analysis. Psychometrika 21, 331–347 (1956)

    Article  MATH  MathSciNet  Google Scholar 

  • McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New Jersey (2008)

    Book  MATH  Google Scholar 

  • Oakes, D.: Direct calculation of the information matrix via the EM algorithm. J. R. Stat. Soc. Ser. B 61, 479–482 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Orchard, T., Woodbury, M. A.: A missing information principle: theory and applications. In Le Cam, L. M., Neyman, J., and Scott, E. L., (eds.) Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 697–715. Berkeley University of California Press (1972)

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MATH  Google Scholar 

  • Scott, S.L.: Bayesian methods for hidden Markov models: recursive computing in the 21st century. J. Am. Stat. Assoc. 97, 337–351 (2002)

    Article  MATH  Google Scholar 

  • Turner, R.: Direct maximization of the likelihood of a hidden Markov model. Comput. Stat. Data Anal. 52, 4147–4160 (2008)

    Article  MATH  Google Scholar 

  • Turner, T.R., Cameron, M.A., Thomson, P.J.: Hidden Markov chains in generalized linear models. Can. J. Stat. 26, 107–125 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  • Welch, L.R.: Hidden Markov models and the Baum–Welch algorithm. IEEE Inf. Theory Soc. Newsl. 53, 1–13 (2003)

    Google Scholar 

  • Zucchini, W., MacDonald, I.L.: Hidden Markov Models for Time Series: An Introduction Using R. Chapman & Hall/CRC Press, Boca Raton (2009)

    Google Scholar 

Download references

Acknowledgments

The authors are grateful to an Associate Editor and two Referees for useful comments that helped us to improve the presentation. Francesco Bartolucci acknowledges the financial support from the grant RBFR12SHVV of the Italian Government (FIRB-Futuro in Ricerca-project “Mixture and latent variable models for causal inference and analysis of socio-economic data”).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Bartolucci.

Appendix

Appendix

1.1 Appendix 1: derivatives of the forward-backward recursions

First of all we have that

$$\begin{aligned} \frac{\partial \log l^{(1)}(u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}&= \frac{\partial \log \pi (u|\varvec{x}^{(1)})}{\partial \varvec{\theta }}\\&+\,\,\frac{\partial \log \phi ^{(1)}(\varvec{y}^{(1)}|u,\varvec{w}^{(1)})}{\partial \varvec{\theta }} \end{aligned}$$

and

$$\begin{aligned}&\frac{\partial \log l^{(t)}({\bar{u}},u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}= \frac{\partial \log l^{(t-1)}({\bar{u}},\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}\\&\quad +\,\,\frac{\partial \log \pi ^{(t)}(u|{\bar{u}},\varvec{x}^{(t)})}{\partial \varvec{\theta }} +\frac{\partial \log \phi ^{(t)}(\varvec{y}^{(t)}|u,\varvec{w}^{(t)})}{\partial \varvec{\theta }}. \end{aligned}$$

Now considering Eq. (8) we have that

$$\begin{aligned} \frac{\partial \log l^{(t)}(u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }} = \sum _{{\bar{u}}=1}^k\frac{ l^{(t)}({\bar{u}},u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{l^{(t)}(u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}\frac{\partial \log l^{(t)}({\bar{u}},u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}, \end{aligned}$$

which may be recursively computed for \(t=2,\ldots ,T\) also taking into account the results in Appendix 2 and that

$$\begin{aligned} \frac{\partial \log \phi ^{(t)}(\varvec{y}|u,\varvec{w})}{\partial \varvec{\alpha }}= \sum _{j=1}^r\frac{\partial \log \phi ^{(t)}_j(y_j|u,\varvec{w})}{\partial \varvec{\alpha }}. \end{aligned}$$

In the end we obtain

$$\begin{aligned} \frac{\partial \log f(\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}= \sum _{u=1}^k \frac{l^{(T)}(u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{f(\tilde{\varvec{y}}|\tilde{\varvec{z}})} \frac{\partial \log l^{(T)}(u,\tilde{\varvec{y}}|\tilde{\varvec{z}})}{\partial \varvec{\theta }}. \end{aligned}$$

In a similar way we have that

$$\begin{aligned} \frac{\partial \log m^{(T)}(\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})}{\partial \varvec{\theta }}=0 \end{aligned}$$

and

$$\begin{aligned} \frac{\partial \log m^{(t)}(\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})}{\partial \varvec{\theta }} \!=\! \sum _{u=1}^k \frac{m^{(t)}(u,\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})}{m^{(t)}(\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})} \frac{\partial \log m^{(t)}(u,\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})}{\partial \varvec{\theta }} \end{aligned}$$

for \(t=2,\ldots ,T-1\), where

$$\begin{aligned} \frac{\partial \log m^{(t)}(u,\tilde{\varvec{y}}|{\bar{u}},\tilde{\varvec{z}})}{\partial \varvec{\theta }}&= \frac{\partial \log m^{(t+1)}(\tilde{\varvec{y}}|u,\tilde{\varvec{z}})}{\partial \varvec{\theta }}\\&+\,\, \frac{\partial \log \pi ^{(t+1)}(u|{\bar{u}},\varvec{x}^{(t+1)})}{\partial \varvec{\theta }} \\&+\,\, \frac{\partial \log \phi ^{(t+1)}(\varvec{y}^{(t+1)}|u,\varvec{w}^{(t+1)})}{\partial \varvec{\theta }}. \end{aligned}$$

Then these derivatives may be computed by a backward recursion.

1.2 Appendix 2: derivatives of the density and probability mass functions

In the case of a canonical GLM parametrization, and considering the general situation of multivariate outcomes, for the measurement component we have

$$\begin{aligned}&\frac{\partial \log \phi _j^{(t)}(y|u,\varvec{w})}{\partial \varvec{\alpha }}=\frac{y-\mu ^{(t)}(u,\varvec{w})}{g(\tau )} \varvec{a}^{(t)}_{ju\varvec{w}},\\&\frac{\partial ^2\log \phi _j^{(t)}(y|u,\varvec{w})}{\partial \varvec{\alpha }\partial \varvec{\alpha }^{\prime }}= -V(Y^{(t)}|U^{(t)}=u, \varvec{W}^{(t)}=\varvec{w}) \varvec{a}^{(t)}_{ju\varvec{w}}(\varvec{a}^{(t)}_{ju\varvec{w}})^{\prime }, \end{aligned}$$

where \(\tau \) denotes the dispersion parameter and \(g(\tau )\) denotes the function involving this parameter in the typical expression for an exponential family distribution (McCullagh and Nelder 1989). In the case of categorical data where a multinomial logit parametrization is adopted, we have

$$\begin{aligned} \frac{\partial \log \phi _j^{(t)}(y|u,\varvec{w})}{\partial \varvec{\alpha }}= (\varvec{A}^{(t)}_{ju\varvec{w}})^{\prime }\varvec{G}_{1c_j}^{\prime }(\varvec{e}_j(y\!+\!1)-\varvec{\phi }_j^{(t)}(u,\varvec{w})), \end{aligned}$$

where \(\varvec{e}_c(y+1)\) is a vector of \(c\) zeros with element \(y+1\) equal to 1 (because the first category is labelled as 0) and

$$\begin{aligned} \frac{\partial ^2\log \phi _j^{(t)}(y|u,\varvec{w})}{\partial \varvec{\alpha }\partial \varvec{\alpha }^{\prime }}\!=\! (\varvec{A}^{(t)}_{ju\varvec{w}})^{\prime }\varvec{G}_{1c_j}^{\prime }\varvec{\Omega }\left( \varvec{\phi }_j^{(t)}(u,\varvec{w})\right) \varvec{G}_{1c_j}\varvec{A}^{(t)}_{ju\varvec{w}}, \end{aligned}$$

where, for a generic probability vector \(\varvec{f}\), we have \(\varvec{\Omega }(\varvec{f})=\mathrm{diag}(\varvec{f})-\varvec{f}\varvec{f}^{\prime }\).

Regarding, the other derivatives, we have

$$\begin{aligned}&\frac{\partial \log \pi (u|\varvec{x})}{\partial \varvec{\beta }}=\varvec{B}^{\prime }_{\varvec{x}}\varvec{G}_{1k}^{\prime }(\varvec{e}_k(u)-\varvec{\pi }(\varvec{x})),\\&\frac{\partial ^2\log \pi (u|\varvec{x})}{\partial \varvec{\beta }\partial \varvec{\beta }^{\prime }}= -\varvec{B}^{\prime }_{\varvec{x}}\varvec{G}_{1k}^{\prime }\varvec{\Omega }(\varvec{\pi }(\varvec{x}))\varvec{G}_{1k}\varvec{B}_{\varvec{x}}, \end{aligned}$$

and, finally,

$$\begin{aligned}&\frac{\partial \log \pi ^{(t)}(u|{\bar{u}},\varvec{x})}{\partial \varvec{\gamma }}=\big (\varvec{C}^{(t)}_{{\bar{u}}\varvec{x}}\big )^{^{\prime }} \varvec{G}_{{\bar{u}}k}^{\prime }\left( \varvec{e}_k(u)-\varvec{\pi }^{(t)}({\bar{u}},\varvec{x})\right) ,\\&\frac{\partial ^2\log \pi ^{(t)}(u|{\bar{u}},\varvec{x})}{\partial \varvec{\gamma }\partial \varvec{\gamma }^{\prime }}= -\big (\varvec{C}^{(t)}_{{\bar{u}}\varvec{x}}\big )^{^{\prime }} \varvec{G}_{{\bar{u}}k}^{\prime }\varvec{\Omega }\left( \varvec{\pi }^{(t)}({\bar{u}},\varvec{x})\right) \varvec{G}_{{\bar{u}}k}\varvec{C}^{(t)}_{{\bar{u}}\varvec{x}}, \end{aligned}$$

where \(\varvec{\pi }(\varvec{x})\) is the column vector of the initial probabilities \(\pi (u|\varvec{x})\) and \(\varvec{\pi }^{(t)}(\bar{u},\varvec{x})\) is that of the transition probabilities \(\pi ^{(t)}(u|\bar{u},\varvec{x})\), with \(u=1,\ldots ,k\).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bartolucci, F., Farcomeni, A. Information matrix for hidden Markov models with covariates. Stat Comput 25, 515–526 (2015). https://doi.org/10.1007/s11222-014-9450-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-014-9450-8

Keywords

Navigation