Abstract
We propose a novel approach for longitudinal data modeling within the Generalized Linear Models family, whenever a covariate of interest is affected by measurement error. We jointly model the response (outcome model), the covariate observed with error (measurement model) and the underlying unobserved time-varying error-free covariate (true score). This is done by assuming a first-order latent Markov chain for the true score. The estimation of the full joint model is hardly feasible when the number of covariates is large, as typical in real-data applications. Available algorithms are severely affected by numerical underflow and multiple local maxima. To overcome these problems, we propose an efficient two-step approach. With an extensive simulation study, we show that the two-step approach produces point estimates and standard errors which are almost identical to those obtained by the more time consuming, simultaneous (one-step) approach. The proposal is also illustrated by analyzing data from the Chinese Longitudinal Healthy Longevity Survey.
Similar content being viewed by others
Notes
An overview of further approaches to estimation in the presence of measurement error can be found in Fuller (2009).
Standard errors are taken as the square root of the diagonal elements of the inverse of minus the Hessian from the second step model. To check the robustness of our conclusions, we have also tried sandwich SEs (White 1980). Results are in line with those reported in Table 4. We mention that the only difference we observed is in the p value for sex (“female”), which dropped from <0.01 to 0.02. An extended version of Table 4, including sandwich SEs, is available from the corresponding author upon request.
We report that, in order to completely exclude issues of sample selection, we have fitted our model also on a subset of the data where individuals with a history of CVD event at the first measurement occasion were excluded: the results were qualitatively the same.
References
Agresti A, Booth JG, Hobert JP, Caffo B (2000) Random-effects modeling of categorical response data. Sociol Methodol 30:27–80
Agresti A, Caffo B, Ohman-Strickland P (2004) Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Comput Stat Data Anal 47:639–653
Aitkin M, Alfó M (1998) Regression models for binary longitudinal responses. Stat Comput 8:289–307
Aitkin M, Rocci R (2002) A general maximum likelihood analysis of measurement error in generalized linear models. Stat Comput 12:163–174
Alexandrovich G, Holzmann H, Leister A (2016) Nonparametric identification and maximum likelihood estimation for hidden Markov models. Biometrika 103:423–434
Allman ES, Matias C, Rhodes JA et al (2009) Identifiability of parameters in latent structure models with many observed variables. Ann Stat 37:3099–3132
Bakk Z, Kuha J (2018) Two-step estimation of models between latent classes and external variables. Psychometrika 83:871–892
Bartolucci F, Bacci S, Pennoni F (2014) Longitudinal analysis of self-reported health status by mixture latent auto-regressive models. J Roy Stat Soc Ser C (Appl Stat) 63:267–288
Bartolucci F, Farcomeni A, Pennoni F (2012) Latent Markov models for longitudinal data. Chapman and Hall, London
Bartolucci F, Farcomeni A, Pennoni F (2014) Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates. TEST 23:433–465
Bartolucci F, Montanari GE, Pandolfi S (2015) Three-step estimation of latent Markov models with covariates. Comput Stat Data Anal 83:287–301
Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41:164–171
Buonaccorsi JP (1996) Measurement error in the response in the general linear model. J Am Stat Assoc 91:633–642
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models: a modern perspective. CRC Press, Boca Raton
Cook JR, Stefanski LA (1994) Simulation–extrapolation estimation in parametric measurement error models. J Am Stat Assoc 89:1314–1328
Di Mari R, Bakk Z (2018) Mostly harmless direct effects: a comparison of different latent Markov modeling approaches. Struct Equ Model A Multidiscip J 25(3):467–483
Di Mari R, Oberski DL, Vermunt JK (2016) Bias-adjusted three-step latent Markov modeling with covariates. Struct Equ Model 23:649–660
Fuller W (2009) Measurement error models. Wiley, New York
Gassiat É, Cleynen A, Robin S (2016) Inference in finite state space non parametric hidden Markov models and applications. Stat Comput 26:61–71
Gong G, Samaniego FJ (1981) Pseudo maximum likelihood estimation: theory and applications. Ann Stat, 861–869
Gourieroux C, Monfort A (1995) Statistics and econometric models, vol 1. Cambridge University Press, Cambridge
Heiss F (2008) Sequential numerical integration in nonlinear state space models for microeconometric panel data. J Appl Econom 23:373–389
Küchenhoff H, Mwalili SM, Lesaffre E (2006) A general method for dealing with misclassification in regression: the misclassification simex. Biometrics 62:85–96
Laird N (1978) Nonparametric maximum likelihood estimation of a mixing distribution. J Am Stat Assoc 73:805–811
Lederer W, Küchenhoff H (2006) A short introduction to the simex and mcsimex. Newslett R Project Volume 6/4 6:26
Li M, Ma Y, Li R (2019) Semiparametric regression for measurement error model with heteroscedastic error. J Multivar Anal 171:320–338
Maruotti A (2011) Mixed hidden Markov models for longitudinal data: an overview. Int Stat Rev 79:427–454
Maruotti A (2015) Handling non-ignorable dropouts in longitudinal data: a conditional model based on a latent Markov heterogeneity structure. TEST 24:84–109
Maruotti A, Punzo A (2021) Initialization of hidden Markov and semi-Markov models: a critical evaluation of several strategies. Int Stat Rev 89(3):447–480
Sánchez BN, Budtz-Jørgensen E, Ryan LM (2009) An estimating equations approach to fitting latent exposure models with longitudinal health outcomes. Ann Appl Stat, 830–856
Skrondal A, Kuha J (2012) Improved regression calibration. Psychometrika 77:649–669
Skrondal A, Rabe-Hesketh S (2004) Generalized latent variable modeling: multilevel, longitudinal, and structural equation models. Chapman and Hall, London
Tsiatis AA, Ma Y (2004) Locally efficient semiparametric estimators for functional measurement error models. Biometrika 91:835–848
Tsuji H, Venditti FJ, Manders ES, Evans JC, Larson MG, Feldman CL, Levy D (1994) Reduced heart rate variability and mortality risk in an elderly cohort. The Framingham heart study. Circulation 90:878–883
Uhrig SN, Watson N (2020) The impact of measurement error on wage decompositions: evidence from the British Household Panel Survey and the Household, Income and Labour Dynamics in Australia Survey. Sociol Methods Res 49(1):43–78
Vermunt JK (2010) Latent class modeling with covariates: two improved three-step approaches. Polit Anal 18:450–469
Vermunt JK, Magidson J (2016) Technical guide for latent gold 5.1: basic, advanced, and syntax. Statistical Innovations Inc, Belmont, MA
White H (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 817–838
Zhang C, Qin Y-Y, Chen Q, Jiang H, Chen X-Z, Xu C-L, Mao P-J, He J, Zhou Y-H (2014) Alcohol intake and risk of stroke: a dose-response meta-analysis of prospective studies. Int J Cardiol 174:669–677
Zucchini W, MacDonald IL, Langrock R (2016) Hidden Markov models for time series: an introduction using R. Chapman and Hall, London
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Di Mari, R., Maruotti, A. A two-step estimator for generalized linear models for longitudinal data with time-varying measurement error. Adv Data Anal Classif 16, 273–300 (2022). https://doi.org/10.1007/s11634-021-00473-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-021-00473-4
Keywords
- Covariate measurement error
- Generalized linear models for longitudinal data
- Latent Markov models
- Two-step estimator