Skip to main content
Log in

A two-step estimator for generalized linear models for longitudinal data with time-varying measurement error

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

We propose a novel approach for longitudinal data modeling within the Generalized Linear Models family, whenever a covariate of interest is affected by measurement error. We jointly model the response (outcome model), the covariate observed with error (measurement model) and the underlying unobserved time-varying error-free covariate (true score). This is done by assuming a first-order latent Markov chain for the true score. The estimation of the full joint model is hardly feasible when the number of covariates is large, as typical in real-data applications. Available algorithms are severely affected by numerical underflow and multiple local maxima. To overcome these problems, we propose an efficient two-step approach. With an extensive simulation study, we show that the two-step approach produces point estimates and standard errors which are almost identical to those obtained by the more time consuming, simultaneous (one-step) approach. The proposal is also illustrated by analyzing data from the Chinese Longitudinal Healthy Longevity Survey.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. An overview of further approaches to estimation in the presence of measurement error can be found in Fuller (2009).

  2. Standard errors are taken as the square root of the diagonal elements of the inverse of minus the Hessian from the second step model. To check the robustness of our conclusions, we have also tried sandwich SEs (White 1980). Results are in line with those reported in Table 4. We mention that the only difference we observed is in the p value for sex (“female”), which dropped from <0.01 to 0.02. An extended version of Table 4, including sandwich SEs, is available from the corresponding author upon request.

  3. We report that, in order to completely exclude issues of sample selection, we have fitted our model also on a subset of the data where individuals with a history of CVD event at the first measurement occasion were excluded: the results were qualitatively the same.

References

  • Agresti A, Booth JG, Hobert JP, Caffo B (2000) Random-effects modeling of categorical response data. Sociol Methodol 30:27–80

    Article  Google Scholar 

  • Agresti A, Caffo B, Ohman-Strickland P (2004) Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Comput Stat Data Anal 47:639–653

    Article  MathSciNet  Google Scholar 

  • Aitkin M, Alfó M (1998) Regression models for binary longitudinal responses. Stat Comput 8:289–307

    Article  Google Scholar 

  • Aitkin M, Rocci R (2002) A general maximum likelihood analysis of measurement error in generalized linear models. Stat Comput 12:163–174

    Article  MathSciNet  Google Scholar 

  • Alexandrovich G, Holzmann H, Leister A (2016) Nonparametric identification and maximum likelihood estimation for hidden Markov models. Biometrika 103:423–434

    Article  MathSciNet  Google Scholar 

  • Allman ES, Matias C, Rhodes JA et al (2009) Identifiability of parameters in latent structure models with many observed variables. Ann Stat 37:3099–3132

    Article  MathSciNet  Google Scholar 

  • Bakk Z, Kuha J (2018) Two-step estimation of models between latent classes and external variables. Psychometrika 83:871–892

    Article  MathSciNet  Google Scholar 

  • Bartolucci F, Bacci S, Pennoni F (2014) Longitudinal analysis of self-reported health status by mixture latent auto-regressive models. J Roy Stat Soc Ser C (Appl Stat) 63:267–288

    Article  MathSciNet  Google Scholar 

  • Bartolucci F, Farcomeni A, Pennoni F (2012) Latent Markov models for longitudinal data. Chapman and Hall, London

    Book  Google Scholar 

  • Bartolucci F, Farcomeni A, Pennoni F (2014) Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates. TEST 23:433–465

    Article  MathSciNet  Google Scholar 

  • Bartolucci F, Montanari GE, Pandolfi S (2015) Three-step estimation of latent Markov models with covariates. Comput Stat Data Anal 83:287–301

    Article  MathSciNet  Google Scholar 

  • Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41:164–171

    Article  MathSciNet  Google Scholar 

  • Buonaccorsi JP (1996) Measurement error in the response in the general linear model. J Am Stat Assoc 91:633–642

    Article  MathSciNet  Google Scholar 

  • Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models: a modern perspective. CRC Press, Boca Raton

    Book  Google Scholar 

  • Cook JR, Stefanski LA (1994) Simulation–extrapolation estimation in parametric measurement error models. J Am Stat Assoc 89:1314–1328

    Article  Google Scholar 

  • Di Mari R, Bakk Z (2018) Mostly harmless direct effects: a comparison of different latent Markov modeling approaches. Struct Equ Model A Multidiscip J 25(3):467–483

    Article  MathSciNet  Google Scholar 

  • Di Mari R, Oberski DL, Vermunt JK (2016) Bias-adjusted three-step latent Markov modeling with covariates. Struct Equ Model 23:649–660

    Article  MathSciNet  Google Scholar 

  • Fuller W (2009) Measurement error models. Wiley, New York

    Google Scholar 

  • Gassiat É, Cleynen A, Robin S (2016) Inference in finite state space non parametric hidden Markov models and applications. Stat Comput 26:61–71

    Article  MathSciNet  Google Scholar 

  • Gong G, Samaniego FJ (1981) Pseudo maximum likelihood estimation: theory and applications. Ann Stat, 861–869

  • Gourieroux C, Monfort A (1995) Statistics and econometric models, vol 1. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Heiss F (2008) Sequential numerical integration in nonlinear state space models for microeconometric panel data. J Appl Econom 23:373–389

    Article  MathSciNet  Google Scholar 

  • Küchenhoff H, Mwalili SM, Lesaffre E (2006) A general method for dealing with misclassification in regression: the misclassification simex. Biometrics 62:85–96

    Article  MathSciNet  Google Scholar 

  • Laird N (1978) Nonparametric maximum likelihood estimation of a mixing distribution. J Am Stat Assoc 73:805–811

    Article  Google Scholar 

  • Lederer W, Küchenhoff H (2006) A short introduction to the simex and mcsimex. Newslett R Project Volume 6/4 6:26

    Google Scholar 

  • Li M, Ma Y, Li R (2019) Semiparametric regression for measurement error model with heteroscedastic error. J Multivar Anal 171:320–338

    Article  MathSciNet  Google Scholar 

  • Maruotti A (2011) Mixed hidden Markov models for longitudinal data: an overview. Int Stat Rev 79:427–454

    Article  Google Scholar 

  • Maruotti A (2015) Handling non-ignorable dropouts in longitudinal data: a conditional model based on a latent Markov heterogeneity structure. TEST 24:84–109

    Article  MathSciNet  Google Scholar 

  • Maruotti A, Punzo A (2021) Initialization of hidden Markov and semi-Markov models: a critical evaluation of several strategies. Int Stat Rev 89(3):447–480

    Article  MathSciNet  Google Scholar 

  • Sánchez BN, Budtz-Jørgensen E, Ryan LM (2009) An estimating equations approach to fitting latent exposure models with longitudinal health outcomes. Ann Appl Stat, 830–856

  • Skrondal A, Kuha J (2012) Improved regression calibration. Psychometrika 77:649–669

    Article  MathSciNet  Google Scholar 

  • Skrondal A, Rabe-Hesketh S (2004) Generalized latent variable modeling: multilevel, longitudinal, and structural equation models. Chapman and Hall, London

    Book  Google Scholar 

  • Tsiatis AA, Ma Y (2004) Locally efficient semiparametric estimators for functional measurement error models. Biometrika 91:835–848

    Article  MathSciNet  Google Scholar 

  • Tsuji H, Venditti FJ, Manders ES, Evans JC, Larson MG, Feldman CL, Levy D (1994) Reduced heart rate variability and mortality risk in an elderly cohort. The Framingham heart study. Circulation 90:878–883

    Article  Google Scholar 

  • Uhrig SN, Watson N (2020) The impact of measurement error on wage decompositions: evidence from the British Household Panel Survey and the Household, Income and Labour Dynamics in Australia Survey. Sociol Methods Res 49(1):43–78

    Article  MathSciNet  Google Scholar 

  • Vermunt JK (2010) Latent class modeling with covariates: two improved three-step approaches. Polit Anal 18:450–469

    Article  Google Scholar 

  • Vermunt JK, Magidson J (2016) Technical guide for latent gold 5.1: basic, advanced, and syntax. Statistical Innovations Inc, Belmont, MA

    Google Scholar 

  • White H (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 817–838

  • Zhang C, Qin Y-Y, Chen Q, Jiang H, Chen X-Z, Xu C-L, Mao P-J, He J, Zhou Y-H (2014) Alcohol intake and risk of stroke: a dose-response meta-analysis of prospective studies. Int J Cardiol 174:669–677

    Article  Google Scholar 

  • Zucchini W, MacDonald IL, Langrock R (2016) Hidden Markov models for time series: an introduction using R. Chapman and Hall, London

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roberto Di Mari.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional figures for the simulation study

Additional figures for the simulation study

1.1 Continuous outcome

See Fig. 6.

Fig. 6
figure 6

Box plots of bias computed at each simulation round for the 27 crossed simulation conditions (sample size \(\times \) error-prone covariate variance \(\times \) true-covariate effect) averaged across covariate effects, for the one-step approach (“one-step”), the two-step approach, (“two-step”), the approach where true-score dynamics are not modeled (“one-step_pool”). Also results obtained if the true score were known (“known_meas”) and if measurement error was not taken into account (“no_meas”) are included for comparison. Continuous outcome

1.2 Continuous outcome: model selection

See Fig. 7.

Fig. 7
figure 7

Bias of regression coefficients related to the exogenous covariates in the outcome model, for 2–10 component joint model. Values reported for the 9 crossed simulation conditions (sample size \(\times \) error-prone covariate variance). Continuous outcome variable

1.3 Dichotomous outcome

See Fig. 8.

Fig. 8
figure 8

Box plots of bias computed at each simulation round for the 9 crossed simulation conditions (sample size \(\times \) error-prone covariate variance) for \(\beta = 1.5\) averaged across covariate effects, for the one-step approach (“one-step”), the two-step approach, (“two-step”), the approach where true-score dynamics are not modeled (“one-step_pool”). Also results obtained if the true score were known (“known_meas”) and if measurement error was not taken into account (“no_meas”) are included for comparison. Dichotomous outcome

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Di Mari, R., Maruotti, A. A two-step estimator for generalized linear models for longitudinal data with time-varying measurement error. Adv Data Anal Classif 16, 273–300 (2022). https://doi.org/10.1007/s11634-021-00473-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-021-00473-4

Keywords

Mathematics Subject Classification

Navigation