A two-step estimator for generalized linear models for longitudinal data with time-varying measurement error

Di Mari, Roberto; Maruotti, Antonello

doi:10.1007/s11634-021-00473-4

A two-step estimator for generalized linear models for longitudinal data with time-varying measurement error

Regular Article
Published: 22 November 2021

Volume 16, pages 273–300, (2022)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

342 Accesses
1 Citation
Explore all metrics

Abstract

We propose a novel approach for longitudinal data modeling within the Generalized Linear Models family, whenever a covariate of interest is affected by measurement error. We jointly model the response (outcome model), the covariate observed with error (measurement model) and the underlying unobserved time-varying error-free covariate (true score). This is done by assuming a first-order latent Markov chain for the true score. The estimation of the full joint model is hardly feasible when the number of covariates is large, as typical in real-data applications. Available algorithms are severely affected by numerical underflow and multiple local maxima. To overcome these problems, we propose an efficient two-step approach. With an extensive simulation study, we show that the two-step approach produces point estimates and standard errors which are almost identical to those obtained by the more time consuming, simultaneous (one-step) approach. The proposal is also illustrated by analyzing data from the Chinese Longitudinal Healthy Longevity Survey.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simultaneous Variable Selection and Estimation in Generalized Semiparametric Mixed Effects Modeling of Longitudinal Data

An approximate method for generalized linear and nonlinear mixed effects models with a mechanistic nonlinear covariate measurement error model

Article 17 October 2018

Profile Estimation of Generalized Semiparametric Varying-Coefficient Additive Models for Longitudinal Data with Within-Subject Correlations

Notes

An overview of further approaches to estimation in the presence of measurement error can be found in Fuller (2009).
Standard errors are taken as the square root of the diagonal elements of the inverse of minus the Hessian from the second step model. To check the robustness of our conclusions, we have also tried sandwich SEs (White 1980). Results are in line with those reported in Table 4. We mention that the only difference we observed is in the p value for sex (“female”), which dropped from <0.01 to 0.02. An extended version of Table 4, including sandwich SEs, is available from the corresponding author upon request.
We report that, in order to completely exclude issues of sample selection, we have fitted our model also on a subset of the data where individuals with a history of CVD event at the first measurement occasion were excluded: the results were qualitatively the same.

References

Agresti A, Booth JG, Hobert JP, Caffo B (2000) Random-effects modeling of categorical response data. Sociol Methodol 30:27–80
Article Google Scholar
Agresti A, Caffo B, Ohman-Strickland P (2004) Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Comput Stat Data Anal 47:639–653
Article MathSciNet Google Scholar
Aitkin M, Alfó M (1998) Regression models for binary longitudinal responses. Stat Comput 8:289–307
Article Google Scholar
Aitkin M, Rocci R (2002) A general maximum likelihood analysis of measurement error in generalized linear models. Stat Comput 12:163–174
Article MathSciNet Google Scholar
Alexandrovich G, Holzmann H, Leister A (2016) Nonparametric identification and maximum likelihood estimation for hidden Markov models. Biometrika 103:423–434
Article MathSciNet Google Scholar
Allman ES, Matias C, Rhodes JA et al (2009) Identifiability of parameters in latent structure models with many observed variables. Ann Stat 37:3099–3132
Article MathSciNet Google Scholar
Bakk Z, Kuha J (2018) Two-step estimation of models between latent classes and external variables. Psychometrika 83:871–892
Article MathSciNet Google Scholar
Bartolucci F, Bacci S, Pennoni F (2014) Longitudinal analysis of self-reported health status by mixture latent auto-regressive models. J Roy Stat Soc Ser C (Appl Stat) 63:267–288
Article MathSciNet Google Scholar
Bartolucci F, Farcomeni A, Pennoni F (2012) Latent Markov models for longitudinal data. Chapman and Hall, London
Book Google Scholar
Bartolucci F, Farcomeni A, Pennoni F (2014) Latent Markov models: a review of a general framework for the analysis of longitudinal data with covariates. TEST 23:433–465
Article MathSciNet Google Scholar
Bartolucci F, Montanari GE, Pandolfi S (2015) Three-step estimation of latent Markov models with covariates. Comput Stat Data Anal 83:287–301
Article MathSciNet Google Scholar
Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41:164–171
Article MathSciNet Google Scholar
Buonaccorsi JP (1996) Measurement error in the response in the general linear model. J Am Stat Assoc 91:633–642
Article MathSciNet Google Scholar
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models: a modern perspective. CRC Press, Boca Raton
Book Google Scholar
Cook JR, Stefanski LA (1994) Simulation–extrapolation estimation in parametric measurement error models. J Am Stat Assoc 89:1314–1328
Article Google Scholar
Di Mari R, Bakk Z (2018) Mostly harmless direct effects: a comparison of different latent Markov modeling approaches. Struct Equ Model A Multidiscip J 25(3):467–483
Article MathSciNet Google Scholar
Di Mari R, Oberski DL, Vermunt JK (2016) Bias-adjusted three-step latent Markov modeling with covariates. Struct Equ Model 23:649–660
Article MathSciNet Google Scholar
Fuller W (2009) Measurement error models. Wiley, New York
Google Scholar
Gassiat É, Cleynen A, Robin S (2016) Inference in finite state space non parametric hidden Markov models and applications. Stat Comput 26:61–71
Article MathSciNet Google Scholar
Gong G, Samaniego FJ (1981) Pseudo maximum likelihood estimation: theory and applications. Ann Stat, 861–869
Gourieroux C, Monfort A (1995) Statistics and econometric models, vol 1. Cambridge University Press, Cambridge
Book Google Scholar
Heiss F (2008) Sequential numerical integration in nonlinear state space models for microeconometric panel data. J Appl Econom 23:373–389
Article MathSciNet Google Scholar
Küchenhoff H, Mwalili SM, Lesaffre E (2006) A general method for dealing with misclassification in regression: the misclassification simex. Biometrics 62:85–96
Article MathSciNet Google Scholar
Laird N (1978) Nonparametric maximum likelihood estimation of a mixing distribution. J Am Stat Assoc 73:805–811
Article Google Scholar
Lederer W, Küchenhoff H (2006) A short introduction to the simex and mcsimex. Newslett R Project Volume 6/4 6:26
Google Scholar
Li M, Ma Y, Li R (2019) Semiparametric regression for measurement error model with heteroscedastic error. J Multivar Anal 171:320–338
Article MathSciNet Google Scholar
Maruotti A (2011) Mixed hidden Markov models for longitudinal data: an overview. Int Stat Rev 79:427–454
Article Google Scholar
Maruotti A (2015) Handling non-ignorable dropouts in longitudinal data: a conditional model based on a latent Markov heterogeneity structure. TEST 24:84–109
Article MathSciNet Google Scholar
Maruotti A, Punzo A (2021) Initialization of hidden Markov and semi-Markov models: a critical evaluation of several strategies. Int Stat Rev 89(3):447–480
Article MathSciNet Google Scholar
Sánchez BN, Budtz-Jørgensen E, Ryan LM (2009) An estimating equations approach to fitting latent exposure models with longitudinal health outcomes. Ann Appl Stat, 830–856
Skrondal A, Kuha J (2012) Improved regression calibration. Psychometrika 77:649–669
Article MathSciNet Google Scholar
Skrondal A, Rabe-Hesketh S (2004) Generalized latent variable modeling: multilevel, longitudinal, and structural equation models. Chapman and Hall, London
Book Google Scholar
Tsiatis AA, Ma Y (2004) Locally efficient semiparametric estimators for functional measurement error models. Biometrika 91:835–848
Article MathSciNet Google Scholar
Tsuji H, Venditti FJ, Manders ES, Evans JC, Larson MG, Feldman CL, Levy D (1994) Reduced heart rate variability and mortality risk in an elderly cohort. The Framingham heart study. Circulation 90:878–883
Article Google Scholar
Uhrig SN, Watson N (2020) The impact of measurement error on wage decompositions: evidence from the British Household Panel Survey and the Household, Income and Labour Dynamics in Australia Survey. Sociol Methods Res 49(1):43–78
Article MathSciNet Google Scholar
Vermunt JK (2010) Latent class modeling with covariates: two improved three-step approaches. Polit Anal 18:450–469
Article Google Scholar
Vermunt JK, Magidson J (2016) Technical guide for latent gold 5.1: basic, advanced, and syntax. Statistical Innovations Inc, Belmont, MA
Google Scholar
White H (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 817–838
Zhang C, Qin Y-Y, Chen Q, Jiang H, Chen X-Z, Xu C-L, Mao P-J, He J, Zhou Y-H (2014) Alcohol intake and risk of stroke: a dose-response meta-analysis of prospective studies. Int J Cardiol 174:669–677
Article Google Scholar
Zucchini W, MacDonald IL, Langrock R (2016) Hidden Markov models for time series: an introduction using R. Chapman and Hall, London
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Economics and Business, University of Catania, Catania, Italy
Roberto Di Mari
Department of Law, Economics, Politics and Modern Languages, LUMSA University, Rome, Italy
Antonello Maruotti
Department of Mathematics, University of Bergen, Bergen, Norway
Antonello Maruotti

Authors

Roberto Di Mari
View author publications
You can also search for this author in PubMed Google Scholar
Antonello Maruotti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roberto Di Mari.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional figures for the simulation study

1.1 Continuous outcome

See Fig. 6.

1.2 Continuous outcome: model selection

See Fig. 7.

1.3 Dichotomous outcome

See Fig. 8.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Di Mari, R., Maruotti, A. A two-step estimator for generalized linear models for longitudinal data with time-varying measurement error. Adv Data Anal Classif 16, 273–300 (2022). https://doi.org/10.1007/s11634-021-00473-4

Download citation

Received: 01 February 2021
Revised: 17 September 2021
Accepted: 03 October 2021
Published: 22 November 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s11634-021-00473-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A two-step estimator for generalized linear models for longitudinal data with time-varying measurement error

Abstract

Access this article

Similar content being viewed by others

Simultaneous Variable Selection and Estimation in Generalized Semiparametric Mixed Effects Modeling of Longitudinal Data

An approximate method for generalized linear and nonlinear mixed effects models with a mechanistic nonlinear covariate measurement error model

Profile Estimation of Generalized Semiparametric Varying-Coefficient Additive Models for Longitudinal Data with Within-Subject Correlations

Notes

References