Improved Regression Calibration

Skrondal, Anders; Kuha, Jouni

doi:10.1007/s11336-012-9285-1

Improved Regression Calibration

Published: 18 September 2012

Volume 77, pages 649–669, (2012)
Cite this article

Psychometrika Aims and scope Submit manuscript

Anders Skrondal¹ &
Jouni Kuha²

714 Accesses
16 Citations
Explore all metrics

Abstract

The likelihood for generalized linear models with covariate measurement error cannot in general be expressed in closed form, which makes maximum likelihood estimation taxing. A popular alternative is regression calibration which is computationally efficient at the cost of inconsistent estimation. We propose an improved regression calibration approach, a general pseudo maximum likelihood estimation method based on a conveniently decomposed form of the likelihood. It is both consistent and computationally efficient, and produces point estimates and estimated standard errors which are practically identical to those obtained by maximum likelihood. Simulations suggest that improved regression calibration, which is easy to implement in standard software, works well in a range of situations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical Linear Calibration in Data with Measurement Errors

Introducing Prior Information into the Forward Search for Regression

Inference on varying-coefficient partially linear regression model

Article 01 January 2015

References

Albert, P.S., & Follmann, D.A. (2000). Modeling repeated count data subject to informative dropout. Biometrics, 56, 667–677.
Article PubMed Google Scholar
Armstrong, B. (1985). Measurement error in generalized linear models. Communications in Statistics. Series B, 16, 529–544.
Google Scholar
Bentler, P.M. (1983). Some contributions to efficient statistics in structural models: specification and estimation of moment structures. Psychometrika, 48, 493–517.
Article Google Scholar
Blackburn, M., & Neumark, D. (1992). Unobserved ability, efficiency wages, and interindustry wage differentials. Quarterly Journal of Economics, 107, 1421–1436.
Article Google Scholar
Buonaccorsi, J., Demidenko, E., & Tosteson, T. (2000). Estimation in longitudinal random effects models with measurement error. Statistica Sinica, 10, 885–903.
Google Scholar
Buonaccorsi, J. (2010). Measurement error: models, methods and applications. Boca Raton: Chapman & Hall/CRC.
Book Google Scholar
Burr, D. (1988). On errors-in-variables in binary regression—Berkson case. Journal of the American Statistical Association, 83, 739–743.
Google Scholar
Buzas, J.S., & Stefanski, L.A. (1995). Instrumental variable estimation in generalized linear measurement error models. Journal of the American Statistical Association, 91, 999–1006.
Article Google Scholar
Carroll, R.J., Ruppert, D., Stefanski, L.A., & Crainiceanu, C.M. (2006). Measurement error in nonlinear models (2nd ed.). Boca Raton: Chapman & Hall/CRC.
Book Google Scholar
Carroll, R.J., Spiegelman, C.H., Lan, K.G., Bailey, K.T., & Abbott, R.D. (1984). On errors-in-variables for binary regression models. Biometrika, 71, 19–25.
Article Google Scholar
Carroll, R.J., & Stefanski, L.A. (1990). Approximate quasi-likelihood estimation in models with surrogate predictors. Journal of the American Statistical Association, 85, 652–663.
Article Google Scholar
Clayton, D.G. (1992). Models for the analysis of cohort and case-control studies with inaccurately measured exposures. In J.H. Dwyer, M. Feinlieb, P. Lippert, & H. Hoffmeister (Eds.), Statistical models for longitudinal studies on health (pp. 301–331). New York: Oxford University Press.
Google Scholar
Davis, P.J., & Rabinowitz, P. (1984). Methods of numerical integration (2nd ed.). New York: Academic Press.
Google Scholar
Gleser, L.J. (1990). Improvements of the naive approach to estimation in nonlinear errors-in-variables regression models. In P.J. Brown & W.A. Fuller (Eds.), Statistical analysis of measurement error models and applications (pp. 99–114). Providence: American Mathematical Society.
Chapter Google Scholar
Gong, G., & Samaniego, F.J. (1981). Pseudo maximum likelihood estimation: theory and applications. Annals of Statistics, 9, 861–869.
Article Google Scholar
Gourieroux, C., & Monfort, A. (1995). Statistics and econometric models (Vol. 2). Cambridge: Cambridge University Press.
Book Google Scholar
Griliches, Z. (1976). Wages of very young men. Journal of Political Economy, 85, S69–S86.
Article Google Scholar
Gustafson, P. (2004). Measurement error and misclassification in statistics and epidemiology: impacts and Bayesian adjustments. Boca Raton: Chapman & Hall/CRC.
Google Scholar
Higdon, R., & Schafer, D.W. (2001). Maximum likelihood computations for regression with measurement error. Computational Statistics & Data Analysis, 35, 283–299.
Article Google Scholar
Jöreskog, K.G. (1971). Statistical analysis of sets of congeneric tests. Psychometrika, 36, 109–133.
Article Google Scholar
Jöreskog, K.G., & Goldberger, A.S. (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 70, 631–639.
Google Scholar
Kuha, J. (1997). Estimation by data augmentation in regression models with continuous and discrete covariates measured with error. Statistics in Medicine, 16, 189–202.
Article PubMed Google Scholar
Lesaffre, E., & Spiessens, B. (2001). On the effect of the number of quadrature points in a logistic random-effects model: an example. Journal of the Royal Statistical Society. Series C, 50, 325–335.
Article Google Scholar
Liang, K.-Y., & Liu, X.-H. (1991). Estimating equations in generalized linear models with measurement error. In V.P. Godambe (Ed.), Estimating functions (pp. 47–63). Oxford: Oxford University Press.
Google Scholar
Lütkepohl, H. (1996). Handbook of matrices. Chichester: Wiley.
Google Scholar
McCullagh, P., & Nelder, J.A. (1989). Generalized linear models (2nd ed.). London: Chapman & Hall.
Google Scholar
McDonald, R.P. (1967). Nonlinear factor analysis (Psychometric Monograph No. 15). Richmond: Psychometric Corporation.
Parke, W.R. (1986). Pseudo maximum likelihood estimation: the asymptotic distribution. Annals of Statistics, 14, 355–357.
Article Google Scholar
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2003). Maximum likelihood estimation of generalized linear models with covariate measurement error. The Stata Journal, 3, 385–410.
Google Scholar
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004a). Generalized multilevel structural equation modeling. Psychometrika, 69, 167–190.
Article Google Scholar
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004b). Gllamm manual (Technical report 160). U.C. Berkeley Division of Biostatistics. Downloadable from http://www.bepress.com/ucbbiostat/paper160/.
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128, 301–323.
Article Google Scholar
Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and longitudinal modeling using Stata, vol. II: categorical responses, counts, and survival (3rd ed.). College Station: Stata Press.
Google Scholar
Richardson, S., & Gilks, W.S. (1993). Conditional independence models for epidemiological studies with covariate measurement error. Statistics in Medicine, 12, 1703–1722.
Article PubMed Google Scholar
Robinson, G.K. (1991). That BLUP is a good thing: the estimation of random effects. Statistical Science, 6, 15–51.
Article Google Scholar
Robinson, P.M. (1974). Identification, estimation, and large sample theory for regressions containing unobservable variables. International Economic Review, 15, 680–692.
Article Google Scholar
Rosner, B., Spiegelman, D., & Willett, W.C. (1990). Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. American Journal of Epidemiology, 132, 734–745.
PubMed Google Scholar
Rosner, B., Willett, W.C., & Spiegelman, D. (1989). Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Statistics in Medicine, 8, 1031–1040.
Article Google Scholar
Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581–592.
Article Google Scholar
Schafer, D.W. (1987). Covariate measurement error in generalized linear models. Biometrika, 74, 385–391.
Article Google Scholar
Schafer, D.W. (1993). Likelihood analysis for probit regression with measurement error. Biometrika, 80, 899–904.
Article Google Scholar
Schafer, D.W., & Purdy, K.G. (1986). Likelihood analysis for errors-in-variables regression with replicate measurements. Biometrika, 83, 813–824.
Article Google Scholar
Shapiro, A. (2007). Statistical inference of moment structures. In S.Y. Lee (Ed.), Handbook of latent variable and related models (pp. 229–259). Amsterdam: Elsevier.
Chapter Google Scholar
Skrondal, A., & Laake, P. (2001). Regression among factor scores. Psychometrika, 66, 563–575.
Article Google Scholar
Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling. Boca Raton: Chapman & Hall/CRC.
Book Google Scholar
Skrondal, A., & Rabe-Hesketh, S. (2007). Latent variable modelling: a survey. Scandinavian Journal of Statistics, 34, 712–745.
Article Google Scholar
Skrondal, A., & Rabe-Hesketh, S. (2009). Prediction in multilevel generalized linear mixed models. Journal of the Royal Statistical Society. Series A, 172, 659–687.
Article Google Scholar
Stephens, D.A., & Dellaportas, P. (1992). Bayesian analysis of generalised linear models with covariate measurement error. In J.M. Bernardo, J.O. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian statistics (Vol. 4, pp. 813–820). Oxford: Oxford University Press.
Google Scholar
Thisted, R.A. (1988). Elements of statistical computing. London: Chapman & Hall.
Google Scholar

Download references

Acknowledgements

We are grateful to H.K. Gjessing for helpful discussions and three anonymous reviewers for constructive comments.

Author information

Authors and Affiliations

Division of Epidemiology, Norwegian Institute of Public Health, P.O. Box 4404, Nydalen, 0403, Oslo, Norway
Anders Skrondal
Department of Statistics, London School of Economics, Houghton Street, London, WC2A 2AE, UK
Jouni Kuha

Authors

Anders Skrondal
View author publications
You can also search for this author in PubMed Google Scholar
Jouni Kuha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anders Skrondal.

Appendix: Obtaining \(\widehat{\boldsymbol{\mathcal{I}}}_{ \mathsf{ME},\mathsf{O}}\) in (18)

Here we describe the calculation of the estimate (18) of the matrix \(\boldsymbol{\mathcal{I}}_{\mathsf{ME}, \mathsf{O}}\), which is used in the calculation of the variance matrix (17) of \(\widehat{\boldsymbol {\vartheta }}_{\mathsf{O}}^{\mathsf{IRC}}\). Let us first introduce some convenient shorthand notation for the logarithm of the likelihood contribution (6):

Here g _xi and g _2i are multivariate normal density functions with parameters \(\boldsymbol{\theta}_{1i}=(\boldsymbol{\xi}_{i}',\text {vec}(\boldsymbol{\Omega}_{i})')'\) and \(\boldsymbol{\theta}_{2i}=(\boldsymbol{\mu}_{i}',\text {vec}(\boldsymbol{\Sigma}_{i})')'\) respectively, as defined by (11)–(12) and (9)–(10). These in turn are functions of the parameters χ=(ν′,vec(Λ)′,vec(Θ)′,vec(Γ)′,vec(Ψ)′)′, and ϑ _ME are the distinct, unknown elements of χ.

The required gradients for (18) are

(A.1)

(A.2)

where

(A.3)

(A.4)

(A.5)

Estimated values for these quantities, and thus for the estimated matrix \(\widehat{\boldsymbol{\mathcal{I}}}_{\mathsf{ME},\mathsf{O}}\) given by (18), are obtained by substituting estimates \(\widehat{\boldsymbol {\vartheta }}^{\mathsf{IRC}}\) of the parameters.

Starting with (A.2), we note that each element of χ is either a known constant or equal to a single element of ϑ _ME; for illustration, consider Λ as shown in (2). Suppose that χ is of length t and ϑ _ME of length u. Then \(\partial\boldsymbol{\chi}/\partial \boldsymbol {\vartheta }_{\mathsf {ME}}'\) is a t×u matrix whose (i,j)th element is 1 if the ith element of χ is equal to the jth element of ϑ _ME, and 0 otherwise.

Next, the elements of ∂ θ _2i/∂ χ′ in (A.2) are

and the elements of ∂ θ _1i/∂ χ′ are

where

and vec(⋅) denotes the column-by-column vectorization operator, ⊗ the Kronecker product, I _m an m×m identity matrix, and K _rm an rm×rm commutation matrix. The formulas are obtained through repeated application of rules of matrix differentiation (see, e.g., Lütkepohl 1996).

In the second term of (A.2), the elements of \(\partial\log g_{2i}/\partial\boldsymbol{\theta}_{2i}'\) are \(\partial\log g_{2i}/\partial\boldsymbol{\mu}_{i}'\) \(= (\mathbf{w}_{i}-\boldsymbol{\mu}_{i})'\boldsymbol{\Sigma }_{i}^{-1}\) and ∂logg _2i/∂vec(Σ _i)′ \(= \text{vec}[\boldsymbol{\Sigma}_{i}^{-1}(\mathbf{w}_{i}-\boldsymbol {\mu}_{i})(\mathbf{w}_{i}-\boldsymbol{\mu}_{i})' \boldsymbol{\Sigma}_{i}^{-1}-\boldsymbol{\Sigma}_{i}^{-1}]'/2\).

The remaining elements of (A.1) and (A.2) depend also on the outcome model for y _i. For the logistic model, which is predominant in applications of generalized linear models with covariate measurement error, and which is also used in our simulations and example, \(g_{yi}=\pi_{i}^{y_{i}}(1-\pi_{i})^{1-y_{i}}\) where π _i=exp(η _i)/[1+exp(η _i)] and \(\eta_{i}=\mathbf{z}_{i}'\boldsymbol{\beta}_{z}+\mathbf {x}_{i}'\boldsymbol{\beta}_{x}\). For this model we employ the well-known closed-form approximation \(g_{1i}\approx(\pi_{i}^{*})^{y_{i}}(1-\pi_{i}^{*})^{1-y_{i}}\), where \(\pi_{i}^{*}=\exp(\eta^{*}_{i})/[1+\exp(\eta^{*}_{i})]\), \(\eta^{*}_{i}=\eta_{1i}\eta_{2i}^{-1/2}\), \(\eta_{1i}= \mathbf{z}_{i}'\boldsymbol{\beta}_{z}+ \boldsymbol{\xi}_{i}'\boldsymbol{\beta}_{x}\), \(\eta_{2i}=1+d\boldsymbol{\beta}_{x}'\boldsymbol{\Omega }_{i}\boldsymbol{\beta}_{x}\), and d=1/1.7² (e.g., Liang & Liu 1991). For this approximation,

where \(\boldsymbol{\xi}_{i}^{*}=\boldsymbol{\xi}_{i}-\eta_{1i}\eta _{2i}^{-1} \, d \boldsymbol{\Omega}_{i}\boldsymbol{\beta}_{x}\). These formulas complete explicit expressions for (A.1) and (A.2).

In our data analysis, we also apply a similar idea for the conventional regression calibration estimate of ϑ _O, which uses the first-order approximation \(g_{1i}\approx (\pi_{i}^{\mathrm{RC}})^{y_{i}}(1-\pi_{i}^{ \mathrm{RC}})^{1-y_{i}}\) where \(\pi_{i}^{\mathrm{RC}}=\exp(\eta_{1i})/[1+\exp(\eta_{1i})]\). We estimate its variance matrix analogously to (17)–(18), using in (A.1) and (A.2) \(\partial g_{1i}/\partial \boldsymbol {\vartheta }_{\mathsf {O}}=(\partial g_{1i}/\partial\eta_{1i}) (\mathbf{z}_{i}',\boldsymbol{\xi}')'\) and \(\partial g_{1i}/\partial\boldsymbol{\theta}_{1i}'=(\partial g_{1i}/\partial\eta_{1i}) [\boldsymbol{\beta}_{x}', \mathbf{0}']\), where \(\partial g_{1i}/\partial\eta_{1i}=(-1)^{1-y_{i}} \pi_{i}^{\mathrm{RC}}\, (1-\pi_{i}^{\mathrm{RC}})\).

For other, less popular models, we must evaluate the integrals involved in (A.3)–(A.5). Note first that the partial derivatives \(\partial g_{xi}/\partial \boldsymbol{\theta}_{1i}'\) are given by

Substituting these into (A.5), we see that each of the integrals there, and also in (A.3) and (A.4), are of the form ∫h _i(x _i)g _xi dx _i for some function h _i(x _i) of x _i, integrated over the multivariate normal density g _xi=g(x _i|w _i,z _i;ϑ _ME). This suggests that the integrals can be evaluated through Monte Carlo integration, by first generating M independent draws x _ij, j=1,…,M, from \(g(\mathbf{x}_{i}|\mathbf{w}_{i}, \mathbf{z}_{i}; \widehat{\boldsymbol {\vartheta }}_{\mathsf{ME}})\), and then approximating the integrals by the averages \(M^{-1} \sum_{j=1}^{M} h_{i}(\mathbf{x}_{ij})\) for each of the h _i(⋅). Only one set of random draws is needed for all the observations i, if we first generate M uncorrelated m-vectors u _j of standard normal random variates and then calculate \(\mathbf{x}_{ij}=\widetilde{\boldsymbol{\xi}}_{i}+\mathbf {B}_{i}\mathbf{u}_{j}\), where \(\widehat{\boldsymbol{\Omega}}_{i}=\mathbf{B}_{i}\mathbf{B}_{i}'\).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Skrondal, A., Kuha, J. Improved Regression Calibration. Psychometrika 77, 649–669 (2012). https://doi.org/10.1007/s11336-012-9285-1

Download citation

Received: 04 July 2011
Revised: 03 February 2012
Published: 18 September 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s11336-012-9285-1

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved Regression Calibration

Abstract

Access this article

Similar content being viewed by others

Statistical Linear Calibration in Data with Measurement Errors

Introducing Prior Information into the Forward Search for Regression

Inference on varying-coefficient partially linear regression model

References

Acknowledgements