Skip to main content
Log in

Improved Regression Calibration

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

The likelihood for generalized linear models with covariate measurement error cannot in general be expressed in closed form, which makes maximum likelihood estimation taxing. A popular alternative is regression calibration which is computationally efficient at the cost of inconsistent estimation. We propose an improved regression calibration approach, a general pseudo maximum likelihood estimation method based on a conveniently decomposed form of the likelihood. It is both consistent and computationally efficient, and produces point estimates and estimated standard errors which are practically identical to those obtained by maximum likelihood. Simulations suggest that improved regression calibration, which is easy to implement in standard software, works well in a range of situations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1.
Figure 2.

Similar content being viewed by others

References

  • Albert, P.S., & Follmann, D.A. (2000). Modeling repeated count data subject to informative dropout. Biometrics, 56, 667–677.

    Article  PubMed  Google Scholar 

  • Armstrong, B. (1985). Measurement error in generalized linear models. Communications in Statistics. Series B, 16, 529–544.

    Google Scholar 

  • Bentler, P.M. (1983). Some contributions to efficient statistics in structural models: specification and estimation of moment structures. Psychometrika, 48, 493–517.

    Article  Google Scholar 

  • Blackburn, M., & Neumark, D. (1992). Unobserved ability, efficiency wages, and interindustry wage differentials. Quarterly Journal of Economics, 107, 1421–1436.

    Article  Google Scholar 

  • Buonaccorsi, J., Demidenko, E., & Tosteson, T. (2000). Estimation in longitudinal random effects models with measurement error. Statistica Sinica, 10, 885–903.

    Google Scholar 

  • Buonaccorsi, J. (2010). Measurement error: models, methods and applications. Boca Raton: Chapman & Hall/CRC.

    Book  Google Scholar 

  • Burr, D. (1988). On errors-in-variables in binary regression—Berkson case. Journal of the American Statistical Association, 83, 739–743.

    Google Scholar 

  • Buzas, J.S., & Stefanski, L.A. (1995). Instrumental variable estimation in generalized linear measurement error models. Journal of the American Statistical Association, 91, 999–1006.

    Article  Google Scholar 

  • Carroll, R.J., Ruppert, D., Stefanski, L.A., & Crainiceanu, C.M. (2006). Measurement error in nonlinear models (2nd ed.). Boca Raton: Chapman & Hall/CRC.

    Book  Google Scholar 

  • Carroll, R.J., Spiegelman, C.H., Lan, K.G., Bailey, K.T., & Abbott, R.D. (1984). On errors-in-variables for binary regression models. Biometrika, 71, 19–25.

    Article  Google Scholar 

  • Carroll, R.J., & Stefanski, L.A. (1990). Approximate quasi-likelihood estimation in models with surrogate predictors. Journal of the American Statistical Association, 85, 652–663.

    Article  Google Scholar 

  • Clayton, D.G. (1992). Models for the analysis of cohort and case-control studies with inaccurately measured exposures. In J.H. Dwyer, M. Feinlieb, P. Lippert, & H. Hoffmeister (Eds.), Statistical models for longitudinal studies on health (pp. 301–331). New York: Oxford University Press.

    Google Scholar 

  • Davis, P.J., & Rabinowitz, P. (1984). Methods of numerical integration (2nd ed.). New York: Academic Press.

    Google Scholar 

  • Gleser, L.J. (1990). Improvements of the naive approach to estimation in nonlinear errors-in-variables regression models. In P.J. Brown & W.A. Fuller (Eds.), Statistical analysis of measurement error models and applications (pp. 99–114). Providence: American Mathematical Society.

    Chapter  Google Scholar 

  • Gong, G., & Samaniego, F.J. (1981). Pseudo maximum likelihood estimation: theory and applications. Annals of Statistics, 9, 861–869.

    Article  Google Scholar 

  • Gourieroux, C., & Monfort, A. (1995). Statistics and econometric models (Vol. 2). Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Griliches, Z. (1976). Wages of very young men. Journal of Political Economy, 85, S69–S86.

    Article  Google Scholar 

  • Gustafson, P. (2004). Measurement error and misclassification in statistics and epidemiology: impacts and Bayesian adjustments. Boca Raton: Chapman & Hall/CRC.

    Google Scholar 

  • Higdon, R., & Schafer, D.W. (2001). Maximum likelihood computations for regression with measurement error. Computational Statistics & Data Analysis, 35, 283–299.

    Article  Google Scholar 

  • Jöreskog, K.G. (1971). Statistical analysis of sets of congeneric tests. Psychometrika, 36, 109–133.

    Article  Google Scholar 

  • Jöreskog, K.G., & Goldberger, A.S. (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 70, 631–639.

    Google Scholar 

  • Kuha, J. (1997). Estimation by data augmentation in regression models with continuous and discrete covariates measured with error. Statistics in Medicine, 16, 189–202.

    Article  PubMed  Google Scholar 

  • Lesaffre, E., & Spiessens, B. (2001). On the effect of the number of quadrature points in a logistic random-effects model: an example. Journal of the Royal Statistical Society. Series C, 50, 325–335.

    Article  Google Scholar 

  • Liang, K.-Y., & Liu, X.-H. (1991). Estimating equations in generalized linear models with measurement error. In V.P. Godambe (Ed.), Estimating functions (pp. 47–63). Oxford: Oxford University Press.

    Google Scholar 

  • Lütkepohl, H. (1996). Handbook of matrices. Chichester: Wiley.

    Google Scholar 

  • McCullagh, P., & Nelder, J.A. (1989). Generalized linear models (2nd ed.). London: Chapman & Hall.

    Google Scholar 

  • McDonald, R.P. (1967). Nonlinear factor analysis (Psychometric Monograph No. 15). Richmond: Psychometric Corporation.

  • Parke, W.R. (1986). Pseudo maximum likelihood estimation: the asymptotic distribution. Annals of Statistics, 14, 355–357.

    Article  Google Scholar 

  • Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2003). Maximum likelihood estimation of generalized linear models with covariate measurement error. The Stata Journal, 3, 385–410.

    Google Scholar 

  • Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004a). Generalized multilevel structural equation modeling. Psychometrika, 69, 167–190.

    Article  Google Scholar 

  • Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004b). Gllamm manual (Technical report 160). U.C. Berkeley Division of Biostatistics. Downloadable from http://www.bepress.com/ucbbiostat/paper160/.

  • Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128, 301–323.

    Article  Google Scholar 

  • Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and longitudinal modeling using Stata, vol. II: categorical responses, counts, and survival (3rd ed.). College Station: Stata Press.

    Google Scholar 

  • Richardson, S., & Gilks, W.S. (1993). Conditional independence models for epidemiological studies with covariate measurement error. Statistics in Medicine, 12, 1703–1722.

    Article  PubMed  Google Scholar 

  • Robinson, G.K. (1991). That BLUP is a good thing: the estimation of random effects. Statistical Science, 6, 15–51.

    Article  Google Scholar 

  • Robinson, P.M. (1974). Identification, estimation, and large sample theory for regressions containing unobservable variables. International Economic Review, 15, 680–692.

    Article  Google Scholar 

  • Rosner, B., Spiegelman, D., & Willett, W.C. (1990). Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. American Journal of Epidemiology, 132, 734–745.

    PubMed  Google Scholar 

  • Rosner, B., Willett, W.C., & Spiegelman, D. (1989). Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Statistics in Medicine, 8, 1031–1040.

    Article  Google Scholar 

  • Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581–592.

    Article  Google Scholar 

  • Schafer, D.W. (1987). Covariate measurement error in generalized linear models. Biometrika, 74, 385–391.

    Article  Google Scholar 

  • Schafer, D.W. (1993). Likelihood analysis for probit regression with measurement error. Biometrika, 80, 899–904.

    Article  Google Scholar 

  • Schafer, D.W., & Purdy, K.G. (1986). Likelihood analysis for errors-in-variables regression with replicate measurements. Biometrika, 83, 813–824.

    Article  Google Scholar 

  • Shapiro, A. (2007). Statistical inference of moment structures. In S.Y. Lee (Ed.), Handbook of latent variable and related models (pp. 229–259). Amsterdam: Elsevier.

    Chapter  Google Scholar 

  • Skrondal, A., & Laake, P. (2001). Regression among factor scores. Psychometrika, 66, 563–575.

    Article  Google Scholar 

  • Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling. Boca Raton: Chapman & Hall/CRC.

    Book  Google Scholar 

  • Skrondal, A., & Rabe-Hesketh, S. (2007). Latent variable modelling: a survey. Scandinavian Journal of Statistics, 34, 712–745.

    Article  Google Scholar 

  • Skrondal, A., & Rabe-Hesketh, S. (2009). Prediction in multilevel generalized linear mixed models. Journal of the Royal Statistical Society. Series A, 172, 659–687.

    Article  Google Scholar 

  • Stephens, D.A., & Dellaportas, P. (1992). Bayesian analysis of generalised linear models with covariate measurement error. In J.M. Bernardo, J.O. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian statistics (Vol. 4, pp. 813–820). Oxford: Oxford University Press.

    Google Scholar 

  • Thisted, R.A. (1988). Elements of statistical computing. London: Chapman & Hall.

    Google Scholar 

Download references

Acknowledgements

We are grateful to H.K. Gjessing for helpful discussions and three anonymous reviewers for constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anders Skrondal.

Appendix: Obtaining \(\widehat{\boldsymbol{\mathcal{I}}}_{ \mathsf{ME},\mathsf{O}}\) in (18)

Appendix: Obtaining \(\widehat{\boldsymbol{\mathcal{I}}}_{ \mathsf{ME},\mathsf{O}}\) in (18)

Here we describe the calculation of the estimate (18) of the matrix \(\boldsymbol{\mathcal{I}}_{\mathsf{ME}, \mathsf{O}}\), which is used in the calculation of the variance matrix (17) of \(\widehat{\boldsymbol {\vartheta }}_{\mathsf{O}}^{\mathsf{IRC}}\). Let us first introduce some convenient shorthand notation for the logarithm of the likelihood contribution (6):

Here g xi and g 2i are multivariate normal density functions with parameters \(\boldsymbol{\theta}_{1i}=(\boldsymbol{\xi}_{i}',\text {vec}(\boldsymbol{\Omega}_{i})')'\) and \(\boldsymbol{\theta}_{2i}=(\boldsymbol{\mu}_{i}',\text {vec}(\boldsymbol{\Sigma}_{i})')'\) respectively, as defined by (11)–(12) and (9)–(10). These in turn are functions of the parameters χ=(ν′,vec(Λ)′,vec(Θ)′,vec(Γ)′,vec(Ψ)′)′, and ϑ ME are the distinct, unknown elements of χ.

The required gradients for (18) are

(A.1)
(A.2)

where

(A.3)
(A.4)
(A.5)

Estimated values for these quantities, and thus for the estimated matrix \(\widehat{\boldsymbol{\mathcal{I}}}_{\mathsf{ME},\mathsf{O}}\) given by (18), are obtained by substituting estimates \(\widehat{\boldsymbol {\vartheta }}^{\mathsf{IRC}}\) of the parameters.

Starting with (A.2), we note that each element of χ is either a known constant or equal to a single element of ϑ ME ; for illustration, consider Λ as shown in (2). Suppose that χ is of length t and ϑ ME of length u. Then \(\partial\boldsymbol{\chi}/\partial \boldsymbol {\vartheta }_{\mathsf {ME}}'\) is a t×u matrix whose (i,j)th element is 1 if the ith element of χ is equal to the jth element of ϑ ME , and 0 otherwise.

Next, the elements of θ 2i / χ′ in (A.2) are

and the elements of θ 1i / χ′ are

where

and vec(⋅) denotes the column-by-column vectorization operator, ⊗ the Kronecker product, I m an m×m identity matrix, and K rm an rm×rm commutation matrix. The formulas are obtained through repeated application of rules of matrix differentiation (see, e.g., Lütkepohl 1996).

In the second term of (A.2), the elements of \(\partial\log g_{2i}/\partial\boldsymbol{\theta}_{2i}'\) are \(\partial\log g_{2i}/\partial\boldsymbol{\mu}_{i}'\) \(= (\mathbf{w}_{i}-\boldsymbol{\mu}_{i})'\boldsymbol{\Sigma }_{i}^{-1}\) and logg 2i /vec(Σ i )′ \(= \text{vec}[\boldsymbol{\Sigma}_{i}^{-1}(\mathbf{w}_{i}-\boldsymbol {\mu}_{i})(\mathbf{w}_{i}-\boldsymbol{\mu}_{i})' \boldsymbol{\Sigma}_{i}^{-1}-\boldsymbol{\Sigma}_{i}^{-1}]'/2\).

The remaining elements of (A.1) and (A.2) depend also on the outcome model for y i . For the logistic model, which is predominant in applications of generalized linear models with covariate measurement error, and which is also used in our simulations and example, \(g_{yi}=\pi_{i}^{y_{i}}(1-\pi_{i})^{1-y_{i}}\) where π i =exp(η i )/[1+exp(η i )] and \(\eta_{i}=\mathbf{z}_{i}'\boldsymbol{\beta}_{z}+\mathbf {x}_{i}'\boldsymbol{\beta}_{x}\). For this model we employ the well-known closed-form approximation \(g_{1i}\approx(\pi_{i}^{*})^{y_{i}}(1-\pi_{i}^{*})^{1-y_{i}}\), where \(\pi_{i}^{*}=\exp(\eta^{*}_{i})/[1+\exp(\eta^{*}_{i})]\), \(\eta^{*}_{i}=\eta_{1i}\eta_{2i}^{-1/2}\), \(\eta_{1i}= \mathbf{z}_{i}'\boldsymbol{\beta}_{z}+ \boldsymbol{\xi}_{i}'\boldsymbol{\beta}_{x}\), \(\eta_{2i}=1+d\boldsymbol{\beta}_{x}'\boldsymbol{\Omega }_{i}\boldsymbol{\beta}_{x}\), and d=1/1.72 (e.g., Liang & Liu 1991). For this approximation,

where \(\boldsymbol{\xi}_{i}^{*}=\boldsymbol{\xi}_{i}-\eta_{1i}\eta _{2i}^{-1} \, d \boldsymbol{\Omega}_{i}\boldsymbol{\beta}_{x}\). These formulas complete explicit expressions for (A.1) and (A.2).

In our data analysis, we also apply a similar idea for the conventional regression calibration estimate of ϑ O , which uses the first-order approximation \(g_{1i}\approx (\pi_{i}^{\mathrm{RC}})^{y_{i}}(1-\pi_{i}^{ \mathrm{RC}})^{1-y_{i}}\) where \(\pi_{i}^{\mathrm{RC}}=\exp(\eta_{1i})/[1+\exp(\eta_{1i})]\). We estimate its variance matrix analogously to (17)–(18), using in (A.1) and (A.2) \(\partial g_{1i}/\partial \boldsymbol {\vartheta }_{\mathsf {O}}=(\partial g_{1i}/\partial\eta_{1i}) (\mathbf{z}_{i}',\boldsymbol{\xi}')'\) and \(\partial g_{1i}/\partial\boldsymbol{\theta}_{1i}'=(\partial g_{1i}/\partial\eta_{1i}) [\boldsymbol{\beta}_{x}', \mathbf{0}']\), where \(\partial g_{1i}/\partial\eta_{1i}=(-1)^{1-y_{i}} \pi_{i}^{\mathrm{RC}}\, (1-\pi_{i}^{\mathrm{RC}})\).

For other, less popular models, we must evaluate the integrals involved in (A.3)–(A.5). Note first that the partial derivatives \(\partial g_{xi}/\partial \boldsymbol{\theta}_{1i}'\) are given by

Substituting these into (A.5), we see that each of the integrals there, and also in (A.3) and (A.4), are of the form ∫h i (x i )g xi  dx i for some function h i (x i ) of x i , integrated over the multivariate normal density g xi =g(x i |w i ,z i ;ϑ ME ). This suggests that the integrals can be evaluated through Monte Carlo integration, by first generating M independent draws x ij , j=1,…,M, from \(g(\mathbf{x}_{i}|\mathbf{w}_{i}, \mathbf{z}_{i}; \widehat{\boldsymbol {\vartheta }}_{\mathsf{ME}})\), and then approximating the integrals by the averages \(M^{-1} \sum_{j=1}^{M} h_{i}(\mathbf{x}_{ij})\) for each of the h i (⋅). Only one set of random draws is needed for all the observations i, if we first generate M uncorrelated m-vectors u j of standard normal random variates and then calculate \(\mathbf{x}_{ij}=\widetilde{\boldsymbol{\xi}}_{i}+\mathbf {B}_{i}\mathbf{u}_{j}\), where \(\widehat{\boldsymbol{\Omega}}_{i}=\mathbf{B}_{i}\mathbf{B}_{i}'\).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Skrondal, A., Kuha, J. Improved Regression Calibration. Psychometrika 77, 649–669 (2012). https://doi.org/10.1007/s11336-012-9285-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-012-9285-1

Key words

Navigation