Improved wrong-model inference for generalized linear models for binary responses in the presence of link misspecification

Huang, Xianzheng

doi:10.1007/s10260-020-00529-3

Improved wrong-model inference for generalized linear models for binary responses in the presence of link misspecification

Original Paper
Published: 06 June 2020

Volume 30, pages 437–459, (2021)
Cite this article

Statistical Methods & Applications Aims and scope Submit manuscript

Xianzheng Huang ORCID: orcid.org/0000-0001-7077-0869¹

1 Citation
Explore all metrics

Abstract

In the framework of generalized linear models for binary responses, we develop parametric methods that yield estimators for regression coefficients less compromised by an inadequate posited link function. The improved inference are obtained without correcting a misspecified model, and thus are referred to as wrong-model inference. A byproduct of the proposed methods is a simple test for link misspecification in this class of models. Impressive bias reduction in estimators for the regression coefficients from the proposed methods and promising power of the proposed test to detect link misspecification are demonstrated in simulation studies. We also apply these methods to a classic data example frequently analyzed in the existing literature concerning this class of models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual Model Misspecification in Generalized Linear Models with Error in Variables

Link misspecification in generalized linear mixed models with a random intercept for binary responses

Article 03 August 2018

Random-intercept misspecification in generalized linear mixed models for binary responses

Article 16 February 2017

References

Aranda-Ordaz FJ (1981) On two families of transformations to additivity for binary response data. Biometrika 68(2):357–363
Article MathSciNet Google Scholar
Bliss CI (1935) The calculation of the dosage–mortality curve. Ann Appl Biol 22(1):134–167
Article Google Scholar
Boos DD, Stefanski LA (2013) Essential statistical inference: theory and methods. Springer, New York
Book Google Scholar
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu C (2006) Measurement error in nonlinear models: a modern perspective. Chapman & Hall/CRC, Boca Raton
Book Google Scholar
Cleveland WS, Devlin SJ (1988) Locally weighted regression: an approach to regression analysis by local fitting. J Am Stat Assoc 83(403):596–610
Article Google Scholar
Cook JR, Stefanski LA (1994) Simulation-extrapolation estimation in parametric measurement error models. J Am Stat Assoc 89(428):1314–1328
Article Google Scholar
Copas JB (1988) Binary regression models for contaminated data. J R Stat Soc Ser B (Methodol) 50(2):225–253
MathSciNet Google Scholar
Czado C, Santner TJ (1992) The effect of link misspecification on binary regression inference. J Stat Plan Infer 33(2):213–231
Article MathSciNet Google Scholar
Guerrero VM, Johnson RA (1982) Use of the box-cox transformation with binary response models. Biometrika 69(2):309–314
Article MathSciNet Google Scholar
Hosmer DW, Hosmer T, Le Cessie S, Lemeshow S (1997) A comparison of goodness-of-fit tests for the logistic regression model. Stat Med 16(9):965–980
Article Google Scholar
Jiang X, Dey DK, Prunier R, Wilson AM, Holsinger KE (2013) A new class of flexible link functions with application to species co-occurrence in cape floristic region. Ann Appl Stat 7(4):2180–2204
Article MathSciNet Google Scholar
Kim S, Chen MH, Dey DK (2007) Flexible generalized t-link models for binary response data. Biometrika 95(1):93–106
Article MathSciNet Google Scholar
McCullagh P, Nelder J (1989) Generalized linear models. Chapman & Hall/CRC, Boca Raton
Book Google Scholar
Morgan BJ (1983) Observations on quantit analysis. Biometrics 39(4):879–886
Article Google Scholar
Nelder JA, Wedderburn RW (1972) Generalized linear models. J R Stat Soc Ser A-G 135(3):370–384
Article Google Scholar
Pregibon D (1980) Goodness of link tests for generalized linear models. J R Stat Soc C-Appl 29(1):15–24
MATH Google Scholar
Samejima F (2000) Logistic positive exponent family of models: virtue of asymmetric item characteristic curves. Psychometrika 65(3):319–335
Article Google Scholar
Stefanski LA, Cook JR (1995) Simulation-extrapolation: the measurement error jackknife. J Am Stat Assoc 90(432):1247–1256
Article MathSciNet Google Scholar
Stukel TA (1988) Generalized logistic models. J Am Stat Assoc 83(402):426–431
Article MathSciNet Google Scholar
White H (1982) Maximum likelihood estimation of misspecified models. Econometrica 50(1):1–25
Article MathSciNet Google Scholar
Whittemore AS (1983) Transformations to linearity in binary regression. SIAM J Appl Math 43(4):703–710
Article MathSciNet Google Scholar

Download references

Acknowledgements

I would like to thank the Associate Editor and the two anonymous referees for their helpful suggestions and insightful comments that greatly improve of the manuscript.

Author information

Authors and Affiliations

University of South Carolina, Columbia, SC, USA
Xianzheng Huang

Authors

Xianzheng Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xianzheng Huang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 0 KB)

Appendix: Proof of equation (2)

Because the assumed GLM is specified by $P(Y=1|X)=H(\eta )$, where $\eta =\beta _0+\beta _1 X$, and the reclassified response is generated according to $P(Y^*=Y|Y, X)=\pi $, one has

$$\begin{aligned} P(Y^*=1|X)&= P(Y^*=1, Y=0|X)+P(Y^*=1, Y=1|X) \\&= P(Y=0|X)P(Y^*\ne Y|Y, X)+P(Y=1|X)P(Y^*=Y|Y, X) \\&= \{1-H(\eta )\}(1-\pi )+H(\eta )\pi \\&= (2\pi -1)H(\eta )+1-\pi . \end{aligned}$$

(15)

It follows that the likelihood function based on the assumed primary model for $Y^*$ evaluated at one data point $(Y^*, X)$ is $L(\pi , \varvec{\beta })=P(Y^*=1|X)^{Y^*}\{1-P(Y^*=1|X)\}^{1-Y^*}$, and the log-likelihood function is $\ell (\pi , \varvec{\beta })=Y^*\log P(Y^*=1|X)+(1-Y^*)\log \{1-P(Y^*=1|X)\}$.

Differentiating (15) with respect to each element in $(\pi , \varvec{\beta })$ gives

$$\begin{aligned} \begin{aligned} \frac{\partial P(Y^*=1|X)}{\partial \pi }&=2H(\eta )-1, \\ \frac{\partial P(Y^*=1|X)}{\partial \beta _0}&=(2\pi -1)H'(\eta ), \\ \frac{\partial P(Y^*=1|X)}{\partial \beta _1}&=(2\pi -1)H'(\eta )X. \end{aligned} \end{aligned}$$

(16)

Using (16), one can show that the three normal score functions associated with $\ell (\pi , \varvec{\beta })$ are given by

$$\begin{aligned} \frac{\partial \ell (\pi , \varvec{\beta })}{\partial \pi }&= Y^*\frac{2H(\eta )-1}{P(Y^*=1|X)}-(1-Y^*)\frac{2H(\eta )-1}{1-P(Y^*=1|X)},\\ \frac{\partial \ell (\pi , \varvec{\beta })}{\partial \beta _0}&= Y^*\frac{(2\pi -1)H'(\eta )}{P(Y^*=1|X)}-(1-Y^*)\frac{(2\pi -1)H'(\eta )}{1-P(Y^*=1|X)}, \\ \frac{\partial \ell (\pi , \varvec{\beta })}{\partial \beta _1}&= Y^*\frac{(2\pi -1)H'(\eta )X}{P(Y^*=1|X)}-(1-Y^*)\frac{(2\pi -1)H'(\eta )X}{1-P(Y^*=1|X)}. \end{aligned}$$

To further simplify notations, let $\mu =P(Y^*=1|X)$. The above three score functions can be re-expressed as

$$\begin{aligned} \begin{aligned} \frac{\partial \ell (\pi , \varvec{\beta })}{\partial \pi }&= \frac{Y^*-\mu }{\mu (1-\mu )}\{2H(\eta )-1\}, \\ \frac{\partial \ell (\pi , \varvec{\beta })}{\partial \beta _0}&= \frac{Y^*-\mu }{\mu (1-\mu )}(2\pi -1)H'(\eta ), \\ \frac{\partial \ell (\pi , \varvec{\beta })}{\partial \beta _1}&= \frac{Y^*-\mu }{\mu (1-\mu )}(2\pi -1)H'(\eta )X. \end{aligned} \end{aligned}$$

(17)

The expectation of the first score in (17) with respect to the true distribution of $(Y^*, X)$ is

$$\begin{aligned} E\left[ \frac{Y^*-\mu }{\mu (1-\mu )}\{2H(\eta )-1\}\right]&= E\left( E\left[ \left. \frac{Y^*-\mu }{\mu (1-\mu )}\{2H(\eta )-1\}\right| X\right] \right) \\&= E\left[ \frac{\mu _0-\mu }{\mu (1-\mu )}\{2H(\eta )-1\}\right] , \end{aligned}$$

where $\eta _0$ is equal to $\eta $ evaluated at the true value of $\varvec{\beta }$, and $\mu _0 =(2\pi -1)G(\eta _0)+1-\pi $, as defined in (4), is the mean of $Y^*$ given X under the correct model evaluated at the true parameter values. Setting this expectation equal to zero gives the first estimating equation in (2). Similarly, the expectations of the second and the third score functions in (17) with respect to the true distribution of $(Y^*, X)$ are given by

$$\begin{aligned} E\left[ \frac{Y^*-\mu }{\mu (1-\mu )}(2\pi -1)H'(\eta )\right]&= E\left[ \frac{\mu _0-\mu }{\mu (1-\mu )}H'(\eta )\right] (2\pi -1), \\ E\left[ \frac{Y^*-\mu }{\mu (1-\mu )}(2\pi -1)H'(\eta )X\right]&= E\left[ \frac{\mu _0-\mu }{\mu (1-\mu )}H'(\eta )X\right] (2\pi -1), \end{aligned}$$

respectively. Setting these two expectations equal to zero gives the second and the third equations in (2).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, X. Improved wrong-model inference for generalized linear models for binary responses in the presence of link misspecification. Stat Methods Appl 30, 437–459 (2021). https://doi.org/10.1007/s10260-020-00529-3

Download citation

Accepted: 18 May 2020
Published: 06 June 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10260-020-00529-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved wrong-model inference for generalized linear models for binary responses in the presence of link misspecification

Abstract

Access this article

Similar content being viewed by others

Dual Model Misspecification in Generalized Linear Models with Error in Variables

Link misspecification in generalized linear mixed models with a random intercept for binary responses

Random-intercept misspecification in generalized linear mixed models for binary responses

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 0 KB)

Appendix: Proof of equation (2)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improved wrong-model inference for generalized linear models for binary responses in the presence of link misspecification

Abstract

Access this article

Similar content being viewed by others

Dual Model Misspecification in Generalized Linear Models with Error in Variables

Link misspecification in generalized linear mixed models with a random intercept for binary responses

Random-intercept misspecification in generalized linear mixed models for binary responses

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (pdf 0 KB)

Appendix: Proof of equation (2)

Appendix: Proof of equation (2)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation