Abstract
We study maximum likelihood estimation of regression parameters in generalized linear models for a binary response with error-prone covariates when the distribution of the error-prone covariate or the link function is misspecified. We revisit the remeasurement method proposed by Huang et al. (Biometrika 93:53–64, 2006) for detecting latent-variable model misspecification and examine its operating characteristics in the presence of link misspecification. Furthermore, we propose a new diagnostic method for assessing assumptions on the link function. Combining these two methods yields informative diagnostic procedures that can identify which model assumption is violated and also reveal the direction in which the true latent-variable distribution or the true link function deviates from the assumed one.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alonso, A., Litière, S., & Laenen, A. (2010). A note on the indeterminacy of the random-effects distribution in hierarchical models. The American Statistician, 64, 318–324.
Brown, C. C. (1982). On a goodness-of-fit test for the logistic model based on score statistics. Communications in Statistics, 11, 1087–1105.
Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in non-linear models: A modern perspective (2nd ed.). Boca Raton: Chapman & Hall/CRC.
Chambers, E., & Cox, D. (1967). Discrimination between alternative binary response models. Biometrika, 67, 250–251.
Czado, C., & Santner, T. J. (1992). The effect of link misspecification on binary regression inference. Journal of Statistical Planning and Inference, 33, 213–231.
Fowlkes, E. B. (1987). Some diagnostics for binary regression via smoothing. Biometrika, 74, 503–515.
Hosmer, D. W., Hosmer, T., Le Cessie, S., & Lemeshow, S. (1997). A comparison of goodness-of-fit tests for the logistic regression model. Statistics in Medicine, 16, 965–980.
Hosmer, D. W., & Lemeshow, S. (1989). Applied logistic regression. New York: Wiley.
Huang, X. (2009). An improved test of latent-variable model misspecification in structural measurement error models for group testing data. Statistics in Medicine, 28, 3316–3327.
Huang, X., Stefanski, L. A, & Davidian, M. (2006). Latent-model robustness in structural measurement error models. Biometrika, 93, 53–64.
Huang, X., Stefanski, L. A., & Davidian, M. (2009). Latent-model robustness in joint modeling for a primary endpoint and a longitudinal process. Biometrics, 65, 719–727.
Kannel, W. B., Neaton, J. D., Wentworth, D., Thomas, H. E., Stamler, J., Hulley, S. B., et al. (1986). Overall and coronary heart disease mortality rates in relation to major risk factors in 325,348 men screened for MRFIT. American Heart Journal, 112, 825–836.
Le Cessie, S., & van Houwelingen, J. C. (1991). A goodness-of-fit test for binary data based on smoothing residuals. Biometrics, 47, 1267–1282.
Li, K., & Duan N. (1989). Regression analysis under link violation. The Annals of Statistics, 17, 1009–1052.
Ma, Y., Hart, J. D., Janicki, R., & Carroll, R. J. (2011). Local and omnibus goodness-of-fit tests in classical measurement error models. Journal of the Royal Statistical Society: Series B, 73, 81–98.
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). Boca Raton: Chapman & Hall/CRC.
Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society: Series A 135, 370–384.
Pregibon, D. (1980). Goodness of link tests for generalized linear models. Journal of the Royal Statistical Society: Series C 29, 15–24.
Stefanski, L. A., & Carroll, R. J. (1990). Deconvoluting kernel density estimators. Statistics, 21, 169–184.
Stukel, T. A. (1988). Generalized logistic models. Journal of American Statistical Association, 83, 426–431.
Tsiatis, A. A. (1980). A note on a goodness-of-fit test for logistic regression model. Biometrika, 67, 250–251.
Verbeke, G., & Molenberghs, G. (2010). Arbitrariness of models for augmented and coarse data, with emphasis on incomplete-data and random-effects models. Statistical Modelling, 10, 391–419.
Wang, X., & Wang, B. (2011). Deconvolution estimation in measurement error models: The R package decon. Journal of Statistical Software, 39, 1–24.
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50, 1–25.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1: Likelihood and Score Functions Referenced in Sect. 3.2
1.1 Likelihood and Score Functions Under the Assumed Model
If one posits a probit link in the primary model and assumes X ∼ N(μ x , σ x 2), the observed-data likelihood for subject i is
where Φ(⋅ ) is the cumulative distribution function (cdf) of N(0, 1), and
If the reclassification model is
the likelihood of the ith reclassified data, (Y i ∗, W i ), under the assumed model is
Differentiating the logarithm of (6) with respect to β yields the normal scores associated with β based on the raw data with measurement error only in X; and, similarly, differentiating the logarithm of (10) with respect to β gives the counterpart normal scores for the reclassified data with measurement error in both X and Y. These two sets of scores are respectively
where
and h i ′(β) = (∂∕∂ β)h i (β) consists of the following two elements,
A close inspection of the scores in (11) and (12) reveals some values of π i that one should avoid when specifying the reclassification model in (9). First, note that the score function in (12) is identically zero if π i = 0. 5 for all i = 1, …, n. Consequently, β is non-estimable from the reclassified data generated according to P(Y i ∗ = Y i | W i ) = 0. 5 for all i = 1, …, n. This is not surprising as, with all π i ’s equal to 0.5, {Y i ∗} i = 1 n virtually contains no information of the true responses. Second, the two sets of scores are equal when π i = 0 for i = 1, …, n, or, π i = 1 for i = 1, …, n. This is also expected as this is the case where {Y i ∗} i = 1 n literally contains the same information as {Y i } i = 1 n, and hence MLEs of β from these two data sets are identical, whether or not the assumed model is correct. Therefore, for the purpose of model diagnosis, we avoid setting π i in (9) identically as 0.5, or 0, or 1, for all i = 1, …, n.
1.2 Score Estimating Equations
Under regularity conditions, the limiting MLE of β based on the raw data and that based on the reclassified data as n → ∞, β m and β c , uniquely satisfy the following score equations respectively,
where the subscripts attached to E{⋅ } signify that the expectations are defined with respect to the relevant true model.
Using iterated expectations, one can show that (14) boils down the following set of two equations,
where p i is the mean of Y i given W i under the true model, that is, p i = P (t)(Y i = 1 | W i ) evaluated at β (the true parameter value), for i = 1, …, n. Similarly, one can deduce that (15) is equivalent to the following system of equations,
where q i is the mean of Y i ∗ given W i under the true model, that is,
1.3 Likelihood Function Under the True Model
Under the mixture-probit-normal model specified in Sect. 3.2, the likelihood of (Y i , W i ) is
where, for ℓ = 1, 2,
It follows that, as the true mean of Y i given W i ,
Evaluating (20) at this p i , one obtains the true mean of Y i ∗ given W i , that is, q i = P (t)(Y i ∗ = 1 | W i ), and further deduces that the true-model likelihood of the reclassified data (Y i ∗, W i ) is, for i = 1, …, n,
Appendix 2: Limiting Maximum Likelihood Estimators When β 1 = 0
When β 1 = 0, the limiting MLEs of β are given in the following proposition.
Proposition 1.
Suppose that the true primary model is a GLM with a mixture probit link and β 1 = 0. Under the assumed probit-normal model, β c = β m = (β 0m , 0) t , where
The proof is given next, which does not depend on the true X-model or the reclassification model. Proposition 1 indicates that, if β 1 = 0, β m does not depend on σ u 2, suggesting that RM cannot detect either misspecification. Also, β c does not depend on π i , which defeats the purpose of creating reclassified data, hence RC does not help in model diagnosis either. This implication should not raise much concern because, after all, now β 1m = β 1c = β 1( = 0), suggesting that MLEs of β 1 remain consistent despite model misspecification.
Proof.
By the uniqueness of the solution to (14), it suffices to check if β m = (β 0m , 0)t solves (16)–(17), where β 0m is given in (23).
Because β 1 = 0,
Suppose one assumes for now that β 1m = 0, then by, (8), h i (β m ) = β 0m . With both h i (β m ) and p i in (24) free of W i , (16) reduces to p i −Φ{h i (β m )} = 0, or, Φ(β 0m ) = p i . Therefore, β 0m = Φ −1(p i ), which proves (23). And with p i −Φ{h i (β m )} = 0, (17) holds automatically. This completes proving the result regarding β m .
Next we show that β m established above also solves (18)–(19), that is, β c = β m . Suppose β 1c = 0, then h i (β c ) = β 0c , and d i (β c ) = (1 −π i )Φ(β 0c ) +π i Φ(−β 0c ). Note that, inside (18), with q i = π i p i + (1 −π i )(1 − p i ) and d i (β c ) = (1 −π i )Φ(β 0c ) +π i Φ(−β 0c ), one has 1 − d i (β c ) − q i = (1 − 2π i ){p i −Φ(β 0c )}. Therefore, if β 0c = Φ −1(p i ), then 1 − d i (β c ) − q i = 0 and (18) holds for all π i . Furthermore, 1 − d i (β c ) − q i = 0 immediately makes (19) hold. This shows that β c = β m .
This completes the proof for Proposition 1. □
Appendix 3: Proof of Proposition 3.1
The following four results are crucial for proving Proposition 1. For clarity, we incorporate the dependence of h i (β) in (8) on W i by re-expressing this function as h(β 0, β 1, w), with the subscript i suppressed.
-
(R1) If μ x = 0, then h(−β 0m , β 1m , −w) = −h(β 0m , β 1m , w).
-
(R2) If μ x = 0, then \(\phi \left \{h(-\beta _{0m},\beta _{1m},-w)\right \} = C\phi \left \{h(\beta _{0m},\beta _{1m},w)\right \}\), where C does not depend on w.
-
(R3) If f 1(x) = f 2(−x) and f U(u) = f U(−u), then f W (1)(w) = f W (2)(−w), where f U(u) is the pdf of the measurement error U, f W (1)(w) and f W (2)(w) are the pdf of W when the pdf of X is f 1(x) and f 2(x), respectively.
-
(R4) If f 1(x) = f 2(−x), f U(u) = f U(−u), H 1(s) = 1 − H 2(−s), μ x = 0, and β 0 = 0, then p (22)(−w) = 1 − p (11)(w), where p (jk)(w) denotes the conditional mean of Y i given W i = w under the true model \(f_{j}(x) \curlywedge H_{k}(s)\), for j, k = 1, 2.
The first two results, (R1) and (R2), follow directly from the definition of h i (β) in (8); (R3) can be easily proved by using the convolution formula based on the error model given in Eq. (2) in the main article. The proof for (R4) is given next.
Proof.
By the definition of p (jk)(w), one has, with β 0 = 0,
Similarly, p (22)(−w) is equal to
This completes the proof of (R4).
Now we are ready to show Proposition 3.1. In essence, we will show that, if (β 0m , β 1m ) solves (16)–(17) when the true model is \(f_{1}(x) \curlywedge H_{1}(s)\), then (−β 0m , β 1m ) solves (16)–(17) when the true model is \(f_{2}(x) \curlywedge H_{2}(s)\). More specifically, evaluating (16) and (17) at its solution under the true model \(f_{1}(x) \curlywedge H_{1}(s)\), we will show that the following two equations,
imply the following two identities,
Take (28) as an example, the left-hand side of it is equal to, by (R1)–(R4) and Φ(−t) = 1 −Φ(t),
Following similar derivations, one can show that the left-hand side of (27) is equal to
which is also equal to 0 according to (25). Therefore, β 0m (11) = −β 0m (22) and β 1m (11) = β 1m (22). This completes the proof of Proposition 3.1.
Appendix 4: A Counterpart Proposition of Proposition 3.1 for β c
Proposition 2.
Let f 1 (x) and f 2 (x) be two pdf’s specifying two true X-distributions that are symmetric of each other, and let H 1 (s) and H 2 (s) be two true links that are symmetric of each other. Denote by β c (jk) the limiting MLE of β based on reclassified data generated according to P(Y i ∗ = Y i |W i ) = π(W i ) when the true model is \(f_{j}(x) \curlywedge H_{k}(s)\) , for j,k = 1,2. If μ x = β 0 = 0 and π(t) is an even function or π(t) satisfies π(−t) = 1 −π(t), then β 0c (11) = −β 0c (22) and β 1c (11) = β 1c (22).
We will elaborate the proof when π(t) is an even function in this Appendix. The following two lemmas are needed in the proof, one lemma concerning d i (β) defined in (13), and the other relates to q i defined in (20). To elaborate the dependence of d i (β) on W i in (13), we re-express this function as d(β 0, β 1, w), with the subscript i suppressed.
Lemma 1.
If μ x = 0 and π(t) is an even function, then d(−β 0c ,β 1c ,−w) = 1 − d(β 0c ,β 1c ,w).
Proof.
By (13),
This completes the proof of Lemma 1. □
Lemma 2.
If f 1 (x) = f 2 (−x), f U (u) = f U (−u), H 1 (s) = 1 − H 2 (−s), μ x = 0, β 0 = 0, and π(t) is an even function, then q (22) (−w) = 1 − q (11) (w), where q (jk) (w) denotes the conditional mean of Y i ∗ given W i = w under the true model \(f_{j}(x) \curlywedge H_{k}(s)\) , for j,k = 1,2.
Proof.
By (20),
This completes the proof of Lemma 2. Following similar derivations, one can show that q (12)(−w) = 1 − q (21)(w).
If, instead of being an even function, π(t) satisfies π(−t) = 1 −π(t), then the conclusion in Lemma 1 becomes d(−β 0c , β 1c , −w) = d(β 0c , β 1c , w), and the conclusion in Lemma 2 changes to q (22)(−w) = q (11)(w).
Now we are ready to show that, if (β 0c , β 1c ) solves (18)–(19) under the true model \(f_{1}(x) \curlywedge H_{1}(s)\), then (−β 0c , β 1c ) solves (18)–(19) under the true model f 2(x) ⋏ H 2(s). Given that (β 0c , β 1c ) solves (18) and (19) under the true model \(f_{1}(x) \curlywedge H_{1}(s)\), one has, by elaborating (18) and (19),
Now we check if (−β 0c , β 1c ) solves (18)–(19) under the true model \(f_{2}(x) \curlywedge H_{2}(s)\). Plugging (−β 0c , β 1c ) in (18) gives, where we set v = −w in the first equality,
Similarly, one can show that (30) implies
Hence, (−β 0c , β 1c ) does solve (18)–(19) under the true model \(f_{2}(x) \curlywedge H_{2}(s)\). In other words, β 0c (11) = −β 0c (22) and β 1c (11) = β 1c (22). Following parallel arguments as above one can show that β 0c (12) = −β 0c (21) and β 1c (12) = β 1c (21). This completes the proof of Proposition 2. □
Appendix 5: Additional Simulation Results from Sect. 4
When the assumed model is not probit-normal or the true model is not in the class of mixture-probit-normal, analytic exploration, as elaborated in Appendices 1–4 leading to the properties of the limiting MLEs of β, β m and β c , become infeasible. To provide empirical justification of these results, such as those summarized in Proposition 1 and (M1) in Sect. 3.3, under and outside this assumed/true-model configuration, Table 3 presents Monte Carlo averages of \(\hat{\beta }_{m}\) and \(\hat{\beta }_{c}\) obtained under some simulation settings considered or mentioned in Sect. 4. When computing \(\hat{\beta }_{c}\), we consider two forms of π(t) in the reclassification model P(Y b, i ∗ = Y i | W i ) = π(W i ). One is used in Sect. 4, i.e., P(Y b, i ∗ = Y i | W i ) = Φ(W i ), and the other is P(Y b, i ∗ = Y i | W i ) = 0. 2. The former π(t) satisfies the condition that π(−t) = 1 −π(t), and the latter is an even function, providing two examples satisfying the condition regarding π(t) under Proposition 2.
Table 4 provides rejection rates across 1000 Monte Carlo replicates when data are generated from four true models in the class of generalized-logit-normal and the assumed model is logit-normal. Overall the operating characteristics of all considered tests are very similar to those when the assumed model is probit-normal (see the lower half of Table 1). Indeed, from a practical point of view when it comes to model diagnosis, it should not matter whether one assumes probit-normal or logit-normal. If one concludes existence of model misspecification under one assumed model, certainly one should not believe in the other assumed model. If one concludes lack of sufficient evidence of model misspecification under one assumed model, the other assumed model is clearly equally plausible. After all, the logit link and the probit link are virtually indistinguishable in most inference contexts (Chambers and Cox, 1967).
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Huang, X. (2016). Dual Model Misspecification in Generalized Linear Models with Error in Variables. In: Jin, Z., Liu, M., Luo, X. (eds) New Developments in Statistical Modeling, Inference and Application. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-42571-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-42571-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42570-2
Online ISBN: 978-3-319-42571-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)