Abstract
In this paper, we consider the problem of censored Gamma regression when the censoring status is missing at random. Three estimation methods are investigated. They consist in solving a censored maximum likelihood estimating equation where missing data are replaced by values adjusted using either regression calibration or multiple imputation or inverse probability weights. We show that the resulting estimates are consistent and asymptotically normal. Moreover, while asymptotic variances in missing data problems are generally estimated empirically (using Rubin’s rules for example), we propose closed-form consistent variances estimates based on explicit formulas for the asymptotic variances of the proposed estimates. A simulation study is conducted to assess finite-sample properties of the proposed parameters and asymptotic variances estimates.
Similar content being viewed by others
REFERENCES
P. K. Andersen, Ø. Borgan, R. D. Gill, and N. Keiding, Statistical Models Based on Counting Processes (Springer Series in Statistics, Springer, New York, 1993).
P. W. Bernhardt, H. J. Wang, and D. Zhang, ‘‘Flexible modeling of survival data with covariates subject to detection limits via multiple imputation,’’ Computational Statistics and Data Analysis 69, 81–91 (2014).
B. Bousselmi, J.-F. Dupuy, and A. Karoui, ‘‘Censored count data regression with missing censoring information,’’ Electronic Journal of Statistics 15 (2), 4343–4383 (2021).
E. Brunel, F. Comte, and A. Guilloux, ‘‘Nonparametric estimation for survival data with censoring indicators missing at random,’’ Journal of Statistical Planning and Inference 143 (10), 1653–1671 (2013).
W. Checkley, J. Guzman-Cottrill, L. Epstein, N. Innocentini, J. Patz, and S. Shulman, ‘‘Short-term weather variability in Chicago and hospitalizations for Kawasaki disease,’’ Epidemiology 49 (3), 588–601; 20 (2), 194–201 (2009).
T. A. Dignam, J. Lojo, P. A. Meyer, E. Norman, A. Sayre, and W. D. Flanders, ‘‘Reduction of elevated blood lead levels in children in North Carolina and Vermont, 1996–1999,’’ Environmental Health Perspectives 116 (7), 981–985 (2008).
R. V. Foutz, ‘‘On the unique consistent solution to the likelihood equations,’’ Journal of the American Statistical Association 72, 147–148 (1977).
X. Guo, C. Niu, Y. Yang, and W. Xu, ‘‘Empirical likelihood for single index model with missing covariates at random,’’ Statistics 49 (3), 588–601 (2015).
D. G. Horvitz and D. J. Thompson, ‘‘A generalization of sampling without replacement from a finite universe,’’ Journal of the American Statistical Association 47 (260), 663–685 (1952).
P. de Jong and G. Z. Heller, Generalized Linear Models for Insurance Data. International Series on Actuarial Science (Cambridge University Press, 2008).
S. Mandal, R. Arabi Belaghi, M. Akram, and M. Aminnejad, ’’Stein-type shrinkage estimators in gamma regression model with application to prostate cancer data,’’ Statistics in Medicine 38 (22), 4310–4322 (2019).
P. McCullagh and J. A. Nelder, Generalized Linear Models (Chapman and Hall/CRC Monographs on Statistics and Applied Probability, Taylor and Francis, 1989).
I. W. McKeague and S. Subramanian, ‘‘Product-limit estimators and Cox regression with missing censoring information,’’ Scandinavian Journal of Statistics 25 (4), 589–601 (1998).
R Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing (Vienna, Austria, 2020). https://www.R-project.org/
J. M. Robins, A. Rotnitzky, and L. P. Zhao, ‘‘Estimation of regression coefficients when some regressors are not always observed,’’ Journal of the American Statistical Association 89 (427), 846–866 (1994).
D. B. Rubin, Multiple Imputation for Nonresponse in Surveys (Wiley Series in Probability and Statistics, John Wiley and Sons Inc., New York, 1987).
F. Sigrist and W. Stahel, ‘‘Using the Censored Gamma Distribution for Modeling Fractional Response Variables with an Application to Loss Given Default,’’ ASTIN Bulletin 41 (2), 673–710 (2011).
S. Subramanian, ’’Survival analysis for the missing censoring indicator model using kernel density estimation techniques,’’ Statistical methodology 3 (2), 125–136 (2006).
S. Subramanian, ’’Multiple imputations and the missing censoring indicator model,’’ Journal of Multivariate Analysis 102 (1), 105–117 (2011).
J. A. Steingrimsson and R. L. Strawderman, ‘‘Estimation in the semiparametric accelerated failure time model with missing covariates: improving efficiency through augmentation,’’ Journal of the American Statistical Association 112 (519), 1221–1235 (2017).
Y. Sun, X. Qian, Q. Shou, and P. B. Gilbert, ‘‘Analysis of two-phase sampling data with semiparametric additive hazards models,’’ Lifetime Data Analysis 23 (3), 377–399 (2017).
J. V. Terza, ‘‘A Tobit-type estimator for the censored Poisson regression model,’’ Economics Letters 18 (4), 361–365 (1985).
A. A. Tsiatis, Semiparametric Theory and Missing Data (Springer Series in Statistics, Springer New York, 2007).
A. W. van der Vaart, Asymptotic Statistics (Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, 2000).
I. W. Verburg, N. F. de Keizer, E. de Jonge, and N. Peek, ‘‘Comparison of regression methods for modeling intensive care length of stay,’’ PloS One 9 (10) (2014).
C. Y. Wang and H. Y. Chen, ‘‘Augmented inverse probability weighted estimator for Cox missing covariate regression,’’ Biometrics 57 (2), 414–419 (2001).
Q. Wang, G. E. Dinse, and C. Liu, ‘‘Hazard function estimation with cause-of-death data missing at random,’’ Annals of the Institute of Statistical Mathematics 64 (2), 415–438 (2012).
Q. Wang and J. Shen, ‘‘Estimation and confidence bands of a conditional survival function with censoring indicators missing at random,’’ Journal of Multivariate Analysis 99 (5), 928–948 (2008).
H. White, ‘‘Maximum Likelihood Estimation of Misspecified Models,’’ Econometrica 50 (1), 1–25 (1982).
S. N. Wood, Generalized Additive Models: An Introduction with R (CRC Press, 2017).
J. Yang, M. Jun, C. Schumacher, and R. Saravanan, ‘‘Predictive statistical representations of observed and simulated rainfall using generalized linear models,’’ Journal of Climate 32 (11), 3409–3427 (2019).
ACKNOWLEDGMENTS
The author is grateful to the referee and Associate Editor for their comments and suggestions that led substantial improvements of this paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
PROOF OF THEOREM 4.1 (CONSISTENCY)
First, let
and \(\psi(\nu):=\partial\log\Gamma(\nu)/\partial\nu\). We have:
with
The proof is based on Foutz’s [17] consistency theorem for maximum likelihood type estimates. The following conditions must be established.
-
C1. \(\partial\dot{\ell}^{1}_{n}(\nu,\beta,\hat{\theta}_{n})/\partial(\nu,\beta^{\top})\) exists and is continuous in an open neighborhood of \((\nu_{0},\beta_{0})\).
-
C2. \(n^{-1}\dot{\ell}^{1}_{n}(\nu_{0},\beta_{0},\hat{\theta}_{n})\) converges in probability to \(0\) as \(n\rightarrow\infty\).
-
C3. \(n^{-1}\partial\dot{\ell}^{1}_{n}(\nu,\beta,\hat{\theta}_{n})/\partial(\nu,\beta^{\top})\) converges in probability to \(-\Omega_{1}(\nu,\beta)\) uniformly in an open neighborhood of \((\nu_{0},\beta_{0})\).
Condition C1. The map \((\nu,\beta)\mapsto\ell^{1}_{n}(\nu,\beta,\theta)\) is twice differentiable with respect to \(\nu\) and \(\beta\) and its second-order derivative is the \((p+1)\times(p+1)\) matrix
where
(\(\partial g_{\nu,\beta}(\widetilde{Y}_{i},\mathbf{X}_{i})/\partial\nu\) and \(\partial h_{\nu,\beta}(\widetilde{Y}_{i},\mathbf{X}_{i})/\partial\nu\) are given in Section 4.1). These terms are continuous in \(\nu\) and \(\beta\), for every \(\theta\).
Condition C2. We decompose
We show that
(proof for the second term proceeds similarly). First, the weak law of large numbers implies that \(\frac{1}{n}\sum_{i=1}^{n}L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})\) converges in probability to
Under MAR,
Hence,
which is the expectation of the score (with respect to \(\nu\)) in the censored Gamma regression model with no missing data, and is equal to 0. Next, we turn to the term \(\frac{1}{n}\sum_{i=1}^{n}\{L_{1,\nu_{0},\beta_{0},\hat{\theta}_{n}}(\mathcal{D}_{i})-L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})\}\). We have:
by condition H3. Moreover, under H1 and H2, there exists a constant \(0<c_{1}<\infty\) such that \(|\log\widetilde{Y}_{i}+\log\nu_{0}+1-\beta_{0}^{\top}\mathbf{X}_{i}-\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})|\leq c_{1}\). It follows that
Finally, the consistency of \(\hat{\theta}_{n}\) implies that \(\frac{1}{n}\sum_{i=1}^{n}\{L_{1,\nu_{0},\beta_{0},\hat{\theta}_{n}}(\mathcal{D}_{i})-L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})\}=o_{p}(1)\), which concludes the proof of (6.10).
Condition C3. The pointwise convergence in probability of \(n^{-1}\partial\dot{\ell}^{1}_{n}(\nu,\beta,\hat{\theta}_{n})/\partial(\nu,\beta^{\top})\) to \(-\Omega_{1}(\nu,\beta)\) is proved in Lemma 2 below. Then, under H1 and H2, the derivative of \(n^{-1}\partial\dot{\ell}^{1}_{n}(\nu,\beta,\hat{\theta}_{n})/\partial(\nu,\beta^{\top})\) with respect to \((\nu,\beta^{\top})\) is bounded, for every \(n\). Hence, the sequence \((n^{-1}\partial\dot{\ell}^{1}_{n}(\nu,\beta,\hat{\theta}_{n})/\partial(\nu,\beta^{\top}))\) is equicontinuous and it follows from Ascoli theorem that the convergence of \(n^{-1}\partial\dot{\ell}^{1}_{n}(\nu,\beta,\hat{\theta}_{n})/\partial(\nu,\beta^{\top})\) to \(-\Omega_{1}(\nu,\beta)\) is uniform.
Having proved the conditions of Foutz’s [17] consistency theorem, we conclude that \((\hat{\nu}^{1}_{n},\hat{\beta}^{1}_{n})\) converges in probability to \((\nu_{0},\beta_{0})\). \(\Box\)
TWO LEMMAS
Let \(\mathbb{G}_{n}\) denote the empirical process [24]. With this notation, we have
Lemma 1. Assume that H1, H2, H3 hold. Then
Proof. First, we prove that the classes of functions \(\{L_{1,\nu_{0},\beta_{0},\theta}:\theta\in\Theta\}\) and \(\{L_{2,\nu_{0},\beta_{0},\theta}:\theta\in\Theta\}\) are Donsker (we refer the reader to [24] for a detailed account on Donsker classes and related notions, such as \(\varepsilon\)-brackets and bracketing entropy). For illustrative purpose, we decompose \(L_{1,\nu_{0},\beta_{0},\theta}\) as
where \(L_{1,\nu_{0},\beta_{0},\theta}^{I}(\mathcal{D}_{i})=-\delta_{i}^{1}(\theta)\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}+h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})-\psi(\nu_{0})\) and \(L_{1,\nu_{0},\beta_{0},\theta}^{II}(\mathcal{D}_{i})=\delta_{i}^{1}(\theta)(\log\widetilde{Y}_{i}+\log\nu_{0}+1-\beta_{0}^{\top}\mathbf{X}_{i}-h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i}))\), and we show that the class \(\{L_{1,\nu_{0},\beta_{0},\theta}^{I}:\theta\in\mathbb{R}\}\) is Donsker (the arguments for \(\{L_{1,\nu_{0},\beta_{0},\theta}^{II}:\theta\in\mathbb{R}\}\) and \(\{L_{2,\nu_{0},\beta_{0},\theta}:\theta\in\Theta\}\) are similar).
Let \(d(\Theta)\) denote the diameter of \(\Theta\). Since the size of \(\Theta\) in every direction is at most \(d(\Theta)\), we can cover \(\Theta\) with less than \((d(\Theta)/\ell)^{p+1}\) cubes of length \(\ell\). The circumscribed balls have radius a multiple \(\tilde{\ell}:=\lambda\times\ell\) of \(\ell\) (with \(\lambda>0\)) and these balls also cover \(\Theta\). Now, let \(\theta\in\Theta\) and consider the class of functions
where \(\mathcal{B}(\theta,\tilde{\ell})\) is the ball of center \(\theta\) and radius \(\tilde{\ell}\). If \(\tilde{\theta}\in\mathcal{B}(\theta,\tilde{\ell})\), the condition H3 implies:
and thus
It follows that \(L_{1,\nu_{0},\beta_{0},\tilde{\theta}}^{I}\) can be bracketed between two functions \(L_{\theta}\) and \(U_{\theta}\) defined by:
and
That is, \(L_{\theta}(\mathcal{D}_{i})\leq L_{1,\nu_{0},\beta_{0},\tilde{\theta}}^{I}(\mathcal{D}_{i})\leq U_{\theta}(\mathcal{D}_{i})\). Moreover, under conditions H1 and H2, there exists a constant \(0<c_{2}<\infty\) such that \(\mathbb{E}(U_{\theta}(\mathcal{D}_{i})-L_{\theta}(\mathcal{D}_{i}))^{2}<c_{2}\tilde{\ell}^{2}\,\mathbb{E}(h^{2}(\mathbf{U}_{i}))\). Hence \([L_{\theta},U_{\theta}]\) is a \(\varepsilon\)-bracket for the class \(\{L_{1,\nu_{0},\beta_{0},\tilde{\theta}}^{I}:\tilde{\theta}\in\Theta\cap\mathcal{B}(\theta,\tilde{\ell})\}\) with \(\varepsilon=\sqrt{c_{2}\tilde{\ell}^{2}\,\mathbb{E}(h^{2}(\mathbf{U}_{i}))}\).
Since we can cover \(\Theta\) with less than \((d(\Theta)/\ell)^{p+1}\) balls of radius \(\tilde{\ell}\), we can cover \(\{L_{1,\nu_{0},\beta_{0},\theta}^{I}:\theta\in\mathbb{R}\}\) with less than \((d(\Theta)/\ell)^{p+1}\) \(\varepsilon\)-brackets \([L_{\theta},U_{\theta}]\), with \(\varepsilon=\sqrt{c_{2}\tilde{\ell}^{2}\,\mathbb{E}(h^{2}(\mathbf{U}_{i}))}\). The number of such \(\varepsilon\)-brackets is thus bounded by \((\lambda\,d(\Theta)\sqrt{c_{2}\,\mathbb{E}(h^{2}(\mathbf{U}_{i}))}/\varepsilon)^{p+1}\), which is of order \(\varepsilon^{-(p+1)}\). Hence, the bracketing entropy is of order \(\int\limits_{0}^{1}\sqrt{-(p+1)\log\varepsilon}\,d\varepsilon\), which is finite. Therefore, the class of functions \(\{L_{1,\nu_{0},\beta_{0},\theta}^{I}:\theta\in\mathbb{R}\}\) is Donsker, by [24, Theorem 19.5].
Using similar arguments, we can prove that \(\{L_{1,\nu_{0},\beta_{0},\theta}^{II}:\theta\in\mathbb{R}\}\) is also Donsker and since sums of Donsker classes are Donsker, the class \(\{L_{1,\nu_{0},\beta_{0},\theta}:\theta\in\Theta\}\) is Donsker. Similarly, we can prove that \(\{L_{2,\nu_{0},\beta_{0},\theta}:\theta\in\Theta\}\) is Donsker. It follows that the sequence of processes \(\{\mathbb{G}_{n}L_{\nu_{0},\beta_{0},\theta}:\theta\in\Theta\}\) converges weakly to a tight limit process, and as such, is stochastically equicontinuous. Thus, [23, Lemma 14.3] and the consistency of \(\hat{\theta}_{n}\) imply that \(\mathbb{G}_{n}(L_{\nu_{0},\beta_{0},\hat{\theta}_{n}}-L_{\nu_{0},\beta_{0},\theta_{0}})=o_{p}(1)\), which concludes the proof. \(\Box\)
Lemma 2. Assume that H1, H2, H3 hold. Then
Proof. We have
Note first that
Then, taking the iterated conditional expectation with respect to \(\mathbf{U}_{1}\) in \(\partial\mathbb{E}(\dot{\ell}_{1}^{1}(\nu,\beta,\theta))/\partial\theta^{\top}\) and differentiating with respect to \(\theta\), we obtain, after some algebra, that
We turn to the second statement. We note
where
and
are given by (A.2). Then, decompose \(n^{-1}\partial\dot{\ell}_{n}^{1}(\nu_{0},\beta_{0},\hat{\theta}_{n})/\partial(\nu,\beta^{\top})\) as
We show that the first term in the sum (B.2) converges to 0 and that the second term converges to \(-\Omega_{1}(\nu_{0},\beta_{0})\). For illustration purpose, we show that \(C_{n}(\nu_{0},\beta_{0},\hat{\theta}_{n})-C_{n}(\nu_{0},\beta_{0},\theta_{0})=o_{p}(1)\) and \(C_{n}(\nu_{0},\beta_{0},\theta_{0})=-\Omega_{1}^{\beta,\beta}(\nu_{0},\beta_{0})+o_{p}(1)\). The other terms can be dealt with similarly. We have:
where
Under H1 and H2, there exists a finite constant \(c_{3}>0\) such that \(||\mathcal{W}_{i}||\leq c_{3}\). Therefore,
Consistency of \(\hat{\theta}_{n}\) implies that \(C_{n}(\nu_{0},\beta_{0},\hat{\theta}_{n})-C_{n}(\nu_{0},\beta_{0},\theta_{0})\) converges in probability to 0. Next, by the weak law of large numbers, \(C_{n}(\nu_{0},\beta_{0},\theta_{0})\) converges in probability to
Iterating this expectation with respect to \(\mathbf{U}_{i}\) and using the fact that \(\mathbb{E}(\delta_{i}^{1}(\theta_{0})|\mathbf{U}_{i})=m_{\theta_{0}}(\mathbf{U}_{i})\) (see (B.1)), it is immediate to see that (B.3) coincides with \(-\Omega_{1}^{\beta,\beta}(\nu_{0},\beta_{0})\).
Finally, using similar arguments on the other terms of (B.2) concludes the proof. \(\Box\)
PROOF OF THEOREM 4.1 (ASYMPTOTIC NORMALITY)
A first-order Taylor expansion of \(\dot{\ell}^{1}_{n}(\hat{\nu}^{1}_{n},\hat{\beta}^{1}_{n},\hat{\theta}_{n})\) around \((\nu_{0},\beta_{0})\) yields
From Lemma 1, we get
where the second line follows from a Taylor expansion of \(\mathbb{E}(\dot{\ell}_{1}^{1}(\nu_{0},\beta_{0},\hat{\theta}_{n}))\) around \(\theta_{0}\). Combining this expression with (C.1) and Lemma 2, we obtain
where
First, note that
Next, under the MAR assumption, it is easy to see that \(\mathbb{E}(K_{i})=0\) and that
Therefore,
Next, we calculate \(\textrm{cov}(L_{\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i}),K_{i})=\mathbb{E}(L_{\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})K_{i}^{\top})\). Note first that
where
where the third to fourth equality follows from
Hence
Similar calculations yield
(details are omitted) and thus
Using (C.3)–(C.5), we obtain
Finally, Theorem 4.1 follows by applying the multivariate central limit theorem (combined with Slutsky’s theorem and Lemma 2) to (C.2). \(\Box\)
A SKETCH OF THE PROOF OF THEOREM 4.2
Proofs are similar to those for the regression calibration estimate and we omit them. We only mention that following Taylor’s expansion of \(\partial\ell_{n,j}^{2}(\hat{\nu}^{2}_{n},\hat{\beta}^{2}_{n},\hat{\theta}_{n})/\partial(\nu,\beta^{\top})^{\top}\) around \((\nu_{0},\beta_{0})\) and using similar arguments as in Appendix C, we obtain
where \(L_{1,\nu_{0},\beta_{0},\theta_{0}}^{(j)}\) and \(L_{2,\nu_{0},\beta_{0},\theta_{0}}^{(j)}\) are given by (4.6). Finally, the multiple imputation estimator
satisfies
Asymptotic normality follows from the multivariate central limit theorem and similar covariance calculations as in the proof of Theorem 4.1. \(\Box\)
A SKETCH OF THE PROOF OF THEOREM 4.3
The proof of consistency, when either \(m_{\theta}(\mathbf{U}_{i})\) or \(\pi_{\gamma}(\mathbf{U}_{i})\) is correctly specified, proceeds similarly as for the regression calibration estimate. Nonetheless, we describe the main steps of the proof in order that the reader understands where the assumption of a correct model for either \(m_{\theta}(\mathbf{U}_{i})\) or \(\pi_{\gamma}(\mathbf{U}_{i})\) is used.
We consider the case where \(m_{\theta}(\mathbf{U}_{i})\) is correctly modeled (the case where \(\pi_{\gamma}(\mathbf{U}_{i})\) is correctly specified works similarly and is omitted). Let \(\dot{\ell}_{n}^{3}(\nu,\beta,\theta,\gamma)=\partial\ell_{n}^{3}(\nu,\beta,\theta,\gamma)/\partial(\nu,\beta^{\top})^{\top}\). We have:
where \(L_{1,\nu,\beta,\theta,\gamma}\) and \(L_{2,\nu,\beta,\theta,\gamma}\) are similar to \(L_{1,\nu,\beta,\theta}\) and \(L_{2,\nu,\beta,\theta}\) in (A.1), with \(\delta^{1}_{i}(\theta)\) replaced by \(\delta^{3}_{i}(\theta,\gamma)\).
First, one can easily check that the map \((\nu,\beta)\mapsto\partial\dot{\ell}_{n}^{3}(\nu,\beta,\hat{\theta}_{n},\hat{\gamma}_{n})/\partial(\nu,\beta^{\top})\) is continuous in an open neighborhood of \((\nu_{0},\beta_{0})\) (condition (i)). Next, we prove that \(n^{-1}\dot{\ell}_{n}^{3}(\nu_{0},\beta_{0},\hat{\theta}_{n},\hat{\gamma}_{n})=o_{p}(1)\) (condition (ii)). As an illustration, we show that
To see this, decompose
Let \(\mathcal{A}_{i}:=\mathbf{X}_{i}\xi_{i}(\widetilde{Y}_{i}\nu_{0}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-\nu_{0}-g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i}))\). Then
Under H1 and H2, there exists a constant \(0<c_{4}<\infty\) such that \(||\mathcal{A}_{i}||<c_{4}\). Therefore,
where the second inequality follows from H4 and H5. Finally, the convergence of \(\hat{\gamma}_{n}\) to \(\gamma_{\ast}\) implies that \(A_{n,1}=o_{p}(1)\). Using similar arguments, we verify that
If the model \(m_{\theta}(\mathbf{U}_{i})\) is correctly specified, then \(\hat{\theta}_{n}\stackrel{{\scriptstyle p}}{{\longrightarrow}}\theta_{0}\) and \(A_{n,3}\) is also an \(o_{p}(1)\). Finally, under the MAR hypothesis, \(A_{n,2}\) converges in probability to
If \(m_{\theta}(\mathbf{U}_{i})\) is correctly modeled, then \(\mathbb{E}(\delta_{i}|\mathbf{U}_{i})-m_{\theta_{0}}(\mathbf{U}_{i})=0\) and thus \(A_{n,2}=o_{p}(1)\). It follows that \(A_{n}(\hat{\theta}_{n},\hat{\gamma}_{n})=o_{p}(1)\) and (E.1) becomes
Decompose \(B_{n}(\hat{\theta}_{n})\) as \(B_{n}(\hat{\theta}_{n})=B_{n}(\hat{\theta}_{n})-B_{n}(\theta_{0})+B_{n}(\theta_{0})\). Acting as for \(A_{n,1}\) and \(A_{n,3}\), one can easily check that
converges to 0. Moreover, by the law of large numbers, \(B_{n}(\theta_{0})\) converges in probability to
which is the expectation of the score (with respect to \(\beta\)) in the censored Gamma regression model with no missing data, and is equal to 0. Finally, \(n^{-1}\sum_{i=1}^{n}L_{2,\nu_{0},\beta_{0},\hat{\theta}_{n},\hat{\gamma}_{n}}(\mathcal{D}_{i})=o_{p}(1)\). Using similar arguments, we can show that \(n^{-1}\sum_{i=1}^{n}L_{1,\nu_{0},\beta_{0},\hat{\theta}_{n},\hat{\gamma}_{n}}(\mathcal{D}_{i})=o_{p}(1)\), and condition (ii) is verified.
Finally, we show that \(n^{-1}\partial\dot{\ell}_{n}^{3}(\nu,\beta,\hat{\theta}_{n},\hat{\gamma}_{n})/\partial(\nu,\beta^{\top})\) converges to \(-\Omega_{1}(\nu,\beta)\) uniformly in a neighborhood of \((\nu_{0},\beta_{0})\) (condition (iii)). We have
where the partial derivatives are given by formulas (A.2) with \(\delta_{i}^{1}(\theta)\) replaced by \(\delta_{i}^{3}(\theta,\gamma)\). For illustrative purpose, we show that \(n^{-1}\sum_{i=1}^{n}\partial L_{2,\nu,\beta,\hat{\theta}_{n},\hat{\gamma}_{n}}(\mathcal{D}_{i})/\partial\beta^{\top}\) converges to \(-\Omega_{1}^{\beta,\beta}(\nu,\beta)\). Let
Then
Decompose
Using similar arguments as for \(A_{n,3},A_{n}(\hat{\theta}_{n},\hat{\gamma}_{n})\), and \(A_{n,2}\), respectively, one can see that \(C_{n,1},C_{n,2}\), and \(C_{n,3}\) converge in probability to 0. Next, \(n^{-1}\sum_{i=1}^{n}\partial L_{2,\nu,\beta,\theta_{0},\gamma_{\ast}}(\mathcal{D}_{i})/\partial\beta^{\top}\) converges to
If \(m_{\theta}(\mathbf{U}_{i})\) is correct (that is, if \(\mathbb{E}(\delta_{i}|\mathbf{U}_{i})=m_{\theta_{0}}(\mathbf{U}_{i})\)), then
Thus, by taking the iterated conditional expectation with respect to \(\mathbf{U}_{i}\) in (E.2), we see that (E.2) becomes \(-\Omega_{1}^{\beta,\beta}(\nu,\beta)\), which concludes the proof. Uniformity follows by the same arguments as in the proof of Theorem 4.1.
Finally, having proved conditions (i)–(iii), we apply Foutz’s [17] consistency theorem and we conclude that \((\hat{\nu}^{3}_{n},\hat{\beta}^{3}_{n})\) converges in probability to \((\nu_{0},\beta_{0})\) if \(m_{\theta}(\mathbf{U}_{i})\) is correctly modeled. Now, we turn to asymptotic normality. Straightforward calculations yield
from which we easily deduce that
as \(n\rightarrow\infty\). If \(\pi_{\gamma}(\mathbf{U}_{i})\) is correctly modeled (i.e., \(\gamma_{\ast}=\gamma_{0}\)), then
Thus, by iterating the conditional expectation with respect to \(\mathbf{U}_{i}\) in \(\Omega_{5}(\nu_{0},\beta_{0},\theta,\gamma)\), we easily see that \(\Omega_{5}(\nu_{0},\beta_{0},\theta_{\ast},\gamma_{\ast})=0\) if \(\pi_{\gamma}(\mathbf{U}_{i})\) is correctly specified. Similarly, \(\Omega_{6}(\nu_{0},\beta_{0},\theta_{\ast},\gamma_{\ast})=0\) if \(m_{\theta}(\mathbf{U}_{i})\) is correctly modeled.
Now, taking Taylor’s expansion of \(\dot{\ell}_{n}^{3}(\hat{\nu}^{3}_{n},\hat{\beta}^{3}_{n},\hat{\theta}_{n},\hat{\gamma}_{n})\) and acting as in Appendix C, we obtain:
if \(m_{\theta}(\mathbf{U}_{i})\) is correctly modeled, and
if \(\pi_{\gamma}(\mathbf{U}_{i})\) is correctly specified. In both cases, asymptotic normality follows by applying the multivariate central limit theorem. Tedious albeit not difficult calculations yield the asymptotic variances formulas. If both \(m_{\theta}(\mathbf{U}_{i})\) and \(\pi_{\gamma}(\mathbf{U}_{i})\) are correctly modeled, \(\Omega_{5}(\nu_{0},\beta_{0},\theta_{0},\gamma_{0})=\Omega_{6}(\nu_{0},\beta_{0},\theta_{0},\gamma_{0})=0\) and the asymptotic variance of \((\hat{\nu}^{3}_{n},\hat{\beta}^{3}_{n})\) reduces to \(\Omega_{1,0}^{-1}\). \(\Box\)
About this article
Cite this article
Dupuy, JF. Censored Gamma Regression with Uncertain Censoring Status. Math. Meth. Stat. 29, 172–196 (2020). https://doi.org/10.3103/S106653072004002X
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S106653072004002X