Skip to main content
Log in

Censored Gamma Regression with Uncertain Censoring Status

  • Published:
Mathematical Methods of Statistics Aims and scope Submit manuscript

Abstract

In this paper, we consider the problem of censored Gamma regression when the censoring status is missing at random. Three estimation methods are investigated. They consist in solving a censored maximum likelihood estimating equation where missing data are replaced by values adjusted using either regression calibration or multiple imputation or inverse probability weights. We show that the resulting estimates are consistent and asymptotically normal. Moreover, while asymptotic variances in missing data problems are generally estimated empirically (using Rubin’s rules for example), we propose closed-form consistent variances estimates based on explicit formulas for the asymptotic variances of the proposed estimates. A simulation study is conducted to assess finite-sample properties of the proposed parameters and asymptotic variances estimates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. P. K. Andersen, Ø. Borgan, R. D. Gill, and N. Keiding, Statistical Models Based on Counting Processes (Springer Series in Statistics, Springer, New York, 1993).

  2. P. W. Bernhardt, H. J. Wang, and D. Zhang, ‘‘Flexible modeling of survival data with covariates subject to detection limits via multiple imputation,’’ Computational Statistics and Data Analysis 69, 81–91 (2014).

    Article  MathSciNet  Google Scholar 

  3. B. Bousselmi, J.-F. Dupuy, and A. Karoui, ‘‘Censored count data regression with missing censoring information,’’ Electronic Journal of Statistics 15 (2), 4343–4383 (2021).

    Article  MathSciNet  Google Scholar 

  4. E. Brunel, F. Comte, and A. Guilloux, ‘‘Nonparametric estimation for survival data with censoring indicators missing at random,’’ Journal of Statistical Planning and Inference 143 (10), 1653–1671 (2013).

    Article  MathSciNet  Google Scholar 

  5. W. Checkley, J. Guzman-Cottrill, L. Epstein, N. Innocentini, J. Patz, and S. Shulman, ‘‘Short-term weather variability in Chicago and hospitalizations for Kawasaki disease,’’ Epidemiology 49 (3), 588–601; 20 (2), 194–201 (2009).

  6. T. A. Dignam, J. Lojo, P. A. Meyer, E. Norman, A. Sayre, and W. D. Flanders, ‘‘Reduction of elevated blood lead levels in children in North Carolina and Vermont, 1996–1999,’’ Environmental Health Perspectives 116 (7), 981–985 (2008).

    Article  Google Scholar 

  7. R. V. Foutz, ‘‘On the unique consistent solution to the likelihood equations,’’ Journal of the American Statistical Association 72, 147–148 (1977).

    Article  MathSciNet  Google Scholar 

  8. X. Guo, C. Niu, Y. Yang, and W. Xu, ‘‘Empirical likelihood for single index model with missing covariates at random,’’ Statistics 49 (3), 588–601 (2015).

    Article  MathSciNet  Google Scholar 

  9. D. G. Horvitz and D. J. Thompson, ‘‘A generalization of sampling without replacement from a finite universe,’’ Journal of the American Statistical Association 47 (260), 663–685 (1952).

    Article  MathSciNet  Google Scholar 

  10. P. de Jong and G. Z. Heller, Generalized Linear Models for Insurance Data. International Series on Actuarial Science (Cambridge University Press, 2008).

  11. S. Mandal, R. Arabi Belaghi, M. Akram, and M. Aminnejad, ’’Stein-type shrinkage estimators in gamma regression model with application to prostate cancer data,’’ Statistics in Medicine 38 (22), 4310–4322 (2019).

    Article  MathSciNet  Google Scholar 

  12. P. McCullagh and J. A. Nelder, Generalized Linear Models (Chapman and Hall/CRC Monographs on Statistics and Applied Probability, Taylor and Francis, 1989).

  13. I. W. McKeague and S. Subramanian, ‘‘Product-limit estimators and Cox regression with missing censoring information,’’ Scandinavian Journal of Statistics 25 (4), 589–601 (1998).

    Article  MathSciNet  Google Scholar 

  14. R Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing (Vienna, Austria, 2020). https://www.R-project.org/

  15. J. M. Robins, A. Rotnitzky, and L. P. Zhao, ‘‘Estimation of regression coefficients when some regressors are not always observed,’’ Journal of the American Statistical Association 89 (427), 846–866 (1994).

    Article  MathSciNet  Google Scholar 

  16. D. B. Rubin, Multiple Imputation for Nonresponse in Surveys (Wiley Series in Probability and Statistics, John Wiley and Sons Inc., New York, 1987).

  17. F. Sigrist and W. Stahel, ‘‘Using the Censored Gamma Distribution for Modeling Fractional Response Variables with an Application to Loss Given Default,’’ ASTIN Bulletin 41 (2), 673–710 (2011).

    MathSciNet  MATH  Google Scholar 

  18. S. Subramanian, ’’Survival analysis for the missing censoring indicator model using kernel density estimation techniques,’’ Statistical methodology 3 (2), 125–136 (2006).

    Article  MathSciNet  Google Scholar 

  19. S. Subramanian, ’’Multiple imputations and the missing censoring indicator model,’’ Journal of Multivariate Analysis 102 (1), 105–117 (2011).

    Article  MathSciNet  Google Scholar 

  20. J. A. Steingrimsson and R. L. Strawderman, ‘‘Estimation in the semiparametric accelerated failure time model with missing covariates: improving efficiency through augmentation,’’ Journal of the American Statistical Association 112 (519), 1221–1235 (2017).

    Article  MathSciNet  Google Scholar 

  21. Y. Sun, X. Qian, Q. Shou, and P. B. Gilbert, ‘‘Analysis of two-phase sampling data with semiparametric additive hazards models,’’ Lifetime Data Analysis 23 (3), 377–399 (2017).

    Article  MathSciNet  Google Scholar 

  22. J. V. Terza, ‘‘A Tobit-type estimator for the censored Poisson regression model,’’ Economics Letters 18 (4), 361–365 (1985).

    Article  MathSciNet  Google Scholar 

  23. A. A. Tsiatis, Semiparametric Theory and Missing Data (Springer Series in Statistics, Springer New York, 2007).

    MATH  Google Scholar 

  24. A. W. van der Vaart, Asymptotic Statistics (Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press, 2000).

  25. I. W. Verburg, N. F. de Keizer, E. de Jonge, and N. Peek, ‘‘Comparison of regression methods for modeling intensive care length of stay,’’ PloS One 9 (10) (2014).

  26. C. Y. Wang and H. Y. Chen, ‘‘Augmented inverse probability weighted estimator for Cox missing covariate regression,’’ Biometrics 57 (2), 414–419 (2001).

    Article  MathSciNet  Google Scholar 

  27. Q. Wang, G. E. Dinse, and C. Liu, ‘‘Hazard function estimation with cause-of-death data missing at random,’’ Annals of the Institute of Statistical Mathematics 64 (2), 415–438 (2012).

    Article  MathSciNet  Google Scholar 

  28. Q. Wang and J. Shen, ‘‘Estimation and confidence bands of a conditional survival function with censoring indicators missing at random,’’ Journal of Multivariate Analysis 99 (5), 928–948 (2008).

    Article  MathSciNet  Google Scholar 

  29. H. White, ‘‘Maximum Likelihood Estimation of Misspecified Models,’’ Econometrica 50 (1), 1–25 (1982).

    Article  MathSciNet  Google Scholar 

  30. S. N. Wood, Generalized Additive Models: An Introduction with R (CRC Press, 2017).

    Book  Google Scholar 

  31. J. Yang, M. Jun, C. Schumacher, and R. Saravanan, ‘‘Predictive statistical representations of observed and simulated rainfall using generalized linear models,’’ Journal of Climate 32 (11), 3409–3427 (2019).

    Article  Google Scholar 

Download references

ACKNOWLEDGMENTS

The author is grateful to the referee and Associate Editor for their comments and suggestions that led substantial improvements of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean-François Dupuy.

Appendices

PROOF OF THEOREM 4.1 (CONSISTENCY)

First, let

$$\dot{\ell}^{1}_{n}(\nu,\beta,\theta):=\frac{\partial\ell^{1}_{n}(\nu,\beta,\theta)}{\partial(\nu,\beta^{\top})^{\top}}$$

and \(\psi(\nu):=\partial\log\Gamma(\nu)/\partial\nu\). We have:

$$\dot{\ell}^{1}_{n}(\nu,\beta,\theta)=\sum_{i=1}^{n}L_{\nu,\beta,\theta}(\mathcal{D}_{i})$$

with

$$L_{\nu,\beta,\theta}(\mathcal{D}_{i})=\begin{pmatrix}\delta^{1}_{i}(\theta)\left(\log\widetilde{Y}_{i}+\log\nu+1-\beta^{\top}\mathbf{X}_{i}-\widetilde{Y}_{i}e^{-\beta^{\top}\mathbf{X}_{i}}-h_{\nu,\beta}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)+h_{\nu,\beta}(\widetilde{Y}_{i},\mathbf{X}_{i})-\psi(\nu)\\ \mathbf{X}_{i}\left(\delta^{1}_{i}(\theta)\left[\widetilde{Y}_{i}\nu e^{-\beta^{\top}\mathbf{X}_{i}}-\nu-g_{\nu,\beta}(\widetilde{Y}_{i},\mathbf{X}_{i})\right]+g_{\nu,\beta}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)\end{pmatrix}$$
$${}:=\begin{pmatrix}L_{1,\nu,\beta,\theta}(\mathcal{D}_{i})\\ L_{2,\nu,\beta,\theta}(\mathcal{D}_{i})\end{pmatrix}.$$
(A.1)

The proof is based on Foutz’s [17] consistency theorem for maximum likelihood type estimates. The following conditions must be established.

  • C1. \(\partial\dot{\ell}^{1}_{n}(\nu,\beta,\hat{\theta}_{n})/\partial(\nu,\beta^{\top})\) exists and is continuous in an open neighborhood of \((\nu_{0},\beta_{0})\).

  • C2. \(n^{-1}\dot{\ell}^{1}_{n}(\nu_{0},\beta_{0},\hat{\theta}_{n})\) converges in probability to \(0\) as \(n\rightarrow\infty\).

  • C3. \(n^{-1}\partial\dot{\ell}^{1}_{n}(\nu,\beta,\hat{\theta}_{n})/\partial(\nu,\beta^{\top})\) converges in probability to \(-\Omega_{1}(\nu,\beta)\) uniformly in an open neighborhood of \((\nu_{0},\beta_{0})\).

Condition C1. The map \((\nu,\beta)\mapsto\ell^{1}_{n}(\nu,\beta,\theta)\) is twice differentiable with respect to \(\nu\) and \(\beta\) and its second-order derivative is the \((p+1)\times(p+1)\) matrix

$$\frac{\partial\dot{\ell}^{1}_{n}(\nu,\beta,\theta)}{\partial(\nu,\beta^{\top})}=\sum_{i=1}^{n}\begin{pmatrix}\displaystyle\frac{\partial L_{1,\nu,\beta,\theta}(\mathcal{D}_{i})}{\partial\nu}&\displaystyle\frac{\partial L_{1,\nu,\beta,\theta}(\mathcal{D}_{i})}{\partial\beta^{\top}}\\ \displaystyle\frac{\partial L_{2,\nu,\beta,\theta}(\mathcal{D}_{i})}{\partial\nu}&\displaystyle\frac{\partial L_{2,\nu,\beta,\theta}(\mathcal{D}_{i})}{\partial\beta^{\top}}\end{pmatrix},$$

where

$$\frac{\partial L_{1,\nu,\beta,\theta}(\mathcal{D}_{i})}{\partial\nu}=\frac{\delta^{1}_{i}(\theta)}{\nu}+(1-\delta^{1}_{i}(\theta))\frac{\partial}{\partial\nu}h_{\nu,\beta}(\widetilde{Y}_{i},\mathbf{X}_{i})-\frac{\partial^{2}}{\partial\nu^{2}}\log\Gamma(\nu),$$
$$\frac{\partial L_{1,\nu,\beta,\theta}(\mathcal{D}_{i})}{\partial\beta^{\top}}=\left(\frac{\partial L_{2,\nu,\beta,\theta}(\mathcal{D}_{i})}{\partial\nu}\right)^{\top}=\mathbf{X}_{i}^{\top}\left(\delta^{1}_{i}(\theta)(\widetilde{Y}_{i}e^{-\beta^{\top}\mathbf{X}_{i}}-1)+(1-\delta^{1}_{i}(\theta))\frac{\partial}{\partial\nu}g_{\nu,\beta}(\widetilde{Y}_{i},\mathbf{X}_{i})\right),$$
$$\frac{\partial L_{2,\nu,\beta,\theta}(\mathcal{D}_{i})}{\partial\beta^{\top}}$$
$${}=\mathbf{X}_{i}\mathbf{X}_{i}^{\top}\left(-\delta^{1}_{i}(\theta)\widetilde{Y}_{i}\nu e^{-\beta^{\top}\mathbf{X}_{i}}+(1-\delta^{1}_{i}(\theta))g_{\nu,\beta}(\widetilde{Y}_{i},\mathbf{X}_{i})\left(\nu\widetilde{Y}_{i}e^{-\beta^{\top}\mathbf{X}_{i}}-\nu-g_{\nu,\beta}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)\right)$$
(A.2)

(\(\partial g_{\nu,\beta}(\widetilde{Y}_{i},\mathbf{X}_{i})/\partial\nu\) and \(\partial h_{\nu,\beta}(\widetilde{Y}_{i},\mathbf{X}_{i})/\partial\nu\) are given in Section 4.1). These terms are continuous in \(\nu\) and \(\beta\), for every \(\theta\).

Condition C2. We decompose

$$\frac{1}{n}\dot{\ell}^{1}_{n}(\nu_{0},\beta_{0},\hat{\theta}_{n})=\frac{1}{n}\left(\dot{\ell}^{1}_{n}(\nu_{0},\beta_{0},\hat{\theta}_{n})-\dot{\ell}^{1}_{n}(\nu_{0},\beta_{0},\theta_{0})\right)+\frac{1}{n}\dot{\ell}^{1}_{n}(\nu_{0},\beta_{0},\theta_{0})$$
$${}=\frac{1}{n}\begin{pmatrix}\sum_{i=1}^{n}\{L_{1,\nu_{0},\beta_{0},\hat{\theta}_{n}}(\mathcal{D}_{i})-L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})\}\\ \sum_{i=1}^{n}\{L_{2,\nu_{0},\beta_{0},\hat{\theta}_{n}}(\mathcal{D}_{i})-L_{2,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})\}\end{pmatrix}+\frac{1}{n}\begin{pmatrix}\sum_{i=1}^{n}L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})\\ \sum_{i=1}^{n}L_{2,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})\end{pmatrix}.$$

We show that

$$\frac{1}{n}\sum_{i=1}^{n}\{L_{1,\nu_{0},\beta_{0},\hat{\theta}_{n}}(\mathcal{D}_{i})-L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})\}+\frac{1}{n}\sum_{i=1}^{n}L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})=o_{p}(1)$$
(A.3)

(proof for the second term proceeds similarly). First, the weak law of large numbers implies that \(\frac{1}{n}\sum_{i=1}^{n}L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})\) converges in probability to

$$\mathbb{E}(L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i}))=\mathbb{E}(\mathbb{E}(L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})|\mathbf{U}_{i})$$
$${}=\mathbb{E}\left(\mathbb{E}(\delta^{1}_{i}(\theta_{0})|\mathbf{U}_{i})\left(\log\widetilde{Y}_{i}+\log\nu_{0}+1-\beta_{0}^{\top}\mathbf{X}_{i}-\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)\right.$$
$${}+\left.h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})-\psi(\nu_{0})\right).$$

Under MAR,

$$\mathbb{E}(\delta^{1}_{i}(\theta_{0})|\mathbf{U}_{i})=\mathbb{E}(\xi_{i}\delta_{i}+(1-\xi_{i})\mathbb{E}(\delta_{i}|\mathbf{U}_{i})|\mathbf{U}_{i})$$
$${}=\mathbb{E}(\xi_{i}|\mathbf{U}_{i})\mathbb{E}(\delta_{i}|\mathbf{U}_{i})+(1-\mathbb{E}(\xi_{i}|\mathbf{U}_{i}))\mathbb{E}(\delta_{i}|\mathbf{U}_{i})$$
$${}=\mathbb{E}(\delta_{i}|\mathbf{U}_{i}).$$

Hence,

$$\mathbb{E}(L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i}))=\mathbb{E}\left(\delta_{i}\left(\log\widetilde{Y}_{i}+\log\nu_{0}+1-\beta_{0}^{\top}\mathbf{X}_{i}-\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)\right.$$
$${}+\left.h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})-\psi(\nu_{0})\right),$$

which is the expectation of the score (with respect to \(\nu\)) in the censored Gamma regression model with no missing data, and is equal to 0. Next, we turn to the term \(\frac{1}{n}\sum_{i=1}^{n}\{L_{1,\nu_{0},\beta_{0},\hat{\theta}_{n}}(\mathcal{D}_{i})-L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})\}\). We have:

$$\left|\frac{1}{n}\sum_{i=1}^{n}\{L_{1,\nu_{0},\beta_{0},\hat{\theta}_{n}}(\mathcal{D}_{i})-L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})\}\right|$$
$${}\leq\frac{1}{n}\sum_{i=1}^{n}|m_{\hat{\theta}_{n}}(\mathbf{U}_{i})-m_{\theta_{0}}(\mathbf{U}_{i})|\times|\log\widetilde{Y}_{i}+\log\nu_{0}+1-\beta_{0}^{\top}\mathbf{X}_{i}-\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})|$$
$${}\leq\frac{1}{n}\sum_{i=1}^{n}||\hat{\theta}_{n}-\theta_{0}||\times h(\mathbf{U}_{i})\times|\log\widetilde{Y}_{i}+\log\nu_{0}+1-\beta_{0}^{\top}\mathbf{X}_{i}-\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})|$$

by condition H3. Moreover, under H1 and H2, there exists a constant \(0<c_{1}<\infty\) such that \(|\log\widetilde{Y}_{i}+\log\nu_{0}+1-\beta_{0}^{\top}\mathbf{X}_{i}-\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})|\leq c_{1}\). It follows that

$$\left|\frac{1}{n}\sum_{i=1}^{n}\{L_{1,\nu_{0},\beta_{0},\hat{\theta}_{n}}(\mathcal{D}_{i})-L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})\}\right|\leq c_{1}||\hat{\theta}_{n}-\theta_{0}||(\mathbb{E}(h(\mathbf{U}_{i}))+o_{p}(1)).$$

Finally, the consistency of \(\hat{\theta}_{n}\) implies that \(\frac{1}{n}\sum_{i=1}^{n}\{L_{1,\nu_{0},\beta_{0},\hat{\theta}_{n}}(\mathcal{D}_{i})-L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})\}=o_{p}(1)\), which concludes the proof of (6.10).

Condition C3. The pointwise convergence in probability of \(n^{-1}\partial\dot{\ell}^{1}_{n}(\nu,\beta,\hat{\theta}_{n})/\partial(\nu,\beta^{\top})\) to \(-\Omega_{1}(\nu,\beta)\) is proved in Lemma 2 below. Then, under H1 and H2, the derivative of \(n^{-1}\partial\dot{\ell}^{1}_{n}(\nu,\beta,\hat{\theta}_{n})/\partial(\nu,\beta^{\top})\) with respect to \((\nu,\beta^{\top})\) is bounded, for every \(n\). Hence, the sequence \((n^{-1}\partial\dot{\ell}^{1}_{n}(\nu,\beta,\hat{\theta}_{n})/\partial(\nu,\beta^{\top}))\) is equicontinuous and it follows from Ascoli theorem that the convergence of \(n^{-1}\partial\dot{\ell}^{1}_{n}(\nu,\beta,\hat{\theta}_{n})/\partial(\nu,\beta^{\top})\) to \(-\Omega_{1}(\nu,\beta)\) is uniform.

Having proved the conditions of Foutz’s [17] consistency theorem, we conclude that \((\hat{\nu}^{1}_{n},\hat{\beta}^{1}_{n})\) converges in probability to \((\nu_{0},\beta_{0})\). \(\Box\)

TWO LEMMAS

Let \(\mathbb{G}_{n}\) denote the empirical process [24]. With this notation, we have

$$\mathbb{G}_{n}L_{\nu_{0},\beta_{0},\theta}=\sqrt{n}\begin{pmatrix}\displaystyle\frac{1}{n}\sum_{i=1}^{n}L_{1,\nu_{0},\beta_{0},\theta}(\mathcal{D}_{i})-\mathbb{E}(L_{1,\nu_{0},\beta_{0},\theta}(\mathcal{D}_{1}))\\ \displaystyle\frac{1}{n}\sum_{i=1}^{n}L_{2,\nu_{0},\beta_{0},\theta}(\mathcal{D}_{i})-\mathbb{E}(L_{2,\nu_{0},\beta_{0},\theta}(\mathcal{D}_{1}))\end{pmatrix}.$$

Lemma 1. Assume that H1, H2, H3 hold. Then

$$\mathbb{G}_{n}(L_{\nu_{0},\beta_{0},\hat{\theta}_{n}}-L_{\nu_{0},\beta_{0},\theta_{0}})=o_{p}(1).$$

Proof. First, we prove that the classes of functions \(\{L_{1,\nu_{0},\beta_{0},\theta}:\theta\in\Theta\}\) and \(\{L_{2,\nu_{0},\beta_{0},\theta}:\theta\in\Theta\}\) are Donsker (we refer the reader to [24] for a detailed account on Donsker classes and related notions, such as \(\varepsilon\)-brackets and bracketing entropy). For illustrative purpose, we decompose \(L_{1,\nu_{0},\beta_{0},\theta}\) as

$$L_{1,\nu_{0},\beta_{0},\theta}=L_{1,\nu_{0},\beta_{0},\theta}^{I}+L_{1,\nu_{0},\beta_{0},\theta}^{II},$$

where \(L_{1,\nu_{0},\beta_{0},\theta}^{I}(\mathcal{D}_{i})=-\delta_{i}^{1}(\theta)\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}+h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})-\psi(\nu_{0})\) and \(L_{1,\nu_{0},\beta_{0},\theta}^{II}(\mathcal{D}_{i})=\delta_{i}^{1}(\theta)(\log\widetilde{Y}_{i}+\log\nu_{0}+1-\beta_{0}^{\top}\mathbf{X}_{i}-h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i}))\), and we show that the class \(\{L_{1,\nu_{0},\beta_{0},\theta}^{I}:\theta\in\mathbb{R}\}\) is Donsker (the arguments for \(\{L_{1,\nu_{0},\beta_{0},\theta}^{II}:\theta\in\mathbb{R}\}\) and \(\{L_{2,\nu_{0},\beta_{0},\theta}:\theta\in\Theta\}\) are similar).

Let \(d(\Theta)\) denote the diameter of \(\Theta\). Since the size of \(\Theta\) in every direction is at most \(d(\Theta)\), we can cover \(\Theta\) with less than \((d(\Theta)/\ell)^{p+1}\) cubes of length \(\ell\). The circumscribed balls have radius a multiple \(\tilde{\ell}:=\lambda\times\ell\) of \(\ell\) (with \(\lambda>0\)) and these balls also cover \(\Theta\). Now, let \(\theta\in\Theta\) and consider the class of functions

$$\left\{L_{1,\nu_{0},\beta_{0},\tilde{\theta}}^{I}:\tilde{\theta}\in\Theta\cap\mathcal{B}(\theta,\tilde{\ell})\right\},$$

where \(\mathcal{B}(\theta,\tilde{\ell})\) is the ball of center \(\theta\) and radius \(\tilde{\ell}\). If \(\tilde{\theta}\in\mathcal{B}(\theta,\tilde{\ell})\), the condition H3 implies:

$$|m_{\theta}(\mathbf{u})-m_{\tilde{\theta}}(\mathbf{u})|\leq h(\mathbf{u})||\theta-\tilde{\theta}||\leq h(\mathbf{u})\tilde{\ell}$$

and thus

$$m_{\theta}(\mathbf{u})-h(\mathbf{u})\tilde{\ell}\leq m_{\tilde{\theta}}(\mathbf{u})\leq m_{\theta}(\mathbf{u})+h(\mathbf{u})\tilde{\ell}.$$

It follows that \(L_{1,\nu_{0},\beta_{0},\tilde{\theta}}^{I}\) can be bracketed between two functions \(L_{\theta}\) and \(U_{\theta}\) defined by:

$$L_{\theta}(\mathcal{D}_{i})=-\left(\xi_{i}\delta_{i}+(1-\xi_{i})(m_{\theta}(\mathbf{U}_{i})+h(\mathbf{U}_{i})\tilde{\ell})\right)\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}+h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})-\psi(\nu_{0})$$

and

$$U_{\theta}(\mathcal{D}_{i})=-\left(\xi_{i}\delta_{i}+(1-\xi_{i})(m_{\theta}(\mathbf{U}_{i})-h(\mathbf{U}_{i})\tilde{\ell})\right)\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}+h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})-\psi(\nu_{0}).$$

That is, \(L_{\theta}(\mathcal{D}_{i})\leq L_{1,\nu_{0},\beta_{0},\tilde{\theta}}^{I}(\mathcal{D}_{i})\leq U_{\theta}(\mathcal{D}_{i})\). Moreover, under conditions H1 and H2, there exists a constant \(0<c_{2}<\infty\) such that \(\mathbb{E}(U_{\theta}(\mathcal{D}_{i})-L_{\theta}(\mathcal{D}_{i}))^{2}<c_{2}\tilde{\ell}^{2}\,\mathbb{E}(h^{2}(\mathbf{U}_{i}))\). Hence \([L_{\theta},U_{\theta}]\) is a \(\varepsilon\)-bracket for the class \(\{L_{1,\nu_{0},\beta_{0},\tilde{\theta}}^{I}:\tilde{\theta}\in\Theta\cap\mathcal{B}(\theta,\tilde{\ell})\}\) with \(\varepsilon=\sqrt{c_{2}\tilde{\ell}^{2}\,\mathbb{E}(h^{2}(\mathbf{U}_{i}))}\).

Since we can cover \(\Theta\) with less than \((d(\Theta)/\ell)^{p+1}\) balls of radius \(\tilde{\ell}\), we can cover \(\{L_{1,\nu_{0},\beta_{0},\theta}^{I}:\theta\in\mathbb{R}\}\) with less than \((d(\Theta)/\ell)^{p+1}\) \(\varepsilon\)-brackets \([L_{\theta},U_{\theta}]\), with \(\varepsilon=\sqrt{c_{2}\tilde{\ell}^{2}\,\mathbb{E}(h^{2}(\mathbf{U}_{i}))}\). The number of such \(\varepsilon\)-brackets is thus bounded by \((\lambda\,d(\Theta)\sqrt{c_{2}\,\mathbb{E}(h^{2}(\mathbf{U}_{i}))}/\varepsilon)^{p+1}\), which is of order \(\varepsilon^{-(p+1)}\). Hence, the bracketing entropy is of order \(\int\limits_{0}^{1}\sqrt{-(p+1)\log\varepsilon}\,d\varepsilon\), which is finite. Therefore, the class of functions \(\{L_{1,\nu_{0},\beta_{0},\theta}^{I}:\theta\in\mathbb{R}\}\) is Donsker, by [24, Theorem 19.5].

Using similar arguments, we can prove that \(\{L_{1,\nu_{0},\beta_{0},\theta}^{II}:\theta\in\mathbb{R}\}\) is also Donsker and since sums of Donsker classes are Donsker, the class \(\{L_{1,\nu_{0},\beta_{0},\theta}:\theta\in\Theta\}\) is Donsker. Similarly, we can prove that \(\{L_{2,\nu_{0},\beta_{0},\theta}:\theta\in\Theta\}\) is Donsker. It follows that the sequence of processes \(\{\mathbb{G}_{n}L_{\nu_{0},\beta_{0},\theta}:\theta\in\Theta\}\) converges weakly to a tight limit process, and as such, is stochastically equicontinuous. Thus, [23, Lemma 14.3] and the consistency of \(\hat{\theta}_{n}\) imply that \(\mathbb{G}_{n}(L_{\nu_{0},\beta_{0},\hat{\theta}_{n}}-L_{\nu_{0},\beta_{0},\theta_{0}})=o_{p}(1)\), which concludes the proof. \(\Box\)

Lemma 2. Assume that H1, H2, H3 hold. Then

$$\frac{\partial\mathbb{E}(\dot{\ell}_{1}^{1}(\nu,\beta,\theta))}{\partial\theta^{\top}}=\Omega_{2}(\nu,\beta,\theta)\quad{and}\quad\frac{1}{n}\frac{\partial\dot{\ell}_{n}^{1}(\nu_{0},\beta_{0},\hat{\theta}_{n})}{\partial(\nu,\beta^{\top})}=-\Omega_{1,0}+o_{p}(1).$$

Proof. We have

$$\frac{\partial\mathbb{E}(\dot{\ell}_{1}^{1}(\nu,\beta,\theta))}{\partial\theta^{\top}}$$
$${}=\frac{\partial}{\partial\theta^{\top}}\begin{pmatrix}\mathbb{E}\left(\delta_{1}^{1}(\theta)\left(\log\widetilde{Y}_{1}+\log\nu+1-\beta^{\top}\mathbf{X}_{1}-\widetilde{Y}_{1}e^{-\beta^{\top}\mathbf{X}_{1}}-h_{\nu,\beta}(\widetilde{Y}_{1},\mathbf{X}_{1})\right)+h_{\nu,\beta}(\widetilde{Y}_{1},\mathbf{X}_{1})-\psi(\nu)\right)\\ \mathbb{E}\left(\mathbf{X}_{1}\left(\delta_{1}^{1}(\theta)\left[\widetilde{Y}_{1}\nu e^{-\beta^{\top}\mathbf{X}_{1}}-\nu-g_{\nu,\beta}(\widetilde{Y}_{1},\mathbf{X}_{1})\right]+g_{\nu,\beta}(\widetilde{Y}_{1},\mathbf{X}_{1})\right)\right)\end{pmatrix}.$$

Note first that

$$\mathbb{E}\left(\delta_{i}^{1}(\theta)|\mathbf{U}_{i}\right)=\mathbb{E}\left(\xi_{i}\delta_{i}+(1-\xi_{i})m_{\theta}(\mathbf{U}_{i})|\mathbf{U}_{i}\right)$$
$${}=\pi(\mathbf{U}_{i})m_{\theta_{0}}(\mathbf{U}_{i})+(1-\pi(\mathbf{U}_{i}))m_{\theta}(\mathbf{U}_{i}).$$
(B.1)

Then, taking the iterated conditional expectation with respect to \(\mathbf{U}_{1}\) in \(\partial\mathbb{E}(\dot{\ell}_{1}^{1}(\nu,\beta,\theta))/\partial\theta^{\top}\) and differentiating with respect to \(\theta\), we obtain, after some algebra, that

$$\frac{\partial\mathbb{E}(\dot{\ell}_{1}^{1}(\nu,\beta,\theta))}{\partial\theta^{\top}}=\Omega_{2}(\nu,\beta,\theta).$$

We turn to the second statement. We note

$$\frac{1}{n}\frac{\partial\dot{\ell}_{n}^{1}(\nu,\beta,\theta)}{\partial(\nu,\beta^{\top})}=\begin{pmatrix}A_{n}(\nu,\beta,\theta)&B_{n}(\nu,\beta,\theta)\\ B_{n}(\nu,\beta,\theta)^{\top}&C_{n}(\nu,\beta,\theta)\end{pmatrix},$$

where

$$A_{n}(\nu,\beta,\theta)=\frac{1}{n}\sum_{i=1}^{n}\frac{\partial L_{1,\nu,\beta,\theta}(\mathcal{D}_{i})}{\partial\nu},\quad B_{n}(\nu,\beta,\theta)=\frac{1}{n}\sum_{i=1}^{n}\frac{\partial L_{1,\nu,\beta,\theta}(\mathcal{D}_{i})}{\partial\beta^{\top}}$$

and

$$C_{n}(\nu,\beta,\theta)=\frac{1}{n}\sum_{i=1}^{n}\frac{\partial L_{2,\nu,\beta,\theta}(\mathcal{D}_{i})}{\partial\beta^{\top}}$$

are given by (A.2). Then, decompose \(n^{-1}\partial\dot{\ell}_{n}^{1}(\nu_{0},\beta_{0},\hat{\theta}_{n})/\partial(\nu,\beta^{\top})\) as

$$\frac{1}{n}\frac{\partial\dot{\ell}_{n}^{1}(\nu_{0},\beta_{0},\hat{\theta}_{n})}{\partial(\nu,\beta^{\top})}=\left(\frac{1}{n}\frac{\partial\dot{\ell}_{n}^{1}(\nu_{0},\beta_{0},\hat{\theta}_{n})}{\partial(\nu,\beta^{\top})}-\frac{1}{n}\frac{\partial\dot{\ell}_{n}^{1}(\nu_{0},\beta_{0},\theta_{0})}{\partial(\nu,\beta^{\top})}\right)+\frac{1}{n}\frac{\partial\dot{\ell}_{n}^{1}(\nu_{0},\beta_{0},\theta_{0})}{\partial(\nu,\beta^{\top})}$$
$${}=\begin{pmatrix}A_{n}(\nu_{0},\beta_{0},\hat{\theta}_{n})-A_{n}(\nu_{0},\beta_{0},\theta_{0})&B_{n}(\nu_{0},\beta_{0},\hat{\theta}_{n})-B_{n}(\nu_{0},\beta_{0},\theta_{0})\\ (B_{n}(\nu_{0},\beta_{0},\hat{\theta}_{n})-B_{n}(\nu_{0},\beta_{0},\theta_{0}))^{\top}&C_{n}(\nu_{0},\beta_{0},\hat{\theta}_{n})-C_{n}(\nu_{0},\beta_{0},\theta_{0})\end{pmatrix}$$
$${}+\begin{pmatrix}A_{n}(\nu_{0},\beta_{0},\theta_{0})&B_{n}(\nu_{0},\beta_{0},\theta_{0})\\ B_{n}(\nu_{0},\beta_{0},\theta_{0})^{\top}&C_{n}(\nu_{0},\beta_{0},\theta_{0})\end{pmatrix}.$$
(B.2)

We show that the first term in the sum (B.2) converges to 0 and that the second term converges to \(-\Omega_{1}(\nu_{0},\beta_{0})\). For illustration purpose, we show that \(C_{n}(\nu_{0},\beta_{0},\hat{\theta}_{n})-C_{n}(\nu_{0},\beta_{0},\theta_{0})=o_{p}(1)\) and \(C_{n}(\nu_{0},\beta_{0},\theta_{0})=-\Omega_{1}^{\beta,\beta}(\nu_{0},\beta_{0})+o_{p}(1)\). The other terms can be dealt with similarly. We have:

$$C_{n}(\nu_{0},\beta_{0},\hat{\theta}_{n})-C_{n}(\nu_{0},\beta_{0},\theta_{0})=\frac{1}{n}\sum_{i=1}^{n}\mathbf{X}_{i}\mathbf{X}_{i}^{\top}(1-\xi_{i})(m_{\theta_{0}}(\mathbf{U}_{i})-m_{\hat{\theta}_{n}}(\mathbf{U}_{i}))$$
$${}\times\left\{\widetilde{Y}_{i}\nu_{0}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}+g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\left(\nu_{0}\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-\nu_{0}-g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)\right\}$$
$${}:=\frac{1}{n}\sum_{i=1}^{n}\mathcal{W}_{i}\times(m_{\theta_{0}}(\mathbf{U}_{i})-m_{\hat{\theta}_{n}}(\mathbf{U}_{i})),$$

where

$$\mathcal{W}_{i}:=\mathbf{X}_{i}\mathbf{X}_{i}^{\top}(1-\xi_{i})\left\{\widetilde{Y}_{i}\nu_{0}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}+g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\left(\nu_{0}\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-\nu_{0}-g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)\right\}.$$

Under H1 and H2, there exists a finite constant \(c_{3}>0\) such that \(||\mathcal{W}_{i}||\leq c_{3}\). Therefore,

$$||C_{n}(\nu_{0},\beta_{0},\hat{\theta}_{n})-C_{n}(\nu_{0},\beta_{0},\theta_{0})||\leq\frac{c_{3}}{n}\sum_{i=1}^{n}|m_{\theta_{0}}(\mathbf{U}_{i})-m_{\hat{\theta}_{n}}(\mathbf{U}_{i})|$$
$${}\leq c_{3}||\hat{\theta}_{n}-\theta_{0}||(\mathbb{E}(h(\mathbf{U}_{i}))+o_{p}(1)).$$

Consistency of \(\hat{\theta}_{n}\) implies that \(C_{n}(\nu_{0},\beta_{0},\hat{\theta}_{n})-C_{n}(\nu_{0},\beta_{0},\theta_{0})\) converges in probability to 0. Next, by the weak law of large numbers, \(C_{n}(\nu_{0},\beta_{0},\theta_{0})\) converges in probability to

$$\mathbb{E}\left(\mathbf{X}_{1}\mathbf{X}_{1}^{\top}\left(-\delta_{1}^{1}(\theta_{0})\widetilde{Y}_{1}\nu_{0}e^{-\beta_{0}^{\top}\mathbf{X}_{1}}\right.\right.$$
$${}+\left.\left.(1-\delta_{1}^{1}(\theta_{0}))g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{1},\mathbf{X}_{1})\left(\nu_{0}\widetilde{Y}_{1}e^{-\beta_{0}^{\top}\mathbf{X}_{1}}-\nu_{0}-g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{1},\mathbf{X}_{1})\right)\right)\right).$$
(B.3)

Iterating this expectation with respect to \(\mathbf{U}_{i}\) and using the fact that \(\mathbb{E}(\delta_{i}^{1}(\theta_{0})|\mathbf{U}_{i})=m_{\theta_{0}}(\mathbf{U}_{i})\) (see (B.1)), it is immediate to see that (B.3) coincides with \(-\Omega_{1}^{\beta,\beta}(\nu_{0},\beta_{0})\).

Finally, using similar arguments on the other terms of (B.2) concludes the proof. \(\Box\)

PROOF OF THEOREM 4.1 (ASYMPTOTIC NORMALITY)

A first-order Taylor expansion of \(\dot{\ell}^{1}_{n}(\hat{\nu}^{1}_{n},\hat{\beta}^{1}_{n},\hat{\theta}_{n})\) around \((\nu_{0},\beta_{0})\) yields

$$0=\frac{1}{\sqrt{n}}\dot{\ell}_{n}^{1}(\nu_{0},\beta_{0},\hat{\theta}_{n})+\frac{1}{n}\frac{\partial\dot{\ell}_{n}^{1}(\nu_{0},\beta_{0},\hat{\theta}_{n})}{\partial(\nu,\beta^{\top})}\sqrt{n}\begin{pmatrix}\hat{\nu}^{1}_{n}-\nu_{0}\\ \hat{\beta}^{1}_{n}-\beta_{0}\end{pmatrix}+o_{p}(1).$$
(C.1)

From Lemma 1, we get

$$\frac{1}{\sqrt{n}}\dot{\ell}_{n}^{1}(\nu_{0},\beta_{0},\hat{\theta}_{n})=\frac{1}{\sqrt{n}}\dot{\ell}_{n}^{1}(\nu_{0},\beta_{0},\theta_{0})+\sqrt{n}\mathbb{E}(\dot{\ell}_{1}^{1}(\nu_{0},\beta_{0},\hat{\theta}_{n}))-\sqrt{n}\mathbb{E}(\dot{\ell}_{1}^{1}(\nu_{0},\beta_{0},\theta_{0}))+o_{p}(1),$$
$${}=\frac{1}{\sqrt{n}}\dot{\ell}_{n}^{1}(\nu_{0},\beta_{0},\theta_{0})+\sqrt{n}\left(\frac{\partial\mathbb{E}(\dot{\ell}_{1}^{1}(\nu_{0},\beta_{0},\theta_{0}))}{\partial\theta^{\top}}(\hat{\theta}_{n}-\theta_{0})+o_{p}(||\hat{\theta}_{n}-\theta_{0}||)\right)+o_{p}(1),$$

where the second line follows from a Taylor expansion of \(\mathbb{E}(\dot{\ell}_{1}^{1}(\nu_{0},\beta_{0},\hat{\theta}_{n}))\) around \(\theta_{0}\). Combining this expression with (C.1) and Lemma 2, we obtain

$$\sqrt{n}\begin{pmatrix}\hat{\nu}^{1}_{n}-\nu_{0}\\ \hat{\beta}^{1}_{n}-\beta_{0}\end{pmatrix}=\left(-\frac{1}{n}\frac{\partial\dot{\ell}_{n}^{1}(\nu_{0},\beta_{0},\hat{\theta}_{n})}{\partial(\nu,\beta^{\top})}\right)^{-1}\left(\frac{1}{\sqrt{n}}\dot{\ell}_{n}^{1}(\nu_{0},\beta_{0},\theta_{0})+\Omega_{2,0}\sqrt{n}(\hat{\theta}_{n}-\theta_{0})\right)+o_{p}(1)$$
$${}=\left(-\frac{1}{n}\frac{\partial\dot{\ell}_{n}^{1}(\nu_{0},\beta_{0},\hat{\theta}_{n})}{\partial(\nu,\beta^{\top})}\right)^{-1}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left\{L_{\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})+K_{i}\right\}+o_{p}(1),$$
(C.2)

where

$$K_{i}:=\Omega_{2,0}\Theta^{-1}_{\theta_{0}}\frac{\xi_{i}(\delta_{i}-m_{\theta_{0}}(\mathbf{U}_{i}))\dot{m}_{\theta_{0}}(\mathbf{U}_{i})}{m_{\theta_{0}}(\mathbf{U}_{i})(1-m_{\theta_{0}}(\mathbf{U}_{i}))}.$$

First, note that

$$\textrm{var}(L_{\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i}))=\Omega_{1,0}.$$
(C.3)

Next, under the MAR assumption, it is easy to see that \(\mathbb{E}(K_{i})=0\) and that

$$\mathbb{E}\left(\xi_{i}(\delta_{i}-m_{\theta_{0}}(\mathbf{U}_{i}))^{2}|\mathbf{U}_{i}\right)=\mathbb{E}\left(\xi_{i}|\mathbf{U}_{i}\right)\mathbb{E}\left((\delta_{i}-m_{\theta_{0}}(\mathbf{U}_{i}))^{2}|\mathbf{U}_{i}\right)$$
$${}=\pi(\mathbf{U}_{i})\mathbb{E}\left(\delta_{i}-2\delta_{i}m_{\theta_{0}}(\mathbf{U}_{i})+m_{\theta_{0}}(\mathbf{U}_{i})^{2}|\mathbf{U}_{i}\right)=\pi(\mathbf{U}_{i})m_{\theta_{0}}(\mathbf{U}_{i})(1-m_{\theta_{0}}(\mathbf{U}_{i})).$$

Therefore,

$$\textrm{var}(K_{i})=\Omega_{2,0}\Theta^{-1}_{\theta_{0}}\mathbb{E}\left\{\left(\frac{\xi_{i}(\delta_{i}-m_{\theta_{0}}(\mathbf{U}_{i}))\dot{m}_{\theta_{0}}(\mathbf{U}_{i})}{m_{\theta_{0}}(\mathbf{U}_{i})(1-m_{\theta_{0}}(\mathbf{U}_{i}))}\right)^{\otimes 2}\right\}\Theta^{-1}_{\theta_{0}}\Omega_{2,0}^{\top}$$
$${}=\Omega_{2,0}\Theta^{-1}_{\theta_{0}}\mathbb{E}\left\{\frac{\dot{m}_{\theta_{0}}^{\otimes 2}(\mathbf{U}_{i})}{\{m_{\theta_{0}}(\mathbf{U}_{i})(1-m_{\theta_{0}}(\mathbf{U}_{i}))\}^{2}}\mathbb{E}\left(\xi_{i}(\delta_{i}-m_{\theta_{0}}(\mathbf{U}_{i}))^{2}|\mathbf{U}_{i}\right)\right\}\Theta^{-1}_{\theta_{0}}\Omega_{2,0}^{\top}$$
$${}=\Omega_{2,0}\Theta^{-1}_{\theta_{0}}\mathbb{E}\left\{\frac{\dot{m}_{\theta_{0}}^{\otimes 2}(\mathbf{U}_{i})}{m_{\theta_{0}}(\mathbf{U}_{i})(1-m_{\theta_{0}}(\mathbf{U}_{i}))}\pi(\mathbf{U}_{i})\right\}\Theta^{-1}_{\theta_{0}}\Omega_{2,0}^{\top}$$
$${}=\Omega_{2,0}\Theta^{-1}_{\theta_{0}}\Omega_{2,0}^{\top}.$$
(C.4)

Next, we calculate \(\textrm{cov}(L_{\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i}),K_{i})=\mathbb{E}(L_{\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})K_{i}^{\top})\). Note first that

$$\mathbb{E}(L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})K_{i}^{\top})=\mathbb{E}\left(\frac{\dot{m}_{\theta_{0}}^{\top}(\mathbf{U}_{i})}{m_{\theta_{0}}(\mathbf{U}_{i})(1-m_{\theta_{0}}(\mathbf{U}_{i}))}A_{i}\right)\Theta^{-1}_{\theta_{0}}\Omega_{2,0}^{\top},$$

where

$$A_{i}=\mathbb{E}\left(\xi_{i}(\delta_{i}-m_{\theta_{0}}(\mathbf{U}_{i}))\left[\delta^{1}_{i}(\theta_{0})\left(\log\widetilde{Y}_{i}+\log\nu_{0}+1-\beta_{0}^{\top}\mathbf{X}_{i}-\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)\right.\right.$$
$${}+\left.\left.\left.h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})-\psi(\nu_{0})\right]\right|\mathbf{U}_{i}\right)$$
$${}=\mathbb{E}\left((\delta_{i}-m_{\theta_{0}}(\mathbf{U}_{i}))\left[\xi_{i}\delta_{i}\left(\log\widetilde{Y}_{i}+\log\nu_{0}+1-\beta_{0}^{\top}\mathbf{X}_{i}-\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)\right.\right.$$
$${}+\left.\left.\left.\xi_{i}\left(h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})-\psi(\nu_{0})\right)\right]\right|\mathbf{U}_{i}\right)$$
$${}=\mathbb{E}\left(\xi_{i}\delta_{i}(\delta_{i}-m_{\theta_{0}}(\mathbf{U}_{i}))\left(\log\widetilde{Y}_{i}+\log\nu_{0}+1-\beta_{0}^{\top}\mathbf{X}_{i}-\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)\right.$$
$${}+\left.\left.\xi_{i}(\delta_{i}-m_{\theta_{0}}(\mathbf{U}_{i}))\left(h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})-\psi(\nu_{0})\right)\right|\mathbf{U}_{i}\right)$$
$${}=\mathbb{E}\left(\left.\xi_{i}\delta_{i}(1-m_{\theta_{0}}(\mathbf{U}_{i}))\left(\log\widetilde{Y}_{i}+\log\nu_{0}+1-\beta_{0}^{\top}\mathbf{X}_{i}-\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)\right|\mathbf{U}_{i}\right)$$
$${}=\pi(\mathbf{U}_{i})m_{\theta_{0}}(\mathbf{U}_{i})(1-m_{\theta_{0}}(\mathbf{U}_{i}))\left(\log\widetilde{Y}_{i}+\log\nu_{0}+1-\beta_{0}^{\top}\mathbf{X}_{i}-\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right),$$

where the third to fourth equality follows from

$$\mathbb{E}\left(\left.\xi_{i}(\delta_{i}-m_{\theta_{0}}(\mathbf{U}_{i}))\left(h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})-\psi(\nu_{0})\right)\right|\mathbf{U}_{i}\right)$$
$${}=\pi_{i}(\mathbf{U}_{i})(h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})-\psi(\nu_{0}))\mathbb{E}\left(\delta_{i}-m_{\theta_{0}}(\mathbf{U}_{i})|\mathbf{U}_{i}\right)$$
$${}=\pi_{i}(\mathbf{U}_{i})(h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})-\psi(\nu_{0}))\times 0=0.$$

Hence

$$\mathbb{E}(L_{1,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})K_{i}^{\top})$$
$${}=\mathbb{E}\left(\dot{m}^{\top}_{\theta_{0}}(\mathbf{U}_{i})\pi(\mathbf{U}_{i})\left(\log\widetilde{Y}_{i}+\log\nu_{0}+1-\beta_{0}^{\top}\mathbf{X}_{i}-\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)\right)\Theta^{-1}_{\theta_{0}}\Omega_{2,0}^{\top}.$$

Similar calculations yield

$$\mathbb{E}(L_{2,\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})K_{i}^{\top})=\mathbb{E}\left(\mathbf{X}_{i}\dot{m}_{\theta_{0}}^{\top}(\mathbf{U}_{i})\pi(\mathbf{U}_{i})\left(\widetilde{Y}_{i}\nu_{0}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-\nu_{0}-g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)\right)\Theta^{-1}_{\theta_{0}}\Omega_{2,0}^{\top}$$

(details are omitted) and thus

$$\textrm{cov}(L_{\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i}),K_{i})$$
$${}=\begin{pmatrix}\mathbb{E}\left(\dot{m}_{\theta_{0}}^{\top}(\mathbf{U}_{i})\pi(\mathbf{U}_{i})\left(\log\widetilde{Y}_{i}+\log\nu_{0}+1-\beta_{0}^{\top}\mathbf{X}_{i}-\widetilde{Y}_{i}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-h_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)\right)\\ \mathbb{E}\left(\mathbf{X}_{i}\dot{m}_{\theta_{0}}^{\top}(\mathbf{U}_{i})\pi(\mathbf{U}_{i})\left(\widetilde{Y}_{i}\nu_{0}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-\nu_{0}-g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)\right)\end{pmatrix}\Theta^{-1}_{\theta_{0}}\Omega_{2,0}^{\top}.$$
(C.5)

Using (C.3)–(C.5), we obtain

$$\textrm{var}(L_{\nu_{0},\beta_{0},\theta_{0}}(\mathcal{D}_{i})+K_{i})=\Omega_{1,0}+\Omega_{3,0}\Theta^{-1}_{\theta_{0}}\Omega_{2,0}^{\top}.$$

Finally, Theorem 4.1 follows by applying the multivariate central limit theorem (combined with Slutsky’s theorem and Lemma 2) to (C.2). \(\Box\)

A SKETCH OF THE PROOF OF THEOREM 4.2

Proofs are similar to those for the regression calibration estimate and we omit them. We only mention that following Taylor’s expansion of \(\partial\ell_{n,j}^{2}(\hat{\nu}^{2}_{n},\hat{\beta}^{2}_{n},\hat{\theta}_{n})/\partial(\nu,\beta^{\top})^{\top}\) around \((\nu_{0},\beta_{0})\) and using similar arguments as in Appendix C, we obtain

$$\sqrt{n}\begin{pmatrix}\hat{\nu}^{2}_{n,j}-\nu_{0}\\ \hat{\beta}^{2}_{n,j}-\beta_{0}\end{pmatrix}=\Omega_{1,0}^{-1}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left\{\begin{pmatrix}L_{1,\nu_{0},\beta_{0},\theta_{0}}^{(j)}(\mathcal{D}_{i})\\ L_{2,\nu_{0},\beta_{0},\theta_{0}}^{(j)}(\mathcal{D}_{i})\end{pmatrix}+K_{i}\right\}+o_{p}(1),\quad j=1,\ldots,J,$$

where \(L_{1,\nu_{0},\beta_{0},\theta_{0}}^{(j)}\) and \(L_{2,\nu_{0},\beta_{0},\theta_{0}}^{(j)}\) are given by (4.6). Finally, the multiple imputation estimator

$$\begin{pmatrix}\hat{\nu}^{2}_{n}\\ \hat{\beta}^{2}_{n}\end{pmatrix}=\frac{1}{J}\sum_{j=1}^{J}\begin{pmatrix}\hat{\nu}^{2}_{n,j}\\ \hat{\beta}^{2}_{n,j}\end{pmatrix}$$

satisfies

$$\sqrt{n}\begin{pmatrix}\hat{\nu}^{2}_{n}-\nu_{0}\\ \hat{\beta}^{2}_{n}-\beta_{0}\end{pmatrix}=\Omega_{1,0}^{-1}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left\{\frac{1}{J}\sum_{j=1}^{J}\begin{pmatrix}L_{1,\nu_{0},\beta_{0},\theta_{0}}^{(j)}(\mathcal{D}_{i})\\ L_{2,\nu_{0},\beta_{0},\theta_{0}}^{(j)}(\mathcal{D}_{i})\end{pmatrix}+K_{i}\right\}+o_{p}(1).$$

Asymptotic normality follows from the multivariate central limit theorem and similar covariance calculations as in the proof of Theorem 4.1. \(\Box\)

A SKETCH OF THE PROOF OF THEOREM 4.3

The proof of consistency, when either \(m_{\theta}(\mathbf{U}_{i})\) or \(\pi_{\gamma}(\mathbf{U}_{i})\) is correctly specified, proceeds similarly as for the regression calibration estimate. Nonetheless, we describe the main steps of the proof in order that the reader understands where the assumption of a correct model for either \(m_{\theta}(\mathbf{U}_{i})\) or \(\pi_{\gamma}(\mathbf{U}_{i})\) is used.

We consider the case where \(m_{\theta}(\mathbf{U}_{i})\) is correctly modeled (the case where \(\pi_{\gamma}(\mathbf{U}_{i})\) is correctly specified works similarly and is omitted). Let \(\dot{\ell}_{n}^{3}(\nu,\beta,\theta,\gamma)=\partial\ell_{n}^{3}(\nu,\beta,\theta,\gamma)/\partial(\nu,\beta^{\top})^{\top}\). We have:

$$\dot{\ell}_{n}^{3}(\nu,\beta,\theta,\gamma)=\sum_{i=1}^{n}L_{\nu,\beta,\theta,\gamma}(\mathcal{D}_{i})=\begin{pmatrix}\sum_{i=1}^{n}L_{1,\nu,\beta,\theta,\gamma}(\mathcal{D}_{i})\\ \sum_{i=1}^{n}L_{2,\nu,\beta,\theta,\gamma}(\mathcal{D}_{i})\end{pmatrix},$$

where \(L_{1,\nu,\beta,\theta,\gamma}\) and \(L_{2,\nu,\beta,\theta,\gamma}\) are similar to \(L_{1,\nu,\beta,\theta}\) and \(L_{2,\nu,\beta,\theta}\) in (A.1), with \(\delta^{1}_{i}(\theta)\) replaced by \(\delta^{3}_{i}(\theta,\gamma)\).

First, one can easily check that the map \((\nu,\beta)\mapsto\partial\dot{\ell}_{n}^{3}(\nu,\beta,\hat{\theta}_{n},\hat{\gamma}_{n})/\partial(\nu,\beta^{\top})\) is continuous in an open neighborhood of \((\nu_{0},\beta_{0})\) (condition (i)). Next, we prove that \(n^{-1}\dot{\ell}_{n}^{3}(\nu_{0},\beta_{0},\hat{\theta}_{n},\hat{\gamma}_{n})=o_{p}(1)\) (condition (ii)). As an illustration, we show that

$$\frac{1}{n}\sum_{i=1}^{n}L_{2,\nu_{0},\beta_{0},\hat{\theta}_{n},\hat{\gamma}_{n}}(\mathcal{D}_{i})=o_{p}(1).$$

To see this, decompose

$$\frac{1}{n}\sum_{i=1}^{n}L_{2,\nu_{0},\beta_{0},\hat{\theta}_{n},\hat{\gamma}_{n}}(\mathcal{D}_{i})=\frac{1}{n}\sum_{i=1}^{n}\mathbf{X}_{i}\frac{\xi_{i}}{\pi_{\hat{\gamma}_{n}}(\mathbf{U}_{i})}(\delta_{i}-m_{\hat{\theta}_{n}}(\mathbf{U}_{i}))\left(\widetilde{Y}_{i}\nu_{0}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-\nu_{0}-g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)$$
$${}+\frac{1}{n}\sum_{i=1}^{n}\mathbf{X}_{i}\left(m_{\hat{\theta}_{n}}(\mathbf{U}_{i})\left[\widetilde{Y}_{i}\nu_{0}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-\nu_{0}-g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right]+g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)$$
$${}:=A_{n}(\hat{\theta}_{n},\hat{\gamma}_{n})+B_{n}(\hat{\theta}_{n}).$$
(E.1)

Let \(\mathcal{A}_{i}:=\mathbf{X}_{i}\xi_{i}(\widetilde{Y}_{i}\nu_{0}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-\nu_{0}-g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i}))\). Then

$$A_{n}(\hat{\theta}_{n},\hat{\gamma}_{n})=\frac{1}{n}\sum_{i=1}^{n}\mathcal{A}_{i}\frac{\delta_{i}-m_{\hat{\theta}_{n}}(\mathbf{U}_{i})}{\pi_{\hat{\gamma}_{n}}(\mathbf{U}_{i})}$$
$${}=\frac{1}{n}\sum_{i=1}^{n}\mathcal{A}_{i}(\delta_{i}-m_{\hat{\theta}_{n}}(\mathbf{U}_{i}))\left(\frac{1}{\pi_{\hat{\gamma}_{n}}(\mathbf{U}_{i})}-\frac{1}{\pi_{\gamma_{\ast}}(\mathbf{U}_{i})}\right)$$
$${}+\frac{1}{n}\sum_{i=1}^{n}\mathcal{A}_{i}(\delta_{i}-m_{\theta_{0}}(\mathbf{U}_{i}))\frac{1}{\pi_{\gamma_{\ast}}(\mathbf{U}_{i})}+\frac{1}{n}\sum_{i=1}^{n}\mathcal{A}_{i}(m_{\theta_{0}}(\mathbf{U}_{i})-m_{\hat{\theta}_{n}}(\mathbf{U}_{i}))\frac{1}{\pi_{\gamma_{\ast}}(\mathbf{U}_{i})}$$
$${}:=A_{n,1}+A_{n,2}+A_{n,3}.$$

Under H1 and H2, there exists a constant \(0<c_{4}<\infty\) such that \(||\mathcal{A}_{i}||<c_{4}\). Therefore,

$$||A_{n,1}||\leq 2c_{4}\cdot\frac{1}{n}\sum_{i=1}^{n}\frac{|\pi_{\gamma_{\ast}}(\mathbf{U}_{i})-\pi_{\hat{\gamma}_{n}}(\mathbf{U}_{i})|}{\pi_{\hat{\gamma}_{n}}(\mathbf{U}_{i})\pi_{\gamma_{\ast}}(\mathbf{U}_{i})}$$
$${}\leq\frac{2c_{4}}{c_{0}^{2}}||\hat{\gamma}_{n}-\gamma_{\ast}||(\mathbb{E}(k(\mathbf{U}_{i}))+o_{p}(1)),$$

where the second inequality follows from H4 and H5. Finally, the convergence of \(\hat{\gamma}_{n}\) to \(\gamma_{\ast}\) implies that \(A_{n,1}=o_{p}(1)\). Using similar arguments, we verify that

$$||A_{n,3}||\leq\frac{c_{4}}{c_{0}}||\hat{\theta}_{n}-\theta_{0}||(\mathbb{E}(h(\mathbf{U}_{i}))+o_{p}(1)).$$

If the model \(m_{\theta}(\mathbf{U}_{i})\) is correctly specified, then \(\hat{\theta}_{n}\stackrel{{\scriptstyle p}}{{\longrightarrow}}\theta_{0}\) and \(A_{n,3}\) is also an \(o_{p}(1)\). Finally, under the MAR hypothesis, \(A_{n,2}\) converges in probability to

$$\mathbb{E}\left(\mathcal{A}_{i}(\delta_{i}-m_{\theta_{0}}(\mathbf{U}_{i}))\frac{1}{\pi_{\gamma_{\ast}}(\mathbf{U}_{i})}\right)$$
$${}=\mathbb{E}\left(\mathbf{X}_{i}\frac{\widetilde{Y}_{i}\nu_{0}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-\nu_{0}-g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})}{\pi_{\gamma_{\ast}}(\mathbf{U}_{i})}\mathbb{E}(\xi_{i}|\mathbf{U}_{i})(\mathbb{E}(\delta_{i}|\mathbf{U}_{i})-m_{\theta_{0}}(\mathbf{U}_{i}))\right).$$

If \(m_{\theta}(\mathbf{U}_{i})\) is correctly modeled, then \(\mathbb{E}(\delta_{i}|\mathbf{U}_{i})-m_{\theta_{0}}(\mathbf{U}_{i})=0\) and thus \(A_{n,2}=o_{p}(1)\). It follows that \(A_{n}(\hat{\theta}_{n},\hat{\gamma}_{n})=o_{p}(1)\) and (E.1) becomes

$$\frac{1}{n}\sum_{i=1}^{n}L_{2,\nu_{0},\beta_{0},\hat{\theta}_{n},\hat{\gamma}_{n}}(\mathcal{D}_{i})=B_{n}(\hat{\theta}_{n})+o_{p}(1).$$

Decompose \(B_{n}(\hat{\theta}_{n})\) as \(B_{n}(\hat{\theta}_{n})=B_{n}(\hat{\theta}_{n})-B_{n}(\theta_{0})+B_{n}(\theta_{0})\). Acting as for \(A_{n,1}\) and \(A_{n,3}\), one can easily check that

$$B_{n}(\hat{\theta}_{n})-B_{n}(\theta_{0})=\frac{1}{n}\sum_{i=1}^{n}\mathbf{X}_{i}(m_{\hat{\theta}_{n}}(\mathbf{U}_{i})-m_{\theta_{0}}(\mathbf{U}_{i}))\left(\widetilde{Y}_{i}\nu_{0}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-\nu_{0}-g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)$$

converges to 0. Moreover, by the law of large numbers, \(B_{n}(\theta_{0})\) converges in probability to

$$\mathbb{E}\left(\mathbf{X}_{i}\left(m_{\theta_{0}}(\mathbf{U}_{i})\left[\widetilde{Y}_{i}\nu_{0}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-\nu_{0}-g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right]+g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)\right)$$
$${}=\mathbb{E}\left(\mathbf{X}_{i}\left(\delta_{i}\left[\widetilde{Y}_{i}\nu_{0}e^{-\beta_{0}^{\top}\mathbf{X}_{i}}-\nu_{0}-g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right]+g_{\nu_{0},\beta_{0}}(\widetilde{Y}_{i},\mathbf{X}_{i})\right)\right),$$

which is the expectation of the score (with respect to \(\beta\)) in the censored Gamma regression model with no missing data, and is equal to 0. Finally, \(n^{-1}\sum_{i=1}^{n}L_{2,\nu_{0},\beta_{0},\hat{\theta}_{n},\hat{\gamma}_{n}}(\mathcal{D}_{i})=o_{p}(1)\). Using similar arguments, we can show that \(n^{-1}\sum_{i=1}^{n}L_{1,\nu_{0},\beta_{0},\hat{\theta}_{n},\hat{\gamma}_{n}}(\mathcal{D}_{i})=o_{p}(1)\), and condition (ii) is verified.

Finally, we show that \(n^{-1}\partial\dot{\ell}_{n}^{3}(\nu,\beta,\hat{\theta}_{n},\hat{\gamma}_{n})/\partial(\nu,\beta^{\top})\) converges to \(-\Omega_{1}(\nu,\beta)\) uniformly in a neighborhood of \((\nu_{0},\beta_{0})\) (condition (iii)). We have

$$\frac{1}{n}\frac{\partial\dot{\ell}_{n}^{3}(\nu,\beta,\hat{\theta}_{n},\hat{\gamma}_{n})}{\partial(\nu,\beta^{\top})}=\frac{1}{n}\sum_{i=1}^{n}\begin{pmatrix}\displaystyle\frac{\partial L_{1,\nu,\beta,\hat{\theta}_{n},\hat{\gamma}_{n}}(\mathcal{D}_{i})}{\partial\nu}\frac{\partial L_{1,\nu,\beta,\hat{\theta}_{n},\hat{\gamma}_{n}}(\mathcal{D}_{i})}{\partial\beta^{\top}}\\ \displaystyle\frac{\partial L_{2,\nu,\beta,\hat{\theta}_{n},\hat{\gamma}_{n}}(\mathcal{D}_{i})}{\partial\nu}\displaystyle\frac{\partial L_{2,\nu,\beta,\hat{\theta}_{n},\hat{\gamma}_{n}}(\mathcal{D}_{i})}{\partial\beta^{\top}}\end{pmatrix},$$

where the partial derivatives are given by formulas (A.2) with \(\delta_{i}^{1}(\theta)\) replaced by \(\delta_{i}^{3}(\theta,\gamma)\). For illustrative purpose, we show that \(n^{-1}\sum_{i=1}^{n}\partial L_{2,\nu,\beta,\hat{\theta}_{n},\hat{\gamma}_{n}}(\mathcal{D}_{i})/\partial\beta^{\top}\) converges to \(-\Omega_{1}^{\beta,\beta}(\nu,\beta)\). Let

$$\mathcal{C}_{i}:=\widetilde{Y}_{i}\nu e^{-\beta^{\top}\mathbf{X}_{i}}+(\widetilde{Y}_{i}\nu e^{-\beta^{\top}\mathbf{X}_{i}}-\nu-g_{\nu,\beta}(\widetilde{Y}_{i},\mathbf{X}_{i}))g_{\nu,\beta}(\widetilde{Y}_{i},\mathbf{X}_{i}).$$

Then

$$\frac{1}{n}\sum_{i=1}^{n}\frac{\partial L_{2,\nu,\beta,\theta,\gamma}(\mathcal{D}_{i})}{\partial\beta^{\top}}=\frac{1}{n}\sum_{i=1}^{n}\mathbf{X}_{i}\mathbf{X}_{i}^{\top}\left(-\delta_{i}^{3}(\theta,\gamma)\mathcal{C}_{i}+\mathcal{C}_{i}-\widetilde{Y}_{i}\nu e^{-\beta^{\top}\mathbf{X}_{i}}\right)$$
$${}=-\frac{1}{n}\sum_{i=1}^{n}\mathbf{X}_{i}\mathbf{X}_{i}^{\top}\frac{\xi_{i}}{\pi_{\gamma}(\mathbf{U}_{i})}(\delta_{i}-m_{\theta}(\mathbf{U}_{i}))\mathcal{C}_{i}-\frac{1}{n}\sum_{i=1}^{n}\mathbf{X}_{i}\mathbf{X}_{i}^{\top}m_{\theta}(\mathbf{U}_{i})\mathcal{C}_{i}$$
$${}+\frac{1}{n}\sum_{i=1}^{n}\mathbf{X}_{i}\mathbf{X}_{i}^{\top}(\mathcal{C}_{i}-\widetilde{Y}_{i}\nu e^{-\beta^{\top}\mathbf{X}_{i}}).$$

Decompose

$$\frac{1}{n}\sum_{i=1}^{n}\frac{\partial L_{2,\nu,\beta,\hat{\theta}_{n},\hat{\gamma}_{n}}(\mathcal{D}_{i})}{\partial\beta^{\top}}=\frac{1}{n}\sum_{i=1}^{n}\left(\frac{\partial L_{2,\nu,\beta,\hat{\theta}_{n},\hat{\gamma}_{n}}(\mathcal{D}_{i})}{\partial\beta^{\top}}-\frac{\partial L_{2,\nu,\beta,\theta_{0},\gamma_{\ast}}(\mathcal{D}_{i})}{\partial\beta^{\top}}\right)+\frac{1}{n}\sum_{i=1}^{n}\frac{\partial L_{2,\nu,\beta,\theta_{0},\gamma_{\ast}}(\mathcal{D}_{i})}{\partial\beta^{\top}}$$
$${}=\frac{1}{n}\sum_{i=1}^{n}\mathbf{X}_{i}\mathbf{X}_{i}^{\top}(m_{\theta_{0}}(\mathbf{U}_{i})-m_{\hat{\theta}_{n}}(\mathbf{U}_{i}))\mathcal{C}_{i}+\frac{1}{n}\sum_{i=1}^{n}\mathbf{X}_{i}\mathbf{X}_{i}^{\top}\frac{\xi_{i}}{\pi_{\hat{\gamma}_{n}}(\mathbf{U}_{i})}(m_{\hat{\theta}_{n}}(\mathbf{U}_{i})-\delta_{i})\mathcal{C}_{i}$$
$${}-\frac{1}{n}\sum_{i=1}^{n}\mathbf{X}_{i}\mathbf{X}_{i}^{\top}\frac{\xi_{i}}{\pi_{\gamma_{\ast}}(\mathbf{U}_{i})}(m_{\theta_{0}}(\mathbf{U}_{i})-\delta_{i})\mathcal{C}_{i}+\frac{1}{n}\sum_{i=1}^{n}\frac{\partial L_{2,\nu,\beta,\theta_{0},\gamma_{\ast}}(\mathcal{D}_{i})}{\partial\beta^{\top}}$$
$${}=C_{n,1}+C_{n,2}+C_{n,3}+\frac{1}{n}\sum_{i=1}^{n}\frac{\partial L_{2,\nu,\beta,\theta_{0},\gamma_{\ast}}(\mathcal{D}_{i})}{\partial\beta^{\top}}.$$

Using similar arguments as for \(A_{n,3},A_{n}(\hat{\theta}_{n},\hat{\gamma}_{n})\), and \(A_{n,2}\), respectively, one can see that \(C_{n,1},C_{n,2}\), and \(C_{n,3}\) converge in probability to 0. Next, \(n^{-1}\sum_{i=1}^{n}\partial L_{2,\nu,\beta,\theta_{0},\gamma_{\ast}}(\mathcal{D}_{i})/\partial\beta^{\top}\) converges to

$$\mathbb{E}\left(\mathbf{X}_{i}\mathbf{X}_{i}^{\top}\left(-\delta_{i}^{3}(\theta_{0},\gamma_{\ast})\mathcal{C}_{i}+\mathcal{C}_{i}-\widetilde{Y}_{i}\nu e^{-\beta^{\top}\mathbf{X}_{i}}\right)\right).$$
(E.2)

If \(m_{\theta}(\mathbf{U}_{i})\) is correct (that is, if \(\mathbb{E}(\delta_{i}|\mathbf{U}_{i})=m_{\theta_{0}}(\mathbf{U}_{i})\)), then

$$\mathbb{E}\left(\delta_{i}^{3}(\theta_{0},\gamma_{\ast})|\mathbf{U}_{i}\right)=\frac{\mathbb{E}(\xi_{i}|\mathbf{U}_{i})m_{\theta_{0}}(\mathbf{U}_{i})}{\pi_{\gamma_{\ast}}(\mathbf{U}_{i})}+\left(1-\frac{\mathbb{E}(\xi_{i}|\mathbf{U}_{i})}{\pi_{\gamma_{\ast}}(\mathbf{U}_{i})}\right)m_{\theta_{0}}(\mathbf{U}_{i})$$
$$=m_{\theta_{0}}(\mathbf{U}_{i}).$$

Thus, by taking the iterated conditional expectation with respect to \(\mathbf{U}_{i}\) in (E.2), we see that (E.2) becomes \(-\Omega_{1}^{\beta,\beta}(\nu,\beta)\), which concludes the proof. Uniformity follows by the same arguments as in the proof of Theorem 4.1.

Finally, having proved conditions (i)–(iii), we apply Foutz’s [17] consistency theorem and we conclude that \((\hat{\nu}^{3}_{n},\hat{\beta}^{3}_{n})\) converges in probability to \((\nu_{0},\beta_{0})\) if \(m_{\theta}(\mathbf{U}_{i})\) is correctly modeled. Now, we turn to asymptotic normality. Straightforward calculations yield

$$\frac{\partial\delta_{i}^{3}(\theta,\gamma)}{\partial\theta^{\top}}=\dot{m}_{\theta}^{\top}(\mathbf{U}_{i})\left(1-\frac{\xi_{i}}{\pi_{\gamma}(\mathbf{U}_{i})}\right)\quad\textrm{ and }\quad\frac{\partial\delta_{i}^{3}(\theta,\gamma)}{\partial\gamma^{\top}}=(m_{\theta}(\mathbf{U}_{i})-\delta_{i})\xi_{i}\frac{\dot{\pi}_{\gamma}^{\top}(\mathbf{U}_{i})}{\pi_{\gamma}^{2}(\mathbf{U}_{i})},$$

from which we easily deduce that

$$\frac{1}{n}\frac{\partial\dot{\ell}_{n}^{3}(\nu,\beta,\theta,\gamma)}{\partial\theta^{\top}}\stackrel{{\scriptstyle p}}{{\longrightarrow}}\Omega_{5}(\nu,\beta,\theta,\gamma)\quad\textrm{ and }\quad\frac{1}{n}\frac{\partial\dot{\ell}_{n}^{3}(\nu,\beta,\theta,\gamma)}{\partial\gamma^{\top}}\stackrel{{\scriptstyle p}}{{\longrightarrow}}\Omega_{6}(\nu,\beta,\theta,\gamma)$$

as \(n\rightarrow\infty\). If \(\pi_{\gamma}(\mathbf{U}_{i})\) is correctly modeled (i.e., \(\gamma_{\ast}=\gamma_{0}\)), then

$$\mathbb{E}\left(\left.1-\frac{\xi_{i}}{\pi_{\gamma_{\ast}}(\mathbf{U}_{i})}\right|\mathbf{U}_{i}\right)=0.$$

Thus, by iterating the conditional expectation with respect to \(\mathbf{U}_{i}\) in \(\Omega_{5}(\nu_{0},\beta_{0},\theta,\gamma)\), we easily see that \(\Omega_{5}(\nu_{0},\beta_{0},\theta_{\ast},\gamma_{\ast})=0\) if \(\pi_{\gamma}(\mathbf{U}_{i})\) is correctly specified. Similarly, \(\Omega_{6}(\nu_{0},\beta_{0},\theta_{\ast},\gamma_{\ast})=0\) if \(m_{\theta}(\mathbf{U}_{i})\) is correctly modeled.

Now, taking Taylor’s expansion of \(\dot{\ell}_{n}^{3}(\hat{\nu}^{3}_{n},\hat{\beta}^{3}_{n},\hat{\theta}_{n},\hat{\gamma}_{n})\) and acting as in Appendix C, we obtain:

$$\sqrt{n}\begin{pmatrix}\hat{\nu}^{3}_{n}-\nu_{0}\\ \hat{\beta}^{3}_{n}-\beta_{0}\end{pmatrix}$$
$${}=\Omega_{1,0}^{-1}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left\{L_{\nu_{0},\beta_{0},\theta_{0},\gamma_{\ast}}(\mathcal{D}_{i})+\Omega_{5}(\nu_{0},\beta_{0},\theta_{0},\gamma_{\ast})\Theta^{-1}_{\theta_{0}}\frac{\xi_{i}(\delta_{i}-m_{\theta_{0}}(\mathbf{U}_{i}))\dot{m}_{\theta_{0}}(\mathbf{U}_{i})}{m_{\theta_{0}}(\mathbf{U}_{i})(1-m_{\theta_{0}}(\mathbf{U}_{i}))}\right\}+o_{p}(1)$$

if \(m_{\theta}(\mathbf{U}_{i})\) is correctly modeled, and

$$\sqrt{n}\begin{pmatrix}\hat{\nu}^{3}_{n}-\nu_{0}\\ \hat{\beta}^{3}_{n}-\beta_{0}\end{pmatrix}$$
$${}=\Omega_{1,0}^{-1}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left\{L_{\nu_{0},\beta_{0},\theta_{\ast},\gamma_{0}}(\mathcal{D}_{i})+\Omega_{6}(\nu_{0},\beta_{0},\theta_{\ast},\gamma_{0})G^{-1}_{\gamma_{0}}\frac{(\xi_{i}-\pi_{\gamma_{0}}(\mathbf{U}_{i}))\dot{\pi}_{\gamma_{0}}(\mathbf{U}_{i})}{\pi_{\gamma_{0}}(\mathbf{U}_{i})(1-\pi_{\gamma_{0}}(\mathbf{U}_{i}))}\right\}+o_{p}(1)$$

if \(\pi_{\gamma}(\mathbf{U}_{i})\) is correctly specified. In both cases, asymptotic normality follows by applying the multivariate central limit theorem. Tedious albeit not difficult calculations yield the asymptotic variances formulas. If both \(m_{\theta}(\mathbf{U}_{i})\) and \(\pi_{\gamma}(\mathbf{U}_{i})\) are correctly modeled, \(\Omega_{5}(\nu_{0},\beta_{0},\theta_{0},\gamma_{0})=\Omega_{6}(\nu_{0},\beta_{0},\theta_{0},\gamma_{0})=0\) and the asymptotic variance of \((\hat{\nu}^{3}_{n},\hat{\beta}^{3}_{n})\) reduces to \(\Omega_{1,0}^{-1}\). \(\Box\)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dupuy, JF. Censored Gamma Regression with Uncertain Censoring Status. Math. Meth. Stat. 29, 172–196 (2020). https://doi.org/10.3103/S106653072004002X

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S106653072004002X

Keywords:

Navigation