Variable selection for semiparametric accelerated failure time models with nonignorable missing data

Liu, Tianqing; Yuan, Xiaohui; Sun, Liuquan

doi:10.1007/s42952-023-00238-z

Variable selection for semiparametric accelerated failure time models with nonignorable missing data

Research Article
Published: 19 November 2023

Volume 53, pages 100–131, (2024)
Cite this article

Journal of the Korean Statistical Society Aims and scope Submit manuscript

Tianqing Liu¹,
Xiaohui Yuan² &
Liuquan Sun³

135 Accesses
Explore all metrics

Abstract

The regularization approach for variable selection was well developed for semiparametric accelerated failure time (AFT) models, where the response variable is right censored. In the presence of missing data, this approach needs to be tailored to different missing data mechanisms. In this paper, we propose a flexible and generally applicable missing data mechanism for AFT models, which contains both ignorable and nonignorable missing data mechanism assumptions. We propose weighted rank (WR) estimators and corresponding penalized estimators of regression parameters under this missing data mechanism. An advantage of the WR estimators and corresponding penalized estimators is that they do not require specifying a missing data model for the proposed missing data mechanism. The theoretical properties of the WR and corresponding penalized estimators are established. Comprehensive simulation studies and a real data application further demonstrate the merits of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semiparametric Bayesian inference for accelerated failure time models with errors-in-covariates and doubly censored data

Article 02 May 2017

Weighted Least Squares Method for the Accelerated Failure Time Model with Auxiliary Covariates

Article 24 April 2019

Adaptive Penalized Weighted Least Absolute Deviations Estimation for the Accelerated Failure Time Model

Article 15 July 2020

Data availability

Data will be made available on request.

References

Amemiya, T. (1985). Advanced econometrics. Harvard University Press.
Google Scholar
Buckley, J., & James, I. (1979). Linear regression with censored data. Biometrika, 66(3), 429–436.
Article Google Scholar
Cai, T., Huang, J., & Tian, L. (2009). Regularized estimation for the accelerated failure time model. Biometrics, 65(2), 394–404.
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Ding, Y., & Nan, B. (2011). A sieve M-theorem for bundled parameters in semiparametric models, with application to the efficient estimation in a linear model for censored data. The Annals of Statistics, 39(6), 3032–3061.
Article MathSciNet Google Scholar
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Article MathSciNet Google Scholar
Fan, J., & Lv, J. (2011). Nonconcave penalized likelihood with NP-dimensionality. IEEE Transactions on Information Theory, 57(8), 5467–5484.
Article MathSciNet PubMed PubMed Central Google Scholar
Fleming, T., & Harrington, D. (1991). Counting processes and survival analysis. Wiley.
Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1–22.
Article PubMed PubMed Central Google Scholar
Fu, W. J. (1998). Penalized regression: The bridge versus the LASSO. Journal of Computational and Graphical Statistics, 7, 397–416.
MathSciNet Google Scholar
Fygenson, M., & Ritov, Y. (1994). Monotone estimating equations for censored data. The Annals of Statistics, 22, 732–746.
Article MathSciNet Google Scholar
Huang, J., & Ma, S. (2010). Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Analysis, 16(2), 176–195.
Article MathSciNet PubMed Google Scholar
Huang, J., Ma, S., & Xie, H. (2006). Regularized estimation in the accelerated failure time model with high-dimensional covariates. Biometrics, 62(3), 813–820.
Article MathSciNet PubMed Google Scholar
Jin, Z., Lin, D. Y., Wei, L. J., & Ying, Z. (2003). Rank-based inference for the accelerated failure time model. Biometrika, 90, 341–353.
Article MathSciNet Google Scholar
Jin, Z., Lin, D. Y., & Ying, Z. (2006). Least-squares regression with censored data. Biometrika, 93, 147–161.
Article MathSciNet Google Scholar
Jin, Z., Ying, Z., & Wei, L. J. (2001). A simple resampling method by perturbing the minimand. Biometrika, 88, 381–390.
Article MathSciNet Google Scholar
Johnson, L. M., & Strawderman, R. L. (2009). Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data. Biometrika, 96, 577–590.
Article MathSciNet PubMed PubMed Central Google Scholar
Jones, M. P. (1997). A class of semiparametric regressions for the accelerated failure time model. Biometrika, 84, 73–84.
Article MathSciNet Google Scholar
Kalbeisch, J. D. (1978). Likelihood methods and nonparametric tests. Journal of the American Statistical Association, 73, 167–170.
Article MathSciNet Google Scholar
Kowalski, J., & Tu, X. M. (2008). Modern applied U-statistics. Wiley.
Google Scholar
Lai, T. L., & Ying, Z. (1991). Rank regression methods for left-truncated and right-censored data. The Annals of Statistics, 19, 531–556.
Article MathSciNet Google Scholar
Lai, T. L., & Ying, Z. (1991). Large-sample theory of a modified Buckley-James estimator for regression analysis with censored data. The Annals of Statistics, 19, 1370–1402.
Article MathSciNet Google Scholar
Liang, K.-Y., & Qin, J. (2000). Regression analysis under non-standard situations: A pairwise pseudolikelihood approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(4), 773–786.
Article MathSciNet Google Scholar
Lin, Y., & Chen, K. (2013). Efficient estimation of the censored linear regression model. Biometrika, 100(2), 525–530.
Article MathSciNet Google Scholar
Lin, D. Y., & Ying, Z. (1995). Semiparametric inference for the accelerated life model with time-dependent covariates. Journal of Statistical Planning and Inference, 44, 47–63.
Article MathSciNet Google Scholar
Liu, T., Yuan, X., & Sun, J. (2021). Weighted rank estimation for nonparametric transformation models with nonignorable missing data. Computational Statistics and Data Analysis, 153, 107061.
Article MathSciNet Google Scholar
Miller, R. G. (1976). Least squares regression with censored data. Biometrika, 63(3), 449–464.
Article MathSciNet Google Scholar
Nan, B., Kalbfleisch, J. D., & Yu, M. (2009). Asymptotic theory for the semiparametric accelerated failure time model with missing data. The Annals of Statistics, 37, 2351–2376.
Article MathSciNet Google Scholar
Nolan, D., & Pollard, D. (1987). U-processes: Rates of convergence. The Annals of Statistics, 15, 780–799.
Article MathSciNet Google Scholar
Prentice, R. L. (1978). Linear rank tests with right-censored data. Biometrika, 65, 167–179.
Article MathSciNet Google Scholar
Ritov, Y. (1990). Estimation in a linear regression model with censored data. The Annals of Statistics, 18(1), 303–328.
Article MathSciNet Google Scholar
Sherman, R. (1993). The limiting distribution of the maximum rank correlation estimator. Econometrica, 61, 123–137.
Article MathSciNet Google Scholar
Sherman, R. (1994). Maximal inequalities for degenerate U-processes with applications to optimization estimators. The Annals of Statistics, 22, 439–459.
Article MathSciNet Google Scholar
Steingrimsson, J. A., & Strawderman, R. L. (2017). Estimation in the semiparametric accelerated failure time model with missing covariates: Improving efficiency through augmentation. Journal of the American Statistical Association, 112, 1221–1235.
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 58, 267–288.
MathSciNet Google Scholar
Tsiatis, A. A. (1990). Estimating regression parameters using linear rank tests for censored data. The Annals of Statistics, 18(1), 354–372.
Article MathSciNet Google Scholar
Wang, H., & Leng, C. (2007). Unified Lasso estimation via least squares approximation. Journal of the American Statistical Association, 102(479), 1039–1048.
Article MathSciNet CAS Google Scholar
Wang, H., Li, R., & Tsai, C.-L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94(3), 553–568.
Article MathSciNet PubMed Google Scholar
Wang, X., & Song, L. (2011). Adaptive lasso variable selection for the accelerated failure models. Communications in Statistics - Theory and Methods, 40(24), 4372–4386.
Article MathSciNet Google Scholar
Wei, L. J., Ying, Z., & Lin, D. Y. (1990). Linear regression analysis of censored survival data based on rank tests. Biometrika, 11, 845–851.
Article MathSciNet Google Scholar
Xu, J., Leng, C., & Ying, Z. (2010). Rank-based variable selection with censored data. Statistics and Computing, 20, 165–176.
Article MathSciNet PubMed PubMed Central Google Scholar
Yang, S. (1997). Extended weighted log-rank estimating functions in censored regression. Journal of the American Statistical Association, 92, 977–984.
Article MathSciNet Google Scholar
Ying, Z. (1993). A large sample study of rank estimation for censored regression data. The Annals of Statistics, 1(21), 76–99.
MathSciNet Google Scholar
Yuan, X., Wang, Y., & Liu, T. (2020). Variable selection for semiparametric random-effects conditional density models with longitudinal data. Communications in Statistics-Theory and Methods, 49(4), 977–996.
Article MathSciNet Google Scholar
Zeng, D., & Lin, D. (2007). Efficient estimation for the accelerated failure time model. Journal of the American Statistical Association, 102(480), 1387–1396.
Article MathSciNet CAS Google Scholar
Zhao, J., Yang, Y., & Ning, Y. (2018). Penalized pairwise pseudo likelihood for variable selection with nonignorable missing data. Statistica Sinica, 28, 2125–2148.
MathSciNet Google Scholar
Zhou, M. (2005). Empirical likelihood analysis of the rank estimator for the censored accelerated failure time model. Biometrika, 92, 492–498.
Article MathSciNet Google Scholar
Zou, H. (2006). The adaptive LASSO and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.
Article MathSciNet CAS Google Scholar
Zou, H., & Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics, 36, 1509–1566.
MathSciNet PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Center for Applied Statistical Research and School of Mathematics, Jilin University, 2699 Qianjin Street, Chanchun, 130012, Jilin, China
Tianqing Liu
School of Mathematics and Statistics, Changchun University of Technology, 2055 Yanan Street, Changchun, 130000, Jilin, China
Xiaohui Yuan
Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
Liuquan Sun

Authors

Tianqing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Liuquan Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tianqing Liu.

Ethics declarations

Conflict of interest

There are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 69 KB)

Appendices

Appendix A

The proofs of Theorems 3 and 4 are very similar to those of Theorems 3.1 and 3.2 in Yuan et al. (2020), respectively. However, our problem has some distinct features. In Yuan et al. (2020), the unpenalized objective function is a smooth function of the regression parameters. In this paper, the unpenalized objective function and its score function are not smooth, which makes a direct application of the standard penalized approach difficult.

Henceforth, we suppress $\lambda$ from the quantities $Q_{\varpi ,\lambda }({\varvec{\beta }})$ and ${\hat{{\varvec{\beta }}}}_{\varpi ,\lambda }$ for $\varpi \in \{A,S\}$.

Proof of Theorem 1

Recall that $\ell ({\varvec{\beta }})=E\{\ell _{N}({\varvec{\beta }})\}$. Our consistency proof utilizes the approach developed in Amemiya (1985, pp. 106–107) to carry out the steps outlined below: (i) $\ell ({\varvec{\beta }})$ is uniquely maximized at ${\varvec{\beta }}^*$; (ii) $\sup _{{{\varvec{\beta }}}\in \Theta }|\ell _N({\varvec{\beta }})-\ell ({\varvec{\beta }})|=o_p(1)$; (iii) $\ell ({\varvec{\beta }})$ is continuous on $\Theta$.

Proof of (i) Note that $\varepsilon _i$ $(i=1, \cdots , N)$ are independent error terms with a common density function $f_\varepsilon (u)$. For convenience, let

$$\begin{aligned} {\bar{\ell }}_N({\varvec{\beta }})=E\{\ell _{N}({\varvec{\beta }})|X_1,C_1,\ldots ,X_N,C_N\}, \end{aligned}$$

then $\ell ({\varvec{\beta }})=E\{{\bar{\ell }}_{N}({\varvec{\beta }})\}$. We first show that

$$\begin{aligned} \partial \ell ({\varvec{\beta }}^*)/\partial {\varvec{\beta }}=:\partial \ell ({\varvec{\beta }})/\partial {\varvec{\beta }}|_{{{\varvec{\beta }}}={{\varvec{\beta }}}^*}=E\{U_{N}({\varvec{\beta }}^*)\}=0. \end{aligned}$$

Let $O_i=(X_i^{\textsf {T}},C_i,\varepsilon _i)^{\textsf {T}}$, $i=1,\ldots ,N$. On one hand,

$$\begin{aligned}{} & {} -(N^2-N)\frac{\partial {\bar{\ell }}_N({\varvec{\beta }}^*)}{\partial {\varvec{\beta }}} =-(N^2-N)E\{U_{N}({\varvec{\beta }}^*)|X_1,C_1,\ldots ,X_N,C_N\}\\{} & {} \quad =\sum _{i\ne j}^NE(E[R_{i}R_{j}\Delta _iI\{e_i({\varvec{\beta }}^*)<e_j({\varvec{\beta }}^*)\}|O_i,O_j]|X_i,C_i,X_j,C_j)(X_i-X_j)\\{} & {} \quad =\sum _{i\ne j}^N(X_i-X_j)E[m_1(X_{i},C_i)m_1(X_{j},C_j)I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+\varepsilon _i\le \log (C_i)\}\\{} & {} \qquad \times I\{X_j^{\textsf {T}}{\varvec{\beta }}^*+\varepsilon _i\le \log (C_j)\} m_2(\varepsilon _i)m_2(\varepsilon _j)I(\varepsilon _i<\varepsilon _j)|X_i,C_i,X_j,C_j]\\{} & {} \quad =\sum _{i=1}^N\sum _{j=1}^N\Gamma (X_i,C_i,X_j,C_j)(X_i-X_j), \end{aligned}$$

where

$$\begin{aligned} \Gamma (X_i,C_i,X_j,C_j)= & {} E[m_1(X_{i},C_i)m_1(X_{j},C_j)I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+\varepsilon _i\le \log (C_i)\}\\{} & {} \quad \times I\{X_j^{\textsf {T}}{\varvec{\beta }}^*+\varepsilon _i\le \log (C_j)\} m_2(\varepsilon _i)m_2(\varepsilon _j)I(\varepsilon _i<\varepsilon _j)|X_i,C_i,X_j,C_j]\\= & {} m_1(X_{i},C_i)m_1(X_{j},C_j)\int _{-\infty }^{+\infty }\int _{-\infty }^{+\infty } I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+u\le \log (C_i)\}\\{} & {} \quad \times I\{X_j^{\textsf {T}}{\varvec{\beta }}^*+u\le \log (C_j)\} m_2(u)m_2(v)I(u<v) f_{\varepsilon }(u)f_{\varepsilon }(v)dudv\\= & {} \Gamma (X_j,C_j,X_i,C_i). \end{aligned}$$

On the other hand,

$$\begin{aligned} \sum _{i=1}^N\sum _{j=1}^N\Gamma (X_i,C_i,X_j,C_j)X_i= & {} \sum _{i=1}^N\sum _{j=1}^N\Gamma (X_j,C_j,X_i,C_i)X_j\\= & {} \sum _{i=1}^N\sum _{j=1}^N\Gamma (X_i,C_i,X_j,C_j)X_j. \end{aligned}$$

It follows that $\partial {\bar{\ell }}_N({\varvec{\beta }}^*)/\partial {\varvec{\beta }}=0$ and thus $\partial \ell ({\varvec{\beta }}^*)/\partial {\varvec{\beta }}=E\{\partial {\bar{\ell }}_N({\varvec{\beta }}^*)/\partial {\varvec{\beta }}\}=0.$ In the following, we show that $\ell ({\varvec{\beta }})=E\{{\bar{\ell }}_{N}({\varvec{\beta }})\}$ is a concave function of ${\varvec{\beta }}$. For two matrices A and B, we write $A \le B$ if $B - A$ is a nonnegative-definite matrix. Direct calculations give

$$\begin{aligned}{} & {} -(N^2-N){\bar{\ell }}_N({\varvec{\beta }})=\sum _{i=1}^N\sum _{j=1}^NE[R_{i}R_{j}\Delta _i\{e_i({\varvec{\beta }})-e_j({\varvec{\beta }})\}^{-}|X_i,C_i, X_j,C_j]\\{} & {} \quad =\sum _{i=1}^N\sum _{j=1}^Nm_1(X_{i},C_i)m_1(X_{j},C_j)E[ m_2(\varepsilon _i)m_2(\varepsilon _j) I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+\varepsilon _i\le \log (C_i)\}\\{} & {} \qquad \times \{\varepsilon _i-\varepsilon _j\wedge [\log (C_j)-X_j^{\textsf {T}}{\varvec{\beta }}^*]-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\}^{-}|X_i,C_i, X_j,C_j]\\{} & {} \quad =\sum _{i=1}^N\sum _{j=1}^Nm_1(X_{i},C_i)m_1(X_{j},C_j)\\{} & {} \qquad \times \int _{-\infty }^{+\infty }\int _{-\infty }^{+\infty } m_2(u)m_2(v) I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+u\le \log (C_i)\}\\{} & {} \qquad \times \{u-v\wedge [\log (C_j)-X_j^{\textsf {T}}{\varvec{\beta }}^*]-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\}^{-} f_{\varepsilon }(u)f_{\varepsilon }(v)dudv. \end{aligned}$$

Define $g(v)= m_2(v)f_{\varepsilon }(v)$ and $G(v)=\int _{v}^{+\infty }g(s)ds$. From C2, we have $g(v)>0$ and $G(v)>0$ for all $v\in {\mathbb {R}}$. It follows that

$$\begin{aligned}{} & {} \int _{-\infty }^{+\infty } I\{s\wedge [\log (C_j)-X_j^{\textsf {T}}{\varvec{\beta }}^*]>v\}g(s)ds\nonumber \\{} & {} \quad =I\{\log (C_j)-X_j^{\textsf {T}}{\varvec{\beta }}^*>v\}\int _{-\infty }^{+\infty } I(s>v)g(s)ds\nonumber \\{} & {} \quad =I\{\log (C_j)-X_j^{\textsf {T}}{\varvec{\beta }}^*>v\}G(v). \end{aligned}$$

(A1)

Let

$$\begin{aligned} u_{ij}({\varvec{\beta }})=\log (C_j)-X_j^{\textsf {T}}{{\varvec{\beta }}}^*+(X_i-X_j)^{\textsf {T}}({{\varvec{\beta }}}-{{\varvec{\beta }}}^*). \end{aligned}$$

Utilizing the identity (A1), we can write

$$\begin{aligned}{} & {} -(N^2-N)\frac{\partial {\bar{\ell }}_N({\varvec{\beta }})}{\partial {\varvec{\beta }}}\\{} & {} \quad =\sum _{i\ne j}^N(X_i-X_j)m_1(X_{i},C_i)m_1(X_{j},C_j)\int _{-\infty }^{+\infty }\int _{-\infty }^{+\infty } I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+u\le \log (C_i)\}\\{} & {} \qquad \times m_2(u)m_2(v)I\{u-v\wedge [\log (C_j)-X_j^{\textsf {T}}{\varvec{\beta }}^*]-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)<0\}\\{} & {} \qquad \times f_{\varepsilon }(u)f_{\varepsilon }(v)dudv\\{} & {} \quad =\sum _{i\ne j}^N(X_i-X_j)m_1(X_{i},C_i)m_1(X_{j},C_j)\int _{-\infty }^{+\infty } I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+u\le \log (C_i)\}\\{} & {} \qquad \times I\{\log (C_j)-X_j^{\textsf {T}}{\varvec{\beta }}^*>u-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\}\\{} & {} \qquad \times G\{u-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\} g(u)du\\{} & {} \quad =\sum _{i=1}^N\sum _{j=1}^N(X_i-X_j)m_1(X_{i},C_i)m_1(X_{j},C_j)\\{} & {} \qquad \times \int _{-\infty }^{u_{ij}({{\varvec{\beta }}})}I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+u\le \log (C_i)\} G\{u-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\}g(u)du \end{aligned}$$

and

$$\begin{aligned}{} & {} -(N^2-N)\frac{\partial ^2 {\bar{\ell }}_N({\varvec{\beta }})}{\partial {\varvec{\beta }}\partial {\varvec{\beta }}^{\textsf {T}}}=\sum _{i=1}^N\sum _{j=1}^N(X_i-X_j)^{\otimes 2}m_1(X_{i},C_i)m_1(X_{j},C_j)\\{} & {} \qquad \times \int _{-\infty }^{+\infty } I\{\log (C_j)-X_j^{\textsf {T}}{\varvec{\beta }}^*>u-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\}\\{} & {} \qquad \times I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+u\le \log (C_i)\} g\{u-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\} g(u)du\\{} & {} \qquad +\sum _{i=1}^N\sum _{j=1}^N(X_i-X_j)^{\otimes 2}m_1(X_{i},C_i)m_1(X_{j},C_j)\\{} & {} \qquad \times \biggr [ I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+u\le \log (C_i)\} G\{u-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\}g(u)\biggr ]\biggr |_{u=u_{ij}({{\varvec{\beta }}})}\\{} & {} \quad \ge \sum _{i=1}^N\sum _{j=1}^N(X_i-X_j)^{\otimes 2}m_1(X_{i},C_i)m_1(X_{j},C_j)\\{} & {} \qquad \times \int _{-\infty }^{+\infty } I[\{\log (C_i)-X_i^{\textsf {T}}{{\varvec{\beta }}}^*\}\wedge u_{ij}({{\varvec{\beta }}})>u]\\{} & {} \qquad \times g\{u-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\} g(u)du. \end{aligned}$$

Recall that $S(u|X)=P(\log (C)>u|X)$. Then, for $i\ne j$, we can write

$$\begin{aligned}{} & {} E(I[\{\log (C_i)-X_i^{\textsf {T}}{{\varvec{\beta }}}^*\}\wedge u_{ij}({{\varvec{\beta }}})>u]|X_i,X_j)\\{} & {} \quad =S(X_i^{\textsf {T}}{{\varvec{\beta }}}^*+u|X_i)S\{X_j^{\textsf {T}}{{\varvec{\beta }}}^*-(X_i-X_j)^{\textsf {T}}({{\varvec{\beta }}}-{{\varvec{\beta }}}^*)+u|X_j\}. \end{aligned}$$

Using conditions C2–C3, for $i\ne j$, we have

$$\begin{aligned} & \int _{-\infty }^{+\infty } g(u)g\{u-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\} S(X_i^{\textsf {T}}{{\varvec{\beta }}}^*+u|X_i)\nonumber \\ & \quad \times S\{X_j^{\textsf {T}}{{\varvec{\beta }}}^*-(X_i-X_j)^{\textsf {T}}({{\varvec{\beta }}}-{{\varvec{\beta }}}^*)+u|X_j\}du\ge \varsigma _2^2\varsigma _4>0 \end{aligned}$$

(A2)

with probability one. By conditions C2, C4 and the inequality (A2), we obtain

$$\begin{aligned} E\biggr \{-\frac{\partial ^2 {\bar{\ell }}_N({\varvec{\beta }})}{\partial {\varvec{\beta }}\partial {\varvec{\beta }}^{\textsf {T}}}\biggr |X_1,\ldots ,X_N\biggr \}\ge \varsigma _1^2\varsigma _2^2\varsigma _4\frac{1}{N^2-N}\sum _{i=1}^N\sum _{j=1}^N(X_i-X_j)^{\otimes 2}, \end{aligned}$$

which is a positive definite matrix for large enough N. Notice that

$$\begin{aligned} D({\varvec{\beta }})= & {} -\partial ^2 \ell ({\varvec{\beta }})/\partial {\varvec{\beta }}\partial {\varvec{\beta }}^{\textsf {T}}=E[E\{-\partial ^2 {\bar{\ell }}_N({\varvec{\beta }})/\partial {\varvec{\beta }}\partial {\varvec{\beta }}^{\textsf {T}}|X_1,\ldots ,X_N\}]\\\ge & {} \varsigma _1^2\varsigma _2^2\varsigma _4E\{(X_1-X_2)^{\otimes 2}\}. \end{aligned}$$

This fact implies that the matrix $D({\varvec{\beta }})$ is positive definite for all ${\varvec{\beta }}\in \Theta$. Therefore, $\ell ({\varvec{\beta }})$ is a concave function on $\Theta$. Combining previous results, it follows that ${\varvec{\beta }}^*$ is the unique maximizer of $\ell ({\varvec{\beta }})$ and thus (i) is proved.

Proof of (ii) For each ${\varvec{\beta }}\in \Theta$ and each $(z_1,z_2)$ in ${\mathcal {Z}}\times {\mathcal {Z}}$, we can write $\ell _N({\varvec{\beta }})-\ell ({\varvec{\beta }})={\mathbb {U}}_Nf(\cdot ,\cdot ,{\varvec{\beta }})$, where

$$\begin{aligned} f(Z_i,Z_j;{\varvec{\beta }})=-R_{i}R_{j}\Delta _i\{e_i({\varvec{\beta }})-e_j({\varvec{\beta }})\}^{-}-\ell ({\varvec{\beta }}) \end{aligned}$$

and ${\mathbb {U}}_N$ denotes the random measure putting mass $1/(N^2-N)$ on each pair $(Z_i,Z_j)$, $i\ne j$. Applying the arguments in Sherman (1993, Section 5) to the class $\{f(\cdot ,\cdot ,{\varvec{\beta }}):{\varvec{\beta }}\in \Theta \}$, shows that it is Euclidean for the envelope $|\Psi (Z_1,Z_2)|+E\{|\Psi (Z_1,Z_2)|\}$, where $\Psi (Z_1,Z_2)=|\log (Y_i)-\log (Y_j)|+\Vert X_{i}-X_{j}\Vert \sup _{{{\varvec{\beta }}}\in \Theta }\Vert {\varvec{\beta }}\Vert$ and $E\{|\Psi (Z_1,Z_2)|\}<+\infty$ under condition C4. An application of Corollary 7 in Sherman (1994, Section 6) shows that $\sup _{{{\varvec{\beta }}}\in \Theta }|{\mathbb {U}}_Nf(\cdot ,\cdot ,{\varvec{\beta }})|=O_p(N^{-1/2}).$ This establishes (ii).

Proof of (iii) Note that $\ell _N({\varvec{\beta }})$ is continuous. It follows that $\ell ({\varvec{\beta }})$ is continuous. Hence, consistency is proved. $\square$

Lemma 1

Under conditions C0C4, for every sequence $\kappa _N>0$ with $\kappa _N\rightarrow 0$, we have

$$\begin{aligned} \ell _{N}({\varvec{\beta }})-{\ell }_N({\varvec{\beta }}^*)= & {} -\frac{1}{2}({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}D({\varvec{\beta }}^*)({\varvec{\beta }}-{\varvec{\beta }}^*)+({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}\frac{1}{N}\sum _{i=1}^N{\dot{b}}_{i}({\varvec{\beta }}^*)\\{} & {} \qquad +o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2)+o_p(N^{-1}). \end{aligned}$$

holds uniformly in $\{{\varvec{\beta }}\in \Theta : \Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert \le \kappa _N\}$.

Proof of Lemma 1

Let $\epsilon _N({\varvec{\beta }})=\ell _{N}({\varvec{\beta }})-{\ell }({\varvec{\beta }})$. A standard decomposition of U-statistics gives

$$\begin{aligned} \epsilon _N({\varvec{\beta }})-\epsilon _N({\varvec{\beta }}^*)= & {} \frac{1}{N}\sum _{i=1}^Nb_i({\varvec{\beta }})+\frac{1}{N^2-N}\sum _{i<j}d_{ij}({\varvec{\beta }}), \end{aligned}$$

(A3)

where

$$\begin{aligned} b_i({\varvec{\beta }})= & {} E[a_{ij}({\varvec{\beta }})+a_{ji}({\varvec{\beta }})-2E\{a_{ij}({\varvec{\beta }})\}|Z_i], \\ d_{ij}({\varvec{\beta }})= & {} a_{ij}({\varvec{\beta }})+a_{ji}({\varvec{\beta }})-2E\{a_{ij}({\varvec{\beta }})\}-b_i({\varvec{\beta }})-b_j({\varvec{\beta }}), \\ a_{ij}({\varvec{\beta }})= & {} -R_{i}R_{j}\Delta _i[\{e_i({\varvec{\beta }})-e_j({\varvec{\beta }})\}^{-}-\{e_i({\varvec{\beta }}^*)-e_j({\varvec{\beta }}^*)\}^{-}]. \end{aligned}$$

Note that $E\{b_i({\varvec{\beta }})\} = 0$ for ${\varvec{\beta }}\in \Theta$ and $b_i({\varvec{\beta }}^*)=0$. A Taylor expansion gives

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^Nb_i({\varvec{\beta }})= & {} ({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}\frac{1}{N}\sum _{i=1}^N{\dot{b}}_{i}({\varvec{\beta }}^*)+o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2), \end{aligned}$$

(A4)

where ${\dot{b}}_{i}({\varvec{\beta }})=\partial b_{i}({\varvec{\beta }})/\partial {\varvec{\beta }}$.

Combining the identical subgraph set and Vapnik-Chervonenkis class set arguments of Sherman (1993, Section 5) with Corollary 17 and Corollary 21 in Nolan and Pollard (1987), shows that the class of function $d_{ij}({\varvec{\beta }})$ is Euclidean. The Euclidean property together with Corollary 8 of Sherman (1994) guarantee that, for any sequence $\kappa _N$ of order o(1),

$$\begin{aligned} \sup _{\Vert {{\varvec{\beta }}}-{{\varvec{\beta }}}^*\Vert \le \kappa _N}~\biggr |\frac{1}{N^2-N}\sum _{i<j}d_{ij}({\varvec{\beta }})\biggr |=o_p(N^{-1}). \end{aligned}$$

(A5)

For ${\varvec{\beta }}$ in a neighbourhood of ${\varvec{\beta }}^*$, a Taylor expansion gives

$$\begin{aligned} \ell ({\varvec{\beta }})= & {} \ell ({\varvec{\beta }}^*)+({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}u({\varvec{\beta }}^*)-\frac{1}{2}({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}D({\varvec{\beta }}^*)({\varvec{\beta }}-{\varvec{\beta }}^*)+o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2)\nonumber \\= & {} \ell ({\varvec{\beta }}^*)-\frac{1}{2}({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}D({\varvec{\beta }}^*)({\varvec{\beta }}-{\varvec{\beta }}^*)+o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2), \end{aligned}$$

(A6)

where $u({\varvec{\beta }})=\partial \ell ({\varvec{\beta }})/\partial {\varvec{\beta }}$, $D({\varvec{\beta }})=-\partial ^2 \ell ({\varvec{\beta }})/\partial {\varvec{\beta }}\partial {\varvec{\beta }}^{\textsf {T}}$ and $u({\varvec{\beta }}^*)=0$. Under conditions C2-C4, the matrix $D({\varvec{\beta }})$ is invertible for $\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert \le \kappa _N$.

By (A3), (A4), (A5), and (A6), we have

$$\begin{aligned}{} & {} \ell _{N}({\varvec{\beta }})-{\ell }_N({\varvec{\beta }}^*)=\ell ({\varvec{\beta }})-{\ell }({\varvec{\beta }}^*)+\frac{1}{N}\sum _{i=1}^Nb_i({\varvec{\beta }})+\frac{1}{N^2-N}\sum _{i<j}d_{ij}({\varvec{\beta }})\\{} & {} \quad =-\frac{1}{2}({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}D({\varvec{\beta }}^*)({\varvec{\beta }}-{\varvec{\beta }}^*)+({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}\frac{1}{N}\sum _{i=1}^N{\dot{b}}_{i}({\varvec{\beta }}^*)\\{} & {} \qquad +o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2)+o_p(N^{-1}). \end{aligned}$$

$\square$

Lemma 2

Under conditions C0-C4, for every sequence $\kappa _N>0$ with $\kappa _N\rightarrow 0$, we have

$$\begin{aligned} \sup _{\Vert {{\varvec{\beta }}}-{{\varvec{\beta }}}^*\Vert \le \kappa _N}\biggr \Vert U_N({\varvec{\beta }})-U_N({\varvec{\beta }}^*)+D({\varvec{\beta }}-{\varvec{\beta }}^*)\biggr \Vert =o_p(N^{-1/2})+o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ). \end{aligned}$$

Proof of Lemma 2

We only need to show that, for $l=1,\ldots ,p$,

$$\begin{aligned} \sup _{\Vert {{\varvec{\beta }}}-{{\varvec{\beta }}}^*\Vert \le \kappa _N}\biggr |U_{Nl}({\varvec{\beta }})-U_{Nl}({\varvec{\beta }}^*)-\Gamma _{l}({\varvec{\beta }}-{\varvec{\beta }}^*)\biggr |=o_p(N^{-1/2})+o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ), \end{aligned}$$

where $U_{Nl}({\varvec{\beta }})=\frac{1}{N(N-1)}\sum _{i\ne j}^Nh_l(Z_i,Z_j;{\varvec{\beta }})$, $h_l(Z_i,Z_j;{\varvec{\beta }})=-R_{i}R_{j}\Delta _iI\{e_i({\varvec{\beta }})<e_j({\varvec{\beta }})\}(X_{il}-X_{jl})$ and $\Gamma _{l}=\partial E\{h_l(Z_1,Z_2;{\varvec{\beta }})/\partial {\varvec{\beta }}^{\textsf {T}}\}\bigr |_{{{\varvec{\beta }}}={{\varvec{\beta }}}^*}.$ It is easy to verify that $D=-(\Gamma _{1}^{\textsf {T}},\ldots ,\Gamma _{p}^{\textsf {T}})^{\textsf {T}}$. Let $U_l({\varvec{\beta }})=E\{U_{Nl}({\varvec{\beta }})\}$ and $\epsilon _{Nl}({\varvec{\beta }})=U_{Nl}({\varvec{\beta }})-U_{l}({\varvec{\beta }})$. A standard decomposition of U-statistics gives

$$\begin{aligned} \epsilon _{Nl}({\varvec{\beta }})-\epsilon _{Nl}({\varvec{\beta }}^*)=\frac{1}{N}\sum _{i=1}^Nb_{i,l}({\varvec{\beta }})+\frac{1}{N^2-N}\sum _{i<j}d_{ij,l}({\varvec{\beta }}), \end{aligned}$$

(A7)

where

$$\begin{aligned} b_{i,l}({\varvec{\beta }})= & {} E[a_{ij,l}({\varvec{\beta }})+a_{ji,l}({\varvec{\beta }})-2E\{a_{ij,l}({\varvec{\beta }})\}|Z_i], \\ d_{ij,l}({\varvec{\beta }})= & {} a_{ij,l}({\varvec{\beta }})+a_{ji,l}({\varvec{\beta }})-2E\{a_{ij,l}({\varvec{\beta }})\}-b_{i,l}({\varvec{\beta }})-b_{j,l}({\varvec{\beta }}), \\ a_{ij,l}({\varvec{\beta }})= & {} h_l(Z_i,Z_j;{\varvec{\beta }})-h_l(Z_i,Z_j;{\varvec{\beta }}^*). \end{aligned}$$

Note that $E\{b_{i,l}({\varvec{\beta }})\} = 0$ for ${\varvec{\beta }}\in \Theta _2$ and $b_{i,l}({\varvec{\beta }}^*)=0$. A Taylor expansion gives

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^Nb_{i,l}({\varvec{\beta }})=({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}\frac{1}{N}\sum _{i=1}^N{\dot{b}}_{i,l}({\varvec{\beta }}^*)+o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2), \end{aligned}$$

(A8)

where ${\dot{b}}_{i,l}({\varvec{\beta }})=\partial b_{i,l}({\varvec{\beta }})/\partial {\varvec{\beta }}$ and $N^{-1}\sum _{i=1}^N{\dot{b}}_{i,l}({\varvec{\beta }}^*)=O_p(N^{-1/2}).$ Similar arguments for proving (A5) again guarantee that, for any sequence $\kappa _N$ of order o(1),

$$\begin{aligned} \sup _{\Vert {{\varvec{\beta }}}-{{\varvec{\beta }}}^*\Vert \le \kappa _N}~\biggr |\frac{1}{N^2-N}\sum _{i<j}d_{ij,l}({\varvec{\beta }})\biggr |=o_p(N^{-1}). \end{aligned}$$

(A9)

It is easy to see that

$$\begin{aligned} \epsilon _{Nl}({\varvec{\beta }})-\epsilon _{Nl}({\varvec{\beta }}^*)= & {} U_{Nl}({\varvec{\beta }})-U_{Nl}({\varvec{\beta }}^*)-\{U_{l}({\varvec{\beta }})-U_{l}({\varvec{\beta }}^*)\}\nonumber \\= & {} U_{Nl}({\varvec{\beta }})-U_{Nl}({\varvec{\beta }}^*)-\Gamma _{l}({\varvec{\beta }}-{\varvec{\beta }}^*)+o(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ). \end{aligned}$$

(A10)

Combining (A7), (A8), (A9) and (A10), the desired result follows. $\square$

Proof of Theorem 2

From the proof of Theorem 1, we know that the matrix $D({\varvec{\beta }})$ is positive definite for all ${\varvec{\beta }}\in \Theta$. By Lemma 1, we obtain

$$\begin{aligned} \ell _{N}({\varvec{\beta }})=\ell ({\varvec{\beta }})+\epsilon _N({\varvec{\beta }})= & {} -\frac{1}{2}({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}D({\varvec{\beta }}^*)({\varvec{\beta }}-{\varvec{\beta }}^*)+({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)\\{} & {} \quad +\ell ({\varvec{\beta }}^*)+\epsilon _N({\varvec{\beta }}^*)+o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2)+o_p(N^{-1})\\= & {} \varrho _N({\varvec{\beta }})+\ell ({\varvec{\beta }}^*)+\epsilon _N({\varvec{\beta }}^*)+o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2)+o_p(N^{-1}), \end{aligned}$$

where

$$\begin{aligned} \varrho _N({\varvec{\beta }})= & {} -\frac{1}{2}({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}D({\varvec{\beta }}^*)({\varvec{\beta }}-{\varvec{\beta }}^*)+({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}\left[ \frac{1}{N}\sum _{i=1}^N {\dot{b}}_i({\varvec{\beta }}^*)\right] \\= & {} -\frac{1}{2}\left( D^{1/2}({\varvec{\beta }}^*)\left[ {\varvec{\beta }}-{\varvec{\beta }}^*-D^{-1}({\varvec{\beta }}^*)\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)\right] \right) ^{\otimes 2}\\{} & {} \quad +\frac{1}{2}\left[ \frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)\right] ^{\textsf {T}}D^{-1}({\varvec{\beta }}^*)\left[ \frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)\right] . \end{aligned}$$

Hence, the maximizer of $\varrho _N({\varvec{\beta }})$ is ${\hat{{\varvec{\gamma }}}}={\varvec{\beta }}^*+D^{-1}({\varvec{\beta }}^*)\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*).$ By Theorem 1, ${\hat{{\varvec{\beta }}}}_W={\varvec{\beta }}^*+o_p(1)$. Since ${\hat{{\varvec{\beta }}}}_W$ is the maximizer of $\ell _{N}({\varvec{\beta }})$, we have

$$\begin{aligned}{} & {} 0\le \varrho _N({\hat{{\varvec{\gamma }}}})-\varrho _N({\hat{{\varvec{\beta }}}}_W)\\{} & {} \quad =\{\varrho _N({\hat{{\varvec{\gamma }}}})+\ell ({\varvec{\beta }}^*)+\epsilon _N({\varvec{\beta }}^*)-\ell _{N}({\hat{{\varvec{\gamma }}}})\}\nonumber \\{} & {} \qquad -\{\varrho _N({\hat{{\varvec{\beta }}}}_W)+\ell ({\varvec{\beta }}^*)+\epsilon _N({\varvec{\beta }}^*)-\ell _{N}({\hat{{\varvec{\beta }}}}_W)\}-\{\ell _N({\hat{{\varvec{\beta }}}}_W)-\ell _N({\hat{{\varvec{\gamma }}}})\}\nonumber \\{} & {} \quad \le \{\varrho _N({\hat{{\varvec{\gamma }}}})+\ell ({\varvec{\beta }}^*)+\epsilon _N({\varvec{\beta }}^*)-\ell _{N}({\hat{{\varvec{\gamma }}}})\}\nonumber \\{} & {} \qquad -\{\varrho _N({\hat{{\varvec{\beta }}}}_W)+\ell ({\varvec{\beta }}^*)+\epsilon _N({\varvec{\beta }}^*)-\ell _{N}({\hat{{\varvec{\beta }}}}_W)\}\nonumber \\{} & {} \quad =o_p(\Vert {\hat{{\varvec{\beta }}}}_W-{\varvec{\beta }}^*\Vert ^2)+o_p(\Vert {\hat{{\varvec{\gamma }}}}-{\varvec{\beta }}^*\Vert ^2)+o_p(N^{-1}).\nonumber \end{aligned}$$

(A11)

On the other hand, in view of the expression for $\varrho _N$,

$$\begin{aligned} \varrho _N({\hat{{\varvec{\gamma }}}})-\varrho _N({\hat{{\varvec{\beta }}}}_W)=\frac{1}{2}\left( D^{1/2}({\varvec{\beta }}^*)\left[ {\hat{{\varvec{\beta }}}}_W-{\varvec{\beta }}^*-D^{-1}({\varvec{\beta }}^*)\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)\right] \right) ^{\otimes 2}.\nonumber \end{aligned}$$

Combining (A11) and (A12), we obtain

$$\begin{aligned} {\hat{{\varvec{\beta }}}}_W= & {} {\varvec{\beta }}^*+D^{-1}({\varvec{\beta }}^*)\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)+o_p(\Vert {\hat{{\varvec{\beta }}}}_W-{\varvec{\beta }}^*\Vert )+o_p(\Vert {\hat{{\varvec{\gamma }}}}-{\varvec{\beta }}^*\Vert )\\{} & {} +o_p(N^{-1/2}). \end{aligned}$$

Obviously, ${\hat{{\varvec{\gamma }}}}-{\varvec{\beta }}^*=D^{-1}({\varvec{\beta }}^*)\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)=O_p(N^{-1/2})$. It follows that ${\hat{{\varvec{\beta }}}}_W-{\varvec{\beta }}^*=O_p(N^{-1/2})$ and ${\hat{{\varvec{\beta }}}}_W={\varvec{\beta }}^*+D^{-1}({\varvec{\beta }}^*)\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)+o_p(N^{-1/2}).$ By the multivariate central limit theorem, the proof of Theorem 2 is complete. $\square$

Proof of Theorem 3

Proof of part (1). Let $\alpha _N=N^{-1/2}+\max \{P'_{S,\lambda _N}(|\beta _j^*|):\beta _j^*\ne 0\}$. We first prove that for any $\varepsilon >0$, there exists a large constant C such that

$$\begin{aligned} P\biggr \{\sup _{\Vert {{\varvec{u}}}\Vert =C}Q_{S}({\varvec{\beta }}^*+\alpha _N {\varvec{u}})<Q_{S}({\varvec{\beta }}^*)\biggr \}\ge 1-\varepsilon , \end{aligned}$$

(A12)

where ${\varvec{u}}=(u_1,\ldots ,u_p)^\textsf {T}$. Observing that $P_{S,\lambda _N}(0)=0$, we have

$$\begin{aligned}{} & {} Q_{S}({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-Q_{S}({\varvec{\beta }}^*) \\{} & {} \quad \le \ell _N({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-\ell _N({\varvec{\beta }}^*)-\sum _{j=1}^{s}\{ P_{S,\lambda _N}(|\beta _j^*+\alpha _N u_j|)-P_{S,\lambda _N}(|\beta _j^*|)\},\nonumber \end{aligned}$$

(A13)

where s is the number of components of ${\varvec{\beta }}_1^*$. For any ${\varvec{\beta }}\in \{{\varvec{\beta }}: {\varvec{\beta }}={\varvec{\beta }}^*+\alpha _N{\varvec{u}}, \Vert {\varvec{u}}\Vert =C\}$, by Lemma 1, we get

$$\begin{aligned} \ell _N({\varvec{\beta }})-\ell _N({\varvec{\beta }}^*)= & {} -\frac{1}{2}({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}D({\varvec{\beta }}^*)({\varvec{\beta }}-{\varvec{\beta }}^*)+({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}\frac{1}{N}\sum _{i=1}^N{\dot{b}}_{i}({\varvec{\beta }}^*)\nonumber \\{} & {} \quad +o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2)+o_p(N^{-1}) \nonumber \\= & {} \frac{1}{N}\sum _{i=1}^N{\dot{b}}_{i}^{\textsf {T}}({\varvec{\beta }}^*)\alpha _N {\varvec{u}}-\frac{1}{2}\alpha _N^2 {\varvec{u}}^\textsf {T}D{\varvec{u}}+o_p(\alpha _N^2C^2)+o_p(N^{-1}) \nonumber \\= & {} O_p (\alpha _N^2 C)-\frac{1}{2}\alpha _N^2 {\varvec{u}}^\textsf {T}D{\varvec{u}}+o_p(\alpha _N^2C^2)+o_p(N^{-1}). \end{aligned}$$

(A14)

Combining (A13) with (A14), we obtain

$$\begin{aligned}{} & {} Q_{S}({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-Q_{S}({\varvec{\beta }}^*) \\{} & {} \quad \le O_p (\alpha _N^2 C)-\frac{1}{2}\alpha _N^2 {\varvec{u}}^\textsf {T}D{\varvec{u}}-\sum _{j=1}^{s}\{ \alpha _N u_j P'_{S,\lambda _N}(|\beta _j^*|)\text{ sgn }(\beta _j^*)+ \frac{1}{2}\alpha _N^2 u_j^2 P''_{S,\lambda _N}(|\beta _j^*|)\nonumber \\{} & {} \qquad +o_p(\alpha _N^2u_j^2)\}+o_p(\alpha _N^2C^2)+o_p(N^{-1}).\nonumber \end{aligned}$$

(A15)

Note that the third term in (A15) is bounded by

$$\begin{aligned} \alpha _N^2 C + \frac{1}{2}\alpha _N^2 C^2 \max \{P''_{S,\lambda _N}(|\beta _j^*|):\beta _j^*\ne 0\}. \end{aligned}$$

(A16)

From Fan and Li (2001), the SCAD penalty function $P_{S,\lambda _N}(\cdot )$ satisfies

$$\begin{aligned} \lim _{N\rightarrow \infty }\max \{|P_{S,\lambda _N}''(|\beta _j^*|)|: \beta _j^*\ne 0\}=0, \liminf _{N\rightarrow \infty }\liminf _{\theta \rightarrow 0_+}P_{S,\lambda _N}'(\theta )/\lambda _N>0, \end{aligned}$$

(A17)

where $\lambda _N \rightarrow 0$ and $N^{1/2} \lambda _N \rightarrow \infty$ as $N\rightarrow \infty$.

(A17) implies that the term in (A16) is further dominated by the second term of (A15). By choosing a sufficiently large C, the second term of (A15) also dominates the first term. Hence, by choosing a sufficiently large C, (A12) is proved.

(A12) guarantees that with probability at least $1-\varepsilon$, there exists a local maximum in the ball ${\varvec{\beta }}\in \{{\varvec{\beta }}: {\varvec{\beta }}={\varvec{\beta }}^*+\alpha _N{\varvec{u}}, \Vert {\varvec{u}}\Vert \le C\}$. Let ${\hat{{\varvec{\beta }}}}_{S}$ denote the local maximizer in the ball ${\varvec{\beta }}\in \{{\varvec{\beta }}: {\varvec{\beta }}={\varvec{\beta }}^*+\alpha _N{\varvec{u}}, \Vert {\varvec{u}}\Vert \le C\}$, it follows that $\Vert {\hat{{\varvec{\beta }}}}_{S}-{\varvec{\beta }}^*\Vert =O_p(\alpha _N)$.

Next, we show that

$$\begin{aligned} Q_{S}(({\varvec{\beta }}_1^\textsf {T},0^\textsf {T})^\textsf {T})= & {} \max _{\Vert {{\varvec{\beta }}}_2\Vert \le CN^{-1/2}} Q_{S}(({\varvec{\beta }}_1^\textsf {T},{\varvec{\beta }}_2^\textsf {T})^\textsf {T}) \end{aligned}$$

(A18)

for any given ${\varvec{\beta }}_1$ satisfying $\Vert {\varvec{\beta }}_1-{\varvec{\beta }}_1^*\Vert =O_p(N^{-1/2})$ and any constant C. Note that (A18) implies that ${\hat{{\varvec{\beta }}}}_{2,S}=0$. To prove (A18), we only need to prove that for some $\varepsilon _N=CN^{-1/2}$ and $j=s+1,\ldots , p$, $\partial Q_{S}({\varvec{\beta }})/\partial \beta _j<0$ for $0< \beta _j< \varepsilon _N$, $\partial Q_{S}({\varvec{\beta }})/\partial \beta _j>0$ for $-\varepsilon _N< \beta _j<0$.

Using Lemma 2, we obtain

$$\begin{aligned} \sup _{\Vert {{\varvec{\beta }}}-{{\varvec{\beta }}}^*\Vert \le \varepsilon _N}\left\| \frac{\partial \ell _N({\varvec{\beta }})}{\partial {\varvec{\beta }}}\right\|= & {} O_p(N^{-1/2}). \end{aligned}$$

(A19)

Then, we can write

$$\begin{aligned} \frac{\partial Q_{S}({\varvec{\beta }})}{\partial \beta _j}= & {} \frac{\partial \ell _N({\varvec{\beta }})}{\partial \beta _j}- P'_{S,\lambda _N}(|\beta _j|) \text{ sgn }(\beta _j)\\= & {} - \lambda _N \left\{ \text{ sgn }(\beta _j)P'_{S,\lambda _N}(|\beta _j|)/\lambda _N+O_p(N^{-1/2}/\lambda _N)\right\} . \end{aligned}$$

From (A17), $\lim \inf _{N\rightarrow \infty } \lim \inf _{\theta \rightarrow 0+} P'_{S,\lambda _N}(\theta )/\lambda _N >0$ and $N^{-1/2}/\lambda _N\rightarrow 0$. These facts indicate that the sign of the derivative $\partial Q_{S}({\varvec{\beta }})/\partial \beta _j$ is completely determined by that of $\beta _j$. The desired result follows.

Proof of part (2). Consider $Q_{S}(({\varvec{\beta }}_1^\textsf {T},0^\textsf {T})^\textsf {T})$ as a function of ${\varvec{\beta }}_1$. Then there exists a $N^{1/2}$-consistent local maximizer of $Q_{S}(({\varvec{\beta }}_1^\textsf {T},0^\textsf {T})^\textsf {T})$. Let ${\hat{{\varvec{\beta }}}}_{1,S}$ denote the local maximizer, which is a solution to the following likelihood equations $\frac{\partial Q_{S}(({{\varvec{\beta }}}_1^\textsf {T},0^\textsf {T})^\textsf {T})}{\partial {{\varvec{\beta }}}_1}=0.$ By Lemma 2 and a Taylor expansion, we get

$$\begin{aligned} 0= & {} \frac{\partial Q_{S}(({\hat{{\varvec{\beta }}}}_{1,S}^\textsf {T},0^\textsf {T})^\textsf {T})}{\partial {\varvec{\beta }}_1}=\frac{\partial \ell _{N}(({\hat{{\varvec{\beta }}}}_{1,S}^\textsf {T},0^\textsf {T})^\textsf {T})}{\partial {\varvec{\beta }}_1}-\frac{\partial \sum _{j=1}^sP_{S,\lambda _N}(|{\hat{\beta }}_{1,S,j}|)}{\partial {\varvec{\beta }}_1}\\= & {} U_{1,N}(({\varvec{\beta }}_1^{*\textsf {T}},0^\textsf {T}))- F_1-D_{11}({\hat{{\varvec{\beta }}}}_{1,S}-{\varvec{\beta }}_1^*)\\{} & {} -(E_1+o_p(1))({\hat{{\varvec{\beta }}}}_{1,S}-{\varvec{\beta }}_1^*)+o_p(N^{-1})+o_p(\Vert {\hat{{\varvec{\beta }}}}_{1,S}-{\varvec{\beta }}_1^*\Vert ), \end{aligned}$$

where $U_{1,N}({\varvec{\beta }})=\partial \ell _{N}({\varvec{\beta }})/\partial {\varvec{\beta }}_1$, and $E_1$ and $F_1$ are defined in theorem 3. An application of the Theorem 1 of Kowalski and Tu (2008, page 158) yields

$$\begin{aligned} N^{1/2} U_{1,N}(({\varvec{\beta }}_1^{*\textsf {T}},0^\textsf {T})) {\mathop {\longrightarrow }\limits ^{d}}N(0,V_{11}). \end{aligned}$$

By Slutsky’s theorem and the central limit theorem, it follows that

$$\begin{aligned}{} & {} (D_{11}+E_1)N^{1/2}\{{\hat{{\varvec{\beta }}}}_{1,S}-{\varvec{\beta }}^*_1+(D_{11}+E_1)^{-1}F_1\}\\{} & {} \quad = N^{1/2} U_{1,N}(({\varvec{\beta }}_1^{*\textsf {T}},0^\textsf {T}))+o_p(1){\mathop {\longrightarrow }\limits ^{d}}N(0, V_{11}). \end{aligned}$$

This completes the proof. $\square$

Proof of Theorem 4

Proof of part (1). Let $\alpha _N=N^{-1/2}$. We first prove that for any given $\varepsilon >0$, there exists a large constant C such that

$$\begin{aligned} P\left\{ \sup _{\Vert {{\varvec{u}}}\Vert =C}Q_{A}({\varvec{\beta }}^*+\alpha _N {\varvec{u}})<Q_{A}({\varvec{\beta }}^*)\right\} \ge 1-\varepsilon , \end{aligned}$$

(A20)

where ${\varvec{u}}=(u_1,\ldots ,u_p)^\textsf {T}$. By the definition of $Q_{A}({\varvec{\beta }})$, we have

$$\begin{aligned}{} & {} Q_{A}({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-Q_{A}({\varvec{\beta }}^*) \\{} & {} \quad \le \ell _N({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-\ell _N({\varvec{\beta }}^*)-\sum _{j=1}^{s}\{\lambda _N w_j|\beta _j^*+\alpha _N u_j|-\lambda _N w_j|\beta _j^*|\} \\{} & {} \quad = \{\ell _N({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-\ell _N({\varvec{\beta }}^*)\}-\lambda _N \sum _{j=1}^{s}w_j (|\beta _j^*+\alpha _N u_j|-|\beta _j^*|) \\{} & {} \quad \le \{\ell _N({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-\ell _N({\varvec{\beta }}^*)\}+\lambda _N \alpha _N \sum _{j=1}^{s}w_j |u_j|. \\ \end{aligned}$$

Thus, it is sufficient to show that $\{\ell _N({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-\ell _N({\varvec{\beta }}^*)\}+\lambda _N \alpha _N \sum _{j=1}^{s}w_j |u_j|\le 0$ for a sufficiently large C with probability approaching one.

From (A14), we get

$$\begin{aligned} \ell _N({\varvec{\beta }})-\ell _N({\varvec{\beta }}^*)= -\frac{1}{2}\alpha _N^2 {\varvec{u}}^\textsf {T}D{\varvec{u}}+O_p (\alpha _N^2 C)+o_p(\alpha _N^2C^2)+o_p(N^{-1}), \end{aligned}$$

(A21)

for any ${\varvec{\beta }}\in \{{\varvec{\beta }}: {\varvec{\beta }}={\varvec{\beta }}^*+\alpha _N{\varvec{u}}, \Vert {\varvec{u}}\Vert =C\}$. Note that D is a positive definite matrix. Then $\frac{1}{2}\alpha _N^2 {\varvec{u}}^\textsf {T}D{\varvec{u}}=O_p(\alpha _N^2C^2)$. Therefore, for a sufficiently large C, the first term dominates the other terms on the right hand side of the equation (A21).

Moreover, direct applications of Taylor’s expansion lead to

$$\begin{aligned} w_j= & {} \frac{1}{|{\hat{\beta }}_{W,j}|^\gamma }=\frac{1}{|\beta _j^*|^\gamma }-\frac{\gamma \text{ sgn }(\beta _j^*)}{|\beta _j^*|^{\gamma +1}}({\hat{\beta }}_{W,j}-\beta _j^*)+o_p (|{\hat{\beta }}_{W,j}-\beta _j^*|)\\= & {} \frac{1}{|\beta _j^*|^\gamma }+O_p (N^{-1/2}). \end{aligned}$$

Based on this fact, we obtain

$$\begin{aligned} \lambda _N \alpha _N \sum _{j=1}^{s}w_j |u_j|= & {} N^{-1/2}\lambda _N \sum _{j=1}^{s}\left[ \frac{|u_j|}{|\beta _j^*|}+N^{-1/2}|u_j|O_p (1)\right] \\\le & {} CN^{-1/2}\lambda _N O_p (1)= CN^{-1} (N^{1/2}\lambda _N) O_p (1)=o_p(\alpha _N^2C).\nonumber \end{aligned}$$

(A22)

Combining (A21) and (A22), it follows that

$$\begin{aligned} Q_{A}({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-Q_{A}({\varvec{\beta }}^*) \le \{\ell _N({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-\ell _N({\varvec{\beta }}^*)\}+\lambda _N \alpha _N \sum _{j=1}^{s}w_j |u_j| \le 0 \end{aligned}$$

for a sufficiently large C. Thus (A20) is proved and there exists a local maximizer ${\hat{{\varvec{\beta }}}}_{A}$ such that $\Vert {\hat{{\varvec{\beta }}}}_{A}-{\varvec{\beta }}^*\Vert =O_p(\alpha _N)$.

Next, we show that show that

$$\begin{aligned} Q_{A}(({\varvec{\beta }}_1^\textsf {T},0^\textsf {T})^\textsf {T})= & {} \max _{\Vert {{\varvec{\beta }}}_2\Vert \le CN^{-1/2}} Q_{A}(({\varvec{\beta }}_1^\textsf {T},{\varvec{\beta }}_2^\textsf {T})^\textsf {T}) \end{aligned}$$

(A23)

for any given ${\varvec{\beta }}_1$ satisfying $\Vert {\varvec{\beta }}_1-{\varvec{\beta }}_1^*\Vert =O_p(N^{-1/2})$ and any constant C. By (A19), we can write

$$\begin{aligned} \frac{\partial Q_{A}({\varvec{\beta }})}{\partial \beta _j}= & {} \frac{\partial \ell _N({\varvec{\beta }})}{\partial \beta _j}-\lambda _N w_j \text{ sgn }(\beta _j)= -\lambda _N N^\frac{\gamma }{2} \frac{\text{ sgn }(\beta _j) }{ |N^{1/2}{\hat{\beta }}_{W,j}|^\gamma }+O_p (N^{-1/2}) \\= & {} N^{-1/2}\left[ O_p (1)-(N^\frac{\gamma +1}{2} \lambda _N)\frac{\text{ sgn }(\beta _j)}{|O_p(1)|} \right] ,\ \ j=s+1,\ldots ,p. \end{aligned}$$

From the condition of Theorem 4, $N^{\frac{\gamma +1}{2}}\lambda _N \rightarrow \infty$ as $N\rightarrow \infty$. Then, $-\beta _j\partial Q_{A}({\varvec{\beta }})/\partial \beta _j>0$ with probability approaching on as $N\rightarrow \infty$. Consequently, (A23) is proved and thus ${\hat{{\varvec{\beta }}}}_{2,A}=0$.

Proof of part (2). Let ${\hat{{\varvec{\beta }}}}_{1,A}$ denote the local maximizer of $Q_{A}(({\varvec{\beta }}_1^\textsf {T},0^\textsf {T})^\textsf {T})$, which is a function of ${\varvec{\beta }}_1$. Then, ${\hat{{\varvec{\beta }}}}_{1,A}$ is $N^{1/2}$-consistent and we have the following asymptotic expression

$$\begin{aligned} 0= & {} \frac{\partial Q_{A}(({\hat{{\varvec{\beta }}}}_{1,A}^\textsf {T},0^\textsf {T})^\textsf {T})}{\partial {\varvec{\beta }}_1}\\= & {} \frac{\partial \ell _{N}(({\hat{{\varvec{\beta }}}}_{1,A}^\textsf {T},0^\textsf {T})^\textsf {T})}{\partial {\varvec{\beta }}_1} - \lambda _N \left( w_1 \text{ sgn }({\hat{\beta }}_{1,A,1}),\ldots , w_s \text{ sgn }({\hat{\beta }}_{1,A,s})\right) ^\textsf {T}\\= & {} U_{1,N}(({\varvec{\beta }}_1^{*\textsf {T}},0^\textsf {T}))-D_{11}({\hat{{\varvec{\beta }}}}_{1,A}-{\varvec{\beta }}_1^*)+o_p(N^{-1})+o_p(\Vert {\hat{{\varvec{\beta }}}}_{1,A}-{\varvec{\beta }}_1^*\Vert )\\{} & {} - \lambda _N \left( w_1 \text{ sgn }({\hat{\beta }}_{1,A,1}),\ldots , w_s \text{ sgn }({\hat{\beta }}_{1,A,s})\right) ^\textsf {T}. \end{aligned}$$

By Theorem 1 of Kowalski and Tu (2008, page 158), we obtain

$$\begin{aligned} N^{1/2} U_{1,N}(({\varvec{\beta }}_1^{*\textsf {T}},0^\textsf {T})) {\mathop {\longrightarrow }\limits ^{d}}N(0,V_{11}). \end{aligned}$$

Since $N^{1/2}\lambda _N=o_p(1)$, the desired result follows. $\square$

Appendix B

In this appendix, we give algorithms for maximizing $Q_{\varpi ,\lambda }({\varvec{\beta }})$ in (5). Efficient algorithms for maximizing penalized likelihood include the local quadratic approximation (LQA) algorithm (Fan & Li, 2001), the local linear approximation (LLA) algorithm (Zou & Li, 2008) and the coordinate optimization algorithm (Fu, 1998; Fan & Lv, 2011). Here, we adopt the LLA algorithm. Define

$$\begin{aligned} Q({\varvec{\beta }}, {\varvec{\alpha }})=\ell _{N}({\varvec{\beta }}) -\sum _{k=1}^p \alpha _k |\beta _k|, \end{aligned}$$

(B.1)

where ${\varvec{\alpha }}=(\alpha _1,\ldots ,\alpha _p)^\textsf {T}$. The LLA algorithm can be summarized as follows:

1.:

Initialize ${\hat{{\varvec{\beta }}}}^{(0)}=({\hat{\beta }}_1^{(0)},\ldots ,{\hat{\beta }}_p^{(0)})$, and compute the adaptive weight

$$\begin{aligned} {\hat{{\varvec{\alpha }}}}^{(0)}= & {} ({\hat{\alpha }}_1^{(0)},\ldots ,{\hat{\alpha }}_p^{(0)})^\textsf {T}\\= & {} (P_\lambda '(|{\hat{\beta }}_1^{(0)}|),\ldots ,P_\lambda '(|{\hat{\beta }}_p^{(0)}|))^\textsf {T}; \end{aligned}$$

2.:

Compute

$$\begin{aligned} {\hat{{\varvec{\beta }}}}^{(m)}=({\hat{\beta }}_1^{(m)},\ldots ,{\hat{\beta }}_p^{(m)})^\textsf {T} =\arg \max _{{{\varvec{\beta }}}} Q ({\varvec{\beta }}, {\hat{{\varvec{\alpha }}}}^{(m-1)}), \end{aligned}$$

where $Q({\varvec{\beta }}, {\varvec{\alpha }})$ is defined in (B.1);

3.:

Update the adaptive weight vector

$$\begin{aligned} {\hat{{\varvec{\alpha }}}}^{(m)}=({\hat{\alpha }}_1^{(m)},\ldots ,{\hat{\alpha }}_p^{(m)})^\textsf {T}, \end{aligned}$$

where ${\hat{\alpha }}_j^{(m)}=P_\lambda '(|{\hat{\beta }}_j^{(m)}|)$, $j=1,\ldots ,p$;

4.:

Repeat steps (2)–(3) till convergence.

For fixed ${\varvec{\alpha }}$, to maximize $Q ({\varvec{\beta }}, {\varvec{\alpha }})$ with respect to ${\varvec{\beta }}$ in step (2), we employ the idea of the coordinate optimization algorithm. In a coordinate optimization algorithm, one maximizes one coordinate at a time with successive displacements. Let $X_{i,-k}=(X_{i,1},\ldots ,X_{i,k-1},X_{i,k+1},\ldots ,X_{i,p})^\textsf {T}$ and $X_{i}=(X_{i,1},\ldots ,X_{i,p})^\textsf {T}$. Define

$$\begin{aligned}{} & {} \ell _{k}(\gamma |{\varvec{\zeta }},{\varvec{\eta }},{\varvec{\alpha }})=Q(({\varvec{\zeta }}^\textsf {T},\gamma ,{\varvec{\eta }}^\textsf {T})^\textsf {T}, {\varvec{\alpha }})\\{} & {} \quad =-\frac{1}{N^2-N}\sum _{i\ne j}^NR_{i}R_{j}\Delta _i (u_{ij,k}-v_{ij,k}\gamma )^{-} - \alpha _k|\gamma |-\sum _{l=1}^{k-1}\alpha _l|\zeta _{l}|-\sum _{l=k+1}^{p}\alpha _l|\eta _{l-k}|, \end{aligned}$$

where $u_{ij,k}=\log (Y_i)-\log (Y_j)-({\varvec{\zeta }}^\textsf {T},{\varvec{\eta }}^\textsf {T})(X_{i,-k}-X_{j,-k})^\textsf {T},$ $v_{ij,k}= X_{i,k}-X_{j,k}$, ${\varvec{\zeta }}^\textsf {T}=(\zeta _1,\ldots ,\zeta _{k-1})$, and ${\varvec{\eta }}^\textsf {T}=(\eta _1,\ldots ,\zeta _{p-k})$. Note that we set ${\varvec{\zeta }}=\emptyset$ if $k=1$ and ${\varvec{\eta }}=\emptyset$ if $k=p$. Differentiating $\ell _{k}(\gamma |{\varvec{\zeta }},{\varvec{\eta }},{\varvec{\alpha }})$ with respect to $\gamma$, we obtain the following estimating equation

$$\begin{aligned} \frac{1}{N^2-N}\sum _{i\ne j}^NR_{i}R_{j}\Delta _i v_{ij,k}I( u_{ij,k}-v_{ij,k}\gamma<0) + \alpha _k =2\alpha _k I(\gamma <0), \end{aligned}$$

which is equivalent to

$$\begin{aligned}{} & {} \sum _{i\ne j}^NR_{i}R_{j}\Delta _i |v_{ij,k}|I\left( \gamma< \frac{u_{ij,k}}{v_{ij,k}}\right) + 2(N^2-N)\alpha _k I(\gamma <0)\\{} & {} \quad =(N^2-N)\alpha _k+ \frac{1}{2 }\sum _{i\ne j}^NR_{i}R_{j}\Delta _i (|v_{ij,k}|+v_{ij,k}). \end{aligned}$$

Let $b_{00,k}=0$, $b_{ij,k}=u_{ij,k}/v_{ij,k}$, $w_{00,k}=2(N^2-N)\alpha _k$, $w_{ij,k}=R_{i}R_{j}\Delta _i |v_{ij,k}|$, $i,j=1,\ldots ,N$. Define

$$\begin{aligned} \{(b_{ij,k},w_{ij,k}):(i,j)\in {\mathscr {S}}\}=\{(b_{l,k},w_{l,k}):b_{1,k}<\cdots <b_{L,k},\ L=|{\mathscr {S}}|\}, \end{aligned}$$

where ${\mathscr {S}}=\{(0,0)\}\cup \{(i,j):v_{ij,k}\ne 0,\ i,j=1,\ldots ,N\}.$ Then, we can write

$$\begin{aligned} {\hat{\gamma }}_k({\varvec{\zeta }},{\varvec{\eta }},{\varvec{\alpha }})=\arg \max _{\gamma \in {\mathbb {R}}}\ell _{k}(\gamma |{\varvec{\zeta }},{\varvec{\eta }},{\varvec{\alpha }})=b_{{\hat{s}}_k}, \end{aligned}$$

(B.2)

where

$$\begin{aligned} {\hat{s}}_k= & {} \min \biggr \{s: \sum _{l=1}^s w_{l,k} > (N^2-N)\alpha _k+ \frac{1}{2}\sum _{i\ne j}^NR_{i}R_{j}\Delta _i (|v_{ij,k}|+v_{ij,k}),\\{} & {} \ \ s=1,\ldots ,L \biggr \}. \end{aligned}$$

To this end, the proposed coordinate optimization algorithm for maximizing $Q ({\varvec{\beta }}, {\varvec{\alpha }})$ with respect to ${\varvec{\beta }}$ in step (2) of the LLA algorithm is given by

(a):

Set initial value ${\varvec{\beta }}^{(0)}=(\beta _1^{(0)},\ldots ,\beta _p^{(0)})^\textsf {T}$;

(b):

Given ${\varvec{\beta }}^{(m)}=(\beta _1^{(m)},\ldots ,\beta _p^{(m)})^\textsf {T}$, for $k=1,\ldots ,p$, compute

$$\begin{aligned} \beta _k^{(m+1)}={\hat{\gamma }}_k((\beta _1^{(m+1)},\ldots ,\beta _{k-1}^{(m+1)})^\textsf {T},(\beta _{k+1}^{(m)},\ldots ,\beta _p^{(m)})^\textsf {T},{\varvec{\alpha }}), \end{aligned}$$

where ${\hat{\gamma }}_k({\varvec{\zeta }},{\varvec{\eta }},{\varvec{\alpha }})$ is defined in (B.2).

Then, set ${\varvec{\beta }}^{(m+1)}=(\beta _1^{(m+1)},\ldots ,\beta _p^{(m+1)})^\textsf {T};$

(c):

Repeat step (b) till $\Vert {\varvec{\beta }}^{(m+1)}-{\varvec{\beta }}^{(m)}\Vert <10^{-6}$.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, T., Yuan, X. & Sun, L. Variable selection for semiparametric accelerated failure time models with nonignorable missing data. J. Korean Stat. Soc. 53, 100–131 (2024). https://doi.org/10.1007/s42952-023-00238-z

Download citation

Received: 18 April 2023
Accepted: 12 October 2023
Published: 19 November 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s42952-023-00238-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable selection for semiparametric accelerated failure time models with nonignorable missing data

Abstract

Access this article

Similar content being viewed by others

Semiparametric Bayesian inference for accelerated failure time models with errors-in-covariates and doubly censored data

Weighted Least Squares Method for the Accelerated Failure Time Model with Auxiliary Covariates

Adaptive Penalized Weighted Least Absolute Deviations Estimation for the Accelerated Failure Time Model

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 69 KB)

Appendices

Appendix A

Proof of Theorem 1

Lemma 1

Proof of Lemma 1

Lemma 2

Proof of Lemma 2

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Appendix B

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Variable selection for semiparametric accelerated failure time models with nonignorable missing data

Abstract

Access this article

Similar content being viewed by others

Semiparametric Bayesian inference for accelerated failure time models with errors-in-covariates and doubly censored data

Weighted Least Squares Method for the Accelerated Failure Time Model with Auxiliary Covariates

Adaptive Penalized Weighted Least Absolute Deviations Estimation for the Accelerated Failure Time Model

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 69 KB)

Appendices

Appendix A

Proof of Theorem 1

Lemma 1

Proof of Lemma 1

Lemma 2

Proof of Lemma 2

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Appendix B

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation