Skip to main content
Log in

Variable selection for semiparametric accelerated failure time models with nonignorable missing data

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

The regularization approach for variable selection was well developed for semiparametric accelerated failure time (AFT) models, where the response variable is right censored. In the presence of missing data, this approach needs to be tailored to different missing data mechanisms. In this paper, we propose a flexible and generally applicable missing data mechanism for AFT models, which contains both ignorable and nonignorable missing data mechanism assumptions. We propose weighted rank (WR) estimators and corresponding penalized estimators of regression parameters under this missing data mechanism. An advantage of the WR estimators and corresponding penalized estimators is that they do not require specifying a missing data model for the proposed missing data mechanism. The theoretical properties of the WR and corresponding penalized estimators are established. Comprehensive simulation studies and a real data application further demonstrate the merits of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data availability

Data will be made available on request.

References

  • Amemiya, T. (1985). Advanced econometrics. Harvard University Press.

    Google Scholar 

  • Buckley, J., & James, I. (1979). Linear regression with censored data. Biometrika, 66(3), 429–436.

    Article  Google Scholar 

  • Cai, T., Huang, J., & Tian, L. (2009). Regularized estimation for the accelerated failure time model. Biometrics, 65(2), 394–404.

    Article  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  • Ding, Y., & Nan, B. (2011). A sieve M-theorem for bundled parameters in semiparametric models, with application to the efficient estimation in a linear model for censored data. The Annals of Statistics, 39(6), 3032–3061.

    Article  MathSciNet  Google Scholar 

  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.

    Article  MathSciNet  Google Scholar 

  • Fan, J., & Lv, J. (2011). Nonconcave penalized likelihood with NP-dimensionality. IEEE Transactions on Information Theory, 57(8), 5467–5484.

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  • Fleming, T., & Harrington, D. (1991). Counting processes and survival analysis. Wiley.

    Google Scholar 

  • Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1–22.

    Article  PubMed  PubMed Central  Google Scholar 

  • Fu, W. J. (1998). Penalized regression: The bridge versus the LASSO. Journal of Computational and Graphical Statistics, 7, 397–416.

    MathSciNet  Google Scholar 

  • Fygenson, M., & Ritov, Y. (1994). Monotone estimating equations for censored data. The Annals of Statistics, 22, 732–746.

    Article  MathSciNet  Google Scholar 

  • Huang, J., & Ma, S. (2010). Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Analysis, 16(2), 176–195.

    Article  MathSciNet  PubMed  Google Scholar 

  • Huang, J., Ma, S., & Xie, H. (2006). Regularized estimation in the accelerated failure time model with high-dimensional covariates. Biometrics, 62(3), 813–820.

    Article  MathSciNet  PubMed  Google Scholar 

  • Jin, Z., Lin, D. Y., Wei, L. J., & Ying, Z. (2003). Rank-based inference for the accelerated failure time model. Biometrika, 90, 341–353.

    Article  MathSciNet  Google Scholar 

  • Jin, Z., Lin, D. Y., & Ying, Z. (2006). Least-squares regression with censored data. Biometrika, 93, 147–161.

    Article  MathSciNet  Google Scholar 

  • Jin, Z., Ying, Z., & Wei, L. J. (2001). A simple resampling method by perturbing the minimand. Biometrika, 88, 381–390.

    Article  MathSciNet  Google Scholar 

  • Johnson, L. M., & Strawderman, R. L. (2009). Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data. Biometrika, 96, 577–590.

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  • Jones, M. P. (1997). A class of semiparametric regressions for the accelerated failure time model. Biometrika, 84, 73–84.

    Article  MathSciNet  Google Scholar 

  • Kalbeisch, J. D. (1978). Likelihood methods and nonparametric tests. Journal of the American Statistical Association, 73, 167–170.

    Article  MathSciNet  Google Scholar 

  • Kowalski, J., & Tu, X. M. (2008). Modern applied U-statistics. Wiley.

    Google Scholar 

  • Lai, T. L., & Ying, Z. (1991). Rank regression methods for left-truncated and right-censored data. The Annals of Statistics, 19, 531–556.

    Article  MathSciNet  Google Scholar 

  • Lai, T. L., & Ying, Z. (1991). Large-sample theory of a modified Buckley-James estimator for regression analysis with censored data. The Annals of Statistics, 19, 1370–1402.

    Article  MathSciNet  Google Scholar 

  • Liang, K.-Y., & Qin, J. (2000). Regression analysis under non-standard situations: A pairwise pseudolikelihood approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(4), 773–786.

    Article  MathSciNet  Google Scholar 

  • Lin, Y., & Chen, K. (2013). Efficient estimation of the censored linear regression model. Biometrika, 100(2), 525–530.

    Article  MathSciNet  Google Scholar 

  • Lin, D. Y., & Ying, Z. (1995). Semiparametric inference for the accelerated life model with time-dependent covariates. Journal of Statistical Planning and Inference, 44, 47–63.

    Article  MathSciNet  Google Scholar 

  • Liu, T., Yuan, X., & Sun, J. (2021). Weighted rank estimation for nonparametric transformation models with nonignorable missing data. Computational Statistics and Data Analysis, 153, 107061.

    Article  MathSciNet  Google Scholar 

  • Miller, R. G. (1976). Least squares regression with censored data. Biometrika, 63(3), 449–464.

    Article  MathSciNet  Google Scholar 

  • Nan, B., Kalbfleisch, J. D., & Yu, M. (2009). Asymptotic theory for the semiparametric accelerated failure time model with missing data. The Annals of Statistics, 37, 2351–2376.

    Article  MathSciNet  Google Scholar 

  • Nolan, D., & Pollard, D. (1987). U-processes: Rates of convergence. The Annals of Statistics, 15, 780–799.

    Article  MathSciNet  Google Scholar 

  • Prentice, R. L. (1978). Linear rank tests with right-censored data. Biometrika, 65, 167–179.

    Article  MathSciNet  Google Scholar 

  • Ritov, Y. (1990). Estimation in a linear regression model with censored data. The Annals of Statistics, 18(1), 303–328.

    Article  MathSciNet  Google Scholar 

  • Sherman, R. (1993). The limiting distribution of the maximum rank correlation estimator. Econometrica, 61, 123–137.

    Article  MathSciNet  Google Scholar 

  • Sherman, R. (1994). Maximal inequalities for degenerate U-processes with applications to optimization estimators. The Annals of Statistics, 22, 439–459.

    Article  MathSciNet  Google Scholar 

  • Steingrimsson, J. A., & Strawderman, R. L. (2017). Estimation in the semiparametric accelerated failure time model with missing covariates: Improving efficiency through augmentation. Journal of the American Statistical Association, 112, 1221–1235.

    Article  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 58, 267–288.

    MathSciNet  Google Scholar 

  • Tsiatis, A. A. (1990). Estimating regression parameters using linear rank tests for censored data. The Annals of Statistics, 18(1), 354–372.

    Article  MathSciNet  Google Scholar 

  • Wang, H., & Leng, C. (2007). Unified Lasso estimation via least squares approximation. Journal of the American Statistical Association, 102(479), 1039–1048.

    Article  MathSciNet  CAS  Google Scholar 

  • Wang, H., Li, R., & Tsai, C.-L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika, 94(3), 553–568.

    Article  MathSciNet  PubMed  Google Scholar 

  • Wang, X., & Song, L. (2011). Adaptive lasso variable selection for the accelerated failure models. Communications in Statistics - Theory and Methods, 40(24), 4372–4386.

    Article  MathSciNet  Google Scholar 

  • Wei, L. J., Ying, Z., & Lin, D. Y. (1990). Linear regression analysis of censored survival data based on rank tests. Biometrika, 11, 845–851.

    Article  MathSciNet  Google Scholar 

  • Xu, J., Leng, C., & Ying, Z. (2010). Rank-based variable selection with censored data. Statistics and Computing, 20, 165–176.

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  • Yang, S. (1997). Extended weighted log-rank estimating functions in censored regression. Journal of the American Statistical Association, 92, 977–984.

    Article  MathSciNet  Google Scholar 

  • Ying, Z. (1993). A large sample study of rank estimation for censored regression data. The Annals of Statistics, 1(21), 76–99.

    MathSciNet  Google Scholar 

  • Yuan, X., Wang, Y., & Liu, T. (2020). Variable selection for semiparametric random-effects conditional density models with longitudinal data. Communications in Statistics-Theory and Methods, 49(4), 977–996.

    Article  MathSciNet  Google Scholar 

  • Zeng, D., & Lin, D. (2007). Efficient estimation for the accelerated failure time model. Journal of the American Statistical Association, 102(480), 1387–1396.

    Article  MathSciNet  CAS  Google Scholar 

  • Zhao, J., Yang, Y., & Ning, Y. (2018). Penalized pairwise pseudo likelihood for variable selection with nonignorable missing data. Statistica Sinica, 28, 2125–2148.

    MathSciNet  Google Scholar 

  • Zhou, M. (2005). Empirical likelihood analysis of the rank estimator for the censored accelerated failure time model. Biometrika, 92, 492–498.

    Article  MathSciNet  Google Scholar 

  • Zou, H. (2006). The adaptive LASSO and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.

    Article  MathSciNet  CAS  Google Scholar 

  • Zou, H., & Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics, 36, 1509–1566.

    MathSciNet  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianqing Liu.

Ethics declarations

Conflict of interest

There are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 69 KB)

Appendices

Appendix A

The proofs of Theorems 3 and 4 are very similar to those of Theorems 3.1 and 3.2 in Yuan et al. (2020), respectively. However, our problem has some distinct features. In Yuan et al. (2020), the unpenalized objective function is a smooth function of the regression parameters. In this paper, the unpenalized objective function and its score function are not smooth, which makes a direct application of the standard penalized approach difficult.

Henceforth, we suppress \(\lambda\) from the quantities \(Q_{\varpi ,\lambda }({\varvec{\beta }})\) and \({\hat{{\varvec{\beta }}}}_{\varpi ,\lambda }\) for \(\varpi \in \{A,S\}\).

Proof of Theorem 1

Recall that \(\ell ({\varvec{\beta }})=E\{\ell _{N}({\varvec{\beta }})\}\). Our consistency proof utilizes the approach developed in Amemiya (1985, pp. 106–107) to carry out the steps outlined below: (i) \(\ell ({\varvec{\beta }})\) is uniquely maximized at \({\varvec{\beta }}^*\); (ii) \(\sup _{{{\varvec{\beta }}}\in \Theta }|\ell _N({\varvec{\beta }})-\ell ({\varvec{\beta }})|=o_p(1)\); (iii) \(\ell ({\varvec{\beta }})\) is continuous on \(\Theta\).

Proof of (i) Note that \(\varepsilon _i\) \((i=1, \cdots , N)\) are independent error terms with a common density function \(f_\varepsilon (u)\). For convenience, let

$$\begin{aligned} {\bar{\ell }}_N({\varvec{\beta }})=E\{\ell _{N}({\varvec{\beta }})|X_1,C_1,\ldots ,X_N,C_N\}, \end{aligned}$$

then \(\ell ({\varvec{\beta }})=E\{{\bar{\ell }}_{N}({\varvec{\beta }})\}\). We first show that

$$\begin{aligned} \partial \ell ({\varvec{\beta }}^*)/\partial {\varvec{\beta }}=:\partial \ell ({\varvec{\beta }})/\partial {\varvec{\beta }}|_{{{\varvec{\beta }}}={{\varvec{\beta }}}^*}=E\{U_{N}({\varvec{\beta }}^*)\}=0. \end{aligned}$$

Let \(O_i=(X_i^{\textsf {T}},C_i,\varepsilon _i)^{\textsf {T}}\), \(i=1,\ldots ,N\). On one hand,

$$\begin{aligned}{} & {} -(N^2-N)\frac{\partial {\bar{\ell }}_N({\varvec{\beta }}^*)}{\partial {\varvec{\beta }}} =-(N^2-N)E\{U_{N}({\varvec{\beta }}^*)|X_1,C_1,\ldots ,X_N,C_N\}\\{} & {} \quad =\sum _{i\ne j}^NE(E[R_{i}R_{j}\Delta _iI\{e_i({\varvec{\beta }}^*)<e_j({\varvec{\beta }}^*)\}|O_i,O_j]|X_i,C_i,X_j,C_j)(X_i-X_j)\\{} & {} \quad =\sum _{i\ne j}^N(X_i-X_j)E[m_1(X_{i},C_i)m_1(X_{j},C_j)I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+\varepsilon _i\le \log (C_i)\}\\{} & {} \qquad \times I\{X_j^{\textsf {T}}{\varvec{\beta }}^*+\varepsilon _i\le \log (C_j)\} m_2(\varepsilon _i)m_2(\varepsilon _j)I(\varepsilon _i<\varepsilon _j)|X_i,C_i,X_j,C_j]\\{} & {} \quad =\sum _{i=1}^N\sum _{j=1}^N\Gamma (X_i,C_i,X_j,C_j)(X_i-X_j), \end{aligned}$$

where

$$\begin{aligned} \Gamma (X_i,C_i,X_j,C_j)= & {} E[m_1(X_{i},C_i)m_1(X_{j},C_j)I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+\varepsilon _i\le \log (C_i)\}\\{} & {} \quad \times I\{X_j^{\textsf {T}}{\varvec{\beta }}^*+\varepsilon _i\le \log (C_j)\} m_2(\varepsilon _i)m_2(\varepsilon _j)I(\varepsilon _i<\varepsilon _j)|X_i,C_i,X_j,C_j]\\= & {} m_1(X_{i},C_i)m_1(X_{j},C_j)\int _{-\infty }^{+\infty }\int _{-\infty }^{+\infty } I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+u\le \log (C_i)\}\\{} & {} \quad \times I\{X_j^{\textsf {T}}{\varvec{\beta }}^*+u\le \log (C_j)\} m_2(u)m_2(v)I(u<v) f_{\varepsilon }(u)f_{\varepsilon }(v)dudv\\= & {} \Gamma (X_j,C_j,X_i,C_i). \end{aligned}$$

On the other hand,

$$\begin{aligned} \sum _{i=1}^N\sum _{j=1}^N\Gamma (X_i,C_i,X_j,C_j)X_i= & {} \sum _{i=1}^N\sum _{j=1}^N\Gamma (X_j,C_j,X_i,C_i)X_j\\= & {} \sum _{i=1}^N\sum _{j=1}^N\Gamma (X_i,C_i,X_j,C_j)X_j. \end{aligned}$$

It follows that \(\partial {\bar{\ell }}_N({\varvec{\beta }}^*)/\partial {\varvec{\beta }}=0\) and thus \(\partial \ell ({\varvec{\beta }}^*)/\partial {\varvec{\beta }}=E\{\partial {\bar{\ell }}_N({\varvec{\beta }}^*)/\partial {\varvec{\beta }}\}=0.\) In the following, we show that \(\ell ({\varvec{\beta }})=E\{{\bar{\ell }}_{N}({\varvec{\beta }})\}\) is a concave function of \({\varvec{\beta }}\). For two matrices A and B, we write \(A \le B\) if \(B - A\) is a nonnegative-definite matrix. Direct calculations give

$$\begin{aligned}{} & {} -(N^2-N){\bar{\ell }}_N({\varvec{\beta }})=\sum _{i=1}^N\sum _{j=1}^NE[R_{i}R_{j}\Delta _i\{e_i({\varvec{\beta }})-e_j({\varvec{\beta }})\}^{-}|X_i,C_i, X_j,C_j]\\{} & {} \quad =\sum _{i=1}^N\sum _{j=1}^Nm_1(X_{i},C_i)m_1(X_{j},C_j)E[ m_2(\varepsilon _i)m_2(\varepsilon _j) I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+\varepsilon _i\le \log (C_i)\}\\{} & {} \qquad \times \{\varepsilon _i-\varepsilon _j\wedge [\log (C_j)-X_j^{\textsf {T}}{\varvec{\beta }}^*]-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\}^{-}|X_i,C_i, X_j,C_j]\\{} & {} \quad =\sum _{i=1}^N\sum _{j=1}^Nm_1(X_{i},C_i)m_1(X_{j},C_j)\\{} & {} \qquad \times \int _{-\infty }^{+\infty }\int _{-\infty }^{+\infty } m_2(u)m_2(v) I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+u\le \log (C_i)\}\\{} & {} \qquad \times \{u-v\wedge [\log (C_j)-X_j^{\textsf {T}}{\varvec{\beta }}^*]-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\}^{-} f_{\varepsilon }(u)f_{\varepsilon }(v)dudv. \end{aligned}$$

Define \(g(v)= m_2(v)f_{\varepsilon }(v)\) and \(G(v)=\int _{v}^{+\infty }g(s)ds\). From C2, we have \(g(v)>0\) and \(G(v)>0\) for all \(v\in {\mathbb {R}}\). It follows that

$$\begin{aligned}{} & {} \int _{-\infty }^{+\infty } I\{s\wedge [\log (C_j)-X_j^{\textsf {T}}{\varvec{\beta }}^*]>v\}g(s)ds\nonumber \\{} & {} \quad =I\{\log (C_j)-X_j^{\textsf {T}}{\varvec{\beta }}^*>v\}\int _{-\infty }^{+\infty } I(s>v)g(s)ds\nonumber \\{} & {} \quad =I\{\log (C_j)-X_j^{\textsf {T}}{\varvec{\beta }}^*>v\}G(v). \end{aligned}$$
(A1)

Let

$$\begin{aligned} u_{ij}({\varvec{\beta }})=\log (C_j)-X_j^{\textsf {T}}{{\varvec{\beta }}}^*+(X_i-X_j)^{\textsf {T}}({{\varvec{\beta }}}-{{\varvec{\beta }}}^*). \end{aligned}$$

Utilizing the identity (A1), we can write

$$\begin{aligned}{} & {} -(N^2-N)\frac{\partial {\bar{\ell }}_N({\varvec{\beta }})}{\partial {\varvec{\beta }}}\\{} & {} \quad =\sum _{i\ne j}^N(X_i-X_j)m_1(X_{i},C_i)m_1(X_{j},C_j)\int _{-\infty }^{+\infty }\int _{-\infty }^{+\infty } I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+u\le \log (C_i)\}\\{} & {} \qquad \times m_2(u)m_2(v)I\{u-v\wedge [\log (C_j)-X_j^{\textsf {T}}{\varvec{\beta }}^*]-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)<0\}\\{} & {} \qquad \times f_{\varepsilon }(u)f_{\varepsilon }(v)dudv\\{} & {} \quad =\sum _{i\ne j}^N(X_i-X_j)m_1(X_{i},C_i)m_1(X_{j},C_j)\int _{-\infty }^{+\infty } I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+u\le \log (C_i)\}\\{} & {} \qquad \times I\{\log (C_j)-X_j^{\textsf {T}}{\varvec{\beta }}^*>u-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\}\\{} & {} \qquad \times G\{u-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\} g(u)du\\{} & {} \quad =\sum _{i=1}^N\sum _{j=1}^N(X_i-X_j)m_1(X_{i},C_i)m_1(X_{j},C_j)\\{} & {} \qquad \times \int _{-\infty }^{u_{ij}({{\varvec{\beta }}})}I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+u\le \log (C_i)\} G\{u-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\}g(u)du \end{aligned}$$

and

$$\begin{aligned}{} & {} -(N^2-N)\frac{\partial ^2 {\bar{\ell }}_N({\varvec{\beta }})}{\partial {\varvec{\beta }}\partial {\varvec{\beta }}^{\textsf {T}}}=\sum _{i=1}^N\sum _{j=1}^N(X_i-X_j)^{\otimes 2}m_1(X_{i},C_i)m_1(X_{j},C_j)\\{} & {} \qquad \times \int _{-\infty }^{+\infty } I\{\log (C_j)-X_j^{\textsf {T}}{\varvec{\beta }}^*>u-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\}\\{} & {} \qquad \times I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+u\le \log (C_i)\} g\{u-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\} g(u)du\\{} & {} \qquad +\sum _{i=1}^N\sum _{j=1}^N(X_i-X_j)^{\otimes 2}m_1(X_{i},C_i)m_1(X_{j},C_j)\\{} & {} \qquad \times \biggr [ I\{X_i^{\textsf {T}}{\varvec{\beta }}^*+u\le \log (C_i)\} G\{u-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\}g(u)\biggr ]\biggr |_{u=u_{ij}({{\varvec{\beta }}})}\\{} & {} \quad \ge \sum _{i=1}^N\sum _{j=1}^N(X_i-X_j)^{\otimes 2}m_1(X_{i},C_i)m_1(X_{j},C_j)\\{} & {} \qquad \times \int _{-\infty }^{+\infty } I[\{\log (C_i)-X_i^{\textsf {T}}{{\varvec{\beta }}}^*\}\wedge u_{ij}({{\varvec{\beta }}})>u]\\{} & {} \qquad \times g\{u-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\} g(u)du. \end{aligned}$$

Recall that \(S(u|X)=P(\log (C)>u|X)\). Then, for \(i\ne j\), we can write

$$\begin{aligned}{} & {} E(I[\{\log (C_i)-X_i^{\textsf {T}}{{\varvec{\beta }}}^*\}\wedge u_{ij}({{\varvec{\beta }}})>u]|X_i,X_j)\\{} & {} \quad =S(X_i^{\textsf {T}}{{\varvec{\beta }}}^*+u|X_i)S\{X_j^{\textsf {T}}{{\varvec{\beta }}}^*-(X_i-X_j)^{\textsf {T}}({{\varvec{\beta }}}-{{\varvec{\beta }}}^*)+u|X_j\}. \end{aligned}$$

Using conditions C2–C3, for \(i\ne j\), we have

$$\begin{aligned} & \int _{-\infty }^{+\infty } g(u)g\{u-(X_i-X_j)^{\textsf {T}}({\varvec{\beta }}-{\varvec{\beta }}^*)\} S(X_i^{\textsf {T}}{{\varvec{\beta }}}^*+u|X_i)\nonumber \\ & \quad \times S\{X_j^{\textsf {T}}{{\varvec{\beta }}}^*-(X_i-X_j)^{\textsf {T}}({{\varvec{\beta }}}-{{\varvec{\beta }}}^*)+u|X_j\}du\ge \varsigma _2^2\varsigma _4>0 \end{aligned}$$
(A2)

with probability one. By conditions C2, C4 and the inequality (A2), we obtain

$$\begin{aligned} E\biggr \{-\frac{\partial ^2 {\bar{\ell }}_N({\varvec{\beta }})}{\partial {\varvec{\beta }}\partial {\varvec{\beta }}^{\textsf {T}}}\biggr |X_1,\ldots ,X_N\biggr \}\ge \varsigma _1^2\varsigma _2^2\varsigma _4\frac{1}{N^2-N}\sum _{i=1}^N\sum _{j=1}^N(X_i-X_j)^{\otimes 2}, \end{aligned}$$

which is a positive definite matrix for large enough N. Notice that

$$\begin{aligned} D({\varvec{\beta }})= & {} -\partial ^2 \ell ({\varvec{\beta }})/\partial {\varvec{\beta }}\partial {\varvec{\beta }}^{\textsf {T}}=E[E\{-\partial ^2 {\bar{\ell }}_N({\varvec{\beta }})/\partial {\varvec{\beta }}\partial {\varvec{\beta }}^{\textsf {T}}|X_1,\ldots ,X_N\}]\\\ge & {} \varsigma _1^2\varsigma _2^2\varsigma _4E\{(X_1-X_2)^{\otimes 2}\}. \end{aligned}$$

This fact implies that the matrix \(D({\varvec{\beta }})\) is positive definite for all \({\varvec{\beta }}\in \Theta\). Therefore, \(\ell ({\varvec{\beta }})\) is a concave function on \(\Theta\). Combining previous results, it follows that \({\varvec{\beta }}^*\) is the unique maximizer of \(\ell ({\varvec{\beta }})\) and thus (i) is proved.

Proof of (ii) For each \({\varvec{\beta }}\in \Theta\) and each \((z_1,z_2)\) in \({\mathcal {Z}}\times {\mathcal {Z}}\), we can write \(\ell _N({\varvec{\beta }})-\ell ({\varvec{\beta }})={\mathbb {U}}_Nf(\cdot ,\cdot ,{\varvec{\beta }})\), where

$$\begin{aligned} f(Z_i,Z_j;{\varvec{\beta }})=-R_{i}R_{j}\Delta _i\{e_i({\varvec{\beta }})-e_j({\varvec{\beta }})\}^{-}-\ell ({\varvec{\beta }}) \end{aligned}$$

and \({\mathbb {U}}_N\) denotes the random measure putting mass \(1/(N^2-N)\) on each pair \((Z_i,Z_j)\), \(i\ne j\). Applying the arguments in Sherman (1993, Section 5) to the class \(\{f(\cdot ,\cdot ,{\varvec{\beta }}):{\varvec{\beta }}\in \Theta \}\), shows that it is Euclidean for the envelope \(|\Psi (Z_1,Z_2)|+E\{|\Psi (Z_1,Z_2)|\}\), where \(\Psi (Z_1,Z_2)=|\log (Y_i)-\log (Y_j)|+\Vert X_{i}-X_{j}\Vert \sup _{{{\varvec{\beta }}}\in \Theta }\Vert {\varvec{\beta }}\Vert\) and \(E\{|\Psi (Z_1,Z_2)|\}<+\infty\) under condition C4. An application of Corollary 7 in Sherman (1994, Section 6) shows that \(\sup _{{{\varvec{\beta }}}\in \Theta }|{\mathbb {U}}_Nf(\cdot ,\cdot ,{\varvec{\beta }})|=O_p(N^{-1/2}).\) This establishes (ii).

Proof of (iii) Note that \(\ell _N({\varvec{\beta }})\) is continuous. It follows that \(\ell ({\varvec{\beta }})\) is continuous. Hence, consistency is proved. \(\square\)

Lemma 1

Under conditions C0C4, for every sequence \(\kappa _N>0\) with \(\kappa _N\rightarrow 0\), we have

$$\begin{aligned} \ell _{N}({\varvec{\beta }})-{\ell }_N({\varvec{\beta }}^*)= & {} -\frac{1}{2}({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}D({\varvec{\beta }}^*)({\varvec{\beta }}-{\varvec{\beta }}^*)+({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}\frac{1}{N}\sum _{i=1}^N{\dot{b}}_{i}({\varvec{\beta }}^*)\\{} & {} \qquad +o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2)+o_p(N^{-1}). \end{aligned}$$

holds uniformly in \(\{{\varvec{\beta }}\in \Theta : \Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert \le \kappa _N\}\).

Proof of Lemma 1

Let \(\epsilon _N({\varvec{\beta }})=\ell _{N}({\varvec{\beta }})-{\ell }({\varvec{\beta }})\). A standard decomposition of U-statistics gives

$$\begin{aligned} \epsilon _N({\varvec{\beta }})-\epsilon _N({\varvec{\beta }}^*)= & {} \frac{1}{N}\sum _{i=1}^Nb_i({\varvec{\beta }})+\frac{1}{N^2-N}\sum _{i<j}d_{ij}({\varvec{\beta }}), \end{aligned}$$
(A3)

where

$$\begin{aligned} b_i({\varvec{\beta }})= & {} E[a_{ij}({\varvec{\beta }})+a_{ji}({\varvec{\beta }})-2E\{a_{ij}({\varvec{\beta }})\}|Z_i], \\ d_{ij}({\varvec{\beta }})= & {} a_{ij}({\varvec{\beta }})+a_{ji}({\varvec{\beta }})-2E\{a_{ij}({\varvec{\beta }})\}-b_i({\varvec{\beta }})-b_j({\varvec{\beta }}), \\ a_{ij}({\varvec{\beta }})= & {} -R_{i}R_{j}\Delta _i[\{e_i({\varvec{\beta }})-e_j({\varvec{\beta }})\}^{-}-\{e_i({\varvec{\beta }}^*)-e_j({\varvec{\beta }}^*)\}^{-}]. \end{aligned}$$

Note that \(E\{b_i({\varvec{\beta }})\} = 0\) for \({\varvec{\beta }}\in \Theta\) and \(b_i({\varvec{\beta }}^*)=0\). A Taylor expansion gives

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^Nb_i({\varvec{\beta }})= & {} ({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}\frac{1}{N}\sum _{i=1}^N{\dot{b}}_{i}({\varvec{\beta }}^*)+o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2), \end{aligned}$$
(A4)

where \({\dot{b}}_{i}({\varvec{\beta }})=\partial b_{i}({\varvec{\beta }})/\partial {\varvec{\beta }}\).

Combining the identical subgraph set and Vapnik-Chervonenkis class set arguments of Sherman (1993, Section 5) with Corollary 17 and Corollary 21 in Nolan and Pollard (1987), shows that the class of function \(d_{ij}({\varvec{\beta }})\) is Euclidean. The Euclidean property together with Corollary 8 of Sherman (1994) guarantee that, for any sequence \(\kappa _N\) of order o(1),

$$\begin{aligned} \sup _{\Vert {{\varvec{\beta }}}-{{\varvec{\beta }}}^*\Vert \le \kappa _N}~\biggr |\frac{1}{N^2-N}\sum _{i<j}d_{ij}({\varvec{\beta }})\biggr |=o_p(N^{-1}). \end{aligned}$$
(A5)

For \({\varvec{\beta }}\) in a neighbourhood of \({\varvec{\beta }}^*\), a Taylor expansion gives

$$\begin{aligned} \ell ({\varvec{\beta }})= & {} \ell ({\varvec{\beta }}^*)+({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}u({\varvec{\beta }}^*)-\frac{1}{2}({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}D({\varvec{\beta }}^*)({\varvec{\beta }}-{\varvec{\beta }}^*)+o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2)\nonumber \\= & {} \ell ({\varvec{\beta }}^*)-\frac{1}{2}({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}D({\varvec{\beta }}^*)({\varvec{\beta }}-{\varvec{\beta }}^*)+o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2), \end{aligned}$$
(A6)

where \(u({\varvec{\beta }})=\partial \ell ({\varvec{\beta }})/\partial {\varvec{\beta }}\), \(D({\varvec{\beta }})=-\partial ^2 \ell ({\varvec{\beta }})/\partial {\varvec{\beta }}\partial {\varvec{\beta }}^{\textsf {T}}\) and \(u({\varvec{\beta }}^*)=0\). Under conditions C2-C4, the matrix \(D({\varvec{\beta }})\) is invertible for \(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert \le \kappa _N\).

By (A3), (A4), (A5), and (A6), we have

$$\begin{aligned}{} & {} \ell _{N}({\varvec{\beta }})-{\ell }_N({\varvec{\beta }}^*)=\ell ({\varvec{\beta }})-{\ell }({\varvec{\beta }}^*)+\frac{1}{N}\sum _{i=1}^Nb_i({\varvec{\beta }})+\frac{1}{N^2-N}\sum _{i<j}d_{ij}({\varvec{\beta }})\\{} & {} \quad =-\frac{1}{2}({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}D({\varvec{\beta }}^*)({\varvec{\beta }}-{\varvec{\beta }}^*)+({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}\frac{1}{N}\sum _{i=1}^N{\dot{b}}_{i}({\varvec{\beta }}^*)\\{} & {} \qquad +o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2)+o_p(N^{-1}). \end{aligned}$$

\(\square\)

Lemma 2

Under conditions C0-C4, for every sequence \(\kappa _N>0\) with \(\kappa _N\rightarrow 0\), we have

$$\begin{aligned} \sup _{\Vert {{\varvec{\beta }}}-{{\varvec{\beta }}}^*\Vert \le \kappa _N}\biggr \Vert U_N({\varvec{\beta }})-U_N({\varvec{\beta }}^*)+D({\varvec{\beta }}-{\varvec{\beta }}^*)\biggr \Vert =o_p(N^{-1/2})+o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ). \end{aligned}$$

Proof of Lemma 2

We only need to show that, for \(l=1,\ldots ,p\),

$$\begin{aligned} \sup _{\Vert {{\varvec{\beta }}}-{{\varvec{\beta }}}^*\Vert \le \kappa _N}\biggr |U_{Nl}({\varvec{\beta }})-U_{Nl}({\varvec{\beta }}^*)-\Gamma _{l}({\varvec{\beta }}-{\varvec{\beta }}^*)\biggr |=o_p(N^{-1/2})+o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ), \end{aligned}$$

where \(U_{Nl}({\varvec{\beta }})=\frac{1}{N(N-1)}\sum _{i\ne j}^Nh_l(Z_i,Z_j;{\varvec{\beta }})\), \(h_l(Z_i,Z_j;{\varvec{\beta }})=-R_{i}R_{j}\Delta _iI\{e_i({\varvec{\beta }})<e_j({\varvec{\beta }})\}(X_{il}-X_{jl})\) and \(\Gamma _{l}=\partial E\{h_l(Z_1,Z_2;{\varvec{\beta }})/\partial {\varvec{\beta }}^{\textsf {T}}\}\bigr |_{{{\varvec{\beta }}}={{\varvec{\beta }}}^*}.\) It is easy to verify that \(D=-(\Gamma _{1}^{\textsf {T}},\ldots ,\Gamma _{p}^{\textsf {T}})^{\textsf {T}}\). Let \(U_l({\varvec{\beta }})=E\{U_{Nl}({\varvec{\beta }})\}\) and \(\epsilon _{Nl}({\varvec{\beta }})=U_{Nl}({\varvec{\beta }})-U_{l}({\varvec{\beta }})\). A standard decomposition of U-statistics gives

$$\begin{aligned} \epsilon _{Nl}({\varvec{\beta }})-\epsilon _{Nl}({\varvec{\beta }}^*)=\frac{1}{N}\sum _{i=1}^Nb_{i,l}({\varvec{\beta }})+\frac{1}{N^2-N}\sum _{i<j}d_{ij,l}({\varvec{\beta }}), \end{aligned}$$
(A7)

where

$$\begin{aligned} b_{i,l}({\varvec{\beta }})= & {} E[a_{ij,l}({\varvec{\beta }})+a_{ji,l}({\varvec{\beta }})-2E\{a_{ij,l}({\varvec{\beta }})\}|Z_i], \\ d_{ij,l}({\varvec{\beta }})= & {} a_{ij,l}({\varvec{\beta }})+a_{ji,l}({\varvec{\beta }})-2E\{a_{ij,l}({\varvec{\beta }})\}-b_{i,l}({\varvec{\beta }})-b_{j,l}({\varvec{\beta }}), \\ a_{ij,l}({\varvec{\beta }})= & {} h_l(Z_i,Z_j;{\varvec{\beta }})-h_l(Z_i,Z_j;{\varvec{\beta }}^*). \end{aligned}$$

Note that \(E\{b_{i,l}({\varvec{\beta }})\} = 0\) for \({\varvec{\beta }}\in \Theta _2\) and \(b_{i,l}({\varvec{\beta }}^*)=0\). A Taylor expansion gives

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^Nb_{i,l}({\varvec{\beta }})=({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}\frac{1}{N}\sum _{i=1}^N{\dot{b}}_{i,l}({\varvec{\beta }}^*)+o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2), \end{aligned}$$
(A8)

where \({\dot{b}}_{i,l}({\varvec{\beta }})=\partial b_{i,l}({\varvec{\beta }})/\partial {\varvec{\beta }}\) and \(N^{-1}\sum _{i=1}^N{\dot{b}}_{i,l}({\varvec{\beta }}^*)=O_p(N^{-1/2}).\) Similar arguments for proving (A5) again guarantee that, for any sequence \(\kappa _N\) of order o(1),

$$\begin{aligned} \sup _{\Vert {{\varvec{\beta }}}-{{\varvec{\beta }}}^*\Vert \le \kappa _N}~\biggr |\frac{1}{N^2-N}\sum _{i<j}d_{ij,l}({\varvec{\beta }})\biggr |=o_p(N^{-1}). \end{aligned}$$
(A9)

It is easy to see that

$$\begin{aligned} \epsilon _{Nl}({\varvec{\beta }})-\epsilon _{Nl}({\varvec{\beta }}^*)= & {} U_{Nl}({\varvec{\beta }})-U_{Nl}({\varvec{\beta }}^*)-\{U_{l}({\varvec{\beta }})-U_{l}({\varvec{\beta }}^*)\}\nonumber \\= & {} U_{Nl}({\varvec{\beta }})-U_{Nl}({\varvec{\beta }}^*)-\Gamma _{l}({\varvec{\beta }}-{\varvec{\beta }}^*)+o(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ). \end{aligned}$$
(A10)

Combining (A7), (A8), (A9) and (A10), the desired result follows. \(\square\)

Proof of Theorem 2

From the proof of Theorem 1, we know that the matrix \(D({\varvec{\beta }})\) is positive definite for all \({\varvec{\beta }}\in \Theta\). By Lemma 1, we obtain

$$\begin{aligned} \ell _{N}({\varvec{\beta }})=\ell ({\varvec{\beta }})+\epsilon _N({\varvec{\beta }})= & {} -\frac{1}{2}({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}D({\varvec{\beta }}^*)({\varvec{\beta }}-{\varvec{\beta }}^*)+({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)\\{} & {} \quad +\ell ({\varvec{\beta }}^*)+\epsilon _N({\varvec{\beta }}^*)+o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2)+o_p(N^{-1})\\= & {} \varrho _N({\varvec{\beta }})+\ell ({\varvec{\beta }}^*)+\epsilon _N({\varvec{\beta }}^*)+o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2)+o_p(N^{-1}), \end{aligned}$$

where

$$\begin{aligned} \varrho _N({\varvec{\beta }})= & {} -\frac{1}{2}({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}D({\varvec{\beta }}^*)({\varvec{\beta }}-{\varvec{\beta }}^*)+({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}\left[ \frac{1}{N}\sum _{i=1}^N {\dot{b}}_i({\varvec{\beta }}^*)\right] \\= & {} -\frac{1}{2}\left( D^{1/2}({\varvec{\beta }}^*)\left[ {\varvec{\beta }}-{\varvec{\beta }}^*-D^{-1}({\varvec{\beta }}^*)\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)\right] \right) ^{\otimes 2}\\{} & {} \quad +\frac{1}{2}\left[ \frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)\right] ^{\textsf {T}}D^{-1}({\varvec{\beta }}^*)\left[ \frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)\right] . \end{aligned}$$

Hence, the maximizer of \(\varrho _N({\varvec{\beta }})\) is \({\hat{{\varvec{\gamma }}}}={\varvec{\beta }}^*+D^{-1}({\varvec{\beta }}^*)\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*).\) By Theorem 1, \({\hat{{\varvec{\beta }}}}_W={\varvec{\beta }}^*+o_p(1)\). Since \({\hat{{\varvec{\beta }}}}_W\) is the maximizer of \(\ell _{N}({\varvec{\beta }})\), we have

$$\begin{aligned}{} & {} 0\le \varrho _N({\hat{{\varvec{\gamma }}}})-\varrho _N({\hat{{\varvec{\beta }}}}_W)\\{} & {} \quad =\{\varrho _N({\hat{{\varvec{\gamma }}}})+\ell ({\varvec{\beta }}^*)+\epsilon _N({\varvec{\beta }}^*)-\ell _{N}({\hat{{\varvec{\gamma }}}})\}\nonumber \\{} & {} \qquad -\{\varrho _N({\hat{{\varvec{\beta }}}}_W)+\ell ({\varvec{\beta }}^*)+\epsilon _N({\varvec{\beta }}^*)-\ell _{N}({\hat{{\varvec{\beta }}}}_W)\}-\{\ell _N({\hat{{\varvec{\beta }}}}_W)-\ell _N({\hat{{\varvec{\gamma }}}})\}\nonumber \\{} & {} \quad \le \{\varrho _N({\hat{{\varvec{\gamma }}}})+\ell ({\varvec{\beta }}^*)+\epsilon _N({\varvec{\beta }}^*)-\ell _{N}({\hat{{\varvec{\gamma }}}})\}\nonumber \\{} & {} \qquad -\{\varrho _N({\hat{{\varvec{\beta }}}}_W)+\ell ({\varvec{\beta }}^*)+\epsilon _N({\varvec{\beta }}^*)-\ell _{N}({\hat{{\varvec{\beta }}}}_W)\}\nonumber \\{} & {} \quad =o_p(\Vert {\hat{{\varvec{\beta }}}}_W-{\varvec{\beta }}^*\Vert ^2)+o_p(\Vert {\hat{{\varvec{\gamma }}}}-{\varvec{\beta }}^*\Vert ^2)+o_p(N^{-1}).\nonumber \end{aligned}$$
(A11)

On the other hand, in view of the expression for \(\varrho _N\),

$$\begin{aligned} \varrho _N({\hat{{\varvec{\gamma }}}})-\varrho _N({\hat{{\varvec{\beta }}}}_W)=\frac{1}{2}\left( D^{1/2}({\varvec{\beta }}^*)\left[ {\hat{{\varvec{\beta }}}}_W-{\varvec{\beta }}^*-D^{-1}({\varvec{\beta }}^*)\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)\right] \right) ^{\otimes 2}.\nonumber \end{aligned}$$

Combining (A11) and (A12), we obtain

$$\begin{aligned} {\hat{{\varvec{\beta }}}}_W= & {} {\varvec{\beta }}^*+D^{-1}({\varvec{\beta }}^*)\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)+o_p(\Vert {\hat{{\varvec{\beta }}}}_W-{\varvec{\beta }}^*\Vert )+o_p(\Vert {\hat{{\varvec{\gamma }}}}-{\varvec{\beta }}^*\Vert )\\{} & {} +o_p(N^{-1/2}). \end{aligned}$$

Obviously, \({\hat{{\varvec{\gamma }}}}-{\varvec{\beta }}^*=D^{-1}({\varvec{\beta }}^*)\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)=O_p(N^{-1/2})\). It follows that \({\hat{{\varvec{\beta }}}}_W-{\varvec{\beta }}^*=O_p(N^{-1/2})\) and \({\hat{{\varvec{\beta }}}}_W={\varvec{\beta }}^*+D^{-1}({\varvec{\beta }}^*)\frac{1}{N}\sum _{i=1}^N{\dot{b}}_i({\varvec{\beta }}^*)+o_p(N^{-1/2}).\) By the multivariate central limit theorem, the proof of Theorem 2 is complete. \(\square\)

Proof of Theorem 3

Proof of part (1). Let \(\alpha _N=N^{-1/2}+\max \{P'_{S,\lambda _N}(|\beta _j^*|):\beta _j^*\ne 0\}\). We first prove that for any \(\varepsilon >0\), there exists a large constant C such that

$$\begin{aligned} P\biggr \{\sup _{\Vert {{\varvec{u}}}\Vert =C}Q_{S}({\varvec{\beta }}^*+\alpha _N {\varvec{u}})<Q_{S}({\varvec{\beta }}^*)\biggr \}\ge 1-\varepsilon , \end{aligned}$$
(A12)

where \({\varvec{u}}=(u_1,\ldots ,u_p)^\textsf {T}\). Observing that \(P_{S,\lambda _N}(0)=0\), we have

$$\begin{aligned}{} & {} Q_{S}({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-Q_{S}({\varvec{\beta }}^*) \\{} & {} \quad \le \ell _N({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-\ell _N({\varvec{\beta }}^*)-\sum _{j=1}^{s}\{ P_{S,\lambda _N}(|\beta _j^*+\alpha _N u_j|)-P_{S,\lambda _N}(|\beta _j^*|)\},\nonumber \end{aligned}$$
(A13)

where s is the number of components of \({\varvec{\beta }}_1^*\). For any \({\varvec{\beta }}\in \{{\varvec{\beta }}: {\varvec{\beta }}={\varvec{\beta }}^*+\alpha _N{\varvec{u}}, \Vert {\varvec{u}}\Vert =C\}\), by Lemma 1, we get

$$\begin{aligned} \ell _N({\varvec{\beta }})-\ell _N({\varvec{\beta }}^*)= & {} -\frac{1}{2}({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}D({\varvec{\beta }}^*)({\varvec{\beta }}-{\varvec{\beta }}^*)+({\varvec{\beta }}-{\varvec{\beta }}^*)^{\textsf {T}}\frac{1}{N}\sum _{i=1}^N{\dot{b}}_{i}({\varvec{\beta }}^*)\nonumber \\{} & {} \quad +o_p(\Vert {\varvec{\beta }}-{\varvec{\beta }}^*\Vert ^2)+o_p(N^{-1}) \nonumber \\= & {} \frac{1}{N}\sum _{i=1}^N{\dot{b}}_{i}^{\textsf {T}}({\varvec{\beta }}^*)\alpha _N {\varvec{u}}-\frac{1}{2}\alpha _N^2 {\varvec{u}}^\textsf {T}D{\varvec{u}}+o_p(\alpha _N^2C^2)+o_p(N^{-1}) \nonumber \\= & {} O_p (\alpha _N^2 C)-\frac{1}{2}\alpha _N^2 {\varvec{u}}^\textsf {T}D{\varvec{u}}+o_p(\alpha _N^2C^2)+o_p(N^{-1}). \end{aligned}$$
(A14)

Combining (A13) with (A14), we obtain

$$\begin{aligned}{} & {} Q_{S}({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-Q_{S}({\varvec{\beta }}^*) \\{} & {} \quad \le O_p (\alpha _N^2 C)-\frac{1}{2}\alpha _N^2 {\varvec{u}}^\textsf {T}D{\varvec{u}}-\sum _{j=1}^{s}\{ \alpha _N u_j P'_{S,\lambda _N}(|\beta _j^*|)\text{ sgn }(\beta _j^*)+ \frac{1}{2}\alpha _N^2 u_j^2 P''_{S,\lambda _N}(|\beta _j^*|)\nonumber \\{} & {} \qquad +o_p(\alpha _N^2u_j^2)\}+o_p(\alpha _N^2C^2)+o_p(N^{-1}).\nonumber \end{aligned}$$
(A15)

Note that the third term in (A15) is bounded by

$$\begin{aligned} \alpha _N^2 C + \frac{1}{2}\alpha _N^2 C^2 \max \{P''_{S,\lambda _N}(|\beta _j^*|):\beta _j^*\ne 0\}. \end{aligned}$$
(A16)

From Fan and Li (2001), the SCAD penalty function \(P_{S,\lambda _N}(\cdot )\) satisfies

$$\begin{aligned} \lim _{N\rightarrow \infty }\max \{|P_{S,\lambda _N}''(|\beta _j^*|)|: \beta _j^*\ne 0\}=0, \liminf _{N\rightarrow \infty }\liminf _{\theta \rightarrow 0_+}P_{S,\lambda _N}'(\theta )/\lambda _N>0, \end{aligned}$$
(A17)

where \(\lambda _N \rightarrow 0\) and \(N^{1/2} \lambda _N \rightarrow \infty\) as \(N\rightarrow \infty\).

(A17) implies that the term in (A16) is further dominated by the second term of (A15). By choosing a sufficiently large C, the second term of (A15) also dominates the first term. Hence, by choosing a sufficiently large C, (A12) is proved.

(A12) guarantees that with probability at least \(1-\varepsilon\), there exists a local maximum in the ball \({\varvec{\beta }}\in \{{\varvec{\beta }}: {\varvec{\beta }}={\varvec{\beta }}^*+\alpha _N{\varvec{u}}, \Vert {\varvec{u}}\Vert \le C\}\). Let \({\hat{{\varvec{\beta }}}}_{S}\) denote the local maximizer in the ball \({\varvec{\beta }}\in \{{\varvec{\beta }}: {\varvec{\beta }}={\varvec{\beta }}^*+\alpha _N{\varvec{u}}, \Vert {\varvec{u}}\Vert \le C\}\), it follows that \(\Vert {\hat{{\varvec{\beta }}}}_{S}-{\varvec{\beta }}^*\Vert =O_p(\alpha _N)\).

Next, we show that

$$\begin{aligned} Q_{S}(({\varvec{\beta }}_1^\textsf {T},0^\textsf {T})^\textsf {T})= & {} \max _{\Vert {{\varvec{\beta }}}_2\Vert \le CN^{-1/2}} Q_{S}(({\varvec{\beta }}_1^\textsf {T},{\varvec{\beta }}_2^\textsf {T})^\textsf {T}) \end{aligned}$$
(A18)

for any given \({\varvec{\beta }}_1\) satisfying \(\Vert {\varvec{\beta }}_1-{\varvec{\beta }}_1^*\Vert =O_p(N^{-1/2})\) and any constant C. Note that (A18) implies that \({\hat{{\varvec{\beta }}}}_{2,S}=0\). To prove (A18), we only need to prove that for some \(\varepsilon _N=CN^{-1/2}\) and \(j=s+1,\ldots , p\), \(\partial Q_{S}({\varvec{\beta }})/\partial \beta _j<0\) for \(0< \beta _j< \varepsilon _N\), \(\partial Q_{S}({\varvec{\beta }})/\partial \beta _j>0\) for \(-\varepsilon _N< \beta _j<0\).

Using Lemma 2, we obtain

$$\begin{aligned} \sup _{\Vert {{\varvec{\beta }}}-{{\varvec{\beta }}}^*\Vert \le \varepsilon _N}\left\| \frac{\partial \ell _N({\varvec{\beta }})}{\partial {\varvec{\beta }}}\right\|= & {} O_p(N^{-1/2}). \end{aligned}$$
(A19)

Then, we can write

$$\begin{aligned} \frac{\partial Q_{S}({\varvec{\beta }})}{\partial \beta _j}= & {} \frac{\partial \ell _N({\varvec{\beta }})}{\partial \beta _j}- P'_{S,\lambda _N}(|\beta _j|) \text{ sgn }(\beta _j)\\= & {} - \lambda _N \left\{ \text{ sgn }(\beta _j)P'_{S,\lambda _N}(|\beta _j|)/\lambda _N+O_p(N^{-1/2}/\lambda _N)\right\} . \end{aligned}$$

From (A17), \(\lim \inf _{N\rightarrow \infty } \lim \inf _{\theta \rightarrow 0+} P'_{S,\lambda _N}(\theta )/\lambda _N >0\) and \(N^{-1/2}/\lambda _N\rightarrow 0\). These facts indicate that the sign of the derivative \(\partial Q_{S}({\varvec{\beta }})/\partial \beta _j\) is completely determined by that of \(\beta _j\). The desired result follows.

Proof of part (2). Consider \(Q_{S}(({\varvec{\beta }}_1^\textsf {T},0^\textsf {T})^\textsf {T})\) as a function of \({\varvec{\beta }}_1\). Then there exists a \(N^{1/2}\)-consistent local maximizer of \(Q_{S}(({\varvec{\beta }}_1^\textsf {T},0^\textsf {T})^\textsf {T})\). Let \({\hat{{\varvec{\beta }}}}_{1,S}\) denote the local maximizer, which is a solution to the following likelihood equations \(\frac{\partial Q_{S}(({{\varvec{\beta }}}_1^\textsf {T},0^\textsf {T})^\textsf {T})}{\partial {{\varvec{\beta }}}_1}=0.\) By Lemma 2 and a Taylor expansion, we get

$$\begin{aligned} 0= & {} \frac{\partial Q_{S}(({\hat{{\varvec{\beta }}}}_{1,S}^\textsf {T},0^\textsf {T})^\textsf {T})}{\partial {\varvec{\beta }}_1}=\frac{\partial \ell _{N}(({\hat{{\varvec{\beta }}}}_{1,S}^\textsf {T},0^\textsf {T})^\textsf {T})}{\partial {\varvec{\beta }}_1}-\frac{\partial \sum _{j=1}^sP_{S,\lambda _N}(|{\hat{\beta }}_{1,S,j}|)}{\partial {\varvec{\beta }}_1}\\= & {} U_{1,N}(({\varvec{\beta }}_1^{*\textsf {T}},0^\textsf {T}))- F_1-D_{11}({\hat{{\varvec{\beta }}}}_{1,S}-{\varvec{\beta }}_1^*)\\{} & {} -(E_1+o_p(1))({\hat{{\varvec{\beta }}}}_{1,S}-{\varvec{\beta }}_1^*)+o_p(N^{-1})+o_p(\Vert {\hat{{\varvec{\beta }}}}_{1,S}-{\varvec{\beta }}_1^*\Vert ), \end{aligned}$$

where \(U_{1,N}({\varvec{\beta }})=\partial \ell _{N}({\varvec{\beta }})/\partial {\varvec{\beta }}_1\), and \(E_1\) and \(F_1\) are defined in theorem 3. An application of the Theorem 1 of Kowalski and Tu (2008, page 158) yields

$$\begin{aligned} N^{1/2} U_{1,N}(({\varvec{\beta }}_1^{*\textsf {T}},0^\textsf {T})) {\mathop {\longrightarrow }\limits ^{d}}N(0,V_{11}). \end{aligned}$$

By Slutsky’s theorem and the central limit theorem, it follows that

$$\begin{aligned}{} & {} (D_{11}+E_1)N^{1/2}\{{\hat{{\varvec{\beta }}}}_{1,S}-{\varvec{\beta }}^*_1+(D_{11}+E_1)^{-1}F_1\}\\{} & {} \quad = N^{1/2} U_{1,N}(({\varvec{\beta }}_1^{*\textsf {T}},0^\textsf {T}))+o_p(1){\mathop {\longrightarrow }\limits ^{d}}N(0, V_{11}). \end{aligned}$$

This completes the proof. \(\square\)

Proof of Theorem 4

Proof of part (1). Let \(\alpha _N=N^{-1/2}\). We first prove that for any given \(\varepsilon >0\), there exists a large constant C such that

$$\begin{aligned} P\left\{ \sup _{\Vert {{\varvec{u}}}\Vert =C}Q_{A}({\varvec{\beta }}^*+\alpha _N {\varvec{u}})<Q_{A}({\varvec{\beta }}^*)\right\} \ge 1-\varepsilon , \end{aligned}$$
(A20)

where \({\varvec{u}}=(u_1,\ldots ,u_p)^\textsf {T}\). By the definition of \(Q_{A}({\varvec{\beta }})\), we have

$$\begin{aligned}{} & {} Q_{A}({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-Q_{A}({\varvec{\beta }}^*) \\{} & {} \quad \le \ell _N({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-\ell _N({\varvec{\beta }}^*)-\sum _{j=1}^{s}\{\lambda _N w_j|\beta _j^*+\alpha _N u_j|-\lambda _N w_j|\beta _j^*|\} \\{} & {} \quad = \{\ell _N({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-\ell _N({\varvec{\beta }}^*)\}-\lambda _N \sum _{j=1}^{s}w_j (|\beta _j^*+\alpha _N u_j|-|\beta _j^*|) \\{} & {} \quad \le \{\ell _N({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-\ell _N({\varvec{\beta }}^*)\}+\lambda _N \alpha _N \sum _{j=1}^{s}w_j |u_j|. \\ \end{aligned}$$

Thus, it is sufficient to show that \(\{\ell _N({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-\ell _N({\varvec{\beta }}^*)\}+\lambda _N \alpha _N \sum _{j=1}^{s}w_j |u_j|\le 0\) for a sufficiently large C with probability approaching one.

From (A14), we get

$$\begin{aligned} \ell _N({\varvec{\beta }})-\ell _N({\varvec{\beta }}^*)= -\frac{1}{2}\alpha _N^2 {\varvec{u}}^\textsf {T}D{\varvec{u}}+O_p (\alpha _N^2 C)+o_p(\alpha _N^2C^2)+o_p(N^{-1}), \end{aligned}$$
(A21)

for any \({\varvec{\beta }}\in \{{\varvec{\beta }}: {\varvec{\beta }}={\varvec{\beta }}^*+\alpha _N{\varvec{u}}, \Vert {\varvec{u}}\Vert =C\}\). Note that D is a positive definite matrix. Then \(\frac{1}{2}\alpha _N^2 {\varvec{u}}^\textsf {T}D{\varvec{u}}=O_p(\alpha _N^2C^2)\). Therefore, for a sufficiently large C, the first term dominates the other terms on the right hand side of the equation (A21).

Moreover, direct applications of Taylor’s expansion lead to

$$\begin{aligned} w_j= & {} \frac{1}{|{\hat{\beta }}_{W,j}|^\gamma }=\frac{1}{|\beta _j^*|^\gamma }-\frac{\gamma \text{ sgn }(\beta _j^*)}{|\beta _j^*|^{\gamma +1}}({\hat{\beta }}_{W,j}-\beta _j^*)+o_p (|{\hat{\beta }}_{W,j}-\beta _j^*|)\\= & {} \frac{1}{|\beta _j^*|^\gamma }+O_p (N^{-1/2}). \end{aligned}$$

Based on this fact, we obtain

$$\begin{aligned} \lambda _N \alpha _N \sum _{j=1}^{s}w_j |u_j|= & {} N^{-1/2}\lambda _N \sum _{j=1}^{s}\left[ \frac{|u_j|}{|\beta _j^*|}+N^{-1/2}|u_j|O_p (1)\right] \\\le & {} CN^{-1/2}\lambda _N O_p (1)= CN^{-1} (N^{1/2}\lambda _N) O_p (1)=o_p(\alpha _N^2C).\nonumber \end{aligned}$$
(A22)

Combining (A21) and (A22), it follows that

$$\begin{aligned} Q_{A}({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-Q_{A}({\varvec{\beta }}^*) \le \{\ell _N({\varvec{\beta }}^*+\alpha _N {\varvec{u}})-\ell _N({\varvec{\beta }}^*)\}+\lambda _N \alpha _N \sum _{j=1}^{s}w_j |u_j| \le 0 \end{aligned}$$

for a sufficiently large C. Thus (A20) is proved and there exists a local maximizer \({\hat{{\varvec{\beta }}}}_{A}\) such that \(\Vert {\hat{{\varvec{\beta }}}}_{A}-{\varvec{\beta }}^*\Vert =O_p(\alpha _N)\).

Next, we show that show that

$$\begin{aligned} Q_{A}(({\varvec{\beta }}_1^\textsf {T},0^\textsf {T})^\textsf {T})= & {} \max _{\Vert {{\varvec{\beta }}}_2\Vert \le CN^{-1/2}} Q_{A}(({\varvec{\beta }}_1^\textsf {T},{\varvec{\beta }}_2^\textsf {T})^\textsf {T}) \end{aligned}$$
(A23)

for any given \({\varvec{\beta }}_1\) satisfying \(\Vert {\varvec{\beta }}_1-{\varvec{\beta }}_1^*\Vert =O_p(N^{-1/2})\) and any constant C. By (A19), we can write

$$\begin{aligned} \frac{\partial Q_{A}({\varvec{\beta }})}{\partial \beta _j}= & {} \frac{\partial \ell _N({\varvec{\beta }})}{\partial \beta _j}-\lambda _N w_j \text{ sgn }(\beta _j)= -\lambda _N N^\frac{\gamma }{2} \frac{\text{ sgn }(\beta _j) }{ |N^{1/2}{\hat{\beta }}_{W,j}|^\gamma }+O_p (N^{-1/2}) \\= & {} N^{-1/2}\left[ O_p (1)-(N^\frac{\gamma +1}{2} \lambda _N)\frac{\text{ sgn }(\beta _j)}{|O_p(1)|} \right] ,\ \ j=s+1,\ldots ,p. \end{aligned}$$

From the condition of Theorem 4, \(N^{\frac{\gamma +1}{2}}\lambda _N \rightarrow \infty\) as \(N\rightarrow \infty\). Then, \(-\beta _j\partial Q_{A}({\varvec{\beta }})/\partial \beta _j>0\) with probability approaching on as \(N\rightarrow \infty\). Consequently, (A23) is proved and thus \({\hat{{\varvec{\beta }}}}_{2,A}=0\).

Proof of part (2). Let \({\hat{{\varvec{\beta }}}}_{1,A}\) denote the local maximizer of \(Q_{A}(({\varvec{\beta }}_1^\textsf {T},0^\textsf {T})^\textsf {T})\), which is a function of \({\varvec{\beta }}_1\). Then, \({\hat{{\varvec{\beta }}}}_{1,A}\) is \(N^{1/2}\)-consistent and we have the following asymptotic expression

$$\begin{aligned} 0= & {} \frac{\partial Q_{A}(({\hat{{\varvec{\beta }}}}_{1,A}^\textsf {T},0^\textsf {T})^\textsf {T})}{\partial {\varvec{\beta }}_1}\\= & {} \frac{\partial \ell _{N}(({\hat{{\varvec{\beta }}}}_{1,A}^\textsf {T},0^\textsf {T})^\textsf {T})}{\partial {\varvec{\beta }}_1} - \lambda _N \left( w_1 \text{ sgn }({\hat{\beta }}_{1,A,1}),\ldots , w_s \text{ sgn }({\hat{\beta }}_{1,A,s})\right) ^\textsf {T}\\= & {} U_{1,N}(({\varvec{\beta }}_1^{*\textsf {T}},0^\textsf {T}))-D_{11}({\hat{{\varvec{\beta }}}}_{1,A}-{\varvec{\beta }}_1^*)+o_p(N^{-1})+o_p(\Vert {\hat{{\varvec{\beta }}}}_{1,A}-{\varvec{\beta }}_1^*\Vert )\\{} & {} - \lambda _N \left( w_1 \text{ sgn }({\hat{\beta }}_{1,A,1}),\ldots , w_s \text{ sgn }({\hat{\beta }}_{1,A,s})\right) ^\textsf {T}. \end{aligned}$$

By Theorem 1 of Kowalski and Tu (2008, page 158), we obtain

$$\begin{aligned} N^{1/2} U_{1,N}(({\varvec{\beta }}_1^{*\textsf {T}},0^\textsf {T})) {\mathop {\longrightarrow }\limits ^{d}}N(0,V_{11}). \end{aligned}$$

Since \(N^{1/2}\lambda _N=o_p(1)\), the desired result follows. \(\square\)

Appendix B

In this appendix, we give algorithms for maximizing \(Q_{\varpi ,\lambda }({\varvec{\beta }})\) in (5). Efficient algorithms for maximizing penalized likelihood include the local quadratic approximation (LQA) algorithm (Fan & Li, 2001), the local linear approximation (LLA) algorithm (Zou & Li, 2008) and the coordinate optimization algorithm (Fu, 1998; Fan & Lv, 2011). Here, we adopt the LLA algorithm. Define

$$\begin{aligned} Q({\varvec{\beta }}, {\varvec{\alpha }})=\ell _{N}({\varvec{\beta }}) -\sum _{k=1}^p \alpha _k |\beta _k|, \end{aligned}$$
(B.1)

where \({\varvec{\alpha }}=(\alpha _1,\ldots ,\alpha _p)^\textsf {T}\). The LLA algorithm can be summarized as follows:

1.:

Initialize \({\hat{{\varvec{\beta }}}}^{(0)}=({\hat{\beta }}_1^{(0)},\ldots ,{\hat{\beta }}_p^{(0)})\), and compute the adaptive weight

$$\begin{aligned} {\hat{{\varvec{\alpha }}}}^{(0)}= & {} ({\hat{\alpha }}_1^{(0)},\ldots ,{\hat{\alpha }}_p^{(0)})^\textsf {T}\\= & {} (P_\lambda '(|{\hat{\beta }}_1^{(0)}|),\ldots ,P_\lambda '(|{\hat{\beta }}_p^{(0)}|))^\textsf {T}; \end{aligned}$$
2.:

Compute

$$\begin{aligned} {\hat{{\varvec{\beta }}}}^{(m)}=({\hat{\beta }}_1^{(m)},\ldots ,{\hat{\beta }}_p^{(m)})^\textsf {T} =\arg \max _{{{\varvec{\beta }}}} Q ({\varvec{\beta }}, {\hat{{\varvec{\alpha }}}}^{(m-1)}), \end{aligned}$$

where \(Q({\varvec{\beta }}, {\varvec{\alpha }})\) is defined in (B.1);

3.:

Update the adaptive weight vector

$$\begin{aligned} {\hat{{\varvec{\alpha }}}}^{(m)}=({\hat{\alpha }}_1^{(m)},\ldots ,{\hat{\alpha }}_p^{(m)})^\textsf {T}, \end{aligned}$$

where \({\hat{\alpha }}_j^{(m)}=P_\lambda '(|{\hat{\beta }}_j^{(m)}|)\), \(j=1,\ldots ,p\);

4.:

Repeat steps (2)–(3) till convergence.

For fixed \({\varvec{\alpha }}\), to maximize \(Q ({\varvec{\beta }}, {\varvec{\alpha }})\) with respect to \({\varvec{\beta }}\) in step (2), we employ the idea of the coordinate optimization algorithm. In a coordinate optimization algorithm, one maximizes one coordinate at a time with successive displacements. Let \(X_{i,-k}=(X_{i,1},\ldots ,X_{i,k-1},X_{i,k+1},\ldots ,X_{i,p})^\textsf {T}\) and \(X_{i}=(X_{i,1},\ldots ,X_{i,p})^\textsf {T}\). Define

$$\begin{aligned}{} & {} \ell _{k}(\gamma |{\varvec{\zeta }},{\varvec{\eta }},{\varvec{\alpha }})=Q(({\varvec{\zeta }}^\textsf {T},\gamma ,{\varvec{\eta }}^\textsf {T})^\textsf {T}, {\varvec{\alpha }})\\{} & {} \quad =-\frac{1}{N^2-N}\sum _{i\ne j}^NR_{i}R_{j}\Delta _i (u_{ij,k}-v_{ij,k}\gamma )^{-} - \alpha _k|\gamma |-\sum _{l=1}^{k-1}\alpha _l|\zeta _{l}|-\sum _{l=k+1}^{p}\alpha _l|\eta _{l-k}|, \end{aligned}$$

where \(u_{ij,k}=\log (Y_i)-\log (Y_j)-({\varvec{\zeta }}^\textsf {T},{\varvec{\eta }}^\textsf {T})(X_{i,-k}-X_{j,-k})^\textsf {T},\) \(v_{ij,k}= X_{i,k}-X_{j,k}\), \({\varvec{\zeta }}^\textsf {T}=(\zeta _1,\ldots ,\zeta _{k-1})\), and \({\varvec{\eta }}^\textsf {T}=(\eta _1,\ldots ,\zeta _{p-k})\). Note that we set \({\varvec{\zeta }}=\emptyset\) if \(k=1\) and \({\varvec{\eta }}=\emptyset\) if \(k=p\). Differentiating \(\ell _{k}(\gamma |{\varvec{\zeta }},{\varvec{\eta }},{\varvec{\alpha }})\) with respect to \(\gamma\), we obtain the following estimating equation

$$\begin{aligned} \frac{1}{N^2-N}\sum _{i\ne j}^NR_{i}R_{j}\Delta _i v_{ij,k}I( u_{ij,k}-v_{ij,k}\gamma<0) + \alpha _k =2\alpha _k I(\gamma <0), \end{aligned}$$

which is equivalent to

$$\begin{aligned}{} & {} \sum _{i\ne j}^NR_{i}R_{j}\Delta _i |v_{ij,k}|I\left( \gamma< \frac{u_{ij,k}}{v_{ij,k}}\right) + 2(N^2-N)\alpha _k I(\gamma <0)\\{} & {} \quad =(N^2-N)\alpha _k+ \frac{1}{2 }\sum _{i\ne j}^NR_{i}R_{j}\Delta _i (|v_{ij,k}|+v_{ij,k}). \end{aligned}$$

Let \(b_{00,k}=0\), \(b_{ij,k}=u_{ij,k}/v_{ij,k}\), \(w_{00,k}=2(N^2-N)\alpha _k\), \(w_{ij,k}=R_{i}R_{j}\Delta _i |v_{ij,k}|\), \(i,j=1,\ldots ,N\). Define

$$\begin{aligned} \{(b_{ij,k},w_{ij,k}):(i,j)\in {\mathscr {S}}\}=\{(b_{l,k},w_{l,k}):b_{1,k}<\cdots <b_{L,k},\ L=|{\mathscr {S}}|\}, \end{aligned}$$

where \({\mathscr {S}}=\{(0,0)\}\cup \{(i,j):v_{ij,k}\ne 0,\ i,j=1,\ldots ,N\}.\) Then, we can write

$$\begin{aligned} {\hat{\gamma }}_k({\varvec{\zeta }},{\varvec{\eta }},{\varvec{\alpha }})=\arg \max _{\gamma \in {\mathbb {R}}}\ell _{k}(\gamma |{\varvec{\zeta }},{\varvec{\eta }},{\varvec{\alpha }})=b_{{\hat{s}}_k}, \end{aligned}$$
(B.2)

where

$$\begin{aligned} {\hat{s}}_k= & {} \min \biggr \{s: \sum _{l=1}^s w_{l,k} > (N^2-N)\alpha _k+ \frac{1}{2}\sum _{i\ne j}^NR_{i}R_{j}\Delta _i (|v_{ij,k}|+v_{ij,k}),\\{} & {} \ \ s=1,\ldots ,L \biggr \}. \end{aligned}$$

To this end, the proposed coordinate optimization algorithm for maximizing \(Q ({\varvec{\beta }}, {\varvec{\alpha }})\) with respect to \({\varvec{\beta }}\) in step (2) of the LLA algorithm is given by

(a):

Set initial value \({\varvec{\beta }}^{(0)}=(\beta _1^{(0)},\ldots ,\beta _p^{(0)})^\textsf {T}\);

(b):

Given \({\varvec{\beta }}^{(m)}=(\beta _1^{(m)},\ldots ,\beta _p^{(m)})^\textsf {T}\), for \(k=1,\ldots ,p\), compute

$$\begin{aligned} \beta _k^{(m+1)}={\hat{\gamma }}_k((\beta _1^{(m+1)},\ldots ,\beta _{k-1}^{(m+1)})^\textsf {T},(\beta _{k+1}^{(m)},\ldots ,\beta _p^{(m)})^\textsf {T},{\varvec{\alpha }}), \end{aligned}$$

where \({\hat{\gamma }}_k({\varvec{\zeta }},{\varvec{\eta }},{\varvec{\alpha }})\) is defined in (B.2).

Then, set \({\varvec{\beta }}^{(m+1)}=(\beta _1^{(m+1)},\ldots ,\beta _p^{(m+1)})^\textsf {T};\)

(c):

Repeat step (b) till \(\Vert {\varvec{\beta }}^{(m+1)}-{\varvec{\beta }}^{(m)}\Vert <10^{-6}\).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, T., Yuan, X. & Sun, L. Variable selection for semiparametric accelerated failure time models with nonignorable missing data. J. Korean Stat. Soc. 53, 100–131 (2024). https://doi.org/10.1007/s42952-023-00238-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42952-023-00238-z

Keywords

Navigation