Abstract
In this paper, our goal is to estimate the homogeneous parameter and cluster the heterogeneous parameters in a partially heterogeneous single index model (PHSIM). To achieve the goal, the minimization criterion for such a single index model is first transformed into a least-squares optimization problem in the population form. Based on the least-squares objective function, we introduce an empirical version for the PHSIM. By minimizing such an empirical version, we estimate the homogeneous parameter and the subgroup-averages of the heterogeneous index directions, and then use a fusion penalized method to identify the subgroup structure of the PHSIM. By the proposed methodologies, the homogeneous parameter and the heterogeneous index directions can be consistently estimated, and the heterogeneous parameters can be consistently clustered. Moreover, the new clustering procedure is simple and robust. Simulation studies are carried out to examine the performance of the proposed methodologies.
Similar content being viewed by others
References
Boyd S, Parikh N, Chu E (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations Trends Mach Learn 3(1):1–122
Chen G, You J (2005) An asymptotic theory for semiparametric generalized least squares estimation in partially linear regression models. Stat Pap 46(2):173–193
Chi EC, Lange K (2015) Splitting methods for convex clustering. J Comput Graph Stat 24(4):994–1013
Cook RD, Ni L (2005) Sufficient dimension reduction via inverse regression: a minimum discrepancy approach. J Am Stat Assoc 100(470):410–428
Engle RF, Granger CW, Rice J, Weiss A (1986) Semiparametric estimates of the relation between weather and electricity sales. J Am Stat Assoc 81(394):310–320
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Hall P, Li KC (1993) On almost linearity of low dimensional projections from high dimensional data. Ann Stat 21(2):867–889
Kravitz RL, Duan N, Braslow J (2004) Evidence-based medicine, heterogeneity of treatment effects, and the trouble with averages. Milbank Q 82(4):661–687
Lagakos SW (2006) The challenge of subgroup analyses-reporting without distorting. N Engl J Med 354(16):1667
Lee ER, Noh H, Park BU (2014) Model selection via Bayesian information criterion for quantile regression models. J Am Stat Assoc 109(505):216–229
Lin L, Dong P, Song Y, Zhu L (2017) Upper expectation parametric regression. Stat Sin 27(3):1265–1280
Ma S, Huang J, Zhang Z, Liu M (2019) Exploration of heterogeneous treatment effects via concave fusion. Int J Biostat 16(1):
Radchenko P, Mukherjee G (2017) Convex clustering via l1 fusion penalization. Stat Methodol Ser B 79(5):1527–1546
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Rothwell PM (2005) Subgroup analysis in randomised controlled trials: importance, indications, and interpretation. Lancet 365(9454):176–186
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B (Statistical Methodology)
Van der Vaart AW (2000) Asymptotic statistics, vol 3. Cambridge University Press, Cambridge
Wang H, Li R, Tsai CL (2007) Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94(3):553–568
Wang T, Xu PR, Zhu LX (2012) Non-convex penalized estimation in high-dimensional models with single-index structure. J Multivar Anal 109:221–235
Wang T, Zhang J, Liang H, Zhu L (2015) Estimation of a groupwise additive multiple-index model and its applications. Stat Sin 25(2):551–566
Xie H, Huang J (2009) SCAD-penalized regression in high-dimensional partially linear models. Ann Stat 37(2):673–696
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The research was supported by NNSF Projects (11971265, 11901356) of China.
Appendix
Appendix
Proof of Proposition 1
According to the Proposition 2 in Wang et al. (2015), under conditions (B0),(B1)and (B2), there exist constants \(\phi _1\ne 0\) and \(\phi _2\ne 0\) such that \(\varvec{\rho }_{LS}=(\phi _1\varvec{\beta }^T,\phi _2\varvec{\theta }^T)^T\). In the spirit of the justification of Theorem 2 of Wang et al. (2015), we only give a sketch to prove \(\phi _1=1\).
(I) Note that
When the condition (A1) of the Proposition 1 holds,
Clearly, we have \(\phi _1=1\).
(II) Under the condition (A2) of the Proposition 1, define
and
where \(\Sigma ={\begin{pmatrix}\Sigma _{X,X}&{}\Sigma _{X,Z}\\ \Sigma _{Z,X}&{}\Sigma _{Z,Z}\\ \end{pmatrix}}\). Because the link function \(g_0(\cdot )\) is analytic, it suffices to consider the case where \(g_0(t)=\Sigma _{k=1}^{\infty }a_kt^k\). Under the condition (A2) of the Proposition 1, the bivariate random vector \((\tilde{X},\tilde{Z})=(X^T\varvec{\beta }, Z^T\varvec{\theta })\) has an elliptically symmetric distribution with mean vector zero. It follows that \(E(\tilde{Z}^{k+1})=0\) if k is even; \(E(\tilde{Z}^{k+1})=c_r\{E(\tilde{Z}^2)\}^r\) and \(E(\tilde{X}\tilde{Z}^k)=c_rE(\tilde{X}\tilde{Z})\{E(\tilde{Z}^2)\}^{r-1}\) for some constant \(c_r\) if \(k+1=2r\). As a result, we have
where the summation \(\sum _{k+1=2r}\) is taken over the set \(\{k:k\ge 1, k+1=2r\}.\) Therefore, we can take \(\phi _2=\sum _{k+1=2r}a_kc_r\{E(\tilde{Z}^2)\}^{r-1}.\) We have completed the proof of the Proposition 1. \(\square \)
Proof of Theorem 1
Note that \(\varvec{\theta }\) is heterogeneous. By objective function (2.4), we have the following empirical versions:
Then
By the result above and estimator (2.7), we get
Because
we can obtain that \(\sqrt{n}\Phi (\widehat{\varvec{\beta }}-\varvec{\beta })\) is asymptotically identically distributed as
where the weight \(\Omega _i=\Big (\varvec{\theta }_i-(\overline{\mathbf {Z} \mathbf {Z}^T})^{-1}\frac{1}{n} \sum _{i=1}^n \mathbf {z}_i\mathbf {z}_i^T\varvec{\theta }_i\Big )\). Note that \((\mathbf {x}_i^T, \mathbf {z}_i^T)^T, i=1, \cdots , n\), are independent and identically distributed as \((X^T,Z^T)^T\). We get
where \(\Delta =E\bigg (\big (ZZ^T-E(ZZ^T)\big )\big (ZZ^T-E(ZZ^T)\big )\bigg )\). The above indicates that
and then \(\Omega _i\xrightarrow {P}(\varvec{\theta }_i-\overline{\varvec{\theta }})\). Thus, the weighted sum \(\frac{1}{\sqrt{n}}\sum _{i=1}^n\mathbf {x}_i\mathbf {z}_i^T\Omega _i\) can be expressed as
and then is asymptotically identically distributed as \(\frac{1}{\sqrt{n}}\sum _{i=1}^n\mathbf {x}_i\mathbf {z}_i^T(\varvec{\theta }_i-\overline{\varvec{\theta }})\). It can be seen that
and
Therefore,
In addition, we can also get that
where \(\Lambda =Var\bigg (X-E\big (XZ^T\big )\big (E(ZZ^T)\big )^{-1}Z\bigg )\). By (6.6), (6.7), (6.8) and the uncorrelation between (X, Z) and \(\varepsilon \), we get that
Thus, the theorem is proved. \(\square \)
Proof of Theorem 2
We only prove the first result. The proof of the second result is similar. Based on the objective function (2.4), the estimator (2.8) can be denoted as the following empirical form:
From the proof of the Theorem 1, \(\sqrt{n}\Phi (\widehat{\varvec{\beta }}-\varvec{\beta })\) is asymptotically identically distributed as
By the above results, we can easily see that \(\sqrt{n}(\widehat{\overline{\varvec{\theta }}}-\overline{\varvec{\theta }})\) is asymptotically identically distributed as
Thus,
\(\square \)
Proof of Theorem 3
For given \(\beta ^0\), based on the sample \(W_i(\beta ^0)=\mathbf {z}_i(y_i-\mathbf {x}_i^T\beta ^0), i=1,\cdots ,n\) of \(W(\beta ^0)\), the corresponding estimators of \(L, R, G_{L,R}, \cdots \), are denoted by \(\tilde{L}(\beta ^0), \tilde{R}(\beta ^0), \tilde{G}_{L,R}(\beta ^0), \cdots ,\) respectively. Using the sample \(\widehat{W}_i=\mathbf {z}_i(y_i-\mathbf {x}_i\widehat{\beta }), i=1,\cdots ,n\), the corresponding estimators are denoted by \(\widehat{L}, \widehat{R}, \widehat{G}_{L,R}, \cdots \), respectively. By the asymptotic property of the estimator of \(\widehat{\beta }\) given in Theorem 1, and the continuous-mapping theorem for strong consistency (see, e.g., Van der Vaart (2000)), we have \(\widehat{L}-\tilde{L}(\beta ^0)\xrightarrow {a.s.}0, \widehat{R}-\tilde{R}(\beta ^0)\xrightarrow {a.s.}0\) and \(\widehat{G}_{L,R}-\tilde{G}_{L,R}(\beta ^0)\xrightarrow {a.s.}0, \cdots \). Then, for any \(\varepsilon >0\) and for all sufficiently large n, the following inequalities hold almost everywhere:
This implies that the estimator \(\widehat{L}, \widehat{R}, \widehat{G}_{L,R}, \cdots \), can be replaced by \(\tilde{L}(\beta ^0), \tilde{L}(\beta ^0), \tilde{R}(\beta ^0), \tilde{G}_{L,R}(\beta ^0),\cdots \), respectively, in the proof procedure (see Lemma 1-Lemma 3 in supplementary Material of Radchenko and Mukherjee 2017). Then, based on the population \(W(\beta ^0)\), its expectation \(\alpha \) and the sample from \(W(\beta ^0)\), by the proof of Theorem 1 of Radchenko and Mukherjee (2017), we can prove Theorem 3. \(\square \)
Rights and permissions
About this article
Cite this article
Wang, F., Lin, L., Liu, L. et al. Estimation and clustering for partially heterogeneous single index model. Stat Papers 62, 2529–2556 (2021). https://doi.org/10.1007/s00362-020-01203-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-020-01203-2