Sufficient Dimension Folding with Categorical Predictors

Wang, Yuanwen; Xue, Yuan; Yuan, Qingcong; Yin, Xiangrong

doi:10.1007/978-3-030-69009-0_7

Yuanwen Wang³,
Yuan Xue⁴,
Qingcong Yuan⁵ &
…
Xiangrong Yin⁶

360 Accesses

Abstract

In this paper, we study dimension folding for matrix/array structured predictors with categorical variables. The categorical variable information is incorporated into dimension folding for regression and classification. The concepts of marginal, conditional, and partial folding subspaces are introduced, and their connections to central folding subspace are investigated. Three estimation methods are proposed to estimate the desired partial folding subspace. An empirical maximal eigenvalue ratio criterion is used to determine the structural dimensions of the associated partial folding subspace. Effectiveness of the proposed methods is evaluated through simulation studies and an application to a longitudinal data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

F. Chiaromonte, R.D. Cook, B. Li, Sufficient dimension reduction in regressions with categorical predictors. Ann. Stat. 30, 475–497 (2002)
Article MathSciNet Google Scholar
R.D. Cook, On the interpretation of regression plots. J. Am. Stat. Assoc. 89, 177–189 (1994)
Article MathSciNet Google Scholar
R.D. Cook, Graphics for regressions with a binary response. J. Am. Stat. Assoc. 91, 983–992 (1996)
Article MathSciNet Google Scholar
R.D. Cook, Regression Graphics: Ideas for Studying Regressions Through Graphics (Wiley, New York, 1998)
Book Google Scholar
R.D. Cook, Testing predictor contribution in sufficient dimension reduction. Ann. Stat. 32, 1062–1092 (2004)
MathSciNet MATH Google Scholar
R.D. Cook, S. Weisberg, Discussion of “Sliced inverse regression for dimension reduction”. J. Am. Stat. Assoc. 86, 328–332 (1991)
MATH Google Scholar
S. Ding, R.D. Cook, Dimension folding PCA and PFC for matrix-valued predictors. Stat. Sin. 24, 463–492 (2014)
MathSciNet MATH Google Scholar
S. Ding, R.D. Cook, Tensor sliced inverse regression. J. Multivar. Anal. 133, 216–231 (2015)
Article MathSciNet Google Scholar
T.R. Fleming, D.P. Harrington, Counting process and survival analysis (Wiley, New York, 1991)
MATH Google Scholar
IBM Big Data and Analytics Hub. The Four V’s of Big Data (2014). http://www.ibmbigdatahub.com/infographic/four-vs-big-data
K.-C. Li, Sliced inverse regression for dimension reduction (with discussion). J. Am. Stat. Assoc. 86, 316–342 (1991)
Article Google Scholar
B. Li, S. Wang, On directional regression for dimension reduction. J. Am. Stat. Assoc. 102, 997–1008 (2007)
Article MathSciNet Google Scholar
L. Li, X. Yin, Longitudinal data analysis using sufficient dimension reduction. Comput. Stat. Data Anal. 53, 4106–4115 (2009)
Article MathSciNet Google Scholar
B. Li, R.D. Cook, F. Chiaromonte, Dimension reduction for the conditional mean in regression with categorical predictors. Ann. Stat. 31, 1636–1668 (2003)
Article MathSciNet Google Scholar
B. Li, H. Zha, C. Chairomonte, Contour regression: a general approach to dimension reduction. Ann. Stat. 33, 1580–1616 (2005)
Article MathSciNet Google Scholar
B. Li, S. Wen, L. Zhu, On a projective resampling method for dimension reduction with multivariate responses. J. Am. Stat. Assoc. 103, 1177–1186 (2008)
Article MathSciNet Google Scholar
B. Li, M. Kim, N. Altman, On dimension folding of matrix- or array-valued statistical objects. Ann. Stat. 38, 1094–1121 (2010)
MathSciNet MATH Google Scholar
W. Luo, B. Li, Combining eigenvalues and variation of eigenvectors for order determination. Biometrika 103, 875–887 (2016)
Article MathSciNet Google Scholar
R. Luo, H. Wang, C.L. Tsai, Contour projected dimension reduction. Ann. Stat. 37, 3743–3778 (2009)
Article MathSciNet Google Scholar
J.R. Magnus, H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics, 2nd edn. (Wiley, New York, 1999)
MATH Google Scholar
P.A. Murtaugh, E.R. Dickson, G.M. Van Dam, M. Malinchoc, P.M. Grambsch, A.L. Langworthy, C.H. Gips, Primary biliary cirrhosis: prediction of short-term survival based on repeated patient visits. Hepatology 20, 126–134 (1994)
Article Google Scholar
Y. Pan, Q. Mai, X. Zhang, Covariate-adjusted tensor classification in high dimensions. J. Am. Stat. Assoc. 114, 1305–1319 (2019)
Article MathSciNet Google Scholar
R.M. Pfeiffer, L. Forzani, E. Bura, Sufficient dimension reduction for longitudinally measured predictors. Stat. Med. 31, 2414–2427 (2012)
Article MathSciNet Google Scholar
J.A. Talwalkar, K.D. Lindor, Primary biliary cirrhosis. Lancet 362, 53–61 (2003)
Article Google Scholar
Y. Xia, H. Tong, W. Li, L. Zhu, An adaptive estimation of dimension reduction. J. R. Stat. Soc. Ser. B 64, 363–410 (2002)
Article MathSciNet Google Scholar
Y. Xue, X. Yin, Sufficient dimension folding for regression mean function. J. Comput. Graph. Stat. 23, 1028–1043 (2014)
Article MathSciNet Google Scholar
Y. Xue, X. Yin, Sufficient dimension folding for a functional of conditional distribution of matrix- or array-valued objects. J. Nonparametr. Stat. 27, 253–269 (2015)
Article MathSciNet Google Scholar
Y. Xue, X. Yin, X. Jiang, Ensemble sufficient dimension folding methods for analyzing matrix-valued data. Comput. Stat. Data Anal. 103, 193–205 (2016)
Article MathSciNet Google Scholar
Z. Ye, R.E. Weiss, Using the bootstrap to select one of a new class of dimension reduction methods. J. Am. Stat. Assoc. 98, 968–979 (2003)
Article MathSciNet Google Scholar
Y. Zhu, P. Zeng, Fourier methods for estimating the central subspace and the central mean subspace in regression. J. Am. Stat. Assoc. 101, 1638–1651 (2006)
Article MathSciNet Google Scholar

Download references

Acknowledgements

Yin’s work is supported in part by an NSF grant CIF-1813330. Xue’s work is supported in part by the Fundamental Research Funds for the Central Universities in University of International Business and Economics (CXTD11-05).

Author information

Authors and Affiliations

Department of Statistics, University of Georgia, Athens, GA, USA
Yuanwen Wang
School of Statistics, University of International Business and Economics, Beijing, China
Yuan Xue
Department of Statistics, Miami University, Oxford, OH, USA
Qingcong Yuan
Dr. Bing Zhang Department of Statistics, University of Kentucky, Lexington, KY, USA
Xiangrong Yin

Authors

Yuanwen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Xue
View author publications
You can also search for this author in PubMed Google Scholar
Qingcong Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xiangrong Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiangrong Yin .

Editor information

Editors and Affiliations

Applied Statistics, Vienna University of Technology, Vienna, Wien, Austria
Efstathia Bura
Department of Statistics, The Pennsylvania State University, University Park, PA, USA
Bing Li

8 Appendix

1.1 8.1 Proofs

The following equivalent relationship will be repeatedly used in the proof of Proposition 1. For generic random variables V ₁, V ₂, V ₃, and V ₄, Cook (1998) showed that

(8.1)

Proof of Proposition 1 part (a)

In Eq. (8.1), let

$$\displaystyle \begin{aligned} \left\{ \begin{array}{c} V_1 = vec({\mathbf X}), \\ V_2 = W, \\ V_3 = P_{{S_R} \otimes {S_L}} vec({\mathbf X}), \\ V_4 = Y, \end{array} \right. \end{aligned}$$

and apply the first part of Eq. (8.1) and the equivalent relationship that , we have

Therefore, under the assumption that

we have $ S_{Y|\circ X} \subseteq S_{Y|\circ X }^{(W)} $, $ S_{Y|X \circ } \subseteq S_{Y|X \circ }^{(W)} $ and $S_{Y|\circ X \circ } \subseteq S_{Y|\circ X \circ }^{(W)} $.Now in Eq. (8.1), let

$$\displaystyle \begin{aligned} \left\{ \begin{array}{c} V_1 = Y, \\ V_2 = vec({\mathbf X}), \\ V_3 =P_{{S_R} \otimes {S_L}} vec({\mathbf X}), \\ V_4 = W, \end{array} \right. \end{aligned}$$

and again apply the first part of Eq. (8.1) and the equivalent relationship that , we have

Therefore, under the assumption that

there exist $ S_{Y|\circ X} \subseteq S_{Y|\circ X }^{(W)}$, $ S_{Y|X \circ } \subseteq S_{Y|X \circ }^{(W)} $ and $ S_{Y|\circ X \circ } \subseteq S_{Y|\circ X \circ }^{(W)} $. □

Proof of Proposition 1 part (b)

In Eq. (8.1), let

$$\displaystyle \begin{aligned} \left\{ \begin{array}{c} V_1 = Y, \\ V_2 = W, \\ V_3 = P_{{S_R} \otimes {S_L}} vec({\mathbf X}), \\ V_4 = vec({\mathbf X}), \end{array} \right. \end{aligned}$$

and apply the first part of Eq. (8.1) and the equivalent relationship that , we have

Therefore, under the assumption that , we also have X). Thus,

and further we have $S_{Y|\circ {\mathbf X} }^{(W)} \subseteq S_{Y|\circ {\mathbf X}} $, $ S_{Y|{\mathbf X} \circ }^{(W)} \subseteq S_{Y|{\mathbf X} \circ } $ and $ S_{Y|\circ {\mathbf X} \circ }^{(W)} \subseteq S_{Y|\circ {\mathbf X} \circ } $. □

Proof of Proposition 1 part (c)

For generic subspace S _L and S _R, we have

(8.2)

Since $S_{Y|\circ {\mathbf X} \circ }^{(W)} = S_{Y|{\mathbf X} \circ }^{(W)} \otimes S_{Y|\circ {\mathbf X} }^{(W)} $, $S_{Y|\circ {\mathbf X} }^{(W)}$, and $S_{Y|{\mathbf X} \circ }^{(W)}$ satisfy the left-hand side of Eq. (8.2) by their definitions, they also satisfy

This implies that, for ∀w = 1, …, C, $S_{Y_w|\circ {\mathbf X}_w} \subseteq S_{Y|\circ {\mathbf X}}^{(W)}$, $ S_{Y_w|{\mathbf X}_w \circ } \subseteq S_{Y|{\mathbf X} \circ }^{(W)} $ and thus $\oplus _{w=1}^{C} S_{Y_w|\circ {\mathbf X}_w} \subseteq S_{Y|\circ {\mathbf X}}^{(W)} $, $ \oplus _{w=1}^{C} S_{Y_w| {\mathbf X}_w \circ } \subseteq S_{Y|{\mathbf X} \circ }^{(W)} $. Therefore,

$$\displaystyle \begin{aligned} (\oplus_{w=1}^{C} S_{Y_w| {\mathbf X}_w \circ }) \otimes (\oplus_{w=1}^{C} S_{Y_w|\circ {\mathbf X}_w}) \subseteq S_{Y|{\mathbf X} \circ}^{(W)} \otimes S_{Y|\circ {\mathbf X}}^{(W)}=S_{Y|\circ {\mathbf X} \circ}^{(W)} . \end{aligned}$$

Because that $ S_{Y_w| \circ {\mathbf X}_w } \subseteq (\oplus _{w=1}^{C} S_{Y_w|\circ {\mathbf X}_w}) $ and $S_{Y_w| {\mathbf X}_w \circ } \subseteq (\oplus _{w=1}^{C} S_{Y_w| {\mathbf X}_w \circ })$, for ∀w = 1, …, C, the two direct sum spaces also satisfy the right-hand side of Eq. (8.2). Therefore, we have

This implies the other containing relationship

$$\displaystyle \begin{aligned} S_{Y|\circ {\mathbf X} \circ}^{(W)} \subseteq (\oplus_{w=1}^{C} S_{Y_w| {\mathbf X}_w \circ }) \otimes (\oplus_{w=1}^{C} S_{Y_w|\circ {\mathbf X}_w}) . \end{aligned}$$

We then conclude that $ S_{Y|\circ {\mathbf X} \circ }^{(W)} = (\oplus _{w=1}^{C} S_{Y_w| {\mathbf X}_w \circ }) \otimes (\oplus _{w=1}^{C} S_{Y_w|\circ {\mathbf X}_w}) $. □

Proof of Proposition 1 part (d)

For generic subspace S _L and S _R, we have

(8.3)

Since $S_{Y|\circ {\mathbf X} \circ }^{(W)} = S_{Y|{\mathbf X} \circ }^{(W)} \otimes S_{Y|\circ {\mathbf X} }^{(W)} $, $S_{Y|\circ {\mathbf X} }^{(W)}$, and $S_{Y|{\mathbf X} \circ }^{(W)}$ satisfy the left-hand side of Eq. (8.3) by their definitions, they also satisfy

This implies that, for ∀w = 1, …, C, $ S_{Y_w|\circ {\mathbf X}_w} \subseteq S_{Y|\circ {\mathbf X}}^{(W)} $, $S_{Y_w|{\mathbf X}_w \circ } \subseteq S_{Y|{\mathbf X} \circ }^{(W)}$ and thus $S_{Y_w|{\mathbf X}_w \circ } \otimes S_{Y_w|\circ {\mathbf X}_w}\subseteq S_{Y| {\mathbf X} \circ }^{(W)} \otimes S_{Y|\circ {\mathbf X}}^{(W)} = S_{Y|\circ {\mathbf X} \circ }^{(W)}$. Therefore,

$$\displaystyle \begin{aligned} \oplus_{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ} \otimes S_{Y_w|\circ {\mathbf X}_w} ) \subseteq S_{Y|\circ {\mathbf X} \circ }^{(W)}. \end{aligned}$$

Because that $ S_{Y_w| {\mathbf X}_w \circ } \otimes S_{Y_w|\circ {\mathbf X}_w} \subseteq \oplus _{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ } \otimes S_{Y_w|\circ {\mathbf X}_w} )$ and $S_{Y_w| {\mathbf X}_w \circ } \otimes S_{Y_w|\circ {\mathbf X}_w} $ satisfy the second right-hand equation (8.3), for ∀w = 1, …, C, we have

$$\displaystyle \begin{aligned} \oplus_{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ} \otimes S_{Y_w|\circ {\mathbf X}_w} ) =span({\mathbf U}^*) \subseteq S_{Y|\circ {\mathbf X} \circ }^{(W)} = S_{Y| {\mathbf X} \circ }^{(W)} \otimes S_{Y|\circ {\mathbf X}}^{(W)}, \end{aligned}$$

where U ^∗ is a random basis matrix of space $\oplus _{w=1}^C span(\beta _w \otimes \alpha _w)$ in $\mathbb R^{p_l p_r \times k}$. Therefore, by the definition of Kronecker envelope in Li et al. (2010), the Kronecker envelope of U ^∗ with respect to integer p _l and p _r, that is $\epsilon ^{\otimes }_{p_l, p_r} ({\mathbf U}^*)= S_{{\mathbf U}^* \circ } \otimes S_{\circ {\mathbf U}^*} $, satisfies the following conditions:1. $span({\mathbf U}^*) = \oplus _{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ } \otimes S_{Y_w|\circ {\mathbf X}_w} ) \subseteq S_{{\mathbf U}^* \circ } \otimes S_{\circ {\mathbf U}^*}$ almost surely.2. If there is another pair of subspaces $S_R \in \mathbb R^{p_r}$ and $S_L \in \mathbb R^{p_l}$ that satisfies condition 1, then $S_{{\mathbf U}^* \circ } \otimes S_{\circ {\mathbf U}^*} \subseteq S_R \otimes S_L$.However, from the previous proof

$$\displaystyle \begin{aligned} span({\mathbf U}^*)=\oplus_{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ} \otimes S_{Y_w|\circ {\mathbf X}_w} ) \subseteq S_{Y|{\mathbf X} \circ }^{(W)} \otimes S_{Y|\circ {\mathbf X}}^{(W)} = S_{Y|\circ {\mathbf X} \circ }^{(W)} , \end{aligned}$$

and by definition, $S_{Y|{\mathbf X} \circ }^{(W)} \in \mathbb R^{p_r}$ and $S_{Y|\circ {\mathbf X}}^{(W)} \in \mathbb R^{p_l}$. Therefore,

$$\displaystyle \begin{aligned} \epsilon^{\otimes}_{p_l, p_r} ({\mathbf U}^*) = S_{{\mathbf U}^* \circ} \otimes S_{\circ {\mathbf U}^*} \subseteq S_{Y|{\mathbf X} \circ }^{(W)} \otimes S_{Y|\circ {\mathbf X}}^{(W)} = S_{Y|\circ {\mathbf X} \circ }^{(W)}. \end{aligned}$$

On the other hand, for ∀w = 1, …, C,

$$\displaystyle \begin{aligned} S_{Y_w| {\mathbf X}_w \circ} \otimes S_{Y_w|\circ {\mathbf X}_w} \subseteq \oplus_{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ} \otimes S_{Y_w|\circ {\mathbf X}_w} )= span({\mathbf U}^*) \subseteq S_{{\mathbf U}^* \circ} \otimes S_{\circ {\mathbf U}^*} {=} \epsilon^{\otimes}_{p_l, p_r} ({\mathbf U}^*). \end{aligned}$$

Therefore, $S_{\circ {\mathbf U}^*}$ and $S_{{\mathbf U}^* \circ }$ satisfy the second right-hand side of Eq. (8.3). And for the left-hand side of Eq. (8.3), we have

Thus $ S_{Y|\circ {\mathbf X} }^{(W)} \subseteq S_{\circ {\mathbf U}^*} $ and $ S_{Y|{\mathbf X} \circ }^{(W)} \subseteq S_{{\mathbf U}^* \circ } $, which implies the relationship

$$\displaystyle \begin{aligned} S_{Y|\circ {\mathbf X} \circ}^{(W)} \subseteq S_{{\mathbf U}^* \circ} \otimes S_{\circ {\mathbf U}^*} =\epsilon^{\otimes}_{p_l, p_r} ({\mathbf U}^*). \end{aligned}$$

Therefore,

$$\displaystyle \begin{aligned} S_{Y|\circ {\mathbf X} \circ}^{(W)} = S_{{\mathbf U}^* \circ} \otimes S_{\circ {\mathbf U}^*} = \epsilon^{\otimes}_{p_l, p_r} ({\mathbf U}^*). \end{aligned}$$

This concludes that $S_{Y|\circ {\mathbf X} \circ }^{(W)}$ equals to the Kronecker envelope of $\oplus _{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ } \otimes S_{Y_w|\circ {\mathbf X}_w})$. Thus by estimating $\oplus _{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ } \otimes S_{Y_w|\circ {\mathbf X}_w})$, we are targeting a proper subspace of $S_{Y|\circ {\mathbf X} \circ }^{(W)}$. For example, estimation on $\oplus _{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ } \otimes S_{Y_w|\circ {\mathbf X}_w})$ does not recover $S_{Y|\circ {\mathbf X} \circ }^{(W)}$ exhaustively. □

Proof of Proposition 1 part (e)

First note that if for each W = w, $span({\mathbf U}_w) \subseteq S_{Y_w | vec({\mathbf X})_w}$ almost surely, then from part (d) of Proposition 1, we have

$$\displaystyle \begin{aligned} \oplus_{w=1}^C span ({\mathbf U}_w) &\subseteq \oplus_{w=1}^c S_{Y_w|vec({\mathbf X})_w} = S_{Y|vec({\mathbf X})}^{(W)}\\ &\subseteq \oplus_{w=1}^c S_{Y_w|{{\mathbf X}_w} \circ} \otimes S_{Y_w| {{\mathbf X}_w} \circ}\\ &\subseteq S_{Y| \circ {\mathbf X} \circ}^{(W)} = (\oplus_{w=1}^c S_{Y_w| {\mathbf X}_w \circ} ) \otimes (\oplus_{w=1}^c S_{Y_w| \circ {\mathbf X}_w}) \end{aligned} $$

almost surely. Therefore, by the definition of Kronecker product, we have

$$\displaystyle \begin{aligned} S_{\circ {\mathbf U}_{new}} &\subseteq \oplus_{w=1}^c S_{Y_w| \circ {\mathbf X}_w},\\ S_{{\mathbf U}_{new} \circ} &\subseteq \oplus_{w=1}^c S_{Y_w| {\mathbf X}_w \circ} \text{and}\\ {\epsilon}^\otimes ({\mathbf U}_{new}) &= S_{{\mathbf U}_{new} \circ} \otimes S_{\circ {\mathbf U}_{new}} \subseteq S_{Y| \circ {\mathbf X} \circ}^{(W)} = (\oplus_{w=1}^c S_{Y_w| {\mathbf X}_w \circ} ) \otimes (\oplus_{w=1}^c S_{Y_w| \circ {\mathbf X}_w}). \end{aligned} $$

□

Proof of Theorem 1

Using double expectation formula, we can further write the objective function as

$$\displaystyle \begin{aligned} E_W (E_{{\mathbf U}_w}||A {\mathbf U}_w-A (\beta \otimes \alpha) f_w(Z) |W=w||{}^2), \end{aligned}$$

where the inside expectation is with respect to random matrices U ₁, …, U _C and the outside expectation is with respect to categorical variable W. This is equivalent to

$$\displaystyle \begin{aligned} \sum_{w=1}^C p_w (E||A {\mathbf U}_w-A (\beta \otimes \alpha) f_w(Z) |W=w||{}^2). \end{aligned} $$

(8.4)

Assume 𝜖 ^⊗(U ^∗) = span(β ₀ ⊗ α ₀). Because that, for each W = w, $ {\mathbf U}_w \subseteq \oplus _{w=1}^C span({\mathbf U}_w)\subseteq {\epsilon }^{\otimes } ({\mathbf U}^*) = span(\beta _0 \otimes \alpha _0) $ and the elements of U _w are measurable with respect to Z, there exists a random projection matrix $\phi _w(Z) \in L^{ d_l d_r \times k_w}$ such that U _w = (β ₀ ⊗ α ₀)ϕ _w(Z), which is equivalent to A U _w = A(β ₀ ⊗ α ₀)ϕ _w(Z).

Thus (4.3), or equivalently (8.4), reaches its minimum 0 within the range of (α, β, f ₁, …, f _C) given in the theorem. This implies that any minimizer $(\alpha ^*, \beta ^*, f_1^*,\ldots , f_C^*)$ of (4.3) must satisfy $A(\beta ^* \otimes \alpha ^* ) f_w^* (Z) = A{\mathbf U}_w$ almost surely for every W = w and, consequently, (β ₀ ⊗ α ₀)ϕ _w(Z) = (β ^∗⊗ α ^∗)f _w ^∗(Z) almost surely. But this means that span(β ^∗⊗ α ^∗) contains each U _w almost surely; thus we have $\oplus _{w=1}^C span({\mathbf U}_w) \subseteq span(\beta ^* \otimes \alpha ^*)$. Since span(β ^∗⊗ α ^∗) has the same dimensions as 𝜖 ^⊗(U ^∗), the theorem now follows from the uniqueness of the Kronecker envelope. □

1.2 8.2 Additional Simulation and Data Analysis

The following six examples are related to the examples in Sect. 6.1 of simulation studies, showing how the results changes when overlap information changes among individual subspaces.

Example A1

Example A1 almost keep the same experimental setting as in Example 1 but slightly change the conditional distribution of Y given X and W, so that the two conditional central subspaces are overlapped but not identical.

$$\displaystyle \begin{aligned} Y&= X_{11} \times (X_{12} + X_{21} +1 ) + 0.2 \times \epsilon for W=0,\\ Y&= X_{12} \times (X_{13} + X_{22} +1 ) + 0.2 \times \epsilon for W=1. \end{aligned} $$

In this example,

$$\displaystyle \begin{aligned} S_{Y_{w=0}|{\circ} {\mathbf X}_{w=0} {\circ}} & = span(e_1, e_2) \otimes span(e_1, e_2) = span(e_1 \otimes e_1, e_1 \otimes e_2, e_2 \otimes e_1, e_2 \otimes e_2),\\ S_{Y_{w=1}|{\circ} {\mathbf X}_{w=1} {\circ}} & = span(e_2, e_3) \otimes span(e_1, e_2) =span(e_2 \otimes e_1, e_2 \otimes e_2, e_3 \otimes e_1, e_3 \otimes e_2). \end{aligned} $$

The two conditional folding subspace are overlapped, because their left conditional folding subspaces are the same and their right conditional folding subspaces also share one same direction. By part (c) of Proposition 1, we have:

$$\displaystyle \begin{aligned} S_{Y|{\circ}X{\circ}}^{(W)} &= (\oplus_{w=1}^{C}S_{Y_{w}|X_{w} \circ }) \otimes (\oplus_{w=1}^{C}S_{Y_{w}|\circ X_{w}})\\ & = (span(e_1, e_2) \oplus span(e_2, e_3)) \otimes (span(e_1, e_2) \oplus span(e_1, e_2))\\ &= span(e_1, e_2, e_3) \otimes span(e_1, e_2)\\ &= span(e_1 \otimes e_1, e_1 \otimes e_2, e_2 \otimes e_1, e_2 \otimes e_2, e_3 \otimes e_1, e_3 \otimes e_2). \end{aligned} $$

On the other hand, based on part (d) of Proposition 1, we have:

$$\displaystyle \begin{aligned} S_{Y|{\circ}X_{w=0}{\circ}} \oplus S_{Y|{\circ}X_{w=1}{\circ}}& = (span(e_1, e_2) \otimes span(e_1, e_2)) \oplus (span(e_2, e_3) \otimes span(e_1, e_2)) \\ &= span(e_1 \otimes e_1, e_1 \otimes e_2, e_2 \otimes e_1, e_2 \otimes e_2, e_3 \otimes e_1, e_3 \otimes e_2)\\ &= S_{Y|{\circ}X{\circ}}^{(W)}. \end{aligned} $$

Therefore, all three methods can still recover $S_{Y|{\circ }X{\circ }}^{(W)}$ exhaustively. Again, for vectorized data,

$$\displaystyle \begin{aligned} S_{Y_{w=0}| vec({\mathbf X})_{w=0}} &= span(e_1 \otimes e_1, e_1 \otimes e_2 + e_2 \otimes e_1) \subsetneq S_{Y_{w=0}|{\circ} {\mathbf X}_{w=0} {\circ}},\\ S_{Y_{w=1}| vec({\mathbf X})_{w=1}} &= span(e_2 \otimes e_1, e_2 \otimes e_2 + e_3 \otimes e_1) \subsetneq S_{Y_{w=1}|{\circ} {\mathbf X}_{w=1} {\circ}}, \end{aligned} $$

thus

$$\displaystyle \begin{aligned} S_{Y| vec({\mathbf X}) } ^{(W)} &= span(e_1 \otimes e_1, e_1 \otimes e_2, e_2 \otimes e_1, e_2 \otimes e_2 + e_3 \otimes e_1) \\ &\subsetneq S_{Y|{\circ}X_{w=0}{\circ}} \oplus S_{Y|{\circ}X_{w=1}{\circ}} = S_{Y|{\circ}X{\circ}}^{(W)}. \end{aligned} $$

Table 7 summarizes the simulation results for Example A1. Note that all three methods perform worse than Example 1 in terms of accuracy and variability. This is due to the fact that they are estimating a bigger partial folding subspace than in Example 1. We observe that individual ensemble method and LSFA method still provide similar accuracy and they outperform objective function method.

Table 7 Example A1, accuracy of estimates on partial folding subspace

Full size table

Example A2

Example A2 keeps the two conditional central subspaces overlapped but to a smaller extent. This can be achieved by setting conditional distribution as

$$\displaystyle \begin{aligned} Y&= X_{11} \times (X_{12} + X_{21} +1 ) + 0.2 \times \epsilon for W=0,\\ Y&= X_{22} \times (X_{23} + X_{32} +1 ) + 0.2 \times \epsilon for W=1. \end{aligned} $$

In this example,

$$\displaystyle \begin{aligned} S_{Y_{w=0}|{\circ} {\mathbf X}_{w=0} {\circ}} & = span(e_1, e_2) \otimes span(e_1, e_2) = span(e_1 \otimes e_1, e_1 \otimes e_2, e_2 \otimes e_1, e_2 \otimes e_2)\\ S_{Y_{w=1}|{\circ} {\mathbf X}_{w=1} {\circ}} & = span(e_2, e_3) \otimes span(e_2, e_3) =span(e_2 \otimes e_2, e_2 \otimes e_3, e_3 \otimes e_2, e_3 \otimes e_3), \end{aligned} $$

The two conditional folding subspace are slightly overlapped, but none of their left (right) conditional folding subspaces are the same. By part (c) of Proposition 1, we have:

$$\displaystyle \begin{aligned} S_{Y|{\circ}X{\circ}}^{(W)} &= (\oplus_{w=1}^{C}S_{Y_{w}|X_{w} \circ }) \otimes (\oplus_{w=1}^{C}S_{Y_{w}|\circ X_{w}})\\ & = (span(e_1, e_2) \oplus span(e_2, e_3)) \otimes (span(e_1, e_2) \oplus span(e_2, e_3))\\ &= span(e_1, e_2, e_3) \otimes span(e_1, e_2, e_3)\\ &= span(e_i \otimes e_j) i,j =1,\ldots,3. \end{aligned} $$

On the other hand, based on part (d) of Proposition 1, we have:

$$\displaystyle \begin{aligned} S_{Y|{\circ}X_{w=0}{\circ}} \oplus S_{Y|{\circ}X_{w=1}{\circ}}& = (span(e_1, e_2) \otimes span(e_1, e_2)) \oplus (span(e_2, e_3) \otimes span(e_2, e_3)) \\ &= span(e_1 \otimes e_1, e_1 \otimes e_2, e_2 \otimes e_1, e_2 \otimes e_2, e_2 \otimes e_3, e_3 \otimes e_2, e_3 \otimes e_3)\\ &\subsetneq S_{Y|{\circ}X{\circ}}^{(W)}. \end{aligned} $$

In this case, only individual ensemble method and objective function method recover $S_{Y|{\circ }X{\circ }}^{(W)}$ exhaustively. Since LSFA method is targeting on space $S_{Y|{\circ }X_{w=0}{\circ }} \oplus S_{Y|{\circ }X_{w=1}{\circ }}$, it is a smaller subspace according to part (d) of Proposition 1 and the experiment setting above. Therefore, LSFA method estimates a smaller subspace than the desired partial folding subspace $S_{Y|{\circ }X{\circ }}^{(W)} $. In practice, the accuracy of LSFA method may not be defected since we use the results from individual ensemble method as initial values. Again, for vectorized data,

$$\displaystyle \begin{aligned} S_{Y_{w=0}| vec({\mathbf X})_{w=0}} &= span(e_1 \otimes e_1, e_1 \otimes e_2 + e_2 \otimes e_1) \subsetneq S_{Y_{w=0}|{\circ} {\mathbf X}_{w=0} {\circ}},\\ S_{Y_{w=1}| vec({\mathbf X})_{w=1}} &= span(e_2 \otimes e_2, e_2 \otimes e_3 + e_3 \otimes e_2) \subsetneq S_{Y_{w=1}|{\circ} {\mathbf X}_{w=1} {\circ}}, \end{aligned} $$

thus

$$\displaystyle \begin{aligned} S_{Y| vec({\mathbf X}) } ^{(W)} &= span(e_1 \otimes e_1, e_1 \otimes e_2 + e_2 \otimes e_1, e_2 \otimes e_3 + e_3 \otimes e_2) \\ &\subsetneq S_{Y|{\circ}X_{w=0}{\circ}} \oplus S_{Y|{\circ}X_{w=1}{\circ}} \subsetneq S_{Y|{\circ}X{\circ}}^{(W)}. \end{aligned} $$

Table 8 displays the simulation results for Example A2, where individual ensemble method and LSFA method still outperform objective function method.

Table 8 Example A2, accuracy of estimates on partial folding subspace

Full size table

Example A3

Example A3 constrains that the two conditional central subspaces are as the same as the conditional folding subspaces, i.e., for any w = 0, 1,

$$\displaystyle \begin{aligned} S_{Y_w| vec({\mathbf X})_w} = S_{Y_w|{\circ} {\mathbf X}_w {\circ}}. \end{aligned}$$

However, the partial central subspace is still a proper subspace of partial folding subspace. We can achieve this by constraining the two partial central subspaces to be orthogonal with each other. The conditional distribution of Y given X and W, that is:

$$\displaystyle \begin{aligned} Y&= X_{11} \times (X_{21} +1 ) + 0.2 \times \epsilon for W=0,\\ Y&= X_{32} \times (X_{42} +1 ) + 0.2 \times \epsilon for W=1. \end{aligned} $$

In this case,

$$\displaystyle \begin{aligned} S_{Y_{w=0}| vec({\mathbf X})_{w=0}} = S_{Y_{w=0}|{\circ} {\mathbf X}_{w=0} {\circ}} & = span(e_1) \otimes span(e_1, e_2),\\ S_{Y_{w=1}| vec({\mathbf X})_{w=1}} = S_{Y_{w=1}|{\circ} {\mathbf X}_{w=1} {\circ}} & = span(e_2) \otimes span(e_3, e_4). \end{aligned} $$

Thus,

$$\displaystyle \begin{aligned} S_{Y|{\circ}X_{w=0}{\circ}} \oplus S_{Y|{\circ}X_{w=1}{\circ}} = S_{Y| vec({\mathbf X}) } ^{(W)} = span(e_1 \otimes e_1, e_1 \otimes e_2, e_2 \otimes e_3, e_2 \otimes e_4). \end{aligned}$$

Based on part (c) of Proposition 1, we have:

$$\displaystyle \begin{aligned} S_{Y|{\circ}X{\circ}}^{(W)} & = (span(e_1) \oplus span(e_2)) \otimes (span(e_1, e_2) \oplus span(e_3, e_4)) \\ &= span(e_1 \otimes e_1, e_1 \otimes e_2, e_1 \otimes e_3, e_1 \otimes e_4, e_2 \otimes e_1, e_2 \otimes e_2, e_2 \otimes e_3, e_2 \otimes e_4) \\ &\supsetneq S_{Y|{\circ}X_{w=0}{\circ}} \oplus S_{Y|{\circ}X_{w=1}{\circ}} = S_{Y| vec({\mathbf X}) } ^{(W)}. \end{aligned} $$

Still, the LSFA method only targets at $S_{Y|{\circ }X_{w=0}{\circ }} \oplus S_{Y|{\circ }X_{w=1}{\circ }}$, which is a proper subspace of our desired space $S_{Y|{\circ }X{\circ }}^{(W)}$.

Simulation results for this example in Table 9 indicate that both individual ensemble method and LSFA method performs similarly.

Table 9 Example A3, accuracy of estimates on partial folding subspace

Full size table

Example A4

In Example A4, we constrain that both the conditional central subspaces and partial central subspaces are as the same as conditional folding subspace and partial folding subspace, respectively. Since estimating partial folding subspace will greatly reduce number of parameters especially when the dimension is larger, here in this example, we are specifically interested in whether folding-based method can achieve higher accuracy than traditional methods such as partial SIR. We modify the conditional distribution as:

$$\displaystyle \begin{aligned} Y&= X_{11} \times (X_{21} +1 ) + 0.2 \times \epsilon for W=0,\\ Y&= X_{21} \times (X_{31} +1 ) + 0.2 \times \epsilon for W=1. \end{aligned} $$

In this case,

$$\displaystyle \begin{aligned} S_{Y_{w=0}| vec({\mathbf X})_{w=0}} = S_{Y_{w=0}|{\circ} {\mathbf X}_{w=0} {\circ}} & = span(e_1) \otimes span(e_1, e_2),\\ S_{Y_{w=1}| vec({\mathbf X})_{w=1}} = S_{Y_{w=1}|{\circ} {\mathbf X}_{w=1} {\circ}} & = span(e_1) \otimes span(e_2, e_3). \end{aligned} $$

And most importantly,

$$\displaystyle \begin{aligned} S_{Y|{\circ}X{\circ}}^{(W)} = S_{Y|{\circ}X_{w=0}{\circ}} \oplus S_{Y|{\circ}X_{w=1}{\circ}} = S_{Y| vec({\mathbf X}) } ^{(W)} = span(e_1 \otimes e_1, e_1 \otimes e_2, e_1 \otimes e_3). \end{aligned}$$

Again, the results from Table 10 reflects individual and LSFA perform better than the other two approaches.

Table 10 Example A4, accuracy of estimates on partial folding subspace

Full size table

Figure 3 summarizes the first two examples in the paper and Examples A1–A4. The three estimation methods can be interpreted as this: Since partial folding subspace $S_{Y|{\circ }X{\circ }}^{(W)}$ with its basis matrix must be presented as a Kronecker product, it can only be covered by a “rectangle space.” Therefore, exhaustive methods including individual ensemble method and objective function method attempt to find one minimal “rectangle space” that covers both of the conditional folding subspaces. On the other hand, LSFA method estimates $\oplus S_{Y|{\circ }X_{w}{\circ }}$, which look for two minimal “rectangles” which cover all the conditional folding subspaces, thus can be smaller than partial folding subspace. Traditional partial central subspace $S_{Y| vec({\mathbf X}) } ^{(W)}$, which stack the columns together, and its estimation method partial SIR look for “blocks” which cover all the conditional central subspaces.

Example A5

Example A5 follows closely from Example 3, which intends to construct corresponding partial folding subspaces to be exactly the same as that of Examples 2 and A1. We illustrate the details of the experiment setting as follows:For W = 0, it follows exact same setting as in Example 3.For W = 1, however, the condition mean of X given Y is changed to:

$$\displaystyle \begin{aligned} E(\mathbf{X} |Y=1, W=1)= \left( {\begin{array}{ccc} \mathbf{0}_{2 \times 1} &\mu \mathbf{I}_{2} & \mathbf{0}_{2 \times (p-3)} \\ \mathbf{0}_{(p-2) \times 1} &\mathbf{0}_{(p-2) \times 2} & \mathbf{0}_{(p-2) \times (p-3)}\\ \end{array} } \right). \end{aligned}$$

Correspondingly, the conditional covariance structure stay the same as in Example 3 except the index set A = {(1, 3), (2, 2)}. We can easily verify that the desired partial folding subspace $S_{Y|{\circ }X{\circ }}^{(W)}$ is the same as in Example A1. But for vectorized data vec(X),

$$\displaystyle \begin{aligned} S_{Y_{w=0}| vec({\mathbf X})_{w=0}} = span(e_1 \otimes e_1 + e_2 \otimes e_2, e_1 \otimes e_2, e_2 \otimes e_1) \subsetneq S_{Y_{w=0}|{\circ} {\mathbf X}_{w=0} {\circ}}, \end{aligned}$$

and

$$\displaystyle \begin{aligned} S_{Y_{w=1}| vec({\mathbf X})_{w=1}} = span(e_2 \otimes e_1 + e_3 \otimes e_2, e_3 \otimes e_1, e_2 \otimes e_2) \subsetneq S_{Y_{w=1}|{\circ} {\mathbf X}_{w=1} {\circ}}, \end{aligned}$$

thus

$$\displaystyle \begin{aligned} S_{Y| vec({\mathbf X}) } ^{(W)} \subsetneq S_{Y|{\circ}X{\circ}}^{(W)}. \end{aligned}$$

From Table 11, it appears that the objective function method with pooled variance provides smallest errors and smallest variability across all different sample size n. The individual direction ensemble method and LSFA method produce similar accuracy and stableness in Example A5.

Table 11 Example A5, accuracy of estimates on partial folding subspace

Full size table

Example A6

Example A6 follows closely from Example 3, which intends to construct corresponding partial folding subspaces to be exactly the same as that of Example 2 and Example A1. In this example, its corresponding two conditional folding subspaces are less overlapped, leading to a larger partial folding subspace. For W = 0, it follows exact same setting as in Examples 3 and A5.For W = 1, however, the condition mean of X given response Y is changed to:

$$\displaystyle \begin{aligned} E(\mathbf{X} |Y=1, W=1)= \left( {\begin{array}{ccc} \mathbf{0}_{1 \times 1} &\mathbf{0}_{1 \times 2} & \mathbf{0}_{1 \times (p-3)} \\ \mathbf{0}_{2 \times 1} &\mu \mathbf{I}_{2} & \mathbf{0}_{2 \times (p-3)}\\ \mathbf{0}_{(p-3) \times 1} &\mathbf{0}_{(p-3) \times 2} & \mathbf{0}_{(p-3) \times (p-3)}\\ \end{array} } \right). \end{aligned}$$

Correspondingly, the conditional covariance structure stay the same as in Example 3 except the index set A = {(2, 3), (3, 2)}. We can easily verify that the desired partial folding space $S_{Y|{\circ }X{\circ }}^{(W)}$ is the same as in Example A1. But for vectorized data vec(X),

$$\displaystyle \begin{aligned} S_{Y_{w=0}| vec({\mathbf X})_{w=0}} = span(e_1 \otimes e_1 + e_2 \otimes e_2, e_1 \otimes e_2, e_2 \otimes e_1) \subsetneq S_{Y_{w=0}|{\circ} {\mathbf X}_{w=0} {\circ}}, \end{aligned}$$

and

$$\displaystyle \begin{aligned} S_{Y_{w=1}| vec({\mathbf X})_{w=1}} = span(e_2 \otimes e_2 + e_3 \otimes e_3, e_2 \otimes e_3, e_3 \otimes e_2) \subsetneq S_{Y_{w=1}|{\circ} {\mathbf X}_{w=1} {\circ}}, \end{aligned}$$

thus

$$\displaystyle \begin{aligned} S_{Y| vec({\mathbf X}) } ^{(W)} \subsetneq S_{Y|{\circ}X{\circ}}^{(W)}. \end{aligned}$$

The results are listed in Table 12; similarly, the proposed individual ensemble method and LSFA method outperform the third estimation method objective function optimization method, but the objective function optimization method with pooled covariance yields smallest error and standard deviations.

Table 12 Example A6, accuracy of estimates on partial folding subspace

Full size table

Three Histograms for the Real Data

See Fig. 4.

The Bootstrap Confidence Interval Plots for Real Data

See Fig. 5.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, Y., Xue, Y., Yuan, Q., Yin, X. (2021). Sufficient Dimension Folding with Categorical Predictors. In: Bura, E., Li, B. (eds) Festschrift in Honor of R. Dennis Cook. Springer, Cham. https://doi.org/10.1007/978-3-030-69009-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-69009-0_7
Published: 23 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69008-3
Online ISBN: 978-3-030-69009-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Sufficient Dimension Folding with Categorical Predictors

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

8 Appendix

8 Appendix

1.1 8.1 Proofs

Proof of Proposition 1 part (a)

Proof of Proposition 1 part (b)

Proof of Proposition 1 part (c)

Proof of Proposition 1 part (d)

Proof of Proposition 1 part (e)

Proof of Theorem 1

1.2 8.2 Additional Simulation and Data Analysis

Example A1

Example A2

Example A3

Example A4

Example A5

Example A6

Three Histograms for the Real Data

The Bootstrap Confidence Interval Plots for Real Data

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation