Skip to main content

Sufficient Dimension Folding with Categorical Predictors

  • Chapter
  • First Online:
Festschrift in Honor of R. Dennis Cook
  • 360 Accesses

Abstract

In this paper, we study dimension folding for matrix/array structured predictors with categorical variables. The categorical variable information is incorporated into dimension folding for regression and classification. The concepts of marginal, conditional, and partial folding subspaces are introduced, and their connections to central folding subspace are investigated. Three estimation methods are proposed to estimate the desired partial folding subspace. An empirical maximal eigenvalue ratio criterion is used to determine the structural dimensions of the associated partial folding subspace. Effectiveness of the proposed methods is evaluated through simulation studies and an application to a longitudinal data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • F. Chiaromonte, R.D. Cook, B. Li, Sufficient dimension reduction in regressions with categorical predictors. Ann. Stat. 30, 475–497 (2002)

    Article  MathSciNet  Google Scholar 

  • R.D. Cook, On the interpretation of regression plots. J. Am. Stat. Assoc. 89, 177–189 (1994)

    Article  MathSciNet  Google Scholar 

  • R.D. Cook, Graphics for regressions with a binary response. J. Am. Stat. Assoc. 91, 983–992 (1996)

    Article  MathSciNet  Google Scholar 

  • R.D. Cook, Regression Graphics: Ideas for Studying Regressions Through Graphics (Wiley, New York, 1998)

    Book  Google Scholar 

  • R.D. Cook, Testing predictor contribution in sufficient dimension reduction. Ann. Stat. 32, 1062–1092 (2004)

    MathSciNet  MATH  Google Scholar 

  • R.D. Cook, S. Weisberg, Discussion of “Sliced inverse regression for dimension reduction”. J. Am. Stat. Assoc. 86, 328–332 (1991)

    MATH  Google Scholar 

  • S. Ding, R.D. Cook, Dimension folding PCA and PFC for matrix-valued predictors. Stat. Sin. 24, 463–492 (2014)

    MathSciNet  MATH  Google Scholar 

  • S. Ding, R.D. Cook, Tensor sliced inverse regression. J. Multivar. Anal. 133, 216–231 (2015)

    Article  MathSciNet  Google Scholar 

  • T.R. Fleming, D.P. Harrington, Counting process and survival analysis (Wiley, New York, 1991)

    MATH  Google Scholar 

  • IBM Big Data and Analytics Hub. The Four V’s of Big Data (2014). http://www.ibmbigdatahub.com/infographic/four-vs-big-data

  • K.-C. Li, Sliced inverse regression for dimension reduction (with discussion). J. Am. Stat. Assoc. 86, 316–342 (1991)

    Article  Google Scholar 

  • B. Li, S. Wang, On directional regression for dimension reduction. J. Am. Stat. Assoc. 102, 997–1008 (2007)

    Article  MathSciNet  Google Scholar 

  • L. Li, X. Yin, Longitudinal data analysis using sufficient dimension reduction. Comput. Stat. Data Anal. 53, 4106–4115 (2009)

    Article  MathSciNet  Google Scholar 

  • B. Li, R.D. Cook, F. Chiaromonte, Dimension reduction for the conditional mean in regression with categorical predictors. Ann. Stat. 31, 1636–1668 (2003)

    Article  MathSciNet  Google Scholar 

  • B. Li, H. Zha, C. Chairomonte, Contour regression: a general approach to dimension reduction. Ann. Stat. 33, 1580–1616 (2005)

    Article  MathSciNet  Google Scholar 

  • B. Li, S. Wen, L. Zhu, On a projective resampling method for dimension reduction with multivariate responses. J. Am. Stat. Assoc. 103, 1177–1186 (2008)

    Article  MathSciNet  Google Scholar 

  • B. Li, M. Kim, N. Altman, On dimension folding of matrix- or array-valued statistical objects. Ann. Stat. 38, 1094–1121 (2010)

    MathSciNet  MATH  Google Scholar 

  • W. Luo, B. Li, Combining eigenvalues and variation of eigenvectors for order determination. Biometrika 103, 875–887 (2016)

    Article  MathSciNet  Google Scholar 

  • R. Luo, H. Wang, C.L. Tsai, Contour projected dimension reduction. Ann. Stat. 37, 3743–3778 (2009)

    Article  MathSciNet  Google Scholar 

  • J.R. Magnus, H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics, 2nd edn. (Wiley, New York, 1999)

    MATH  Google Scholar 

  • P.A. Murtaugh, E.R. Dickson, G.M. Van Dam, M. Malinchoc, P.M. Grambsch, A.L. Langworthy, C.H. Gips, Primary biliary cirrhosis: prediction of short-term survival based on repeated patient visits. Hepatology 20, 126–134 (1994)

    Article  Google Scholar 

  • Y. Pan, Q. Mai, X. Zhang, Covariate-adjusted tensor classification in high dimensions. J. Am. Stat. Assoc. 114, 1305–1319 (2019)

    Article  MathSciNet  Google Scholar 

  • R.M. Pfeiffer, L. Forzani, E. Bura, Sufficient dimension reduction for longitudinally measured predictors. Stat. Med. 31, 2414–2427 (2012)

    Article  MathSciNet  Google Scholar 

  • J.A. Talwalkar, K.D. Lindor, Primary biliary cirrhosis. Lancet 362, 53–61 (2003)

    Article  Google Scholar 

  • Y. Xia, H. Tong, W. Li, L. Zhu, An adaptive estimation of dimension reduction. J. R. Stat. Soc. Ser. B 64, 363–410 (2002)

    Article  MathSciNet  Google Scholar 

  • Y. Xue, X. Yin, Sufficient dimension folding for regression mean function. J. Comput. Graph. Stat. 23, 1028–1043 (2014)

    Article  MathSciNet  Google Scholar 

  • Y. Xue, X. Yin, Sufficient dimension folding for a functional of conditional distribution of matrix- or array-valued objects. J. Nonparametr. Stat. 27, 253–269 (2015)

    Article  MathSciNet  Google Scholar 

  • Y. Xue, X. Yin, X. Jiang, Ensemble sufficient dimension folding methods for analyzing matrix-valued data. Comput. Stat. Data Anal. 103, 193–205 (2016)

    Article  MathSciNet  Google Scholar 

  • Z. Ye, R.E. Weiss, Using the bootstrap to select one of a new class of dimension reduction methods. J. Am. Stat. Assoc. 98, 968–979 (2003)

    Article  MathSciNet  Google Scholar 

  • Y. Zhu, P. Zeng, Fourier methods for estimating the central subspace and the central mean subspace in regression. J. Am. Stat. Assoc. 101, 1638–1651 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Yin’s work is supported in part by an NSF grant CIF-1813330. Xue’s work is supported in part by the Fundamental Research Funds for the Central Universities in University of International Business and Economics (CXTD11-05).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangrong Yin .

Editor information

Editors and Affiliations

8 Appendix

8 Appendix

1.1 8.1 Proofs

The following equivalent relationship will be repeatedly used in the proof of Proposition 1. For generic random variables V 1, V 2, V 3, and V 4, Cook (1998) showed that

(8.1)

Proof of Proposition 1 part (a)

In Eq. (8.1), let

$$\displaystyle \begin{aligned} \left\{ \begin{array}{c} V_1 = vec({\mathbf X}), \\ V_2 = W, \\ V_3 = P_{{S_R} \otimes {S_L}} vec({\mathbf X}), \\ V_4 = Y, \end{array} \right. \end{aligned}$$

and apply the first part of Eq. (8.1) and the equivalent relationship that , we have

Therefore, under the assumption that

we have \( S_{Y|\circ X} \subseteq S_{Y|\circ X }^{(W)} \), \( S_{Y|X \circ } \subseteq S_{Y|X \circ }^{(W)} \) and \(S_{Y|\circ X \circ } \subseteq S_{Y|\circ X \circ }^{(W)} \).Now in Eq. (8.1), let

$$\displaystyle \begin{aligned} \left\{ \begin{array}{c} V_1 = Y, \\ V_2 = vec({\mathbf X}), \\ V_3 =P_{{S_R} \otimes {S_L}} vec({\mathbf X}), \\ V_4 = W, \end{array} \right. \end{aligned}$$

and again apply the first part of Eq. (8.1) and the equivalent relationship that , we have

Therefore, under the assumption that

there exist \( S_{Y|\circ X} \subseteq S_{Y|\circ X }^{(W)}\), \( S_{Y|X \circ } \subseteq S_{Y|X \circ }^{(W)} \) and \( S_{Y|\circ X \circ } \subseteq S_{Y|\circ X \circ }^{(W)} \). □

Proof of Proposition 1 part (b)

In Eq. (8.1), let

$$\displaystyle \begin{aligned} \left\{ \begin{array}{c} V_1 = Y, \\ V_2 = W, \\ V_3 = P_{{S_R} \otimes {S_L}} vec({\mathbf X}), \\ V_4 = vec({\mathbf X}), \end{array} \right. \end{aligned}$$

and apply the first part of Eq. (8.1) and the equivalent relationship that , we have

Therefore, under the assumption that , we also have X). Thus,

and further we have \(S_{Y|\circ {\mathbf X} }^{(W)} \subseteq S_{Y|\circ {\mathbf X}} \), \( S_{Y|{\mathbf X} \circ }^{(W)} \subseteq S_{Y|{\mathbf X} \circ } \) and \( S_{Y|\circ {\mathbf X} \circ }^{(W)} \subseteq S_{Y|\circ {\mathbf X} \circ } \). □

Proof of Proposition 1 part (c)

For generic subspace S L and S R, we have

(8.2)

Since \(S_{Y|\circ {\mathbf X} \circ }^{(W)} = S_{Y|{\mathbf X} \circ }^{(W)} \otimes S_{Y|\circ {\mathbf X} }^{(W)} \), \(S_{Y|\circ {\mathbf X} }^{(W)}\), and \(S_{Y|{\mathbf X} \circ }^{(W)}\) satisfy the left-hand side of Eq. (8.2) by their definitions, they also satisfy

This implies that, for ∀w = 1, …, C, \(S_{Y_w|\circ {\mathbf X}_w} \subseteq S_{Y|\circ {\mathbf X}}^{(W)}\), \( S_{Y_w|{\mathbf X}_w \circ } \subseteq S_{Y|{\mathbf X} \circ }^{(W)} \) and thus \(\oplus _{w=1}^{C} S_{Y_w|\circ {\mathbf X}_w} \subseteq S_{Y|\circ {\mathbf X}}^{(W)} \), \( \oplus _{w=1}^{C} S_{Y_w| {\mathbf X}_w \circ } \subseteq S_{Y|{\mathbf X} \circ }^{(W)} \). Therefore,

$$\displaystyle \begin{aligned} (\oplus_{w=1}^{C} S_{Y_w| {\mathbf X}_w \circ }) \otimes (\oplus_{w=1}^{C} S_{Y_w|\circ {\mathbf X}_w}) \subseteq S_{Y|{\mathbf X} \circ}^{(W)} \otimes S_{Y|\circ {\mathbf X}}^{(W)}=S_{Y|\circ {\mathbf X} \circ}^{(W)} . \end{aligned}$$

Because that \( S_{Y_w| \circ {\mathbf X}_w } \subseteq (\oplus _{w=1}^{C} S_{Y_w|\circ {\mathbf X}_w}) \) and \(S_{Y_w| {\mathbf X}_w \circ } \subseteq (\oplus _{w=1}^{C} S_{Y_w| {\mathbf X}_w \circ })\), for ∀w = 1, …, C, the two direct sum spaces also satisfy the right-hand side of Eq. (8.2). Therefore, we have

This implies the other containing relationship

$$\displaystyle \begin{aligned} S_{Y|\circ {\mathbf X} \circ}^{(W)} \subseteq (\oplus_{w=1}^{C} S_{Y_w| {\mathbf X}_w \circ }) \otimes (\oplus_{w=1}^{C} S_{Y_w|\circ {\mathbf X}_w}) . \end{aligned}$$

We then conclude that \( S_{Y|\circ {\mathbf X} \circ }^{(W)} = (\oplus _{w=1}^{C} S_{Y_w| {\mathbf X}_w \circ }) \otimes (\oplus _{w=1}^{C} S_{Y_w|\circ {\mathbf X}_w}) \). □

Proof of Proposition 1 part (d)

For generic subspace S L and S R, we have

(8.3)

Since \(S_{Y|\circ {\mathbf X} \circ }^{(W)} = S_{Y|{\mathbf X} \circ }^{(W)} \otimes S_{Y|\circ {\mathbf X} }^{(W)} \), \(S_{Y|\circ {\mathbf X} }^{(W)}\), and \(S_{Y|{\mathbf X} \circ }^{(W)}\) satisfy the left-hand side of Eq. (8.3) by their definitions, they also satisfy

This implies that, for ∀w = 1, …, C, \( S_{Y_w|\circ {\mathbf X}_w} \subseteq S_{Y|\circ {\mathbf X}}^{(W)} \), \(S_{Y_w|{\mathbf X}_w \circ } \subseteq S_{Y|{\mathbf X} \circ }^{(W)}\) and thus \(S_{Y_w|{\mathbf X}_w \circ } \otimes S_{Y_w|\circ {\mathbf X}_w}\subseteq S_{Y| {\mathbf X} \circ }^{(W)} \otimes S_{Y|\circ {\mathbf X}}^{(W)} = S_{Y|\circ {\mathbf X} \circ }^{(W)}\). Therefore,

$$\displaystyle \begin{aligned} \oplus_{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ} \otimes S_{Y_w|\circ {\mathbf X}_w} ) \subseteq S_{Y|\circ {\mathbf X} \circ }^{(W)}. \end{aligned}$$

Because that \( S_{Y_w| {\mathbf X}_w \circ } \otimes S_{Y_w|\circ {\mathbf X}_w} \subseteq \oplus _{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ } \otimes S_{Y_w|\circ {\mathbf X}_w} )\) and \(S_{Y_w| {\mathbf X}_w \circ } \otimes S_{Y_w|\circ {\mathbf X}_w} \) satisfy the second right-hand equation (8.3), for ∀w = 1, …, C, we have

$$\displaystyle \begin{aligned} \oplus_{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ} \otimes S_{Y_w|\circ {\mathbf X}_w} ) =span({\mathbf U}^*) \subseteq S_{Y|\circ {\mathbf X} \circ }^{(W)} = S_{Y| {\mathbf X} \circ }^{(W)} \otimes S_{Y|\circ {\mathbf X}}^{(W)}, \end{aligned}$$

where U is a random basis matrix of space \(\oplus _{w=1}^C span(\beta _w \otimes \alpha _w)\) in \(\mathbb R^{p_l p_r \times k}\). Therefore, by the definition of Kronecker envelope in Li et al. (2010), the Kronecker envelope of U with respect to integer p l and p r, that is \(\epsilon ^{\otimes }_{p_l, p_r} ({\mathbf U}^*)= S_{{\mathbf U}^* \circ } \otimes S_{\circ {\mathbf U}^*} \), satisfies the following conditions:1. \(span({\mathbf U}^*) = \oplus _{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ } \otimes S_{Y_w|\circ {\mathbf X}_w} ) \subseteq S_{{\mathbf U}^* \circ } \otimes S_{\circ {\mathbf U}^*}\) almost surely.2. If there is another pair of subspaces \(S_R \in \mathbb R^{p_r}\) and \(S_L \in \mathbb R^{p_l}\) that satisfies condition 1, then \(S_{{\mathbf U}^* \circ } \otimes S_{\circ {\mathbf U}^*} \subseteq S_R \otimes S_L\).However, from the previous proof

$$\displaystyle \begin{aligned} span({\mathbf U}^*)=\oplus_{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ} \otimes S_{Y_w|\circ {\mathbf X}_w} ) \subseteq S_{Y|{\mathbf X} \circ }^{(W)} \otimes S_{Y|\circ {\mathbf X}}^{(W)} = S_{Y|\circ {\mathbf X} \circ }^{(W)} , \end{aligned}$$

and by definition, \(S_{Y|{\mathbf X} \circ }^{(W)} \in \mathbb R^{p_r}\) and \(S_{Y|\circ {\mathbf X}}^{(W)} \in \mathbb R^{p_l}\). Therefore,

$$\displaystyle \begin{aligned} \epsilon^{\otimes}_{p_l, p_r} ({\mathbf U}^*) = S_{{\mathbf U}^* \circ} \otimes S_{\circ {\mathbf U}^*} \subseteq S_{Y|{\mathbf X} \circ }^{(W)} \otimes S_{Y|\circ {\mathbf X}}^{(W)} = S_{Y|\circ {\mathbf X} \circ }^{(W)}. \end{aligned}$$

On the other hand, for ∀w = 1, …, C,

$$\displaystyle \begin{aligned} S_{Y_w| {\mathbf X}_w \circ} \otimes S_{Y_w|\circ {\mathbf X}_w} \subseteq \oplus_{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ} \otimes S_{Y_w|\circ {\mathbf X}_w} )= span({\mathbf U}^*) \subseteq S_{{\mathbf U}^* \circ} \otimes S_{\circ {\mathbf U}^*} {=} \epsilon^{\otimes}_{p_l, p_r} ({\mathbf U}^*). \end{aligned}$$

Therefore, \(S_{\circ {\mathbf U}^*}\) and \(S_{{\mathbf U}^* \circ }\) satisfy the second right-hand side of Eq. (8.3). And for the left-hand side of Eq. (8.3), we have

Thus \( S_{Y|\circ {\mathbf X} }^{(W)} \subseteq S_{\circ {\mathbf U}^*} \) and \( S_{Y|{\mathbf X} \circ }^{(W)} \subseteq S_{{\mathbf U}^* \circ } \), which implies the relationship

$$\displaystyle \begin{aligned} S_{Y|\circ {\mathbf X} \circ}^{(W)} \subseteq S_{{\mathbf U}^* \circ} \otimes S_{\circ {\mathbf U}^*} =\epsilon^{\otimes}_{p_l, p_r} ({\mathbf U}^*). \end{aligned}$$

Therefore,

$$\displaystyle \begin{aligned} S_{Y|\circ {\mathbf X} \circ}^{(W)} = S_{{\mathbf U}^* \circ} \otimes S_{\circ {\mathbf U}^*} = \epsilon^{\otimes}_{p_l, p_r} ({\mathbf U}^*). \end{aligned}$$

This concludes that \(S_{Y|\circ {\mathbf X} \circ }^{(W)}\) equals to the Kronecker envelope of \(\oplus _{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ } \otimes S_{Y_w|\circ {\mathbf X}_w})\). Thus by estimating \(\oplus _{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ } \otimes S_{Y_w|\circ {\mathbf X}_w})\), we are targeting a proper subspace of \(S_{Y|\circ {\mathbf X} \circ }^{(W)}\). For example, estimation on \(\oplus _{w=1}^{C} (S_{Y_w| {\mathbf X}_w \circ } \otimes S_{Y_w|\circ {\mathbf X}_w})\) does not recover \(S_{Y|\circ {\mathbf X} \circ }^{(W)}\) exhaustively. □

Proof of Proposition 1 part (e)

First note that if for each W = w, \(span({\mathbf U}_w) \subseteq S_{Y_w | vec({\mathbf X})_w}\) almost surely, then from part (d) of Proposition 1, we have

$$\displaystyle \begin{aligned} \oplus_{w=1}^C span ({\mathbf U}_w) &\subseteq \oplus_{w=1}^c S_{Y_w|vec({\mathbf X})_w} = S_{Y|vec({\mathbf X})}^{(W)}\\ &\subseteq \oplus_{w=1}^c S_{Y_w|{{\mathbf X}_w} \circ} \otimes S_{Y_w| {{\mathbf X}_w} \circ}\\ &\subseteq S_{Y| \circ {\mathbf X} \circ}^{(W)} = (\oplus_{w=1}^c S_{Y_w| {\mathbf X}_w \circ} ) \otimes (\oplus_{w=1}^c S_{Y_w| \circ {\mathbf X}_w}) \end{aligned} $$

almost surely. Therefore, by the definition of Kronecker product, we have

$$\displaystyle \begin{aligned} S_{\circ {\mathbf U}_{new}} &\subseteq \oplus_{w=1}^c S_{Y_w| \circ {\mathbf X}_w},\\ S_{{\mathbf U}_{new} \circ} &\subseteq \oplus_{w=1}^c S_{Y_w| {\mathbf X}_w \circ} \text{and}\\ {\epsilon}^\otimes ({\mathbf U}_{new}) &= S_{{\mathbf U}_{new} \circ} \otimes S_{\circ {\mathbf U}_{new}} \subseteq S_{Y| \circ {\mathbf X} \circ}^{(W)} = (\oplus_{w=1}^c S_{Y_w| {\mathbf X}_w \circ} ) \otimes (\oplus_{w=1}^c S_{Y_w| \circ {\mathbf X}_w}). \end{aligned} $$

Proof of Theorem 1

Using double expectation formula, we can further write the objective function as

$$\displaystyle \begin{aligned} E_W (E_{{\mathbf U}_w}||A {\mathbf U}_w-A (\beta \otimes \alpha) f_w(Z) |W=w||{}^2), \end{aligned}$$

where the inside expectation is with respect to random matrices U 1, …, U C and the outside expectation is with respect to categorical variable W. This is equivalent to

$$\displaystyle \begin{aligned} \sum_{w=1}^C p_w (E||A {\mathbf U}_w-A (\beta \otimes \alpha) f_w(Z) |W=w||{}^2). \end{aligned} $$
(8.4)

Assume 𝜖 (U ) = span(β 0 ⊗ α 0). Because that, for each W = w, \( {\mathbf U}_w \subseteq \oplus _{w=1}^C span({\mathbf U}_w)\subseteq {\epsilon }^{\otimes } ({\mathbf U}^*) = span(\beta _0 \otimes \alpha _0) \) and the elements of U w are measurable with respect to Z, there exists a random projection matrix \(\phi _w(Z) \in L^{ d_l d_r \times k_w}\) such that U w = (β 0 ⊗ α 0)ϕ w(Z), which is equivalent to A U w = A(β 0 ⊗ α 0)ϕ w(Z).

Thus (4.3), or equivalently (8.4), reaches its minimum 0 within the range of (α, β, f 1, …, f C) given in the theorem. This implies that any minimizer \((\alpha ^*, \beta ^*, f_1^*,\ldots , f_C^*)\) of (4.3) must satisfy \(A(\beta ^* \otimes \alpha ^* ) f_w^* (Z) = A{\mathbf U}_w\) almost surely for every W = w and, consequently, (β 0 ⊗ α 0)ϕ w(Z) = (β ⊗ α )f w (Z) almost surely. But this means that span(β ⊗ α ) contains each U w almost surely; thus we have \(\oplus _{w=1}^C span({\mathbf U}_w) \subseteq span(\beta ^* \otimes \alpha ^*)\). Since span(β ⊗ α ) has the same dimensions as 𝜖 (U ), the theorem now follows from the uniqueness of the Kronecker envelope. □

1.2 8.2 Additional Simulation and Data Analysis

The following six examples are related to the examples in Sect. 6.1 of simulation studies, showing how the results changes when overlap information changes among individual subspaces.

Example A1

Example A1 almost keep the same experimental setting as in Example 1 but slightly change the conditional distribution of Y  given X and W, so that the two conditional central subspaces are overlapped but not identical.

$$\displaystyle \begin{aligned} Y&= X_{11} \times (X_{12} + X_{21} +1 ) + 0.2 \times \epsilon for W=0,\\ Y&= X_{12} \times (X_{13} + X_{22} +1 ) + 0.2 \times \epsilon for W=1. \end{aligned} $$

In this example,

$$\displaystyle \begin{aligned} S_{Y_{w=0}|{\circ} {\mathbf X}_{w=0} {\circ}} & = span(e_1, e_2) \otimes span(e_1, e_2) = span(e_1 \otimes e_1, e_1 \otimes e_2, e_2 \otimes e_1, e_2 \otimes e_2),\\ S_{Y_{w=1}|{\circ} {\mathbf X}_{w=1} {\circ}} & = span(e_2, e_3) \otimes span(e_1, e_2) =span(e_2 \otimes e_1, e_2 \otimes e_2, e_3 \otimes e_1, e_3 \otimes e_2). \end{aligned} $$

The two conditional folding subspace are overlapped, because their left conditional folding subspaces are the same and their right conditional folding subspaces also share one same direction. By part (c) of Proposition 1, we have:

$$\displaystyle \begin{aligned} S_{Y|{\circ}X{\circ}}^{(W)} &= (\oplus_{w=1}^{C}S_{Y_{w}|X_{w} \circ }) \otimes (\oplus_{w=1}^{C}S_{Y_{w}|\circ X_{w}})\\ & = (span(e_1, e_2) \oplus span(e_2, e_3)) \otimes (span(e_1, e_2) \oplus span(e_1, e_2))\\ &= span(e_1, e_2, e_3) \otimes span(e_1, e_2)\\ &= span(e_1 \otimes e_1, e_1 \otimes e_2, e_2 \otimes e_1, e_2 \otimes e_2, e_3 \otimes e_1, e_3 \otimes e_2). \end{aligned} $$

On the other hand, based on part (d) of Proposition 1, we have:

$$\displaystyle \begin{aligned} S_{Y|{\circ}X_{w=0}{\circ}} \oplus S_{Y|{\circ}X_{w=1}{\circ}}& = (span(e_1, e_2) \otimes span(e_1, e_2)) \oplus (span(e_2, e_3) \otimes span(e_1, e_2)) \\ &= span(e_1 \otimes e_1, e_1 \otimes e_2, e_2 \otimes e_1, e_2 \otimes e_2, e_3 \otimes e_1, e_3 \otimes e_2)\\ &= S_{Y|{\circ}X{\circ}}^{(W)}. \end{aligned} $$

Therefore, all three methods can still recover \(S_{Y|{\circ }X{\circ }}^{(W)}\) exhaustively. Again, for vectorized data,

$$\displaystyle \begin{aligned} S_{Y_{w=0}| vec({\mathbf X})_{w=0}} &= span(e_1 \otimes e_1, e_1 \otimes e_2 + e_2 \otimes e_1) \subsetneq S_{Y_{w=0}|{\circ} {\mathbf X}_{w=0} {\circ}},\\ S_{Y_{w=1}| vec({\mathbf X})_{w=1}} &= span(e_2 \otimes e_1, e_2 \otimes e_2 + e_3 \otimes e_1) \subsetneq S_{Y_{w=1}|{\circ} {\mathbf X}_{w=1} {\circ}}, \end{aligned} $$

thus

$$\displaystyle \begin{aligned} S_{Y| vec({\mathbf X}) } ^{(W)} &= span(e_1 \otimes e_1, e_1 \otimes e_2, e_2 \otimes e_1, e_2 \otimes e_2 + e_3 \otimes e_1) \\ &\subsetneq S_{Y|{\circ}X_{w=0}{\circ}} \oplus S_{Y|{\circ}X_{w=1}{\circ}} = S_{Y|{\circ}X{\circ}}^{(W)}. \end{aligned} $$

Table 7 summarizes the simulation results for Example A1. Note that all three methods perform worse than Example 1 in terms of accuracy and variability. This is due to the fact that they are estimating a bigger partial folding subspace than in Example 1. We observe that individual ensemble method and LSFA method still provide similar accuracy and they outperform objective function method.

Table 7 Example A1, accuracy of estimates on partial folding subspace

Example A2

Example A2 keeps the two conditional central subspaces overlapped but to a smaller extent. This can be achieved by setting conditional distribution as

$$\displaystyle \begin{aligned} Y&= X_{11} \times (X_{12} + X_{21} +1 ) + 0.2 \times \epsilon for W=0,\\ Y&= X_{22} \times (X_{23} + X_{32} +1 ) + 0.2 \times \epsilon for W=1. \end{aligned} $$

In this example,

$$\displaystyle \begin{aligned} S_{Y_{w=0}|{\circ} {\mathbf X}_{w=0} {\circ}} & = span(e_1, e_2) \otimes span(e_1, e_2) = span(e_1 \otimes e_1, e_1 \otimes e_2, e_2 \otimes e_1, e_2 \otimes e_2)\\ S_{Y_{w=1}|{\circ} {\mathbf X}_{w=1} {\circ}} & = span(e_2, e_3) \otimes span(e_2, e_3) =span(e_2 \otimes e_2, e_2 \otimes e_3, e_3 \otimes e_2, e_3 \otimes e_3), \end{aligned} $$

The two conditional folding subspace are slightly overlapped, but none of their left (right) conditional folding subspaces are the same. By part (c) of Proposition 1, we have:

$$\displaystyle \begin{aligned} S_{Y|{\circ}X{\circ}}^{(W)} &= (\oplus_{w=1}^{C}S_{Y_{w}|X_{w} \circ }) \otimes (\oplus_{w=1}^{C}S_{Y_{w}|\circ X_{w}})\\ & = (span(e_1, e_2) \oplus span(e_2, e_3)) \otimes (span(e_1, e_2) \oplus span(e_2, e_3))\\ &= span(e_1, e_2, e_3) \otimes span(e_1, e_2, e_3)\\ &= span(e_i \otimes e_j) i,j =1,\ldots,3. \end{aligned} $$

On the other hand, based on part (d) of Proposition 1, we have:

$$\displaystyle \begin{aligned} S_{Y|{\circ}X_{w=0}{\circ}} \oplus S_{Y|{\circ}X_{w=1}{\circ}}& = (span(e_1, e_2) \otimes span(e_1, e_2)) \oplus (span(e_2, e_3) \otimes span(e_2, e_3)) \\ &= span(e_1 \otimes e_1, e_1 \otimes e_2, e_2 \otimes e_1, e_2 \otimes e_2, e_2 \otimes e_3, e_3 \otimes e_2, e_3 \otimes e_3)\\ &\subsetneq S_{Y|{\circ}X{\circ}}^{(W)}. \end{aligned} $$

In this case, only individual ensemble method and objective function method recover \(S_{Y|{\circ }X{\circ }}^{(W)}\) exhaustively. Since LSFA method is targeting on space \(S_{Y|{\circ }X_{w=0}{\circ }} \oplus S_{Y|{\circ }X_{w=1}{\circ }}\), it is a smaller subspace according to part (d) of Proposition 1 and the experiment setting above. Therefore, LSFA method estimates a smaller subspace than the desired partial folding subspace \(S_{Y|{\circ }X{\circ }}^{(W)} \). In practice, the accuracy of LSFA method may not be defected since we use the results from individual ensemble method as initial values. Again, for vectorized data,

$$\displaystyle \begin{aligned} S_{Y_{w=0}| vec({\mathbf X})_{w=0}} &= span(e_1 \otimes e_1, e_1 \otimes e_2 + e_2 \otimes e_1) \subsetneq S_{Y_{w=0}|{\circ} {\mathbf X}_{w=0} {\circ}},\\ S_{Y_{w=1}| vec({\mathbf X})_{w=1}} &= span(e_2 \otimes e_2, e_2 \otimes e_3 + e_3 \otimes e_2) \subsetneq S_{Y_{w=1}|{\circ} {\mathbf X}_{w=1} {\circ}}, \end{aligned} $$

thus

$$\displaystyle \begin{aligned} S_{Y| vec({\mathbf X}) } ^{(W)} &= span(e_1 \otimes e_1, e_1 \otimes e_2 + e_2 \otimes e_1, e_2 \otimes e_3 + e_3 \otimes e_2) \\ &\subsetneq S_{Y|{\circ}X_{w=0}{\circ}} \oplus S_{Y|{\circ}X_{w=1}{\circ}} \subsetneq S_{Y|{\circ}X{\circ}}^{(W)}. \end{aligned} $$

Table 8 displays the simulation results for Example A2, where individual ensemble method and LSFA method still outperform objective function method.

Table 8 Example A2, accuracy of estimates on partial folding subspace

Example A3

Example A3 constrains that the two conditional central subspaces are as the same as the conditional folding subspaces, i.e., for any w = 0, 1,

$$\displaystyle \begin{aligned} S_{Y_w| vec({\mathbf X})_w} = S_{Y_w|{\circ} {\mathbf X}_w {\circ}}. \end{aligned}$$

However, the partial central subspace is still a proper subspace of partial folding subspace. We can achieve this by constraining the two partial central subspaces to be orthogonal with each other. The conditional distribution of Y  given X and W, that is:

$$\displaystyle \begin{aligned} Y&= X_{11} \times (X_{21} +1 ) + 0.2 \times \epsilon for W=0,\\ Y&= X_{32} \times (X_{42} +1 ) + 0.2 \times \epsilon for W=1. \end{aligned} $$

In this case,

$$\displaystyle \begin{aligned} S_{Y_{w=0}| vec({\mathbf X})_{w=0}} = S_{Y_{w=0}|{\circ} {\mathbf X}_{w=0} {\circ}} & = span(e_1) \otimes span(e_1, e_2),\\ S_{Y_{w=1}| vec({\mathbf X})_{w=1}} = S_{Y_{w=1}|{\circ} {\mathbf X}_{w=1} {\circ}} & = span(e_2) \otimes span(e_3, e_4). \end{aligned} $$

Thus,

$$\displaystyle \begin{aligned} S_{Y|{\circ}X_{w=0}{\circ}} \oplus S_{Y|{\circ}X_{w=1}{\circ}} = S_{Y| vec({\mathbf X}) } ^{(W)} = span(e_1 \otimes e_1, e_1 \otimes e_2, e_2 \otimes e_3, e_2 \otimes e_4). \end{aligned}$$

Based on part (c) of Proposition 1, we have:

$$\displaystyle \begin{aligned} S_{Y|{\circ}X{\circ}}^{(W)} & = (span(e_1) \oplus span(e_2)) \otimes (span(e_1, e_2) \oplus span(e_3, e_4)) \\ &= span(e_1 \otimes e_1, e_1 \otimes e_2, e_1 \otimes e_3, e_1 \otimes e_4, e_2 \otimes e_1, e_2 \otimes e_2, e_2 \otimes e_3, e_2 \otimes e_4) \\ &\supsetneq S_{Y|{\circ}X_{w=0}{\circ}} \oplus S_{Y|{\circ}X_{w=1}{\circ}} = S_{Y| vec({\mathbf X}) } ^{(W)}. \end{aligned} $$

Still, the LSFA method only targets at \(S_{Y|{\circ }X_{w=0}{\circ }} \oplus S_{Y|{\circ }X_{w=1}{\circ }}\), which is a proper subspace of our desired space \(S_{Y|{\circ }X{\circ }}^{(W)}\).

Simulation results for this example in Table 9 indicate that both individual ensemble method and LSFA method performs similarly.

Table 9 Example A3, accuracy of estimates on partial folding subspace

Example A4

In Example A4, we constrain that both the conditional central subspaces and partial central subspaces are as the same as conditional folding subspace and partial folding subspace, respectively. Since estimating partial folding subspace will greatly reduce number of parameters especially when the dimension is larger, here in this example, we are specifically interested in whether folding-based method can achieve higher accuracy than traditional methods such as partial SIR. We modify the conditional distribution as:

$$\displaystyle \begin{aligned} Y&= X_{11} \times (X_{21} +1 ) + 0.2 \times \epsilon for W=0,\\ Y&= X_{21} \times (X_{31} +1 ) + 0.2 \times \epsilon for W=1. \end{aligned} $$

In this case,

$$\displaystyle \begin{aligned} S_{Y_{w=0}| vec({\mathbf X})_{w=0}} = S_{Y_{w=0}|{\circ} {\mathbf X}_{w=0} {\circ}} & = span(e_1) \otimes span(e_1, e_2),\\ S_{Y_{w=1}| vec({\mathbf X})_{w=1}} = S_{Y_{w=1}|{\circ} {\mathbf X}_{w=1} {\circ}} & = span(e_1) \otimes span(e_2, e_3). \end{aligned} $$

And most importantly,

$$\displaystyle \begin{aligned} S_{Y|{\circ}X{\circ}}^{(W)} = S_{Y|{\circ}X_{w=0}{\circ}} \oplus S_{Y|{\circ}X_{w=1}{\circ}} = S_{Y| vec({\mathbf X}) } ^{(W)} = span(e_1 \otimes e_1, e_1 \otimes e_2, e_1 \otimes e_3). \end{aligned}$$

Again, the results from Table 10 reflects individual and LSFA perform better than the other two approaches.

Table 10 Example A4, accuracy of estimates on partial folding subspace

Figure 3 summarizes the first two examples in the paper and Examples A1A4. The three estimation methods can be interpreted as this: Since partial folding subspace \(S_{Y|{\circ }X{\circ }}^{(W)}\) with its basis matrix must be presented as a Kronecker product, it can only be covered by a “rectangle space.” Therefore, exhaustive methods including individual ensemble method and objective function method attempt to find one minimal “rectangle space” that covers both of the conditional folding subspaces. On the other hand, LSFA method estimates \(\oplus S_{Y|{\circ }X_{w}{\circ }}\), which look for two minimal “rectangles” which cover all the conditional folding subspaces, thus can be smaller than partial folding subspace. Traditional partial central subspace \(S_{Y| vec({\mathbf X}) } ^{(W)}\), which stack the columns together, and its estimation method partial SIR look for “blocks” which cover all the conditional central subspaces.

Example A5

Example A5 follows closely from Example 3, which intends to construct corresponding partial folding subspaces to be exactly the same as that of Examples 2 and A1. We illustrate the details of the experiment setting as follows:For W = 0, it follows exact same setting as in Example 3.For W = 1, however, the condition mean of X given Y  is changed to:

$$\displaystyle \begin{aligned} E(\mathbf{X} |Y=1, W=1)= \left( {\begin{array}{ccc} \mathbf{0}_{2 \times 1} &\mu \mathbf{I}_{2} & \mathbf{0}_{2 \times (p-3)} \\ \mathbf{0}_{(p-2) \times 1} &\mathbf{0}_{(p-2) \times 2} & \mathbf{0}_{(p-2) \times (p-3)}\\ \end{array} } \right). \end{aligned}$$

Correspondingly, the conditional covariance structure stay the same as in Example 3 except the index set A = {(1, 3), (2, 2)}. We can easily verify that the desired partial folding subspace \(S_{Y|{\circ }X{\circ }}^{(W)}\) is the same as in Example A1. But for vectorized data vec(X),

$$\displaystyle \begin{aligned} S_{Y_{w=0}| vec({\mathbf X})_{w=0}} = span(e_1 \otimes e_1 + e_2 \otimes e_2, e_1 \otimes e_2, e_2 \otimes e_1) \subsetneq S_{Y_{w=0}|{\circ} {\mathbf X}_{w=0} {\circ}}, \end{aligned}$$

and

$$\displaystyle \begin{aligned} S_{Y_{w=1}| vec({\mathbf X})_{w=1}} = span(e_2 \otimes e_1 + e_3 \otimes e_2, e_3 \otimes e_1, e_2 \otimes e_2) \subsetneq S_{Y_{w=1}|{\circ} {\mathbf X}_{w=1} {\circ}}, \end{aligned}$$

thus

$$\displaystyle \begin{aligned} S_{Y| vec({\mathbf X}) } ^{(W)} \subsetneq S_{Y|{\circ}X{\circ}}^{(W)}. \end{aligned}$$

From Table 11, it appears that the objective function method with pooled variance provides smallest errors and smallest variability across all different sample size n. The individual direction ensemble method and LSFA method produce similar accuracy and stableness in Example A5.

Table 11 Example A5, accuracy of estimates on partial folding subspace

Example A6

Example A6 follows closely from Example 3, which intends to construct corresponding partial folding subspaces to be exactly the same as that of Example 2 and Example A1. In this example, its corresponding two conditional folding subspaces are less overlapped, leading to a larger partial folding subspace. For W = 0, it follows exact same setting as in Examples 3 and A5.For W = 1, however, the condition mean of X given response Y  is changed to:

$$\displaystyle \begin{aligned} E(\mathbf{X} |Y=1, W=1)= \left( {\begin{array}{ccc} \mathbf{0}_{1 \times 1} &\mathbf{0}_{1 \times 2} & \mathbf{0}_{1 \times (p-3)} \\ \mathbf{0}_{2 \times 1} &\mu \mathbf{I}_{2} & \mathbf{0}_{2 \times (p-3)}\\ \mathbf{0}_{(p-3) \times 1} &\mathbf{0}_{(p-3) \times 2} & \mathbf{0}_{(p-3) \times (p-3)}\\ \end{array} } \right). \end{aligned}$$

Correspondingly, the conditional covariance structure stay the same as in Example 3 except the index set A = {(2, 3), (3, 2)}. We can easily verify that the desired partial folding space \(S_{Y|{\circ }X{\circ }}^{(W)}\) is the same as in Example A1. But for vectorized data vec(X),

$$\displaystyle \begin{aligned} S_{Y_{w=0}| vec({\mathbf X})_{w=0}} = span(e_1 \otimes e_1 + e_2 \otimes e_2, e_1 \otimes e_2, e_2 \otimes e_1) \subsetneq S_{Y_{w=0}|{\circ} {\mathbf X}_{w=0} {\circ}}, \end{aligned}$$

and

$$\displaystyle \begin{aligned} S_{Y_{w=1}| vec({\mathbf X})_{w=1}} = span(e_2 \otimes e_2 + e_3 \otimes e_3, e_2 \otimes e_3, e_3 \otimes e_2) \subsetneq S_{Y_{w=1}|{\circ} {\mathbf X}_{w=1} {\circ}}, \end{aligned}$$

thus

$$\displaystyle \begin{aligned} S_{Y| vec({\mathbf X}) } ^{(W)} \subsetneq S_{Y|{\circ}X{\circ}}^{(W)}. \end{aligned}$$

The results are listed in Table 12; similarly, the proposed individual ensemble method and LSFA method outperform the third estimation method objective function optimization method, but the objective function optimization method with pooled covariance yields smallest error and standard deviations.

Table 12 Example A6, accuracy of estimates on partial folding subspace

Three Histograms for the Real Data

See Fig. 4.

Fig. 4
figure 4

Histogram of the data

The Bootstrap Confidence Interval Plots for Real Data

See Fig. 5.

Fig. 5
figure 5

Confidence intervals for estimated directions

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Wang, Y., Xue, Y., Yuan, Q., Yin, X. (2021). Sufficient Dimension Folding with Categorical Predictors. In: Bura, E., Li, B. (eds) Festschrift in Honor of R. Dennis Cook. Springer, Cham. https://doi.org/10.1007/978-3-030-69009-0_7

Download citation

Publish with us

Policies and ethics