Skip to main content

More About Asymptotic Properties of Some Binary Classification Methods for High Dimensional Data

  • Conference paper
  • First Online:
Advances in Probability and Mathematical Statistics

Part of the book series: Progress in Probability ((PRPR,volume 79))

  • 388 Accesses

Abstract

In this manuscript we study the asymptotic behavior of the following binary classification methods: Support Vector Machine, Mean Difference, Distance Weighted Discrimination and Maximal Data Piling, when the dimension of the data increases and the sample sizes of the classes are fixed. We consider multivariate data with the asymptotic geometric structure of n-simplex, such that the multivariate standard Gaussian distribution, as the dimension increases and the sample size n is fixed. We provide the asymptotic behavior of the four methods in terms of the angle between the normal vector of the separating hyperplane of the method and the optimal direction for classification, under more general conditions than those of Bolivar-Cime and Cordova-Rodriguez (Commun Stat Theory Methods 47(11):2720–2740, 2018). We also analyze the asymptotic behavior of the probabilities of misclassification of the methods. A simulation study is performed to illustrate the theoretical results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahn, J., Marron, J.S.: The maximal data piling direction for discrimination. Biometrika 97(1), 254–259 (2010)

    Article  MathSciNet  Google Scholar 

  2. Ahn, J., Marron, J.S., Muller, K.M., Chi, Y.: The high-dimension, low-sample-size geometric representation holds under mild conditions. Biometrika 94(3), 760–766 (2007)

    Article  MathSciNet  Google Scholar 

  3. Bolivar-Cime, A., Cordova-Rodriguez, L.M.: Binary discrimination methods for high dimensional data with a geometric representation. Commun. Stat. Theory Methods 47(11), 2720–2740 (2018)

    Article  MathSciNet  Google Scholar 

  4. Bolivar-Cime, A., Marron, J.S.: Comparison of binary discrimination methods for high dimension low sample size data. J. Multivar. Anal. 115, 108–121 (2013)

    Article  MathSciNet  Google Scholar 

  5. Hall, P., Marron, J.S., Neeman, A.: Geometric representation of high dimension, low sample size data. J. R. Stat. Soc. B 67(3), 427–444 (2005)

    Article  MathSciNet  Google Scholar 

  6. Jung, S., Marron, J.S.: PCA consistency in high dimension, low sample size context. Ann. Stat. 37(6B), 4104–4130 (2009)

    Article  MathSciNet  Google Scholar 

  7. Marron, J.S.: Distance-weighted discrimination. WIREs Comput. Stat. 7, 109–114 (2015)

    Article  MathSciNet  Google Scholar 

  8. Marron, J.S., Todd, M.J., Ahn, J.: Distance-weighted discrimination. J. Am. Stat. Assess. 102(480), 1267–1271 (2007)

    Article  MathSciNet  Google Scholar 

  9. Qiao, X., Zhang, H.H., Liu, Y., Todd, M.J., Marron, J.S.: Weighted distance weighted discrimination and its asymptotic properties. J. Am. Stat. Assess. 105(489), 401–414 (2010)

    Article  MathSciNet  Google Scholar 

  10. Qiao, X., Zhang, L.: Flexible high-dimensional classification machines and their asymptotic properties. J. Mach. Learn. Res. 16, 1547–1572 (2015)

    MathSciNet  MATH  Google Scholar 

  11. Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. The MIT Press, Cambridge, Massachusetts (2002)

    Google Scholar 

  12. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)

    Book  Google Scholar 

  13. Yata, K., Aoshima, M.: Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations. J. Multivar. Anal. 105(1), 193–215 (2012)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Addy Bolivar-Cime .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Proof of Theorem 2.1

Case 1: The vectorvis the normal vector of the MD, SVM or DWD hyperplane. Let \(\widetilde {v}\) be the vector given in Lemma 2.1. Let \(X_i^{\prime }=X_i-\mu _+\) and \(Y_j^{\prime }=Y_j-\mu _-\), for i = 1, 2, …, m and j = 1, 2, …, n. We denote by \(\left \langle x,y \right \rangle \) the dot product between the d-dimensional vectors x and y. Note that

$$\displaystyle \begin{aligned} \parallel \widetilde{v} \parallel ^2&=\sum_{i=1}^m \alpha_{i+}^2\parallel X_i^{\prime} \parallel ^2+\sum_{j=1}^n \alpha_{j-}^2\parallel Y_j^{\prime} \parallel ^2+\parallel v_d \parallel ^2\\ &\quad +2\sum_{i<j}\alpha_{i+}\alpha_{j+}\left\langle X_i^{\prime},X_j^{\prime} \right\rangle +2\sum_{i<j}\alpha_{i-}\alpha_{j-}\left\langle Y_i^{\prime},Y_j^{\prime} \right\rangle \\ &\quad +2\sum_{i=1}^m \alpha_{i+}\left\langle X_i^{\prime},v_d \right\rangle -2\sum_{j=1}^n \alpha_{j-}\left\langle Y_j^{\prime},v_d \right\rangle -2\sum_{i=1}^m\sum_{j=1}^n\alpha_{i+}\alpha_{j-}\left\langle X_i^{\prime},Y_j^{\prime} \right\rangle .{} \end{aligned} $$
(8)

Dividing both sides of (8) by d, we have by Lemma 2.1 and (1)–(3) that the sum of the first three terms of the right side converges in probability to σ 2m + τ 2n + c 2 as d →. Now we will see that the sum of the last five terms converges in probability to zero as d →. By (1) we have that

$$\displaystyle \begin{aligned} \frac{\left\langle X_i^{\prime},X_j^{\prime} \right\rangle }{d}=\frac{1}{2}\left(\frac{\parallel X_i^{\prime} \parallel ^2}{d}+\frac{\parallel X_j^{\prime} \parallel ^2}{d}-\frac{\parallel X_i-X_j \parallel ^2}{d}\right)\overset{P}{\longrightarrow}\frac{1}{2}(\sigma^2+\sigma^2-2\sigma^2)=0 \end{aligned} $$
(9)

as d →, for i ≠ j. Analogously

$$\displaystyle \begin{aligned} \frac{\left\langle Y_i^{\prime},Y_j^{\prime} \right\rangle }{d}\overset{P}{\longrightarrow}0 \quad \text{as }d\to\infty, \text{ for }i\neq j. \end{aligned} $$
(10)

Observe that sum of the last three terms of the right side of (8) is equal to

$$\displaystyle \begin{aligned} 2\sum_{i=1}^m\sum_{j=1}^n\alpha_{i+}\alpha_{j-} [\left\langle X_i^{\prime},v_d \right\rangle -\left\langle Y_j^{\prime},v_d \right\rangle -\left\langle X_i^{\prime},Y_j^{\prime} \right\rangle ]. \end{aligned} $$

From (1)–(4) and the equality

$$\displaystyle \begin{aligned} \frac{\parallel X_i-Y_i \parallel ^2}{d}&=\frac{\parallel X_i^{\prime} \parallel ^2}{d}+\frac{\parallel Y_j^{\prime} \parallel ^2}{d}+\frac{\parallel v_d \parallel ^2}{d}\\ &\quad +\frac{2}{d}\left[\left\langle X_i^{\prime},v_d \right\rangle -\left\langle Y_j^{\prime},v_d \right\rangle -\left\langle X_i^{\prime},Y_j^{\prime} \right\rangle \right], \end{aligned} $$

we have that

$$\displaystyle \begin{aligned} \frac{1}{d}[\left\langle X_i^{\prime},v_d \right\rangle -\left\langle Y_j^{\prime},v_d \right\rangle -\left\langle X_i^{\prime},Y_j^{\prime} \right\rangle ]\overset{P}{\longrightarrow} 0\quad \text{as }d\to\infty,\ \forall i, j. \end{aligned} $$
(11)

Therefore, by Lemma 2.1 and (9)–(11), we have that the sum of the last five terms of the right side of (8) divided by d converges in probability to zero as d →. Thus,

$$\displaystyle \begin{aligned} \frac{\parallel \widetilde{v} \parallel ^2}{d}\overset{P}{\longrightarrow} \frac{\sigma^2}{m}+\frac{\tau^2}{n}+c^2\quad \text{as }d\to\infty. \end{aligned} $$
(12)

From the results of [5], under the asymptotic geometric structure of the data, if \(Y^*_1,Y^*_2,\dots , Y^*_k\) are independent and identically distributed d-dimensional random vectors with the same distribution as the vectors of the class C and \(\overline {Y}^*_k=\sum _{j=1}^kY^*_j/k\), we have

$$\displaystyle \begin{aligned} \frac{\parallel X_i-\overline{Y}^*_k \parallel ^2}{d}\overset{P}{\longrightarrow}c^2+\sigma^2+\frac{\tau^2}{k} \quad \text{as }d\to\infty. \end{aligned} $$

Since \(\overline {Y}^*_k\) converges in probability to μ as k →, we have that

$$\displaystyle \begin{aligned} \frac{\parallel X_i-\mu_- \parallel ^2}{d}\overset{P}{\longrightarrow} c^2+\sigma^2\quad \text{as }d\to\infty. \end{aligned} $$

Furthermore, by (1) and (3)

$$\displaystyle \begin{aligned} \frac{\parallel X_i-\mu_+ \parallel ^2}{d}\overset{P}{\longrightarrow}\sigma^2\quad \text{and}\quad \frac{\parallel \mu_+-\mu- \parallel ^2}{d}\overset{P}{\longrightarrow} c^2, \end{aligned} $$

as d →. Thus by the Pythagoras theorem, after rescaling by d −1∕2, the segments X i μ , X i μ + and μ + μ tend to form a right triangle as d →, where the hypotenuse is X i μ . Therefore, \(X^{\prime }_i/d^{1/2}=(X_i-\mu _+)/d^{1/2}\) and v dd 1∕2 = (μ + − μ )∕d 1∕2 tend to be orthogonal as d →, then

$$\displaystyle \begin{aligned} \frac{\left\langle X^{\prime}_i,v_d \right\rangle }{d}\overset{P}{\longrightarrow}0\quad \text{as }d\to\infty,\ \forall i. \end{aligned} $$
(13)

We also have that

$$\displaystyle \begin{aligned} \frac{\left\langle X^{\prime}_i,v_d \right\rangle }{d^{1/2}\parallel v_d \parallel }=\frac{\left\langle X^{\prime}_i,v_d \right\rangle }{d}\frac{d^{1/2}}{\parallel v_d \parallel }\overset{P}{\longrightarrow}0*c^{-1}=0\quad \text{as }d\to\infty,\ \forall i. \end{aligned} $$
(14)

Analogously,

$$\displaystyle \begin{aligned} \frac{\left\langle Y^{\prime}_j,v_d \right\rangle }{d}\overset{P}{\longrightarrow}0,\quad \frac{\left\langle Y^{\prime}_j,v_d \right\rangle }{d^{1/2}\parallel v_d \parallel }\overset{P}{\longrightarrow}0,\quad \text{as }d\to\infty,\ \forall j. \end{aligned} $$
(15)

Note that

$$\displaystyle \begin{aligned} \left\langle \widetilde{v},v_d \right\rangle =\sum_{i=1}^m\alpha_{i+}\left\langle X^{\prime}_i,v_d \right\rangle -\sum_{j=1}^n\alpha_{j-}\left\langle Y^{\prime}_j,v_d \right\rangle +\parallel v_d \parallel ^2. \end{aligned} $$
(16)

Therefore, dividing both sides of (16) by d 1∕2 ∥ v d ∥, from Lemma 2.1, (3), (14) and (15) we have

$$\displaystyle \begin{aligned} \frac{\left\langle \widetilde{v},v_d \right\rangle }{d^{1/2}\parallel v_d \parallel }\overset{P}{\longrightarrow} c\quad \text{as }d\to\infty. \end{aligned} $$
(17)

By (12) and (17) we have

$$\displaystyle \begin{aligned} \frac{\left\langle \widetilde{v},v_d \right\rangle }{\parallel \widetilde{v} \parallel \parallel v_d \parallel }=\frac{\left\langle \widetilde{v},v_d \right\rangle /(d^{1/2}\parallel v_d \parallel )}{\parallel \widetilde{v} \parallel /d^{1/2}}\overset{P}{\longrightarrow} \frac{c}{\left(\sigma^2/m+\tau^2/n+c^2\right)} \end{aligned} $$

as d →. Then

$$\displaystyle \begin{aligned} \mathrm{Angle}(\widetilde{v},v_d)=\arccos\left(\frac{\left\langle \widetilde{v},v_d \right\rangle }{\parallel \widetilde{v} \parallel \parallel v_d \parallel }\right)\overset{P}{\longrightarrow} \arccos\left[\frac{c}{(\sigma^2/m+\tau^2/n+c^2)^{1/2}}\right] \end{aligned} $$

as d →.

Case 2: The vectorvis the normal vector of the MDP hyperplane. Let \(X^{\prime }_i\) and \(Y^{\prime }_j\) be as in case 1, for i = 1, …, m and j = 1, 2, …, n. From (1) and the results of [5] we have

$$\displaystyle \begin{aligned} &\frac{\parallel \overline{X}-\mu_+ \parallel ^2}{d}\overset{P}{\longrightarrow}\frac{\sigma^2}{m},{} \end{aligned} $$
(18)
$$\displaystyle \begin{aligned} &\frac{\parallel X_i-\overline{X} \parallel ^2}{d}\overset{P}{\longrightarrow} \frac{m-1}{m}\sigma^2, {} \end{aligned} $$
(19)

as d →. By (11), (13) and (15) we have

$$\displaystyle \begin{aligned} \frac{\left\langle X^{\prime}_i,Y^{\prime}_j \right\rangle }{d}\overset{P}{\longrightarrow}0\quad \text{as }d\to\infty,\ \forall i,j. \end{aligned} $$
(20)

Note that

$$\displaystyle \begin{aligned} \left\langle X_i-\overline{X},\overline{X}-\overline{Y} \right\rangle &=\left\langle X^{\prime}_i-(\overline{X}-\mu_+),(\overline{X}-\mu_+)-(\overline{Y}-\mu_-)+v_d \right\rangle \\ &=\frac{1}{m}\parallel X^{\prime}_i \parallel ^2+\frac{1}{m}\sum_{j\neq i}\left\langle X^{\prime}_i,X^{\prime}_j \right\rangle -\left\langle X^{\prime}_i,\overline{Y}-\mu_- \right\rangle +\left\langle X^{\prime}_i,v_d \right\rangle \\ &\quad -\parallel \overline{X}-\mu_+ \parallel ^2+\left\langle \overline{X}-\mu_+,\overline{Y}-\mu_- \right\rangle -\left\langle \overline{X}-\mu_+,v_d \right\rangle . \end{aligned} $$

Dividing both sides of the last equality by d 1∕2 ∥ v d ∥, from (1), (9), (13), (18) and (20) we have

$$\displaystyle \begin{aligned} \frac{\left\langle X_i-\overline{X},\overline{X}-\overline{Y} \right\rangle }{d^{1/2}\parallel v_d \parallel }=\frac{d^{1/2}}{\parallel v_d \parallel }\frac{\left\langle X_i-\overline{X},\overline{X}-\overline{Y} \right\rangle }{d}\overset{P}{\longrightarrow}c^{-1}\left(\frac{\sigma^2}{m}-\frac{\sigma^2}{m}\right)=0 \end{aligned} $$
(21)

as d →. Furthermore, by (3) and (12) we have

$$\displaystyle \begin{aligned} \frac{\parallel \overline{X}-\overline{Y} \parallel }{\parallel v_d \parallel }\overset{P}{\longrightarrow} \frac{(\sigma^2/m+\tau^2/n+c^2)^{1/2}}{c}\quad \text{as }d\to\infty. \end{aligned} $$
(22)

Thus, by (19), (21) and (22) it follows that

$$\displaystyle \begin{aligned} \mathrm{Angle}(X_i-\overline{X},\overline{X}-\overline{Y})&=\arccos\left[\frac{\left\langle X_i-\overline{X},\overline{X}-\overline{Y} \right\rangle }{\parallel X_i-\overline{X} \parallel \parallel \overline{X}-\overline{Y} \parallel } \right]\\ &=\arccos\left[\frac{\left\langle X_i-\overline{X},\overline{X}-\overline{Y} \right\rangle /(d^{1/2}\parallel v_d \parallel )}{(\parallel X_i-\overline{X} \parallel /d^{1/2})(\parallel \overline{X}-\overline{Y} \parallel /\parallel v_d \parallel )} \right]\\ &\overset{P}{\longrightarrow} \arccos(0)=\frac{\pi}{2} \end{aligned} $$
(23)

as d →, ∀i. Analogously

$$\displaystyle \begin{aligned} \mathrm{Angle}(Y_j-\overline{Y},\overline{X}-\overline{Y})\overset{P}{\longrightarrow}\frac{\pi}{2}\quad \text{as }d\to\infty,\ \forall j. \end{aligned} $$
(24)

As it was shown in the proof of Theorem 3.1 of [3], (23) and (24) imply that when d is large, the normal vector v of the MDP hyperplane is approximately in the same direction as \((\overline {X}-\overline {Y})/\parallel \overline {X}-\overline {Y} \parallel \). Hence, \(\mathrm {Angle}(v,v_d)=\arccos (\left \langle v,v_d \right \rangle /(\parallel v \parallel \parallel v_d \parallel ))\) is approximately

$$\displaystyle \begin{aligned} \arccos[\left\langle \overline{X}-\overline{Y},v_d \right\rangle /(\parallel \overline{X}-\overline{Y} \parallel \parallel v_d \parallel )] \end{aligned} $$
(25)

when d is large, which by case 1 converges in probability to \(\arccos \left [\frac {c}{(\sigma ^2/m+\tau ^2/n+c^2)^{1/2}}\right ]\) as d →.

1.2 The Data in the Simulations Satisfy Conditions (1)–(4)

We have that X i is equal in distribution to rZ i + β 1 d, for i = 1, 2, …, m, and Y j is equal in distribution to Z m+j, for j = 1, 2, …, n, where Z 1, Z 2, …, Z m+n are independent and identically distributed with the same distribution as the random vector Z given at the beginning of Sect. 3. Therefore, by (7)

$$\displaystyle \begin{aligned} \frac{\parallel X_i-\mu_+ \parallel ^2}{d}&=\frac{\parallel Z_i \parallel ^2r^2}{d}\overset{P}{\longrightarrow}2r^2,\ \forall i;\\ \frac{\parallel X_i-X_j \parallel ^2}{d}&=\frac{\parallel Z_i-Z_j \parallel ^2r^2}{d}\overset{P}{\longrightarrow}4r^2,\ \forall i\neq j;\\ \frac{\parallel Y_i-\mu_- \parallel ^2}{d}&=\frac{\parallel Z_{m+i} \parallel ^2}{d}\overset{P}{\longrightarrow}2, \ \forall i;\\ \frac{\parallel Y_i-Y_j \parallel ^2}{d}&=\frac{\parallel Z_{m+i}-Z_{m+j} \parallel ^2}{d}\overset{P}{\longrightarrow}4,\ \forall i\neq j; \end{aligned} $$

as d →. Thus conditions (1) and (2) hold with \(\sigma =\sqrt {2}r\) and \(\tau =\sqrt {2}\). We also have

$$\displaystyle \begin{aligned} \frac{\parallel v_d \parallel ^2}{d}=\frac{\parallel \mu_+-\mu_- \parallel ^2}{d}=\frac{\parallel \beta{\mathbf{1}}_d \parallel ^2}{d}=\beta^2, \end{aligned} $$
(26)

therefore condition (3) holds with c = β.

Now we will see that condition (4) holds. Observe that

$$\displaystyle \begin{aligned} \frac{\parallel X_i-Y_j \parallel ^2}{d}&=\frac{\parallel Z_i \parallel ^2}{d}r^2+\frac{\parallel Z_{m+j}^2 \parallel }{d}+\frac{\parallel v_d \parallel ^2}{d}\\ &\quad -2r\frac{\left\langle Z_i,Z_{m+j} \right\rangle }{d} +2r\frac{\left\langle Z_i,v_d \right\rangle }{d}-2\frac{\left\langle Z_{m+j},v_d \right\rangle }{d},{} \end{aligned} $$
(27)

for all i, j. From the properties of Z given in [3], we have that

$$\displaystyle \begin{aligned} \frac{\left\langle Z_i,Z_j \right\rangle }{d}\overset{P}{\longrightarrow}0,\ \forall i\neq j,\quad \frac{\left\langle Z_i,v_d \right\rangle }{d}\overset{P}{\longrightarrow}0,\ \forall i,\quad \text{as }d\to\infty. \end{aligned} $$
(28)

Therefore, by (7) and (26)–(28) we have

$$\displaystyle \begin{aligned} \frac{\parallel X_i-Y_j \parallel ^2}{d}\overset{P}{\longrightarrow}2r^2+2+\beta^2=\sigma^2+\tau^2+c^2\quad \text{as }d\to\infty. \end{aligned} $$

Then the condition (4) holds.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bolivar-Cime, A. (2021). More About Asymptotic Properties of Some Binary Classification Methods for High Dimensional Data. In: Hernández‐Hernández, D., Leonardi, F., Mena, R.H., Pardo Millán, J.C. (eds) Advances in Probability and Mathematical Statistics. Progress in Probability, vol 79. Birkhäuser, Cham. https://doi.org/10.1007/978-3-030-85325-9_3

Download citation

Publish with us

Policies and ethics