More About Asymptotic Properties of Some Binary Classification Methods for High Dimensional Data

Bolivar-Cime, Addy

doi:10.1007/978-3-030-85325-9_3

Addy Bolivar-Cime⁹

Part of the book series: Progress in Probability ((PRPR,volume 79))

388 Accesses

Abstract

In this manuscript we study the asymptotic behavior of the following binary classification methods: Support Vector Machine, Mean Difference, Distance Weighted Discrimination and Maximal Data Piling, when the dimension of the data increases and the sample sizes of the classes are fixed. We consider multivariate data with the asymptotic geometric structure of n-simplex, such that the multivariate standard Gaussian distribution, as the dimension increases and the sample size n is fixed. We provide the asymptotic behavior of the four methods in terms of the angle between the normal vector of the separating hyperplane of the method and the optimal direction for classification, under more general conditions than those of Bolivar-Cime and Cordova-Rodriguez (Commun Stat Theory Methods 47(11):2720–2740, 2018). We also analyze the asymptotic behavior of the probabilities of misclassification of the methods. A simulation study is performed to illustrate the theoretical results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahn, J., Marron, J.S.: The maximal data piling direction for discrimination. Biometrika 97(1), 254–259 (2010)
Article MathSciNet Google Scholar
Ahn, J., Marron, J.S., Muller, K.M., Chi, Y.: The high-dimension, low-sample-size geometric representation holds under mild conditions. Biometrika 94(3), 760–766 (2007)
Article MathSciNet Google Scholar
Bolivar-Cime, A., Cordova-Rodriguez, L.M.: Binary discrimination methods for high dimensional data with a geometric representation. Commun. Stat. Theory Methods 47(11), 2720–2740 (2018)
Article MathSciNet Google Scholar
Bolivar-Cime, A., Marron, J.S.: Comparison of binary discrimination methods for high dimension low sample size data. J. Multivar. Anal. 115, 108–121 (2013)
Article MathSciNet Google Scholar
Hall, P., Marron, J.S., Neeman, A.: Geometric representation of high dimension, low sample size data. J. R. Stat. Soc. B 67(3), 427–444 (2005)
Article MathSciNet Google Scholar
Jung, S., Marron, J.S.: PCA consistency in high dimension, low sample size context. Ann. Stat. 37(6B), 4104–4130 (2009)
Article MathSciNet Google Scholar
Marron, J.S.: Distance-weighted discrimination. WIREs Comput. Stat. 7, 109–114 (2015)
Article MathSciNet Google Scholar
Marron, J.S., Todd, M.J., Ahn, J.: Distance-weighted discrimination. J. Am. Stat. Assess. 102(480), 1267–1271 (2007)
Article MathSciNet Google Scholar
Qiao, X., Zhang, H.H., Liu, Y., Todd, M.J., Marron, J.S.: Weighted distance weighted discrimination and its asymptotic properties. J. Am. Stat. Assess. 105(489), 401–414 (2010)
Article MathSciNet Google Scholar
Qiao, X., Zhang, L.: Flexible high-dimensional classification machines and their asymptotic properties. J. Mach. Learn. Res. 16, 1547–1572 (2015)
MathSciNet MATH Google Scholar
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. The MIT Press, Cambridge, Massachusetts (2002)
Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)
Book Google Scholar
Yata, K., Aoshima, M.: Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations. J. Multivar. Anal. 105(1), 193–215 (2012)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

División Académica de Ciencias Básicas, Universidad Juárez Autónoma de Tabasco, Villahermosa, Mexico
Addy Bolivar-Cime

Authors

Addy Bolivar-Cime
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Addy Bolivar-Cime .

Editor information

Editors and Affiliations

Department of Probability and Statistics, Centro de Investigación en Matemáticas, Guanajuato, Mexico
Daniel Hernández‐Hernández
Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, Brazil
Florencia Leonardi
Departamento de Probabilidad, IIMAS-UNAM, Mexico City, Mexico
Ramsés H. Mena
Department of Probability and Statistics, Centro de Investigación en Matemáticas, Guanajuato, Mexico
Juan Carlos Pardo Millán

Appendix

1.1 Proof of Theorem 2.1

Case 1: The vectorvis the normal vector of the MD, SVM or DWD hyperplane. Let $\widetilde {v}$ be the vector given in Lemma 2.1. Let $X_i^{\prime }=X_i-\mu _+$ and $Y_j^{\prime }=Y_j-\mu _-$, for i = 1, 2, …, m and j = 1, 2, …, n. We denote by $\left \langle x,y \right \rangle $ the dot product between the d-dimensional vectors x and y. Note that

$$\displaystyle \begin{aligned} \parallel \widetilde{v} \parallel ^2&=\sum_{i=1}^m \alpha_{i+}^2\parallel X_i^{\prime} \parallel ^2+\sum_{j=1}^n \alpha_{j-}^2\parallel Y_j^{\prime} \parallel ^2+\parallel v_d \parallel ^2\\ &\quad +2\sum_{i<j}\alpha_{i+}\alpha_{j+}\left\langle X_i^{\prime},X_j^{\prime} \right\rangle +2\sum_{i<j}\alpha_{i-}\alpha_{j-}\left\langle Y_i^{\prime},Y_j^{\prime} \right\rangle \\ &\quad +2\sum_{i=1}^m \alpha_{i+}\left\langle X_i^{\prime},v_d \right\rangle -2\sum_{j=1}^n \alpha_{j-}\left\langle Y_j^{\prime},v_d \right\rangle -2\sum_{i=1}^m\sum_{j=1}^n\alpha_{i+}\alpha_{j-}\left\langle X_i^{\prime},Y_j^{\prime} \right\rangle .{} \end{aligned} $$

(8)

Dividing both sides of (8) by d, we have by Lemma 2.1 and (1)–(3) that the sum of the first three terms of the right side converges in probability to σ ²∕m + τ ²∕n + c ² as d →∞. Now we will see that the sum of the last five terms converges in probability to zero as d →∞. By (1) we have that

$$\displaystyle \begin{aligned} \frac{\left\langle X_i^{\prime},X_j^{\prime} \right\rangle }{d}=\frac{1}{2}\left(\frac{\parallel X_i^{\prime} \parallel ^2}{d}+\frac{\parallel X_j^{\prime} \parallel ^2}{d}-\frac{\parallel X_i-X_j \parallel ^2}{d}\right)\overset{P}{\longrightarrow}\frac{1}{2}(\sigma^2+\sigma^2-2\sigma^2)=0 \end{aligned} $$

(9)

as d →∞, for i ≠ j. Analogously

$$\displaystyle \begin{aligned} \frac{\left\langle Y_i^{\prime},Y_j^{\prime} \right\rangle }{d}\overset{P}{\longrightarrow}0 \quad \text{as }d\to\infty, \text{ for }i\neq j. \end{aligned} $$

(10)

Observe that sum of the last three terms of the right side of (8) is equal to

$$\displaystyle \begin{aligned} 2\sum_{i=1}^m\sum_{j=1}^n\alpha_{i+}\alpha_{j-} [\left\langle X_i^{\prime},v_d \right\rangle -\left\langle Y_j^{\prime},v_d \right\rangle -\left\langle X_i^{\prime},Y_j^{\prime} \right\rangle ]. \end{aligned} $$

From (1)–(4) and the equality

$$\displaystyle \begin{aligned} \frac{\parallel X_i-Y_i \parallel ^2}{d}&=\frac{\parallel X_i^{\prime} \parallel ^2}{d}+\frac{\parallel Y_j^{\prime} \parallel ^2}{d}+\frac{\parallel v_d \parallel ^2}{d}\\ &\quad +\frac{2}{d}\left[\left\langle X_i^{\prime},v_d \right\rangle -\left\langle Y_j^{\prime},v_d \right\rangle -\left\langle X_i^{\prime},Y_j^{\prime} \right\rangle \right], \end{aligned} $$

we have that

$$\displaystyle \begin{aligned} \frac{1}{d}[\left\langle X_i^{\prime},v_d \right\rangle -\left\langle Y_j^{\prime},v_d \right\rangle -\left\langle X_i^{\prime},Y_j^{\prime} \right\rangle ]\overset{P}{\longrightarrow} 0\quad \text{as }d\to\infty,\ \forall i, j. \end{aligned} $$

(11)

Therefore, by Lemma 2.1 and (9)–(11), we have that the sum of the last five terms of the right side of (8) divided by d converges in probability to zero as d →∞. Thus,

$$\displaystyle \begin{aligned} \frac{\parallel \widetilde{v} \parallel ^2}{d}\overset{P}{\longrightarrow} \frac{\sigma^2}{m}+\frac{\tau^2}{n}+c^2\quad \text{as }d\to\infty. \end{aligned} $$

(12)

From the results of [5], under the asymptotic geometric structure of the data, if $Y^*_1,Y^*_2,\dots , Y^*_k$ are independent and identically distributed d-dimensional random vectors with the same distribution as the vectors of the class C ₋ and $\overline {Y}^*_k=\sum _{j=1}^kY^*_j/k$, we have

$$\displaystyle \begin{aligned} \frac{\parallel X_i-\overline{Y}^*_k \parallel ^2}{d}\overset{P}{\longrightarrow}c^2+\sigma^2+\frac{\tau^2}{k} \quad \text{as }d\to\infty. \end{aligned} $$

Since $\overline {Y}^*_k$ converges in probability to μ ₋ as k →∞, we have that

$$\displaystyle \begin{aligned} \frac{\parallel X_i-\mu_- \parallel ^2}{d}\overset{P}{\longrightarrow} c^2+\sigma^2\quad \text{as }d\to\infty. \end{aligned} $$

Furthermore, by (1) and (3)

$$\displaystyle \begin{aligned} \frac{\parallel X_i-\mu_+ \parallel ^2}{d}\overset{P}{\longrightarrow}\sigma^2\quad \text{and}\quad \frac{\parallel \mu_+-\mu- \parallel ^2}{d}\overset{P}{\longrightarrow} c^2, \end{aligned} $$

as d →∞. Thus by the Pythagoras theorem, after rescaling by d ^−1∕2, the segments X _i μ ₋, X _i μ ₊ and μ ₊ μ ₋ tend to form a right triangle as d →∞, where the hypotenuse is X _i μ ₋. Therefore, $X^{\prime }_i/d^{1/2}=(X_i-\mu _+)/d^{1/2}$ and v _d∕d ^1∕2 = (μ ₊ − μ ₋)∕d ^1∕2 tend to be orthogonal as d →∞, then

$$\displaystyle \begin{aligned} \frac{\left\langle X^{\prime}_i,v_d \right\rangle }{d}\overset{P}{\longrightarrow}0\quad \text{as }d\to\infty,\ \forall i. \end{aligned} $$

(13)

We also have that

$$\displaystyle \begin{aligned} \frac{\left\langle X^{\prime}_i,v_d \right\rangle }{d^{1/2}\parallel v_d \parallel }=\frac{\left\langle X^{\prime}_i,v_d \right\rangle }{d}\frac{d^{1/2}}{\parallel v_d \parallel }\overset{P}{\longrightarrow}0*c^{-1}=0\quad \text{as }d\to\infty,\ \forall i. \end{aligned} $$

(14)

Analogously,

$$\displaystyle \begin{aligned} \frac{\left\langle Y^{\prime}_j,v_d \right\rangle }{d}\overset{P}{\longrightarrow}0,\quad \frac{\left\langle Y^{\prime}_j,v_d \right\rangle }{d^{1/2}\parallel v_d \parallel }\overset{P}{\longrightarrow}0,\quad \text{as }d\to\infty,\ \forall j. \end{aligned} $$

(15)

Note that

$$\displaystyle \begin{aligned} \left\langle \widetilde{v},v_d \right\rangle =\sum_{i=1}^m\alpha_{i+}\left\langle X^{\prime}_i,v_d \right\rangle -\sum_{j=1}^n\alpha_{j-}\left\langle Y^{\prime}_j,v_d \right\rangle +\parallel v_d \parallel ^2. \end{aligned} $$

(16)

Therefore, dividing both sides of (16) by d ^1∕2 ∥ v _d ∥, from Lemma 2.1, (3), (14) and (15) we have

$$\displaystyle \begin{aligned} \frac{\left\langle \widetilde{v},v_d \right\rangle }{d^{1/2}\parallel v_d \parallel }\overset{P}{\longrightarrow} c\quad \text{as }d\to\infty. \end{aligned} $$

(17)

By (12) and (17) we have

$$\displaystyle \begin{aligned} \frac{\left\langle \widetilde{v},v_d \right\rangle }{\parallel \widetilde{v} \parallel \parallel v_d \parallel }=\frac{\left\langle \widetilde{v},v_d \right\rangle /(d^{1/2}\parallel v_d \parallel )}{\parallel \widetilde{v} \parallel /d^{1/2}}\overset{P}{\longrightarrow} \frac{c}{\left(\sigma^2/m+\tau^2/n+c^2\right)} \end{aligned} $$

as d →∞. Then

$$\displaystyle \begin{aligned} \mathrm{Angle}(\widetilde{v},v_d)=\arccos\left(\frac{\left\langle \widetilde{v},v_d \right\rangle }{\parallel \widetilde{v} \parallel \parallel v_d \parallel }\right)\overset{P}{\longrightarrow} \arccos\left[\frac{c}{(\sigma^2/m+\tau^2/n+c^2)^{1/2}}\right] \end{aligned} $$

as d →∞.

Case 2: The vectorvis the normal vector of the MDP hyperplane. Let $X^{\prime }_i$ and $Y^{\prime }_j$ be as in case 1, for i = 1, …, m and j = 1, 2, …, n. From (1) and the results of [5] we have

$$\displaystyle \begin{aligned} &\frac{\parallel \overline{X}-\mu_+ \parallel ^2}{d}\overset{P}{\longrightarrow}\frac{\sigma^2}{m},{} \end{aligned} $$

(18)

$$\displaystyle \begin{aligned} &\frac{\parallel X_i-\overline{X} \parallel ^2}{d}\overset{P}{\longrightarrow} \frac{m-1}{m}\sigma^2, {} \end{aligned} $$

(19)

as d →∞. By (11), (13) and (15) we have

$$\displaystyle \begin{aligned} \frac{\left\langle X^{\prime}_i,Y^{\prime}_j \right\rangle }{d}\overset{P}{\longrightarrow}0\quad \text{as }d\to\infty,\ \forall i,j. \end{aligned} $$

(20)

Note that

$$\displaystyle \begin{aligned} \left\langle X_i-\overline{X},\overline{X}-\overline{Y} \right\rangle &=\left\langle X^{\prime}_i-(\overline{X}-\mu_+),(\overline{X}-\mu_+)-(\overline{Y}-\mu_-)+v_d \right\rangle \\ &=\frac{1}{m}\parallel X^{\prime}_i \parallel ^2+\frac{1}{m}\sum_{j\neq i}\left\langle X^{\prime}_i,X^{\prime}_j \right\rangle -\left\langle X^{\prime}_i,\overline{Y}-\mu_- \right\rangle +\left\langle X^{\prime}_i,v_d \right\rangle \\ &\quad -\parallel \overline{X}-\mu_+ \parallel ^2+\left\langle \overline{X}-\mu_+,\overline{Y}-\mu_- \right\rangle -\left\langle \overline{X}-\mu_+,v_d \right\rangle . \end{aligned} $$

Dividing both sides of the last equality by d ^1∕2 ∥ v _d ∥, from (1), (9), (13), (18) and (20) we have

$$\displaystyle \begin{aligned} \frac{\left\langle X_i-\overline{X},\overline{X}-\overline{Y} \right\rangle }{d^{1/2}\parallel v_d \parallel }=\frac{d^{1/2}}{\parallel v_d \parallel }\frac{\left\langle X_i-\overline{X},\overline{X}-\overline{Y} \right\rangle }{d}\overset{P}{\longrightarrow}c^{-1}\left(\frac{\sigma^2}{m}-\frac{\sigma^2}{m}\right)=0 \end{aligned} $$

(21)

as d →∞. Furthermore, by (3) and (12) we have

$$\displaystyle \begin{aligned} \frac{\parallel \overline{X}-\overline{Y} \parallel }{\parallel v_d \parallel }\overset{P}{\longrightarrow} \frac{(\sigma^2/m+\tau^2/n+c^2)^{1/2}}{c}\quad \text{as }d\to\infty. \end{aligned} $$

(22)

Thus, by (19), (21) and (22) it follows that

$$\displaystyle \begin{aligned} \mathrm{Angle}(X_i-\overline{X},\overline{X}-\overline{Y})&=\arccos\left[\frac{\left\langle X_i-\overline{X},\overline{X}-\overline{Y} \right\rangle }{\parallel X_i-\overline{X} \parallel \parallel \overline{X}-\overline{Y} \parallel } \right]\\ &=\arccos\left[\frac{\left\langle X_i-\overline{X},\overline{X}-\overline{Y} \right\rangle /(d^{1/2}\parallel v_d \parallel )}{(\parallel X_i-\overline{X} \parallel /d^{1/2})(\parallel \overline{X}-\overline{Y} \parallel /\parallel v_d \parallel )} \right]\\ &\overset{P}{\longrightarrow} \arccos(0)=\frac{\pi}{2} \end{aligned} $$

(23)

as d →∞, ∀i. Analogously

$$\displaystyle \begin{aligned} \mathrm{Angle}(Y_j-\overline{Y},\overline{X}-\overline{Y})\overset{P}{\longrightarrow}\frac{\pi}{2}\quad \text{as }d\to\infty,\ \forall j. \end{aligned} $$

(24)

As it was shown in the proof of Theorem 3.1 of [3], (23) and (24) imply that when d is large, the normal vector v of the MDP hyperplane is approximately in the same direction as $(\overline {X}-\overline {Y})/\parallel \overline {X}-\overline {Y} \parallel $. Hence, $\mathrm {Angle}(v,v_d)=\arccos (\left \langle v,v_d \right \rangle /(\parallel v \parallel \parallel v_d \parallel ))$ is approximately

$$\displaystyle \begin{aligned} \arccos[\left\langle \overline{X}-\overline{Y},v_d \right\rangle /(\parallel \overline{X}-\overline{Y} \parallel \parallel v_d \parallel )] \end{aligned} $$

(25)

when d is large, which by case 1 converges in probability to $\arccos \left [\frac {c}{(\sigma ^2/m+\tau ^2/n+c^2)^{1/2}}\right ]$ as d →∞.

1.2 The Data in the Simulations Satisfy Conditions (1)–(4)

We have that X _i is equal in distribution to rZ _i + β 1 _d, for i = 1, 2, …, m, and Y _j is equal in distribution to Z _m+j, for j = 1, 2, …, n, where Z ₁, Z ₂, …, Z _m+n are independent and identically distributed with the same distribution as the random vector Z given at the beginning of Sect. 3. Therefore, by (7)

$$\displaystyle \begin{aligned} \frac{\parallel X_i-\mu_+ \parallel ^2}{d}&=\frac{\parallel Z_i \parallel ^2r^2}{d}\overset{P}{\longrightarrow}2r^2,\ \forall i;\\ \frac{\parallel X_i-X_j \parallel ^2}{d}&=\frac{\parallel Z_i-Z_j \parallel ^2r^2}{d}\overset{P}{\longrightarrow}4r^2,\ \forall i\neq j;\\ \frac{\parallel Y_i-\mu_- \parallel ^2}{d}&=\frac{\parallel Z_{m+i} \parallel ^2}{d}\overset{P}{\longrightarrow}2, \ \forall i;\\ \frac{\parallel Y_i-Y_j \parallel ^2}{d}&=\frac{\parallel Z_{m+i}-Z_{m+j} \parallel ^2}{d}\overset{P}{\longrightarrow}4,\ \forall i\neq j; \end{aligned} $$

as d →∞. Thus conditions (1) and (2) hold with $\sigma =\sqrt {2}r$ and $\tau =\sqrt {2}$. We also have

$$\displaystyle \begin{aligned} \frac{\parallel v_d \parallel ^2}{d}=\frac{\parallel \mu_+-\mu_- \parallel ^2}{d}=\frac{\parallel \beta{\mathbf{1}}_d \parallel ^2}{d}=\beta^2, \end{aligned} $$

(26)

therefore condition (3) holds with c = β.

Now we will see that condition (4) holds. Observe that

$$\displaystyle \begin{aligned} \frac{\parallel X_i-Y_j \parallel ^2}{d}&=\frac{\parallel Z_i \parallel ^2}{d}r^2+\frac{\parallel Z_{m+j}^2 \parallel }{d}+\frac{\parallel v_d \parallel ^2}{d}\\ &\quad -2r\frac{\left\langle Z_i,Z_{m+j} \right\rangle }{d} +2r\frac{\left\langle Z_i,v_d \right\rangle }{d}-2\frac{\left\langle Z_{m+j},v_d \right\rangle }{d},{} \end{aligned} $$

(27)

for all i, j. From the properties of Z given in [3], we have that

$$\displaystyle \begin{aligned} \frac{\left\langle Z_i,Z_j \right\rangle }{d}\overset{P}{\longrightarrow}0,\ \forall i\neq j,\quad \frac{\left\langle Z_i,v_d \right\rangle }{d}\overset{P}{\longrightarrow}0,\ \forall i,\quad \text{as }d\to\infty. \end{aligned} $$

(28)

Therefore, by (7) and (26)–(28) we have

$$\displaystyle \begin{aligned} \frac{\parallel X_i-Y_j \parallel ^2}{d}\overset{P}{\longrightarrow}2r^2+2+\beta^2=\sigma^2+\tau^2+c^2\quad \text{as }d\to\infty. \end{aligned} $$

Then the condition (4) holds.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bolivar-Cime, A. (2021). More About Asymptotic Properties of Some Binary Classification Methods for High Dimensional Data. In: Hernández‐Hernández, D., Leonardi, F., Mena, R.H., Pardo Millán, J.C. (eds) Advances in Probability and Mathematical Statistics. Progress in Probability, vol 79. Birkhäuser, Cham. https://doi.org/10.1007/978-3-030-85325-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-85325-9_3
Published: 05 August 2021
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-030-85324-2
Online ISBN: 978-3-030-85325-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

More About Asymptotic Properties of Some Binary Classification Methods for High Dimensional Data

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Proof of Theorem 2.1

1.2 The Data in the Simulations Satisfy Conditions (1)–(4)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation