Nonparametric classification of high dimensional observations

Modarres, Reza

doi:10.1007/s00362-022-01363-3

Nonparametric classification of high dimensional observations

Regular Article
Published: 08 October 2022

Volume 64, pages 1833–1859, (2023)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Reza Modarres ORCID: orcid.org/0000-0003-1240-6027¹

350 Accesses
Explore all metrics

Abstract

We consider the nonparametric classification of high dimensional, low sample size (HDLSS) data where the classical discrimination methods break down due to the singularity of the sample covariance matrix. We present new dissimilarity indices, discuss their asymptotic properties in the HDLSS setting, use them in building powerful classifiers, and compare their behavior with existing methods. We illustrate the difficulties with the Euclidean nearest neighbor method and prove that dissimilarity-based classifiers produce misclassification rates that tend to zero as \(p\rightarrow \infty \). We present test-based classifiers in the HDLSS setting. A simulation study compares the misclassification rates of diagonal linear discriminant analysis with twelve other nonparametric classifiers. The methods are applied to microarray data for classification of prostate cancer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Identification of Correlated Differential Features for Supervised Classification of High-Dimensional Data

Robust Classification of High-Dimensional Data Using Data-Adaptive Energy Distance

Small Sample Size in High Dimensional Space - Minimum Distance Based Classification

References

Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. In: International conference on database theory, pp 420–434. Springer
Ahn J, Marron JS, Muller KM, Chi Y (2007) The high-dimension, low-sample-size geometric representation holds under mild conditions. Biometrika 94:760–766
Article MathSciNet MATH Google Scholar
Angiulli F (2018) On the behavior of intrinsically high-dimensional spaces: distances, direct and reverse nearest neighbors, and hubness. J Mach Learn Res 18:1–60
MathSciNet MATH Google Scholar
Aryal S, Ting KM, Washio T, Haffari G (2017) Data-dependent dissimilarity measure: an effective alternative to geometric distance measures. Knowl Inf Syst 53:479–506
Article Google Scholar
Baringhaus L, Franz C (2004) On a new multivariate two-sample test. J Multivar Anal 88:190–206
Article MathSciNet MATH Google Scholar
Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: International conference on database theory, pp 217–235. Springer
Biswas M, Ghosh AK (2014) A nonparametric two-sample test applicable to high dimensional data. J Multivar Anal 123:160–171
Article MathSciNet MATH Google Scholar
Chakraborty S, Zhang X (2021) A new framework for distance and kernel-based metrics in high dimensions. Electron J Stat 15:5455–5522
Article MathSciNet MATH Google Scholar
Chan Y-B, Hall P (2009) Scale adjustments for classifiers in high-dimensional, low sample size settings. Biometrika 96:469–478
Article MathSciNet MATH Google Scholar
Chen SX, Qin YL (2010) A two sample test for high dimensional data with application togene-set testing. Ann Stat 38:808–835
Article MATH Google Scholar
Chung D, Keles S (2010) Sparse partial least squares classification for high dimensional data. Stat Appl Genet Mol Biol 9(1):17
Article MathSciNet MATH Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188
Article Google Scholar
Fix E, Hodges JL (1989) Discriminatory analysis: nonparametric discrimination: consistency properties. Int Stat Rev 57:238–247
Article MATH Google Scholar
Hall P, Marron JS, Neeman A (2005) Geometric representation of high dimension, low sample size data. J R Stat Soc B 67:427–444
Article MathSciNet MATH Google Scholar
Henze N (1988) A multivariate two-sample test based on the number of nearest neighbor type coincidences. Ann Stat 16:772–783
Article MathSciNet MATH Google Scholar
Huang S, Tong T, Zhao H (2010) Bias-corrected diagonal discriminant rules for high-dimensional classification. Biometrics 66(4):1096–1106
Article MathSciNet MATH Google Scholar
Johnson RA, Wichern DW (2002) Applied multivariate statistical analysis. Prentice-Hall, Upper Saddle River
MATH Google Scholar
Lange T, Mosler K, Mozharovskyi P (2014) Fast nonparametric classification based on data depth. Stat Pap 55:49–69
Article MathSciNet MATH Google Scholar
Liao SM, Akritas M (2007) Test-based classification: a linkage between classification and statistical testing. Stat Probab Lett 77(12):1269–1281
Article MathSciNet MATH Google Scholar
Marozzi M (2016) Multivariate tests based on interpoint distances with application to magnetic resonance imaging. Stat Methods Med Res 25:2593–2610
Article MathSciNet Google Scholar
Marozzi M, Mukherjee A, Kalina J (2020) Interpoint distance tests for high-dimensional comparison studies. J Appl Stat 47(4):653–665
Article MathSciNet MATH Google Scholar
Modarres R (2022) A high dimensional measure of dissimilarity. Comput Stat Data Anal 10:20. https://doi.org/10.1016/j.csda.2022.107560
Article MATH Google Scholar
Modarres R, Song Y (2020) Interpoint distances: applications, properties and visualization. J Appl Stoch Models Bus Ind 36(6):1147–1168
Article MathSciNet Google Scholar
Pal AK, Mondal PK, Ghosh AK (2016) High dimensional nearest neighbor classification based on differences of inter-point distances. Pattern Recogn Lett 74:1–8
Article Google Scholar
Radovanovic M, Nanopoulos A, Ivanovic M (2010) Hubs in space: popular nearest neighbors in high-dimensional data. J Mach Learn Res 11:2487–2531
MathSciNet MATH Google Scholar
Roy S, Sarkar S, Dutta S, Ghosh AK (2022a) On generalizations of some distance based classifiers for HDLSS data. J Mach Learn Res 23:1–41
Roy S, Choudhury JR, Dutta S (2022b) On some fast and robust classifiers for high dimension, low sample size data. In: Proceedings of the 25th international conference on artificial intelligence and statistics, vol 151. Proceedings of machine learning research. PMLR, 28–30, 9943–9968
Sarkar S, Ghosh AK (2020) On perfect clustering of high dimension, low sample size data. IEEE Trans Pattern Anal Mach Intell 42(9):2257–2272
Article Google Scholar
Sarkar S, Biswas R, Ghosh AK (2020) High dimensional two-sample tests based on a new class of dissimilarity indices. Technical Report
Stiglic G, Kokol P (2010) Stability of ranked gene lists in large microarray analysis studies. J Biomed Biotechnol 616358:1–9
Article Google Scholar
Ting KM, Zhu Y, Carman M, Zhu Y, Zhou Z-H (2016) Overcoming key weaknesses of distance-based neighborhood methods using a data dependent dissimilarity measure. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1205–1214
West M, Blanchette C, Dressman H, Huang E, Ishida S, Sprang R, Zuzan H, Olson J, Marks J, Nevins J (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 98:11462–11467
Article Google Scholar
Xu P, Brock GN, Parrish RS (2009) Modified linear discriminant analysis approaches for classification of high-dimensional microarray data. Comput Stat Data Anal 53(5):1674–1687
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

I would like to thank two anonymous referees for constructive suggestions.

Author information

Authors and Affiliations

Department of Statistics, George Washington University, Washington, DC, USA
Reza Modarres

Authors

Reza Modarres
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reza Modarres.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof

Lemma 3 Assuming (B1)–(B3) hold, if \(\mathbf{z}\) is from \(\pi _x\), then \(\mathbb {I}_j^\delta (\mathbf{X}_i)\rightarrow 1\) and \(\mathbb {I}_j^\delta (\mathbf{Y}_j)\rightarrow 0\) so that \(T_1(NN_{\delta _0})\rightarrow k>0\) while \(T_2(NN_{\delta _0})\rightarrow 0\) with probability 1 as \(p\rightarrow \infty \). Hence, \(T_1(NN_{\delta _0})>T_2(NN_{\delta _0})\) and \(\mathbf{z}\) will be assigned to \(\pi _x\) with probability 1 as \(p\rightarrow \infty \). Similarly, if \(\mathbf{z}\) is from \(\pi _y\), then \(\mathbb {I}_j^\delta (\mathbf{X}_i)\rightarrow 0\) and \(\mathbb {I}_j^\delta (\mathbf{Y}_j)\rightarrow 1\) so that \(T_1(NN_{\delta _0})\rightarrow 0\) while \(T_2(NN_{\delta _0})\rightarrow k>0\) with probability 1 as \(p\rightarrow \infty \). Hence, \(T_1(NN_{\delta _0})<T_2(NN_{\delta _0})\) and \(\mathbf{z}\) will be assigned to \(\pi _x\) with probability 1 as \(p\rightarrow \infty \). Replacing \(\delta _0\) with \(\rho _0\) in the above argument establishes that as \(p\rightarrow \infty \), the NN method with index \(\rho _0\) has zero misclassification rate as \(p\rightarrow \infty \). \(\square \)

Proof

Lemma 4 Assuming (B1)–(B3) hold, and using comment 1, if \(\mathbf{z}\) is from \(\pi _x\), then \(T_1(\delta _0)\rightarrow 0\) while \(T_2(\delta _0)\rightarrow m\tilde{\delta }_0({\mathbb F},{\mathbb G})>0\) with probability 1 as \(p\rightarrow \infty \). Hence, \(T_1(\delta _0)<T_2(\delta _0)\) and \(\mathbf{z}\) will be assigned to \(\pi _x\) with probability 1 as \(p\rightarrow \infty \) If \(\mathbf{z}\) is from \(\pi _y\), then \(T_1(\delta _0)\rightarrow m\tilde{\delta }_0({\mathbb F},{\mathbb G})>0\) while \(T_2(\delta _0)\rightarrow 0\) with probability 1 as \(p\rightarrow \infty \). Hence, \(T_1(\delta _0)>T_2(\delta _0)\) and \(\mathbf{z}\) will be assigned to \(\pi _y\) with probability 1 as \(p\rightarrow \infty \).

Comment 2 explains the behavior of \(T_1(\rho _0)\). If \(\mathbf{z}\) is from \(\pi _x\), then \(T_1(\rho _0)\rightarrow 0\) while \(T_2(\rho _0)\rightarrow m\tilde{\rho }_0({\mathbb F},{\mathbb G})>0\) with probability 1 as \(p\rightarrow \infty \). Hence, \(T_1(\rho _0)<T_2(\rho _0)\) and \(\mathbf{z}\) will be assigned to \(\pi _x\) with probability 1 as \(p\rightarrow \infty \) If the new observation \(\mathbf{z}\) is from \(\pi _y\), then \(T_1(\rho _0)\rightarrow m\tilde{\rho }_0({\mathbb F},{\mathbb G})>0\) while \(T_2(\rho _0)\rightarrow 0\) with probability 1 as \(p\rightarrow \infty \). Hence, \(T_1(\rho _0)>T_2(\rho _0)\) and \(\mathbf{z}\) will be assigned to \(\pi _y\) with probability 1 as \(p\rightarrow \infty \). \(\square \)

Proof

Lemma 1 We obtain \(T_1(BF_\delta )=2\bar{\delta }_{(x\cup z)y}-\bar{\delta }_{x\cup z}-\bar{\delta }_{y}\). and \(T_2({BF}_\delta )=2\bar{\delta }_{x(y\cup z)}-\bar{\delta }_{x}-\bar{\delta }_{y\cup z}\). Assuming (B1)–(B3) hold, if \(\mathbf{z}\) is from \(\pi _x\) and assigned to \(\pi _x\), then there are \(m+1\) observations in \(x\cup z\) and n observations in the \(\mathbf{Y}\) sample. Hence, using comment 2, \(\bar{\delta }_{x\cup z}\rightarrow 0\), \(\bar{\delta }_{(y)}\rightarrow 0\), and \(\bar{\delta }_{(x\cup z)y}\rightarrow \tilde{\delta }_0({\mathbb F},{\mathbb G})\) as \(p\rightarrow \infty \). Hence, \(T_1({BF}_\delta )\) converges to \(2\tilde{\delta }_0({\mathbb F},{\mathbb G})>0\) in probability as \(p\rightarrow \infty \). If the new observation \(\mathbf{z}\) is from \(\pi _x\) and assigned to \(\pi _y\), then there are m observations in \(\mathbf{X}\) sample and \(n+1\) observations in the \(y\cup z\) sample. Hence, \(\delta _{x}\rightarrow 0\), \(\bar{\delta }_{y\cup z}\rightarrow \frac{2}{n+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\), and \(\bar{\delta }_{x(y\cup z)}\rightarrow \frac{n}{n+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\). Hence, \(T_2({BF}_\delta )\) converges to \(\frac{2(n-1)}{n+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\) in probability as \(p\rightarrow \infty \). Since \(T_1(BF_{\delta _0})>T_2(BF_{\delta _0})\), \(\mathbf{z}\) will be assigned to \(\pi _x\) with probability 1 as \(p\rightarrow \infty \).

Assuming (B1)–(B3) hold, if \(\mathbf{z}\) is from \(\pi _y\) and assigned to \(\pi _x\), then there are \(m+1\) observations in \(x\cup z\) and n observations in the \(\mathbf{Y}\) sample. Hence, \(\bar{\delta }_{x\cup z}\rightarrow \frac{2}{m+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\), \(\bar{\delta }_{(y)}\rightarrow 0\), and \(\bar{\delta }_{(x\cup z)y}\rightarrow \frac{m}{m+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\) as \(p\rightarrow \infty \). Now, \(T_1({BF}_\delta )\) converges to \(\frac{2(m-1)}{m+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})>0\) in probability as \(p\rightarrow \infty \). If \(\mathbf{z}\) is from \(\pi _y\) and assigned to \(\pi _y\), then there are m observations in \(\mathbf{X}\) sample and \(n+1\) observations in the \(y\cup z\) sample. Hence, \(\delta _{x}\rightarrow 0\), \(\bar{\delta }_{y\cup z}\rightarrow 0\), and \(\bar{\delta }_{x(y\cup z)}\rightarrow \frac{n}{n+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\). Hence, \(T_2({BF}_\delta )\) converges to \(2\tilde{\delta }_0({\mathbb F},{\mathbb G})\) in probability as \(p\rightarrow \infty \). Since \(T_1(BF_{\delta _0})<T_2(BF_{\delta _0})\), \(\mathbf{z}\) will be assigned to \(\pi _y\) with probability 1 as \(p\rightarrow \infty \). \(\square \)

Proof

Lemma 2 We obtain \(T_1({BG}_\delta )=(\bar{\delta }_{(x\cup z) y}- \bar{\delta }_{(x\cup z)})^2+(\bar{\delta }_{(x\cup z) y}- \bar{\delta }_{(y)})^2\) and \(T_2({BG}_\delta )=(\bar{\delta }_{x (y\cup z)}- \bar{\delta }_{(y\cup z)})^2+(\bar{\delta }_{x(y\cup z)}- \bar{\delta }_{(x)})^2\). Assuming (B1)–(B3) hold, if \(\mathbf{z}\) is from \(\pi _x\) and assigned to \(\pi _x\), then there are \(m+1\) observations in \(x\cup z\) and n observations in the \(\mathbf{Y}\) sample. Hence, \(\bar{\delta }_{x\cup z}\rightarrow 0\), \(\bar{\delta }_{(y)}\rightarrow 0\), and \(\bar{\delta }_{(x\cup z)y}\rightarrow \tilde{\delta }_0({\mathbb F},{\mathbb G})\) as \(p\rightarrow \infty \). Hence, using comment 2, \(T_1({BF}_\delta )\) converges to \(2\tilde{\delta }^2_0({\mathbb F},{\mathbb G})>0\) in probability as \(p\rightarrow \infty \). If \(\mathbf{z}\) is from \(\pi _x\) and assigned to \(\pi _y\), then there are m observations in \(\mathbf{X}\) sample and \(n+1\) observations in the \(y\cup z\) sample. Hence, \(\delta _{x}\rightarrow 0\), \(\bar{\delta }_{y\cup z}\rightarrow \frac{2}{n+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\), and \(\bar{\delta }_{x(y\cup z)}\rightarrow \frac{n}{n+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\). Hence, \(T_2({BF}_\delta )\) converges to \(\frac{(n-2)^2+n^2}{(n+1)^2}\tilde{\delta }^2_0({\mathbb F},{\mathbb G})\) in probability as \(p\rightarrow \infty \). Since \(T_1(BG_{\delta _0})>T_2(BG_{\delta _0})\), \(\mathbf{z}\) will be assigned to \(\pi _x\) with probability 1 as \(p\rightarrow \infty \).

Assuming (B1)–(B3) hold, if \(\mathbf{z}\) is from \(\pi _y\) and assigned to \(\pi _x\), then there are \(m+1\) observations in \(x\cup z\) and n observations in the \(\mathbf{Y}\) sample. Hence, using comment 1, \(\bar{\delta }_{x\cup z}\rightarrow \frac{2}{m+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\), \(\bar{\delta }_{(y)}\rightarrow 0\), and \(\bar{\delta }_{(x\cup z)y}\rightarrow \frac{m}{m+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\) as \(p\rightarrow \infty \). Hence, \(T_1({BF}_\delta )\) converges to \(\frac{(m-2)^2+m^2}{(m+1)^2}\tilde{\delta }^2_0({\mathbb F},{\mathbb G})>0\) in probability as \(p\rightarrow \infty \). If the new observation \(\mathbf{z}\) is from \(\pi _y\) and assigned to \(\pi _y\), then there are m observations in \(\mathbf{X}\) sample and \(n+1\) observations in the \(y\cup z\) sample. Hence, \(\delta _{x}\rightarrow 0\), \(\bar{\delta }_{y\cup z}\rightarrow 0\), and \(\bar{\delta }_{x(y\cup z)}\rightarrow \frac{m}{m+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\). Hence, \(T_2({BF}_\delta )\) converges to \(\frac{2m^2}{(m+1)^2}\tilde{\delta }^2_0({\mathbb F},{\mathbb G})\) in probability as \(p\rightarrow \infty \). Since \(T_1(BG_{\delta _0})<T_2(BG_{\delta _0})\), \(\mathbf{z}\) will be assigned to \(\pi _y\) with probability 1 as \(p\rightarrow \infty \). \(\square \)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Modarres, R. Nonparametric classification of high dimensional observations. Stat Papers 64, 1833–1859 (2023). https://doi.org/10.1007/s00362-022-01363-3

Download citation

Received: 12 March 2022
Revised: 20 September 2022
Accepted: 22 September 2022
Published: 08 October 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00362-022-01363-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonparametric classification of high dimensional observations

Abstract

Access this article

Similar content being viewed by others

On the Identification of Correlated Differential Features for Supervised Classification of High-Dimensional Data

Robust Classification of High-Dimensional Data Using Data-Adaptive Energy Distance

Small Sample Size in High Dimensional Space - Minimum Distance Based Classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof

Proof

Proof

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Nonparametric classification of high dimensional observations

Abstract

Access this article

Similar content being viewed by others

On the Identification of Correlated Differential Features for Supervised Classification of High-Dimensional Data

Robust Classification of High-Dimensional Data Using Data-Adaptive Energy Distance

Small Sample Size in High Dimensional Space - Minimum Distance Based Classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof

Proof

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation