Skip to main content
Log in

Nonparametric classification of high dimensional observations

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

We consider the nonparametric classification of high dimensional, low sample size (HDLSS) data where the classical discrimination methods break down due to the singularity of the sample covariance matrix. We present new dissimilarity indices, discuss their asymptotic properties in the HDLSS setting, use them in building powerful classifiers, and compare their behavior with existing methods. We illustrate the difficulties with the Euclidean nearest neighbor method and prove that dissimilarity-based classifiers produce misclassification rates that tend to zero as \(p\rightarrow \infty \). We present test-based classifiers in the HDLSS setting. A simulation study compares the misclassification rates of diagonal linear discriminant analysis with twelve other nonparametric classifiers. The methods are applied to microarray data for classification of prostate cancer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. In: International conference on database theory, pp 420–434. Springer

  • Ahn J, Marron JS, Muller KM, Chi Y (2007) The high-dimension, low-sample-size geometric representation holds under mild conditions. Biometrika 94:760–766

    Article  MathSciNet  MATH  Google Scholar 

  • Angiulli F (2018) On the behavior of intrinsically high-dimensional spaces: distances, direct and reverse nearest neighbors, and hubness. J Mach Learn Res 18:1–60

    MathSciNet  MATH  Google Scholar 

  • Aryal S, Ting KM, Washio T, Haffari G (2017) Data-dependent dissimilarity measure: an effective alternative to geometric distance measures. Knowl Inf Syst 53:479–506

    Article  Google Scholar 

  • Baringhaus L, Franz C (2004) On a new multivariate two-sample test. J Multivar Anal 88:190–206

    Article  MathSciNet  MATH  Google Scholar 

  • Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “nearest neighbor” meaningful? In: International conference on database theory, pp 217–235. Springer

  • Biswas M, Ghosh AK (2014) A nonparametric two-sample test applicable to high dimensional data. J Multivar Anal 123:160–171

    Article  MathSciNet  MATH  Google Scholar 

  • Chakraborty S, Zhang X (2021) A new framework for distance and kernel-based metrics in high dimensions. Electron J Stat 15:5455–5522

    Article  MathSciNet  MATH  Google Scholar 

  • Chan Y-B, Hall P (2009) Scale adjustments for classifiers in high-dimensional, low sample size settings. Biometrika 96:469–478

    Article  MathSciNet  MATH  Google Scholar 

  • Chen SX, Qin YL (2010) A two sample test for high dimensional data with application togene-set testing. Ann Stat 38:808–835

    Article  MATH  Google Scholar 

  • Chung D, Keles S (2010) Sparse partial least squares classification for high dimensional data. Stat Appl Genet Mol Biol 9(1):17

    Article  MathSciNet  MATH  Google Scholar 

  • Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188

    Article  Google Scholar 

  • Fix E, Hodges JL (1989) Discriminatory analysis: nonparametric discrimination: consistency properties. Int Stat Rev 57:238–247

    Article  MATH  Google Scholar 

  • Hall P, Marron JS, Neeman A (2005) Geometric representation of high dimension, low sample size data. J R Stat Soc B 67:427–444

    Article  MathSciNet  MATH  Google Scholar 

  • Henze N (1988) A multivariate two-sample test based on the number of nearest neighbor type coincidences. Ann Stat 16:772–783

    Article  MathSciNet  MATH  Google Scholar 

  • Huang S, Tong T, Zhao H (2010) Bias-corrected diagonal discriminant rules for high-dimensional classification. Biometrics 66(4):1096–1106

    Article  MathSciNet  MATH  Google Scholar 

  • Johnson RA, Wichern DW (2002) Applied multivariate statistical analysis. Prentice-Hall, Upper Saddle River

    MATH  Google Scholar 

  • Lange T, Mosler K, Mozharovskyi P (2014) Fast nonparametric classification based on data depth. Stat Pap 55:49–69

    Article  MathSciNet  MATH  Google Scholar 

  • Liao SM, Akritas M (2007) Test-based classification: a linkage between classification and statistical testing. Stat Probab Lett 77(12):1269–1281

    Article  MathSciNet  MATH  Google Scholar 

  • Marozzi M (2016) Multivariate tests based on interpoint distances with application to magnetic resonance imaging. Stat Methods Med Res 25:2593–2610

    Article  MathSciNet  Google Scholar 

  • Marozzi M, Mukherjee A, Kalina J (2020) Interpoint distance tests for high-dimensional comparison studies. J Appl Stat 47(4):653–665

    Article  MathSciNet  MATH  Google Scholar 

  • Modarres R (2022) A high dimensional measure of dissimilarity. Comput Stat Data Anal 10:20. https://doi.org/10.1016/j.csda.2022.107560

    Article  MATH  Google Scholar 

  • Modarres R, Song Y (2020) Interpoint distances: applications, properties and visualization. J Appl Stoch Models Bus Ind 36(6):1147–1168

    Article  MathSciNet  Google Scholar 

  • Pal AK, Mondal PK, Ghosh AK (2016) High dimensional nearest neighbor classification based on differences of inter-point distances. Pattern Recogn Lett 74:1–8

    Article  Google Scholar 

  • Radovanovic M, Nanopoulos A, Ivanovic M (2010) Hubs in space: popular nearest neighbors in high-dimensional data. J Mach Learn Res 11:2487–2531

    MathSciNet  MATH  Google Scholar 

  • Roy S, Sarkar S, Dutta S, Ghosh AK (2022a) On generalizations of some distance based classifiers for HDLSS data. J Mach Learn Res 23:1–41

  • Roy S, Choudhury JR, Dutta S (2022b) On some fast and robust classifiers for high dimension, low sample size data. In: Proceedings of the 25th international conference on artificial intelligence and statistics, vol 151. Proceedings of machine learning research. PMLR, 28–30, 9943–9968

  • Sarkar S, Ghosh AK (2020) On perfect clustering of high dimension, low sample size data. IEEE Trans Pattern Anal Mach Intell 42(9):2257–2272

    Article  Google Scholar 

  • Sarkar S, Biswas R, Ghosh AK (2020) High dimensional two-sample tests based on a new class of dissimilarity indices. Technical Report

  • Stiglic G, Kokol P (2010) Stability of ranked gene lists in large microarray analysis studies. J Biomed Biotechnol 616358:1–9

    Article  Google Scholar 

  • Ting KM, Zhu Y, Carman M, Zhu Y, Zhou Z-H (2016) Overcoming key weaknesses of distance-based neighborhood methods using a data dependent dissimilarity measure. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1205–1214

  • West M, Blanchette C, Dressman H, Huang E, Ishida S, Sprang R, Zuzan H, Olson J, Marks J, Nevins J (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 98:11462–11467

    Article  Google Scholar 

  • Xu P, Brock GN, Parrish RS (2009) Modified linear discriminant analysis approaches for classification of high-dimensional microarray data. Comput Stat Data Anal 53(5):1674–1687

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

I would like to thank two anonymous referees for constructive suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reza Modarres.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof

Lemma 3 Assuming (B1)–(B3) hold, if \(\mathbf{z}\) is from \(\pi _x\), then \(\mathbb {I}_j^\delta (\mathbf{X}_i)\rightarrow 1\) and \(\mathbb {I}_j^\delta (\mathbf{Y}_j)\rightarrow 0\) so that \(T_1(NN_{\delta _0})\rightarrow k>0\) while \(T_2(NN_{\delta _0})\rightarrow 0\) with probability 1 as \(p\rightarrow \infty \). Hence, \(T_1(NN_{\delta _0})>T_2(NN_{\delta _0})\) and \(\mathbf{z}\) will be assigned to \(\pi _x\) with probability 1 as \(p\rightarrow \infty \). Similarly, if \(\mathbf{z}\) is from \(\pi _y\), then \(\mathbb {I}_j^\delta (\mathbf{X}_i)\rightarrow 0\) and \(\mathbb {I}_j^\delta (\mathbf{Y}_j)\rightarrow 1\) so that \(T_1(NN_{\delta _0})\rightarrow 0\) while \(T_2(NN_{\delta _0})\rightarrow k>0\) with probability 1 as \(p\rightarrow \infty \). Hence, \(T_1(NN_{\delta _0})<T_2(NN_{\delta _0})\) and \(\mathbf{z}\) will be assigned to \(\pi _x\) with probability 1 as \(p\rightarrow \infty \). Replacing \(\delta _0\) with \(\rho _0\) in the above argument establishes that as \(p\rightarrow \infty \), the NN method with index \(\rho _0\) has zero misclassification rate as \(p\rightarrow \infty \). \(\square \)

Proof

Lemma 4 Assuming (B1)–(B3) hold, and using comment 1, if \(\mathbf{z}\) is from \(\pi _x\), then \(T_1(\delta _0)\rightarrow 0\) while \(T_2(\delta _0)\rightarrow m\tilde{\delta }_0({\mathbb F},{\mathbb G})>0\) with probability 1 as \(p\rightarrow \infty \). Hence, \(T_1(\delta _0)<T_2(\delta _0)\) and \(\mathbf{z}\) will be assigned to \(\pi _x\) with probability 1 as \(p\rightarrow \infty \) If \(\mathbf{z}\) is from \(\pi _y\), then \(T_1(\delta _0)\rightarrow m\tilde{\delta }_0({\mathbb F},{\mathbb G})>0\) while \(T_2(\delta _0)\rightarrow 0\) with probability 1 as \(p\rightarrow \infty \). Hence, \(T_1(\delta _0)>T_2(\delta _0)\) and \(\mathbf{z}\) will be assigned to \(\pi _y\) with probability 1 as \(p\rightarrow \infty \).

Comment 2 explains the behavior of \(T_1(\rho _0)\). If \(\mathbf{z}\) is from \(\pi _x\), then \(T_1(\rho _0)\rightarrow 0\) while \(T_2(\rho _0)\rightarrow m\tilde{\rho }_0({\mathbb F},{\mathbb G})>0\) with probability 1 as \(p\rightarrow \infty \). Hence, \(T_1(\rho _0)<T_2(\rho _0)\) and \(\mathbf{z}\) will be assigned to \(\pi _x\) with probability 1 as \(p\rightarrow \infty \) If the new observation \(\mathbf{z}\) is from \(\pi _y\), then \(T_1(\rho _0)\rightarrow m\tilde{\rho }_0({\mathbb F},{\mathbb G})>0\) while \(T_2(\rho _0)\rightarrow 0\) with probability 1 as \(p\rightarrow \infty \). Hence, \(T_1(\rho _0)>T_2(\rho _0)\) and \(\mathbf{z}\) will be assigned to \(\pi _y\) with probability 1 as \(p\rightarrow \infty \). \(\square \)

Proof

Lemma 1 We obtain \(T_1(BF_\delta )=2\bar{\delta }_{(x\cup z)y}-\bar{\delta }_{x\cup z}-\bar{\delta }_{y}\). and \(T_2({BF}_\delta )=2\bar{\delta }_{x(y\cup z)}-\bar{\delta }_{x}-\bar{\delta }_{y\cup z}\). Assuming (B1)–(B3) hold, if \(\mathbf{z}\) is from \(\pi _x\) and assigned to \(\pi _x\), then there are \(m+1\) observations in \(x\cup z\) and n observations in the \(\mathbf{Y}\) sample. Hence, using comment 2, \(\bar{\delta }_{x\cup z}\rightarrow 0\), \(\bar{\delta }_{(y)}\rightarrow 0\), and \(\bar{\delta }_{(x\cup z)y}\rightarrow \tilde{\delta }_0({\mathbb F},{\mathbb G})\) as \(p\rightarrow \infty \). Hence, \(T_1({BF}_\delta )\) converges to \(2\tilde{\delta }_0({\mathbb F},{\mathbb G})>0\) in probability as \(p\rightarrow \infty \). If the new observation \(\mathbf{z}\) is from \(\pi _x\) and assigned to \(\pi _y\), then there are m observations in \(\mathbf{X}\) sample and \(n+1\) observations in the \(y\cup z\) sample. Hence, \(\delta _{x}\rightarrow 0\), \(\bar{\delta }_{y\cup z}\rightarrow \frac{2}{n+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\), and \(\bar{\delta }_{x(y\cup z)}\rightarrow \frac{n}{n+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\). Hence, \(T_2({BF}_\delta )\) converges to \(\frac{2(n-1)}{n+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\) in probability as \(p\rightarrow \infty \). Since \(T_1(BF_{\delta _0})>T_2(BF_{\delta _0})\), \(\mathbf{z}\) will be assigned to \(\pi _x\) with probability 1 as \(p\rightarrow \infty \).

Assuming (B1)–(B3) hold, if \(\mathbf{z}\) is from \(\pi _y\) and assigned to \(\pi _x\), then there are \(m+1\) observations in \(x\cup z\) and n observations in the \(\mathbf{Y}\) sample. Hence, \(\bar{\delta }_{x\cup z}\rightarrow \frac{2}{m+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\), \(\bar{\delta }_{(y)}\rightarrow 0\), and \(\bar{\delta }_{(x\cup z)y}\rightarrow \frac{m}{m+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\) as \(p\rightarrow \infty \). Now, \(T_1({BF}_\delta )\) converges to \(\frac{2(m-1)}{m+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})>0\) in probability as \(p\rightarrow \infty \). If \(\mathbf{z}\) is from \(\pi _y\) and assigned to \(\pi _y\), then there are m observations in \(\mathbf{X}\) sample and \(n+1\) observations in the \(y\cup z\) sample. Hence, \(\delta _{x}\rightarrow 0\), \(\bar{\delta }_{y\cup z}\rightarrow 0\), and \(\bar{\delta }_{x(y\cup z)}\rightarrow \frac{n}{n+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\). Hence, \(T_2({BF}_\delta )\) converges to \(2\tilde{\delta }_0({\mathbb F},{\mathbb G})\) in probability as \(p\rightarrow \infty \). Since \(T_1(BF_{\delta _0})<T_2(BF_{\delta _0})\), \(\mathbf{z}\) will be assigned to \(\pi _y\) with probability 1 as \(p\rightarrow \infty \). \(\square \)

Proof

Lemma 2 We obtain \(T_1({BG}_\delta )=(\bar{\delta }_{(x\cup z) y}- \bar{\delta }_{(x\cup z)})^2+(\bar{\delta }_{(x\cup z) y}- \bar{\delta }_{(y)})^2\) and \(T_2({BG}_\delta )=(\bar{\delta }_{x (y\cup z)}- \bar{\delta }_{(y\cup z)})^2+(\bar{\delta }_{x(y\cup z)}- \bar{\delta }_{(x)})^2\). Assuming (B1)–(B3) hold, if \(\mathbf{z}\) is from \(\pi _x\) and assigned to \(\pi _x\), then there are \(m+1\) observations in \(x\cup z\) and n observations in the \(\mathbf{Y}\) sample. Hence, \(\bar{\delta }_{x\cup z}\rightarrow 0\), \(\bar{\delta }_{(y)}\rightarrow 0\), and \(\bar{\delta }_{(x\cup z)y}\rightarrow \tilde{\delta }_0({\mathbb F},{\mathbb G})\) as \(p\rightarrow \infty \). Hence, using comment 2, \(T_1({BF}_\delta )\) converges to \(2\tilde{\delta }^2_0({\mathbb F},{\mathbb G})>0\) in probability as \(p\rightarrow \infty \). If \(\mathbf{z}\) is from \(\pi _x\) and assigned to \(\pi _y\), then there are m observations in \(\mathbf{X}\) sample and \(n+1\) observations in the \(y\cup z\) sample. Hence, \(\delta _{x}\rightarrow 0\), \(\bar{\delta }_{y\cup z}\rightarrow \frac{2}{n+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\), and \(\bar{\delta }_{x(y\cup z)}\rightarrow \frac{n}{n+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\). Hence, \(T_2({BF}_\delta )\) converges to \(\frac{(n-2)^2+n^2}{(n+1)^2}\tilde{\delta }^2_0({\mathbb F},{\mathbb G})\) in probability as \(p\rightarrow \infty \). Since \(T_1(BG_{\delta _0})>T_2(BG_{\delta _0})\), \(\mathbf{z}\) will be assigned to \(\pi _x\) with probability 1 as \(p\rightarrow \infty \).

Assuming (B1)–(B3) hold, if \(\mathbf{z}\) is from \(\pi _y\) and assigned to \(\pi _x\), then there are \(m+1\) observations in \(x\cup z\) and n observations in the \(\mathbf{Y}\) sample. Hence, using comment 1, \(\bar{\delta }_{x\cup z}\rightarrow \frac{2}{m+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\), \(\bar{\delta }_{(y)}\rightarrow 0\), and \(\bar{\delta }_{(x\cup z)y}\rightarrow \frac{m}{m+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\) as \(p\rightarrow \infty \). Hence, \(T_1({BF}_\delta )\) converges to \(\frac{(m-2)^2+m^2}{(m+1)^2}\tilde{\delta }^2_0({\mathbb F},{\mathbb G})>0\) in probability as \(p\rightarrow \infty \). If the new observation \(\mathbf{z}\) is from \(\pi _y\) and assigned to \(\pi _y\), then there are m observations in \(\mathbf{X}\) sample and \(n+1\) observations in the \(y\cup z\) sample. Hence, \(\delta _{x}\rightarrow 0\), \(\bar{\delta }_{y\cup z}\rightarrow 0\), and \(\bar{\delta }_{x(y\cup z)}\rightarrow \frac{m}{m+1}\tilde{\delta }_0({\mathbb F},{\mathbb G})\). Hence, \(T_2({BF}_\delta )\) converges to \(\frac{2m^2}{(m+1)^2}\tilde{\delta }^2_0({\mathbb F},{\mathbb G})\) in probability as \(p\rightarrow \infty \). Since \(T_1(BG_{\delta _0})<T_2(BG_{\delta _0})\), \(\mathbf{z}\) will be assigned to \(\pi _y\) with probability 1 as \(p\rightarrow \infty \). \(\square \)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Modarres, R. Nonparametric classification of high dimensional observations. Stat Papers 64, 1833–1859 (2023). https://doi.org/10.1007/s00362-022-01363-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-022-01363-3

Keywords

Mathematics Subject Classification

Navigation