A Quadratic Classifier for High-Dimension, Low-Sample-Size Data Under the Strongly Spiked Eigenvalue Model
- 655 Downloads
Abstract
We consider a classifier for high-dimensional data under the strongly spiked eigenvalue (SSE) model. We create a new classification procedure on the basis of the high-dimensional eigenstructure. We propose a quadratic classification procedure by using a data transformation. We also prove that our proposed classification procedure has a consistency property for misclassification rates. We discuss performances of our classification procedure in simulations and real data analyses using microarray data sets.
Keywords
Classification Eigenstructure Geometrical quadratic discriminant analysis HDLSS Noise reduction methodology SSE modelNotes
Acknowledgements
We would like to thank an anonymous referee for his/her kind comments. Research of the first author was partially supported by Grant-in-Aid for Young Scientists, Japan Society for the Promotion of Science (JSPS), under Contract Number 18K18015. Research of the second author was partially supported by Grant-in-Aid for Scientific Research (C), JSPS, under Contract Number 18K03409. Research of the third author was partially supported by Grants-in-Aid for Scientific Research (A) and Challenging Research (Exploratory), JSPS, under Contract Numbers 15H01678 and 17K19956.
References
- 1.Aoshima, M., Yata, K.: Two-stage procedures for high-dimensional data. Seq. Anal. (Editor’s special invited paper) 30, 356–399 (2011)MathSciNetCrossRefGoogle Scholar
- 2.Aoshima, M., Yata, K.: A distance-based, misclassification rate adjusted classifier for multiclass, high-dimensional data. Ann. Inst. Stat. Math. 66, 983–1010 (2014)MathSciNetCrossRefGoogle Scholar
- 3.Aoshima, M., Yata, K.: Geometric classifier for multiclass, high-dimensional data. Seq. Anal. (Special Issue: Celebrating Seventy Years of Charles Stein’s 1945 Seminal Paper on Two-Stage Sampling) 34, 279–294 (2015)MathSciNetCrossRefGoogle Scholar
- 4.Aoshima, M., Yata, K.: Two-sample tests for high-dimension, strongly spiked eigenvalue models. Stat. Sin. 28, 43–62 (2018)Google Scholar
- 5.Aoshima, M., Yata, K.: High-dimensional quadratic classifiers in non-sparse settings. Methodol. Comput. Appl. Probab. (2018). https://doi.org/10.1007/s11009-018-9646-z
- 6.Aoshima, M., Yata, K.: Distance-based classifier by data transformation for high-dimension, strongly spiked eigenvalue models. Ann. Inst. Stat. Math. 71, 473–503 (2019). https://doi.org/10.1007/s10463-018-0655-z
- 7.Bai, Z., Saranadasa, H.: Effect of high dimension: by an example of a two sample problem. Stat. Sin. 6, 311–329 (1996)MathSciNetzbMATHGoogle Scholar
- 8.Bickel, P.J., Levina, E.: Some theory for Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10, 989–1010 (2004)MathSciNetCrossRefGoogle Scholar
- 9.Borovecki, F., Lovrecic, L., Zhou, J., Jeong, H., Then, F., Rosas, H.D., Hersch, S.M., Hogarth, P., Bouzou, B., Jensen, R.V., Krainc, D.: Genome-wide expression profiling of human blood reveals biomarkers for Huntington’s disease. Proc. Natl. Acad. Sci. U.S.A. 102, 11023–11028 (2005)CrossRefGoogle Scholar
- 10.Chan, Y.-B., Hall, P.: Scale adjustments for classifiers in high-dimensional, low sample size settings. Biometrika 96, 469–478 (2009)MathSciNetCrossRefGoogle Scholar
- 11.Chen, S.X., Qin, Y.-L.: A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Stat. 38, 808–835 (2010)MathSciNetCrossRefGoogle Scholar
- 12.Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97, 77–87 (2002)MathSciNetCrossRefGoogle Scholar
- 13.Ishii, A.: A classifier under the strongly spiked eigenvalue model in high-dimension, low-sample-size context. Commun. Stat. Theory Methods (2019)Google Scholar
- 14.Ishii, A., Yata, K., Aoshima, M.: Asymptotic properties of the first principal component and equality tests of covariance matrices in high-dimension, low-sample-size context. J. Stat. Plan. Inference 170, 186–199 (2016)MathSciNetCrossRefGoogle Scholar
- 15.Ishii, A., Yata, K., Aoshima, M.: Equality tests of high-dimensional covariance matrices under the strongly spiked eigenvalue model. J. Stat. Plan. Inference 202, 99–111 (2019). https://doi.org/10.1016/j.jspi.2019.02.002
- 16.Jung, S., Marron, J.S.: PCA consistency in high dimension, low sample size context. Ann. Stat. 37, 4104–4130 (2009)MathSciNetCrossRefGoogle Scholar
- 17.Shen, D., Shen, H., Zhu, H., Marron, J.S.: The statistics and mathematics of high dimension low sample size asymptotics. Stat. Sin. 26, 1747–1770 (2016)MathSciNetzbMATHGoogle Scholar
- 18.Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)CrossRefGoogle Scholar
- 19.Srivastava, M.S.: Minimum distance classification rules for high dimensional data. J. Multivar. Anal. 97, 2057–2070 (2006)MathSciNetCrossRefGoogle Scholar
- 20.Yata, K., Aoshima, M.: Effective PCA for high-dimension, low-sample-size data with noise reduction via geometric representations. J. Multivar. Anal. 105, 193–215 (2012)MathSciNetCrossRefGoogle Scholar
- 21.Yata, K., Aoshima, M.: PCA consistency for the power spiked model in high-dimensional settings. J. Multivar. Anal. 122, 334–354 (2013)MathSciNetCrossRefGoogle Scholar