Abstract
We consider a regularized D-classification rule for high dimensional binary classification, which adapts the linear shrinkage estimator of a covariance matrix as an alternative to the sample covariance matrix in the D-classification rule (D-rule in short). We find an asymptotic expression for misclassification rate of the regularized D-rule, when the sample size n and the dimension p both increase and their ratio p/n approaches a positive constant γ. In addition, we compare its misclassification rate to the standard D-rule under various settings via simulation.
Similar content being viewed by others
References
Anderson, T. W. (2003). An introduction to multivariate statistical analysis. (3rd ed.). New York: Wiley and Sons.
Bandox, T. V., Bruzzone, L., & Gamps-Valls, (2009). Classification of hyperspectral images with regularized linear discriminant analysis. IEEE Transactions on Geoscience and Remote Sensing, 47(3), 862–873.
Bickel, P. J., & Levina, E. (2008a). Regularized estimation of large covariance matrices. The Annals of Statistics, 38(1), 199–227.
Bickel, P. J., & Levina, E. (2008b). Covariance regularization by thresholding. The Annals of Statistics, 36(6), 2577–2604.
Chen, L. S., Paul, D., Prentice, R. L., & Wang, P. (2011). A regularized Hotelling’s T-test for pathway analysis in proteomic studies. Journal of the American Statistical Association, 106, 1345–1360.
Choi, Y.-G., Ng, C. T., & Lim, J. (2017). Regularized LRT for large scale covariance matrices: One sample problem. Journal of Statistical Planning and Inference, 180, 108–123.
El Karoui, N. (2008). Spectrum estimation for large dimensional covariance matrices using random matrix theory. The Annals of Statistics, 36(6), 2757–2790.
Feng, L., & Liu, B. (2017). High-dimensional rank tests for sphericity. Journal of Multivariate Analysis, 155, 217–233.
Friedman, J. H. (1989). Regularized discriminant analysis. Journal of the American Statistical Association, 84(405), 165–175.
Guo, Y., Hastie, T., & Tibshirani, R. (2007). Regularized linear discriminant analysis and its application in microarrays. Biostatistics, 8(1), 86–100.
John, S. (1972). The distribution of a statistic used for testing sphericity of normal distributions. Biometrika, 59, 169–173.
Johnstone, I. M., & Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104(486), 682–693.
Ju, H. K., Chung, H. W., Lee, H.-S., Lim, J., Park, J. H., Lim, S. C., et al. (2013). Investigation of metabolite alteration in dimethylnitrosamine-induced liver fibrosis by GC-MS. Bioanalysis, 5(1), 41–51.
Kim, S., Shin, B.-K., Lim, D. K., Yang, T.-J., Lim, J., Park, J. H., et al. (2015). Expeditious discrimination of four species of the Panax genus using direct infusion MS/MS combined with multivariate statistical analysis. Journal of Chromatography B, 1002, 329–336.
Kubokawa, T., Hyodo, M., & Srivastava, M. (2013). Asymptotic expansion and estimation of EPMC for linear classification rules in high dimension. 115, 496–515.
Lanckriet, G., El Ghaoui, L., Bhattacharyya, C., & Jordan, M. I. (2002). A robust minimax approach to classification. Journal of Machine Learning Research (JMLR), 3, 555–582.
Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88, 365–411.
Lee, D.-K., Yoon, M. H., Kang, Y. P., Yu, J., Park, J. H., Lee, J. M., et al. (2013). Comparison of primary and secondary metabolites for suitability to discriminate the origins of Schisandra chinensis by GC/MS and LC/MS. Food Chemistry, 141(4), 3931–3937.
Lee, S., Lim, J., Son, I., Jung, S.-H., & Park, C.-K. (2015). Two sample test for high-dimensional partially paired data. Journal of Applied Statistics, 42(9), 1946–1961.
Li, Z., & Yao, J. (2016). Testing the sphericity of a covariance matrix when the dimension is much larger than the sample size. Electronic Journal of Statistics, 10, 2973–3010.
Proschan, M. A., & Shaw, P. A. (2011). Asymptotics of Bonferroni for dependent normal test statistics. Statistics & Probability Letters, 81, 739–748.
Pyun, K., Lim, J., & Gray, R. M. (2009). A robust hidden markov gauss mixture vector quantizer for a noisy source. IEEE Transactions on Image Processing, 18(7), 1385–1394.
Saranandasa, H. (1993). Asymptotic expansion of the misclassification probabilities of D- and A- criteria for discrimination from two high dimensional populations using the theory of large dimensional random matrices. Journal of Multivariate Analysis, 46, 154–174.
Schafer, J., & Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1).
Srivastava, M. S., & Kubokawa, T. (2007). Comparison of discrimination methods for high dimensional data. Journal of Japan Statistical Society, 37, 123–134.
Tukey, J. W., & Wilks, S. S. (1946). Approximation of the distribution of the product of Beta variables by a single Beta variable. The Annals of Mathematical Statistics, 17, 318–324.
Won, J.-H., Lim, J., Kim, S.-J., & Rajaratnam, B. (2013). Condition-number-regularized covariance estimation. Journal of the Royal Statistical Society. Series B., 75(3), 427–450.
Yao, J. F., Zheng, S., & Bai, Z. D. (2015). Large sample covariance matrices and high-dimensional data analysis. New York: Cambridge University Press.
Yu, D., Lee, S. J., Lee, W. J., Kim, S. C., Lim, J., & Kwon, S. W. (2015). Classification of spectral data using fused lasso logistic regression. Chemometrics and Intelligent Laboratory Systems, 142, 70–77.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Son, W., Lim, J. & Wang, X. Accuracy of regularized D-rule for binary classification. J. Korean Stat. Soc. 47, 150–160 (2018). https://doi.org/10.1016/j.jkss.2017.11.002
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1016/j.jkss.2017.11.002