Skip to main content
Log in

Accuracy of regularized D-rule for binary classification

  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

We consider a regularized D-classification rule for high dimensional binary classification, which adapts the linear shrinkage estimator of a covariance matrix as an alternative to the sample covariance matrix in the D-classification rule (D-rule in short). We find an asymptotic expression for misclassification rate of the regularized D-rule, when the sample size n and the dimension p both increase and their ratio p/n approaches a positive constant γ. In addition, we compare its misclassification rate to the standard D-rule under various settings via simulation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Anderson, T. W. (2003). An introduction to multivariate statistical analysis. (3rd ed.). New York: Wiley and Sons.

    MATH  Google Scholar 

  • Bandox, T. V., Bruzzone, L., & Gamps-Valls, (2009). Classification of hyperspectral images with regularized linear discriminant analysis. IEEE Transactions on Geoscience and Remote Sensing, 47(3), 862–873.

    Article  Google Scholar 

  • Bickel, P. J., & Levina, E. (2008a). Regularized estimation of large covariance matrices. The Annals of Statistics, 38(1), 199–227.

    Article  MathSciNet  Google Scholar 

  • Bickel, P. J., & Levina, E. (2008b). Covariance regularization by thresholding. The Annals of Statistics, 36(6), 2577–2604.

    Article  MathSciNet  Google Scholar 

  • Chen, L. S., Paul, D., Prentice, R. L., & Wang, P. (2011). A regularized Hotelling’s T-test for pathway analysis in proteomic studies. Journal of the American Statistical Association, 106, 1345–1360.

    Article  MathSciNet  Google Scholar 

  • Choi, Y.-G., Ng, C. T., & Lim, J. (2017). Regularized LRT for large scale covariance matrices: One sample problem. Journal of Statistical Planning and Inference, 180, 108–123.

    Article  MathSciNet  Google Scholar 

  • El Karoui, N. (2008). Spectrum estimation for large dimensional covariance matrices using random matrix theory. The Annals of Statistics, 36(6), 2757–2790.

    Article  MathSciNet  Google Scholar 

  • Feng, L., & Liu, B. (2017). High-dimensional rank tests for sphericity. Journal of Multivariate Analysis, 155, 217–233.

    Article  MathSciNet  Google Scholar 

  • Friedman, J. H. (1989). Regularized discriminant analysis. Journal of the American Statistical Association, 84(405), 165–175.

    Article  MathSciNet  Google Scholar 

  • Guo, Y., Hastie, T., & Tibshirani, R. (2007). Regularized linear discriminant analysis and its application in microarrays. Biostatistics, 8(1), 86–100.

    Article  Google Scholar 

  • John, S. (1972). The distribution of a statistic used for testing sphericity of normal distributions. Biometrika, 59, 169–173.

    Article  MathSciNet  Google Scholar 

  • Johnstone, I. M., & Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. Journal of the American Statistical Association, 104(486), 682–693.

    Article  MathSciNet  Google Scholar 

  • Ju, H. K., Chung, H. W., Lee, H.-S., Lim, J., Park, J. H., Lim, S. C., et al. (2013). Investigation of metabolite alteration in dimethylnitrosamine-induced liver fibrosis by GC-MS. Bioanalysis, 5(1), 41–51.

    Article  Google Scholar 

  • Kim, S., Shin, B.-K., Lim, D. K., Yang, T.-J., Lim, J., Park, J. H., et al. (2015). Expeditious discrimination of four species of the Panax genus using direct infusion MS/MS combined with multivariate statistical analysis. Journal of Chromatography B, 1002, 329–336.

    Article  Google Scholar 

  • Kubokawa, T., Hyodo, M., & Srivastava, M. (2013). Asymptotic expansion and estimation of EPMC for linear classification rules in high dimension. 115, 496–515.

    Google Scholar 

  • Lanckriet, G., El Ghaoui, L., Bhattacharyya, C., & Jordan, M. I. (2002). A robust minimax approach to classification. Journal of Machine Learning Research (JMLR), 3, 555–582.

    MathSciNet  MATH  Google Scholar 

  • Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88, 365–411.

    Article  MathSciNet  Google Scholar 

  • Lee, D.-K., Yoon, M. H., Kang, Y. P., Yu, J., Park, J. H., Lee, J. M., et al. (2013). Comparison of primary and secondary metabolites for suitability to discriminate the origins of Schisandra chinensis by GC/MS and LC/MS. Food Chemistry, 141(4), 3931–3937.

    Article  Google Scholar 

  • Lee, S., Lim, J., Son, I., Jung, S.-H., & Park, C.-K. (2015). Two sample test for high-dimensional partially paired data. Journal of Applied Statistics, 42(9), 1946–1961.

    Article  MathSciNet  Google Scholar 

  • Li, Z., & Yao, J. (2016). Testing the sphericity of a covariance matrix when the dimension is much larger than the sample size. Electronic Journal of Statistics, 10, 2973–3010.

    Article  MathSciNet  Google Scholar 

  • Proschan, M. A., & Shaw, P. A. (2011). Asymptotics of Bonferroni for dependent normal test statistics. Statistics & Probability Letters, 81, 739–748.

    Article  MathSciNet  Google Scholar 

  • Pyun, K., Lim, J., & Gray, R. M. (2009). A robust hidden markov gauss mixture vector quantizer for a noisy source. IEEE Transactions on Image Processing, 18(7), 1385–1394.

    Article  MathSciNet  Google Scholar 

  • Saranandasa, H. (1993). Asymptotic expansion of the misclassification probabilities of D- and A- criteria for discrimination from two high dimensional populations using the theory of large dimensional random matrices. Journal of Multivariate Analysis, 46, 154–174.

    Article  MathSciNet  Google Scholar 

  • Schafer, J., & Strimmer, K. (2005). A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4(1).

    Google Scholar 

  • Srivastava, M. S., & Kubokawa, T. (2007). Comparison of discrimination methods for high dimensional data. Journal of Japan Statistical Society, 37, 123–134.

    Article  MathSciNet  Google Scholar 

  • Tukey, J. W., & Wilks, S. S. (1946). Approximation of the distribution of the product of Beta variables by a single Beta variable. The Annals of Mathematical Statistics, 17, 318–324.

    Article  MathSciNet  Google Scholar 

  • Won, J.-H., Lim, J., Kim, S.-J., & Rajaratnam, B. (2013). Condition-number-regularized covariance estimation. Journal of the Royal Statistical Society. Series B., 75(3), 427–450.

    Article  MathSciNet  Google Scholar 

  • Yao, J. F., Zheng, S., & Bai, Z. D. (2015). Large sample covariance matrices and high-dimensional data analysis. New York: Cambridge University Press.

    Book  Google Scholar 

  • Yu, D., Lee, S. J., Lee, W. J., Kim, S. C., Lim, J., & Kwon, S. W. (2015). Classification of spectral data using fused lasso logistic regression. Chemometrics and Intelligent Laboratory Systems, 142, 70–77.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Johan Lim.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Son, W., Lim, J. & Wang, X. Accuracy of regularized D-rule for binary classification. J. Korean Stat. Soc. 47, 150–160 (2018). https://doi.org/10.1016/j.jkss.2017.11.002

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1016/j.jkss.2017.11.002

AMS 2000 subject classifications

Keywords

Navigation