Advertisement

Statistics and Computing

, Volume 27, Issue 2, pp 535–545 | Cite as

Robust rank screening for ultrahigh dimensional discriminant analysis

  • Guosheng Cheng
  • Xingxiang Li
  • Peng LaiEmail author
  • Fengli Song
  • Jun Yu
Article

Abstract

In this paper, we consider sure independence feature screening for ultrahigh dimensional discriminant analysis. We propose a new method named robust rank screening based on the conditional expectation of the rank of predictor’s samples. We also establish the sure screening property for the proposed procedure under simple assumptions. The new procedure has some additional desirable characters. First, it is robust against heavy-tailed distributions, potential outliers and the sample shortage for some categories. Second, it is model-free without any specification of a regression model and directly applicable to the situation with many categories. Third, it is simple in theoretical derivation due to the boundedness of the resulting statistics. Forth, it is relatively inexpensive in computational cost because of the simple structure of the screening index. Monte Carlo simulations and real data examples are used to demonstrate the finite sample performance.

Keywords

Feature screening Robust property of rank Sure screening property Ultrahigh dimensional discriminant analysis 

Notes

Acknowledgments

The authors thank the editor and two referees for their valuable comments and suggestions. Peng Lai’s research was supported by National Natural Science Foundation of China (Grant No. 11301279). Fengli Song’s research was supported by Natural Science Foundation of Jiangsu Province for Youth (Grant No. BK20140983).

References

  1. Barrett, T., Suzek, T.O., Troup, D.B., Wilhite, S.E., Ngau, W.-C., Ledoux, P., Rudnev, D., Lash, A.E., Fujibuchi, W., Edgar, R.: NCBI GEO: mining millions of expression profiles database and tools. Nucleic Acids Res. 33, D562–D566 (2005)CrossRefGoogle Scholar
  2. Bickel, P.J., Levina, E.: Some theory for Fisher’s linear discriminant function’, naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10, 989–1010 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  3. Clemmensen, L., Hastie, T., Witten, D., Ersbøll, B.: Sparse discriminant analysis. Technometrics 53, 406–415 (2011)MathSciNetCrossRefGoogle Scholar
  4. Cui, H., Li, R., Zhong, W. : Model-free feature screening for ultrahigh dimensional discriminant analysis. J. Am. Stat. Assoc. (2014)Google Scholar
  5. Fan, J., Fan, Y.: High dimensional classification using features annealed independence rules. Ann. Stat. 36, 2605–2637 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  6. Fan, J., Feng, Y., Song, R.: Nonparametric independence screening in sparse ultra-high-dimensional additive models. J. Am. Stat. Assoc. 106, 544–557 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  7. Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 70, 849–911 (2008)MathSciNetCrossRefGoogle Scholar
  8. Fan, J., Samworth, R., Wu, Y.: Ultrahigh dimensional feature selection: beyond the linear model. J. Mach. Learn. Res. 10, 1829–1853 (2009)MathSciNetzbMATHGoogle Scholar
  9. Fan, J., Song, R.: Sure independence screening in generalized linear models with NP-dimensionality. Ann. Stat. 38, 3567–3604 (2010)Google Scholar
  10. Gordon, G.J., Jensen, R.V., Hsiao, L.-L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62, 4963–4967 (2002)Google Scholar
  11. He, X., Wang, L., Hong, H.G., et al.: Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann. Stat. 41, 342–369 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  12. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)MathSciNetCrossRefzbMATHGoogle Scholar
  13. Li, G., Peng, H., Zhang, J., Zhu, L., et al.: Robust rank correlation based screening. Ann. Stat. 40, 1846–1877 (2012a)MathSciNetCrossRefzbMATHGoogle Scholar
  14. Li, R., Zhong, W., Zhu, L.: Feature screening via distance correlation learning. J. Am. Stat. Assoc. 107, 1129–1139 (2012b)MathSciNetCrossRefzbMATHGoogle Scholar
  15. Mai, Q., Zou, H.: The Kolmogorov filter for variable screening in high-dimensional binary classification. Biometrika 100, 229–234 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  16. Mai, Q., Zou, H.: The Fused Kolmogorov Filter: A Nonparametric Model-Free Screening Method, arXiv preprint arXiv:1403.7701 (2014)
  17. Nakayama, R., Nemoto, T., Takahashi, H., Ohta, T., Kawai, A., Seki, K., Yoshida, T., Toyama, Y., Ichikawa, H., Hasegawa, T.: Gene expression analysis of soft tissue sarcomas: characterization and reclassification of malignant fibrous histiocytoma. Modern Pathol. 20, 749–759 (2007)CrossRefGoogle Scholar
  18. Pan, R., Wang, H., Li, R.: Ultrahigh dimensional multi-class linear discriminant analysis by pairwise sure independence screening. J. Am. Stat. Assoc. (2015)Google Scholar
  19. Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. 99, 6567–6572 (2002)CrossRefGoogle Scholar
  20. Wang, H.: Forward regression for ultra-high dimensional variable screening. J. Am. Stat. Assoc. 104, 1512–1524 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  21. Witten, D.M., Tibshirani, R.: Penalized classification using Fisher’s linear discriminant. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 73, 753–772 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  22. Zhu, L.-P., Li, L., Li, R., Zhu, L.-X.: Model-free feature screening for ultrahigh-dimensional data. J. Am. Stat. Assoc. 106, 1464–1475 (2011)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Guosheng Cheng
    • 1
  • Xingxiang Li
    • 1
  • Peng Lai
    • 1
    Email author
  • Fengli Song
    • 1
  • Jun Yu
    • 2
  1. 1.School of Mathematics and StatisticsNanjing University of Information Science & TechnologyNanjingChina
  2. 2.Department of Mathematics and StatisticsUniversity of VermontBurlingtonUSA

Personalised recommendations