Fast and Robust Supervised Learning in High Dimensions Using the Geometry of the Data

  • Ujjal Kumar Mukherjee
  • Subhabrata Majumdar
  • Snigdhansu Chatterjee
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9165)

Abstract

We develop a method for tracing out the shape of a cloud of sample observations, in arbitrary dimensions, called the data cloud wrapper (DCW). The DCW have strong theoretical properties, have algorithmic scalability and parallel computational features. We further use the DCW to develop a new fast, robust and accurate classification method in high dimensions, called the geometric learning algorithm (GLA). Two of the main features of the proposed algorithm are that there are no assumptions made about the geometric properties of the underlying data generating distribution, and that there are no parametric or other restrictive assumptions made either for the data or the algorithm. The proposed methods are typically faster and more robust than established classification techniques, while being comparably accurate in most cases.

References

  1. 1.
    Alon, A., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96, 6745–6750 (1999)CrossRefGoogle Scholar
  2. 2.
    Bache, K., Lichman, M.: UCI machine learning repository (2013)Google Scholar
  3. 3.
    Chaudhuri, P.: On a geometric notion of quantiles for multivariate data. J. Am. Stat. Assoc. 91, 862–872 (1996)MATHCrossRefGoogle Scholar
  4. 4.
    Ferguson, T.S.: Mathematical Statistics. A Decision Theoretic Approach. Academic Press, New York (1967)MATHGoogle Scholar
  5. 5.
    Guyon, I., et al.: Feature selection with the CLOP package. Technical report (2006)Google Scholar
  6. 6.
    Guyon, I., et al.: Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark. Pattern Recogn. Lett. 28, 1438–1444 (2007)CrossRefGoogle Scholar
  7. 7.
    Haldane, J.B.S.: Note on the median of a multivariate distribution. Biometrika 35, 414–415 (1948)MATHMathSciNetCrossRefGoogle Scholar
  8. 8.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009)CrossRefGoogle Scholar
  9. 9.
    Mukhopadhyay, N., Chatterjee, S.B.: High dimensional data analysis using multivariate generalized spatial quantiles. J. Mult. Anal. 102–4, 768–780 (2011)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Ujjal Kumar Mukherjee
    • 1
  • Subhabrata Majumdar
    • 1
  • Snigdhansu Chatterjee
    • 1
  1. 1.University of MinnesotaMinneapolisUSA

Personalised recommendations