Skip to main content

Fast and Robust Supervised Learning in High Dimensions Using the Geometry of the Data

  • 1387 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 9165)

Abstract

We develop a method for tracing out the shape of a cloud of sample observations, in arbitrary dimensions, called the data cloud wrapper (DCW). The DCW have strong theoretical properties, have algorithmic scalability and parallel computational features. We further use the DCW to develop a new fast, robust and accurate classification method in high dimensions, called the geometric learning algorithm (GLA). Two of the main features of the proposed algorithm are that there are no assumptions made about the geometric properties of the underlying data generating distribution, and that there are no parametric or other restrictive assumptions made either for the data or the algorithm. The proposed methods are typically faster and more robust than established classification techniques, while being comparably accurate in most cases.

Keywords

  • Feature Selection
  • Random Forest
  • Supervise Learning
  • Data Cloud
  • Quadratic Discriminant Analysis

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alon, A., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96, 6745–6750 (1999)

    CrossRef  Google Scholar 

  2. Bache, K., Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

  3. Chaudhuri, P.: On a geometric notion of quantiles for multivariate data. J. Am. Stat. Assoc. 91, 862–872 (1996)

    CrossRef  MATH  Google Scholar 

  4. Ferguson, T.S.: Mathematical Statistics. A Decision Theoretic Approach. Academic Press, New York (1967)

    MATH  Google Scholar 

  5. Guyon, I., et al.: Feature selection with the CLOP package. Technical report (2006)

    Google Scholar 

  6. Guyon, I., et al.: Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark. Pattern Recogn. Lett. 28, 1438–1444 (2007)

    CrossRef  Google Scholar 

  7. Haldane, J.B.S.: Note on the median of a multivariate distribution. Biometrika 35, 414–415 (1948)

    CrossRef  MATH  MathSciNet  Google Scholar 

  8. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009)

    CrossRef  Google Scholar 

  9. Mukhopadhyay, N., Chatterjee, S.B.: High dimensional data analysis using multivariate generalized spatial quantiles. J. Mult. Anal. 102–4, 768–780 (2011)

    CrossRef  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research is partially supported by NSF grant # IIS-1029711, NASA grant #-1502546) the Institute on the Environment (IonE), and College of Liberal Arts (CLA) at the University of Minnesota.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Snigdhansu Chatterjee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Mukherjee, U.K., Majumdar, S., Chatterjee, S. (2015). Fast and Robust Supervised Learning in High Dimensions Using the Geometry of the Data. In: Perner, P. (eds) Advances in Data Mining: Applications and Theoretical Aspects. ICDM 2015. Lecture Notes in Computer Science(), vol 9165. Springer, Cham. https://doi.org/10.1007/978-3-319-20910-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20910-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20909-8

  • Online ISBN: 978-3-319-20910-4

  • eBook Packages: Computer ScienceComputer Science (R0)