Fast and Robust Supervised Learning in High Dimensions Using the Geometry of the Data
We develop a method for tracing out the shape of a cloud of sample observations, in arbitrary dimensions, called the data cloud wrapper (DCW). The DCW have strong theoretical properties, have algorithmic scalability and parallel computational features. We further use the DCW to develop a new fast, robust and accurate classification method in high dimensions, called the geometric learning algorithm (GLA). Two of the main features of the proposed algorithm are that there are no assumptions made about the geometric properties of the underlying data generating distribution, and that there are no parametric or other restrictive assumptions made either for the data or the algorithm. The proposed methods are typically faster and more robust than established classification techniques, while being comparably accurate in most cases.
KeywordsFeature Selection Random Forest Supervise Learning Data Cloud Quadratic Discriminant Analysis
This research is partially supported by NSF grant # IIS-1029711, NASA grant #-1502546) the Institute on the Environment (IonE), and College of Liberal Arts (CLA) at the University of Minnesota.