Abstract
We present a novel method to tackle the multi-class classification problem with sparse grids and show how the computational procedure can be split into an Offline phase (pre-processing) and a very rapid Online phase. For each class of the training data the underlying probability density function is estimated on a sparse grid. The class of a new data point is determined by the values of the density functions at this point. Our classification method can deal with more than two classes in a natural way and it provides a stochastically motivated confidence value which indicates how to rate the respond to a new point. Furthermore, the underlying density estimation method allows us to pre-compute the system matrix and store it in an appropriate format. This so-called Offline/Online splitting of the computational procedure allows an Online phase where only a few matrix-vector products are necessary to learn a new, previously unseen training data set. In particular, we do not have to solve a system of linear equations anymore. We show that speed ups by a factor of several hundred are possible. A typical application for such an Offline/Online splitting is cross validation. We present the algorithm and the computational procedure for our classification method, report on the employed density estimation method on sparse grids and show by means of artificial and real-world data sets that we obtain competitive results compared to the classical sparse grid classification method based on regression.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Azzalini, N. Torelli, Clustering via nonparametric density estimation. Stat. Comput. 17(1), 71–80 (2007)
B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom, Models and issues in data stream systems, in Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (ACM, New York, 2002)
C. Bishop, Pattern Recognition and Machine Learning (Springer, Berlin, 2007)
H.-J. Bungartz, M. Griebel, Sparse grids. Acta Numer. 13, 1–123 (2004)
R. Fisher, The use of multiple measurements in taxonomic problems. Ann. Hum. Genet. 7(2), 179–188 (1936)
F. Franzelin, Classification with estimated densities on sparse grids. Master’s thesis, Institut für Informatik, Technische Universität München, 2011
M. Gaber, A. Zaslavsky, S. Krishnaswamy, Mining data streams: a review. SIGMOD Rec. 34(2), 18–26 (2005)
J. Gama, Knowledge Discovery from Data Streams (Chapman & Hall, London, 2010)
J. Garcke, Maschinelles Lernen durch Funktionsrekonstruktion mit verallgemeinerten dünnen Gittern. Doktorarbeit, Institut für Numerische Simulation, Universität Bonn, 2004
J. Garcke, M. Griebel, Classification with sparse grids using simplicial basis functions. Intell. Data Anal. 6(6), 483–502 (2002)
J. Garcke, M. Griebel, M. Thess, Data mining with sparse grids. Computing 67(3), 225–253 (2001)
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning (Springer, Berlin, 2009)
M. Hegland, G. Hooker, S. Roberts, Finite element thin plate splines in density estimation. ANZIAM J. 42, C712–C734 (2000)
A. Heinecke, B. Peherstorfer, D. Pflüger, Z. Song, Sparse grid classifiers as base learners for AdaBoost, in 2012 International Conference on High Performance Computing and Simulation (HPCS), Madrid (IEEE, New York, 2012)
C.-W. Hsu, C.-C. Chang, C.-J. Lin, A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University, 2003
T. Jebara, Machine Learning. Discriminative and Generative (Kluwer, Dordecht, 2004)
K. Murphy, The Machine Learning: A Probabilistic Perspective (MIT Press, Cambridge, 2012)
A. Patera, G. Rozza, Reduced Basis Approximation and A Posteriori Error Estimation for Parametrized Partial Differential Equations. MIT Pappalardo Graduate Monographs in Mechanical Engineering (MIT, 2007)
B. Peherstorfer, D. Pflüger, H.-J. Bungartz, Clustering based on density estimation with sparse grids, in KI 2012: Advances in Artificial Intelligence, ed. by B. Glimm, A. Krüger. Lecture Notes in Computer Science, vol. 7526 (Springer, Berlin, 2012), pp. 131–142
D. Pflüger, Data Mining mit Dünnen Gittern. Diplomarbeit, IPVS, Universität Stuttgart, 2005
D. Pflüger, Spatially Adaptive Sparse Grids for High-Dimensional Problems (Verlag Dr. Hut, München, 2010)
D. Pflüger, Spatially adaptive refinement, in Sparse Grids and Applications, ed. by J. Garcke, M. Griebel. Lecture Notes in Computational Science and Engineering, vol. 88 (Springer, Berlin, 2013)
D. Pflüger, B. Peherstorfer, H.-J. Bungartz, Spatially adaptive sparse grids for high-dimensional data-driven problems. J. Complex. 26(5), 508–522 (2010)
A. Uzilov, J. Keegan, D. Mathews, Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics 7(1), 173 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Peherstorfer, B., Franzelin, F., Pflüger, D., Bungartz, HJ. (2014). Classification with Probability Density Estimation on Sparse Grids. In: Garcke, J., Pflüger, D. (eds) Sparse Grids and Applications - Munich 2012. Lecture Notes in Computational Science and Engineering, vol 97. Springer, Cham. https://doi.org/10.1007/978-3-319-04537-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-04537-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04536-8
Online ISBN: 978-3-319-04537-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)