Abstract
Previous experiments with low dimensional data sets have shown that Gabriel graph methods for instance-based learning are among the best machine learning algorithms for pattern classification applications. However, as the dimensionality of the data grows large, all data points in the training set tend to become Gabriel neighbors of each other, bringing the efficacy of this method into question. Indeed, it has been conjectured that for high-dimensional data, proximity graph methods that use sparser graphs, such as relative neighbor graphs (RNG) and minimum spanning trees (MST) would have to be employed in order to maintain their privileged status. Here the performance of proximity graph methods, in instance-based learning, that employ Gabriel graphs, relative neighborhood graphs, and minimum spanning trees, are compared experimentally on high-dimensional data sets. These methods are also compared empirically against the traditional k-NN rule and support vector machines (SVMs), the leading competitors of proximity graph methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brighton, H., Mellish, C.S.: Advances in Instance Selection for Instance Based Learning Algorithms. Data Mining and Knowledge Discovery 6, 153–172 (2002)
Bhattacharya, B., Mukherjee, K., Toussaint, G.T.: Geometric Decision Rules for Instance-Based Learning Problems. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds.) PReMI 2005. LNCS, vol. 3776, pp. 60–69. Springer, Heidelberg (2005)
Bhattacharya, B., Mukherjee, K., Toussaint, G.T.: Geometric Decision Rules for High Dimensions. In: Proc. 55th Session of the International Statistics Institute, Sydney, Australia, April 5-12 (2005)
Cover, T.M., Hart, P.E.: Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory 13, 21–27 (1967)
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20, September 1-25 (1995)
Devroye, L.: The Exptected Size of Some Graphs in Computational Geometry. Computers and Mathematics with Applications 15, 53–64 (1988)
Devroye, L.: On the Inequality of Cover and Hart in Nearest Neighbor Discrimination. IEEE Transactions on Pattern Analysis and Machine Intelligence 3, 75–78 (1981)
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer (1996)
Duan, K.-B., Keerthi, S.S.: Which Is the Best Multiclass SVM Method? An Empirical Study. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 278–285. Springer, Heidelberg (2005)
Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2010), http://archive.ics.uci.edu/ml
Gomez, E., Herrera, P.: Comparative Analysis of Music Recordings from Western and Non-Western traditions by Automatic Tonal Feature Extraction. Empirical Musicology Review 3 (2008)
Hart, P.E.: The Condensed Nearest Neighbor Rule. IEEE Transactions on In-formation Theory 14, 515–516 (1968)
Houle, M.: SASH: A Spatial Approximation Sample Hierarchy for Similarity Search. Tech. Report RT-0517, IBM Tokyo Research Lab (2003)
Jaromczyk, J.W., Toussaint, G.T.: Relative Neighborhood Graphs and their Relatives. Proceedings of the IEEE 80, 1502–1517 (1992)
Kirkpatrick, D.G., Radke, J.D.: A Framework for Computational Morphology. In: Toussaint, G.T. (ed.) Computational Geometry, pp. 217–248. North Holland, Amsterdam (1985)
Merz, C.J., Murphy, P.M.: UCI Repository of Machine Learning Database, Department of Information and Computer Science, University of California, Internet, http://www.ics.uci.edu/mlearn/MLRepository.html
Narasimhan, G., Zhu, J., Zachariasen, M.: Experiments with Computing Geometric Minimum Spanning Trees. In: Proceedings of Algorithm Engineering and Experiments (ALENEX 2000). LNCS, pp. 183–196. Springer, Heidelberg (2000)
Oliver, L.H., Poulsen, R.S., Toussaint, G.T.: Estimating False Positive and False Negative Error Rates in Cervical Cell Classification. J. Histochemistry and Cytochemistry 25, 696–701 (1977)
Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schoelkopf, B., et al. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press (1988)
Toussaint, G.T.: Geometric Proximity Graphs for Improving Nearest Neighbor Methods in Instance-Based Learning and Data Mining. International J. Computational Geometry and Applications 15, 101–150 (2005)
Toussaint, G.T.: The Relative Neighborhood Graph of a Finite Planar Set. Pattern Recognition 12, 261–268 (1980)
Sánchez, J.S., Pla, F., Ferri, F.J.: Prototype Selection for the Nearest Neighbor Rule through Proximity Graphs. Pattern Recognition Letters 18, 507–513 (1997)
Toussaint, G.T., Poulsen, R.S.: Some New Algorithms and Software Implementation Methods for Pattern Recognition Research. In: Proc. Third International Computer Software and Applications Conference, pp. 55–63. IEEE Computer Society (1979)
Toussaint, G.T., Bhattacharya, B.K., Poulsen, R.S.: The Application of Voronoi Diagrams to Nonparametric Decision Rules. In: Proc. Computer Science and Statistics: 16th Symposium on the Interface, pp. 97–108. North-Holland, Amsterdam (1985)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Wilson, D.L.: Asymptotic Properties of Nearest Neighbor Rules Using Edited-Data. IEEE Transactions on Systems, Man, and Cybernetics 2, 408–421 (1973)
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning 38, 257–286 (2000)
Zhang, W., King, I.: A Study of the Relationship Between Support Vector Machine and Gabriel Graph. In: Proc. IEEE International Joint Conference on Neural Networks, IJCNN 2002, Honolulu, vol. 1, pp. 239–244 (2002)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Toussaint, G.T., Berzan, C. (2012). Proximity-Graph Instance-Based Learning, Support Vector Machines, and High Dimensionality: An Empirical Comparison. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-31537-4_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31536-7
Online ISBN: 978-3-642-31537-4
eBook Packages: Computer ScienceComputer Science (R0)