Proximity-Graph Instance-Based Learning, Support Vector Machines, and High Dimensionality: An Empirical Comparison

Toussaint, Godfried T.; Berzan, Constantin

doi:10.1007/978-3-642-31537-4_18

Godfried T. Toussaint²⁰ &
Constantin Berzan²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7376))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

5922 Accesses
6 Citations

Abstract

Previous experiments with low dimensional data sets have shown that Gabriel graph methods for instance-based learning are among the best machine learning algorithms for pattern classification applications. However, as the dimensionality of the data grows large, all data points in the training set tend to become Gabriel neighbors of each other, bringing the efficacy of this method into question. Indeed, it has been conjectured that for high-dimensional data, proximity graph methods that use sparser graphs, such as relative neighbor graphs (RNG) and minimum spanning trees (MST) would have to be employed in order to maintain their privileged status. Here the performance of proximity graph methods, in instance-based learning, that employ Gabriel graphs, relative neighborhood graphs, and minimum spanning trees, are compared experimentally on high-dimensional data sets. These methods are also compared empirically against the traditional k-NN rule and support vector machines (SVMs), the leading competitors of proximity graph methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brighton, H., Mellish, C.S.: Advances in Instance Selection for Instance Based Learning Algorithms. Data Mining and Knowledge Discovery 6, 153–172 (2002)
Article MathSciNet MATH Google Scholar
Bhattacharya, B., Mukherjee, K., Toussaint, G.T.: Geometric Decision Rules for Instance-Based Learning Problems. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds.) PReMI 2005. LNCS, vol. 3776, pp. 60–69. Springer, Heidelberg (2005)
Chapter Google Scholar
Bhattacharya, B., Mukherjee, K., Toussaint, G.T.: Geometric Decision Rules for High Dimensions. In: Proc. 55th Session of the International Statistics Institute, Sydney, Australia, April 5-12 (2005)
Google Scholar
Cover, T.M., Hart, P.E.: Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory 13, 21–27 (1967)
Article MATH Google Scholar
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20, September 1-25 (1995)
Google Scholar
Devroye, L.: The Exptected Size of Some Graphs in Computational Geometry. Computers and Mathematics with Applications 15, 53–64 (1988)
Article MathSciNet MATH Google Scholar
Devroye, L.: On the Inequality of Cover and Hart in Nearest Neighbor Discrimination. IEEE Transactions on Pattern Analysis and Machine Intelligence 3, 75–78 (1981)
Article MATH Google Scholar
Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer (1996)
Google Scholar
Duan, K.-B., Keerthi, S.S.: Which Is the Best Multiclass SVM Method? An Empirical Study. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 278–285. Springer, Heidelberg (2005)
Chapter Google Scholar
Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2010), http://archive.ics.uci.edu/ml
Google Scholar
Gomez, E., Herrera, P.: Comparative Analysis of Music Recordings from Western and Non-Western traditions by Automatic Tonal Feature Extraction. Empirical Musicology Review 3 (2008)
Google Scholar
Hart, P.E.: The Condensed Nearest Neighbor Rule. IEEE Transactions on In-formation Theory 14, 515–516 (1968)
Article Google Scholar
Houle, M.: SASH: A Spatial Approximation Sample Hierarchy for Similarity Search. Tech. Report RT-0517, IBM Tokyo Research Lab (2003)
Google Scholar
Jaromczyk, J.W., Toussaint, G.T.: Relative Neighborhood Graphs and their Relatives. Proceedings of the IEEE 80, 1502–1517 (1992)
Article Google Scholar
Kirkpatrick, D.G., Radke, J.D.: A Framework for Computational Morphology. In: Toussaint, G.T. (ed.) Computational Geometry, pp. 217–248. North Holland, Amsterdam (1985)
Google Scholar
Merz, C.J., Murphy, P.M.: UCI Repository of Machine Learning Database, Department of Information and Computer Science, University of California, Internet, http://www.ics.uci.edu/mlearn/MLRepository.html
Narasimhan, G., Zhu, J., Zachariasen, M.: Experiments with Computing Geometric Minimum Spanning Trees. In: Proceedings of Algorithm Engineering and Experiments (ALENEX 2000). LNCS, pp. 183–196. Springer, Heidelberg (2000)
Google Scholar
Oliver, L.H., Poulsen, R.S., Toussaint, G.T.: Estimating False Positive and False Negative Error Rates in Cervical Cell Classification. J. Histochemistry and Cytochemistry 25, 696–701 (1977)
Article Google Scholar
Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schoelkopf, B., et al. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press (1988)
Google Scholar
Toussaint, G.T.: Geometric Proximity Graphs for Improving Nearest Neighbor Methods in Instance-Based Learning and Data Mining. International J. Computational Geometry and Applications 15, 101–150 (2005)
Article MathSciNet MATH Google Scholar
Toussaint, G.T.: The Relative Neighborhood Graph of a Finite Planar Set. Pattern Recognition 12, 261–268 (1980)
Article MathSciNet MATH Google Scholar
Sánchez, J.S., Pla, F., Ferri, F.J.: Prototype Selection for the Nearest Neighbor Rule through Proximity Graphs. Pattern Recognition Letters 18, 507–513 (1997)
Article Google Scholar
Toussaint, G.T., Poulsen, R.S.: Some New Algorithms and Software Implementation Methods for Pattern Recognition Research. In: Proc. Third International Computer Software and Applications Conference, pp. 55–63. IEEE Computer Society (1979)
Google Scholar
Toussaint, G.T., Bhattacharya, B.K., Poulsen, R.S.: The Application of Voronoi Diagrams to Nonparametric Decision Rules. In: Proc. Computer Science and Statistics: 16th Symposium on the Interface, pp. 97–108. North-Holland, Amsterdam (1985)
Google Scholar
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
MATH Google Scholar
Wilson, D.L.: Asymptotic Properties of Nearest Neighbor Rules Using Edited-Data. IEEE Transactions on Systems, Man, and Cybernetics 2, 408–421 (1973)
Article Google Scholar
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning 38, 257–286 (2000)
Article MATH Google Scholar
Zhang, W., King, I.: A Study of the Relationship Between Support Vector Machine and Gabriel Graph. In: Proc. IEEE International Joint Conference on Neural Networks, IJCNN 2002, Honolulu, vol. 1, pp. 239–244 (2002)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Science, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
Godfried T. Toussaint
Department of Computer Science, Tufts University, Medford, MA, 02155, USA
Constantin Berzan

Authors

Godfried T. Toussaint
View author publications
You can also search for this author in PubMed Google Scholar
Constantin Berzan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, IBaI, Kohlenstraße 2, 04107, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Toussaint, G.T., Berzan, C. (2012). Proximity-Graph Instance-Based Learning, Support Vector Machines, and High Dimensionality: An Empirical Comparison. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-31537-4_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31536-7
Online ISBN: 978-3-642-31537-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics