Data Selection Using SASH Trees for Support Vector Machines

Sun, Chaofan; Vilalta, Ricardo

doi:10.1007/978-3-540-73499-4_22

Chaofan Sun¹ &
Ricardo Vilalta²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4571))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

3631 Accesses
1 Citations

Abstract

This paper presents a data preprocessing procedure to select support vector (SV) candidates. We select decision boundary region vectors (BRVs) as SV candidates. Without the need to use the decision boundary, BRVs can be selected based on a vector’s nearest neighbor of opposite class (NNO). To speed up the process, two spatial approximation sample hierarchical (SASH) trees are used for estimating the BRVs. Empirical results show that our data selection procedure can reduce a full dataset to the number of SVs or only slightly higher. Training with the selected subset gives performance comparable to that of the full dataset. For large datasets, overall time spent in selecting and training on the smaller dataset is significantly lower than the time used in training on the full dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Vapnik, V.N.: Statistical Learning Theory. Addison-Wiley, New York, NY (1998)
MATH Google Scholar
Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines. In: Microsoft Research Technical Report MSR-TR-98-14 (1998)
Google Scholar
Bazzani, A., Bevilacqua, A., Bollini, D., Brancaccio, R., Campanini, R., Lanconelli, N., Riccardi, A., Romani, D.: An SVM classifier to separate false signals from microcalcifications in digital mammograms. vol 46, pp. 1651–1663 (2001)
Google Scholar
Schohn, G., Cohn, D.: Less is more: Active learning with support vector machines. In: Proceedings of the 17th International Conference on Machine Learning (ICML), pp. 839–846 (2000)
Google Scholar
Milenova, Boriana, L., Yarmus, J.S., Campos, M.M.: SVM in oracle database 10g: Removing the barriers to widespread adoption of support vector machines. In: Proceedings of the 31st VLDB Conference, Trondheim, Norway (2005)
Google Scholar
Almeida, M., Braga, A.P., Braga, J.P.: SVM-KM: Speeding SVMs learning with a priori cluster selection and k-means. In: Proceedings of the 6th Brazilian Symposium on Neural Networks, pp. 162–167 (2000)
Google Scholar
Koggalage, R., Halgamuge, S.: Reducing the number of training samples for fast support vector machine classification. Neural Information Processing - Letters and Reviews 2, 57–65 (2004)
Google Scholar
Shin, H., Cho, S.: Fast pattern selection for support vector classifiers. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 376–387. Springer, Heidelberg (2003)
Chapter Google Scholar
Wang, J., Neskovic, P., Cooper, L.N.: Training data selection for support vector machines. In: Wang, L., Chen, K., Ong, Y.S. (eds.) ICNC 2005. LNCS, vol. 3610, pp. 554–564. Springer, Heidelberg (2005)
Google Scholar
Zhang, W., King, I.: A study of the relationship between support vector machine and gabriel graph. In: IEEE, pp. 239–245 (2002)
Google Scholar
Roobaert, D.: DirectSVM: A fast and simple support vector machine perceptron. In: Proceedings. IEEE Int. Workshop Neural Networks for Signal Processing, Sydney, Australia, pp. 356–365 (2000)
Google Scholar
Vishwanathan, S.V.N., Murty, N.M.: A simple SVM algorithm. In: Proceedings 2002 International Joint Conference on Neural Networks. IJCNN ’02. vol. 3, Honolulu, Hawaii, pp. 2393–2398 (2002)
Google Scholar
Raicharoen, T., Lursinsap, C.: Critical support vector machine without kernel function. In: Proc. of 9th International Conference on Neural Information. vol. 5, pp. 2532–2536 (2002)
Google Scholar
Houle, M.E.: SASH: A spatial approximation sample hierarchy for similarity search. Technical Report pages 16, IBM Tokyo Research Laboratory Report RT-0517 (March 5, 2003)
Google Scholar
Dasarathy, B.V., Sanchez, J.S., Townsend, S.: Nearest neighbour editing and condensing tools-synergy exploitation. In: Pattern Analysis & Applications, vol. 3, pp. 19–30. Springer, London (2000)
Google Scholar
Yu, H., Yang, J., Han, J.: Classifying large data sets using SVM with hierarchical clusters. In: SIGKDD 2003, Washington, DC, USA (2003)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Houston, 4800 Calhoun Rd., Houston TX, 77204-3010 Email: vilalta@cs.uh.edu,
Chaofan Sun
Center for Research and Advanced Studies (CINVESTAV), Av. Científica 1145, Guadalajara, 45010, México
Ricardo Vilalta

Authors

Chaofan Sun
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Vilalta
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, C., Vilalta, R. (2007). Data Selection Using SASH Trees for Support Vector Machines. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_22

Download citation

DOI: https://doi.org/10.1007/978-3-540-73499-4_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73498-7
Online ISBN: 978-3-540-73499-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics