Skip to main content

Data Selection Using SASH Trees for Support Vector Machines

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4571))

Abstract

This paper presents a data preprocessing procedure to select support vector (SV) candidates. We select decision boundary region vectors (BRVs) as SV candidates. Without the need to use the decision boundary, BRVs can be selected based on a vector’s nearest neighbor of opposite class (NNO). To speed up the process, two spatial approximation sample hierarchical (SASH) trees are used for estimating the BRVs. Empirical results show that our data selection procedure can reduce a full dataset to the number of SVs or only slightly higher. Training with the selected subset gives performance comparable to that of the full dataset. For large datasets, overall time spent in selecting and training on the smaller dataset is significantly lower than the time used in training on the full dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vapnik, V.N.: Statistical Learning Theory. Addison-Wiley, New York, NY (1998)

    MATH  Google Scholar 

  2. Platt, J.: Sequential minimal optimization: A fast algorithm for training support vector machines. In: Microsoft Research Technical Report MSR-TR-98-14 (1998)

    Google Scholar 

  3. Bazzani, A., Bevilacqua, A., Bollini, D., Brancaccio, R., Campanini, R., Lanconelli, N., Riccardi, A., Romani, D.: An SVM classifier to separate false signals from microcalcifications in digital mammograms. vol 46, pp. 1651–1663 (2001)

    Google Scholar 

  4. Schohn, G., Cohn, D.: Less is more: Active learning with support vector machines. In: Proceedings of the 17th International Conference on Machine Learning (ICML), pp. 839–846 (2000)

    Google Scholar 

  5. Milenova, Boriana, L., Yarmus, J.S., Campos, M.M.: SVM in oracle database 10g: Removing the barriers to widespread adoption of support vector machines. In: Proceedings of the 31st VLDB Conference, Trondheim, Norway (2005)

    Google Scholar 

  6. Almeida, M., Braga, A.P., Braga, J.P.: SVM-KM: Speeding SVMs learning with a priori cluster selection and k-means. In: Proceedings of the 6th Brazilian Symposium on Neural Networks, pp. 162–167 (2000)

    Google Scholar 

  7. Koggalage, R., Halgamuge, S.: Reducing the number of training samples for fast support vector machine classification. Neural Information Processing - Letters and Reviews 2, 57–65 (2004)

    Google Scholar 

  8. Shin, H., Cho, S.: Fast pattern selection for support vector classifiers. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 376–387. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Wang, J., Neskovic, P., Cooper, L.N.: Training data selection for support vector machines. In: Wang, L., Chen, K., Ong, Y.S. (eds.) ICNC 2005. LNCS, vol. 3610, pp. 554–564. Springer, Heidelberg (2005)

    Google Scholar 

  10. Zhang, W., King, I.: A study of the relationship between support vector machine and gabriel graph. In: IEEE, pp. 239–245 (2002)

    Google Scholar 

  11. Roobaert, D.: DirectSVM: A fast and simple support vector machine perceptron. In: Proceedings. IEEE Int. Workshop Neural Networks for Signal Processing, Sydney, Australia, pp. 356–365 (2000)

    Google Scholar 

  12. Vishwanathan, S.V.N., Murty, N.M.: A simple SVM algorithm. In: Proceedings 2002 International Joint Conference on Neural Networks. IJCNN ’02. vol. 3, Honolulu, Hawaii, pp. 2393–2398 (2002)

    Google Scholar 

  13. Raicharoen, T., Lursinsap, C.: Critical support vector machine without kernel function. In: Proc. of 9th International Conference on Neural Information. vol. 5, pp. 2532–2536 (2002)

    Google Scholar 

  14. Houle, M.E.: SASH: A spatial approximation sample hierarchy for similarity search. Technical Report pages 16, IBM Tokyo Research Laboratory Report RT-0517 (March 5, 2003)

    Google Scholar 

  15. Dasarathy, B.V., Sanchez, J.S., Townsend, S.: Nearest neighbour editing and condensing tools-synergy exploitation. In: Pattern Analysis & Applications, vol. 3, pp. 19–30. Springer, London (2000)

    Google Scholar 

  16. Yu, H., Yang, J., Han, J.: Classifying large data sets using SVM with hierarchical clusters. In: SIGKDD 2003, Washington, DC, USA (2003)

    Google Scholar 

  17. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sun, C., Vilalta, R. (2007). Data Selection Using SASH Trees for Support Vector Machines. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2007. Lecture Notes in Computer Science(), vol 4571. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73499-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73499-4_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73498-7

  • Online ISBN: 978-3-540-73499-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics