Accelerating kernel classifiers through borders mapping

Original Research Paper

Abstract

Support vector machine (SVM) and other kernel techniques represent a family of powerful statistical classification methods with high accuracy and broad applicability. Because they use all or a significant portion of the training data, however, they can be slow, especially for large problems. Piecewise linear classifiers are similarly versatile, yet have the additional advantages of simplicity, ease of interpretation and, if the number of component linear classifiers is not too large, speed. Here we show how a simple, piecewise linear classifier can be trained from a kernel-based classifier in order to improve the classification speed. The method works by finding the root of the difference in conditional probabilities between pairs of opposite classes to build up a representation of the decision boundary. When tested on 17 different datasets, it succeeded in improving the classification speed of a SVM for 12 of them by up to two orders of magnitude. Of these, two were less accurate than a simple, linear classifier. The method is best suited to problems with continuum features data and smooth probability functions. Because the component linear classifiers are built up individually from an existing classifier, rather than through a simultaneous optimization procedure, the classifier is also fast to train.

Keywords

Class borders Multi-dimensional root finding Adaptive Gaussian filtering Nonparametric statistics Variable kernel density estimation 

Notes

Acknowledgements

Thanks to Chih-Chung Chan and Chih-Jen Lin of the National Taiwan University for data from the LIBSVM archive and also to David Aha and the curators of the UCI Machine Learning Repository for statistical classification datasets.

References

  1. 1.
    Alimoglu, F.: Combining Multiple Classifiers for Pen-Based Handwritten Digit Recognition. Master’s thesis, Bogazici University (1996)Google Scholar
  2. 2.
    Bagirov, A.M.: Derivative-free methods for unconstrained nonsmooth optimization and its numerical analysis. Invstigacao Operacional 19, 75–93 (1999)Google Scholar
  3. 3.
    Bagirov, A.M.: Max-min separability. Optim. Methods Softw. 20(2–3), 277–296 (2005)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)CrossRefGoogle Scholar
  5. 5.
    Crammer, K., Singer, Y.: On the learnability and design of output codes for multiclass problems. Mach. Learn. 47(2–3), 201–233CrossRefMATHGoogle Scholar
  6. 6.
    Duarte, M.F., Hu, Y.H.: Vehicle classification in distributed sensor networks. J. Parallel Distrib. Comput. 64, 826–838 (2004)CrossRefGoogle Scholar
  7. 7.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. Mach. Learn. Open Source Softw. 9, 1871–1874 (2008)MATHGoogle Scholar
  8. 8.
    Feldkamp, L., Puskorius, G.V.: A signal processing framework based on dynamic neural networks with application to problems in adaptation, filtering, and classification. Proc. IEEE 86(11), 2259–2277 (1998)CrossRefGoogle Scholar
  9. 9.
    Frey, P., Slate, D.: Letter recognition using Holland-style adaptive classifiers. Mach. Learn. 6(2), 161–182 (1991)Google Scholar
  10. 10.
    Gai, K., Zhang, C.: Learning discriminative piecewise linear models with boundary points. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, pp. 444–450. Association for the Advancement of Artificial Intelligence (2010)Google Scholar
  11. 11.
    Guyon, I., Gunn, S., Hur, A.B., Dror, G.: Results analysis of the NIPS 2003 feature selection challenge. In: Proceedings of the 17th International Conference on Neural Information Processing Systems, pp. 545–552. MIT Press, Vancouver (2004)Google Scholar
  12. 12.
    Herman, G.T., Yeung, K.T.D.: On piecewise-linear classification. IEEE Trans. Pattern Anal. Mach. Intell. 14(7), 782–786 (1992)CrossRefGoogle Scholar
  13. 13.
    Hsu, C.W., Lin, C.J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)CrossRefGoogle Scholar
  14. 14.
    Huang, X., Mehrkanoon, S., Suykens, J.A.K.: Support vector machines with piecewise linear feature mapping. Neurocomputing 117(6), 118–127 (2013)CrossRefGoogle Scholar
  15. 15.
    Hull, J.J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16(5), 550–554 (1994)CrossRefGoogle Scholar
  16. 16.
    Iba, W., Wogulis, J., Lngley, P.: Trading of simplicity and coverage in incremental concept learning. In: Proceedings of Fifth International Conference on Machine Learning, pp. 73–79 (1988)Google Scholar
  17. 17.
    King, R.D., Feng, C., Sutherland, A.: Statlog: comparison of classification problems on large real-world problems. Appl. Artif. Intell. 9(3), 289–333 (1995)CrossRefGoogle Scholar
  18. 18.
    Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer, Berlin (2000)MATHGoogle Scholar
  19. 19.
    Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J., Torkkola, K.: LVQ PAK: The Learning Vector Quantization Package, Version 3.1 (1995)Google Scholar
  20. 20.
    Kostin, A.: A simple and fast multi-class piecewise linear pattern classifier. Pattern Recogn. 39, 1949–1962 (2006)CrossRefMATHGoogle Scholar
  21. 21.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  22. 22.
    Lee, T., Richards, J.A.: Piecewise linear classification using seniority logic committee methods with application to remote sensing. Pattern Recogn. 17(4), 453–464 (1984)CrossRefGoogle Scholar
  23. 23.
    Lee, T., Richards, J.A.: A low cost classifier for multitemporal applications. Int. J. Remote Sens. 6(8), 1405–1417 (1985)CrossRefGoogle Scholar
  24. 24.
    Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml. Accessed 4 Mar 2017
  25. 25.
    Lin, H.T., Lin, C.J., Weng, R.C.: A note on Platt’s probabilistic outputs for support vector machines. Mach. Learn. 68(267), 276 (2007)Google Scholar
  26. 26.
    Michie, D., Spiegelhalter, D.J., Tayler, C.C. (eds.): Machine Learning, Neural and Statistical Classification. Ellis Horwood Series in Artificial Intelligence. Prentice Hall, Upper Saddle River, NJ (1994). http://www.amsta.leeds.ac.uk/~charles/statlog/. Accessed 13 May 2017
  27. 27.
    Mills, P.: Isoline retrieval: an optimal method for validation of advected contours. Comput. Geosci. 35(11), 2020–2031 (2009)CrossRefGoogle Scholar
  28. 28.
    Mills, P.: Efficient statistical classification of satellite measurements. Int. J. Remote Sens. 32(21), 6109–6132 (2011)CrossRefGoogle Scholar
  29. 29.
    Mohommad, R., Fadi Abdeljaber Thabtah, F.A., McCluskey, T.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2014)CrossRefGoogle Scholar
  30. 30.
    Müller, K.R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)CrossRefGoogle Scholar
  31. 31.
    Osborne, M.: Seniority logic: a logic of a committee machine. IEEE Trans. Comput. 26(12), 1302–1306 (1977)CrossRefGoogle Scholar
  32. 32.
    Ott, E.: Chaos in Dynamical Systems. Cambridge University Press, Cambridge (1993)MATHGoogle Scholar
  33. 33.
    Pavlidis, N.G., Hofmeyr, D.P., Tasoulis, S.K.: Minimum density hyperplanes. J. Mach. Learn. Res. 17(156), 1–33 (2016)MathSciNetMATHGoogle Scholar
  34. 34.
    Platt, J.: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Advances in Large Margin Classifiers. MIT Press (1999)Google Scholar
  35. 35.
    Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C, 2nd edn. Cambridge University Press, Cambridge (1992)MATHGoogle Scholar
  36. 36.
    Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, Urbana (1963)MATHGoogle Scholar
  37. 37.
    Sklansky, J., Michelotti, L.: Locally trained piecewise linear classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 2(2), 101–111 (1980)CrossRefMATHGoogle Scholar
  38. 38.
    Tenmoto, H., Kuda, M., Shimbo, M.: Piecewise linear classifiers with an appropriate number of hyperplanes. Pattern Recogn. 31(11), 1627–1634 (1998)CrossRefGoogle Scholar
  39. 39.
    Terrell, D.G., Scott, D.W.: Variable kernel density estimation. Ann. Stat. 20, 1236–1265 (1992)MathSciNetCrossRefMATHGoogle Scholar
  40. 40.
    Uzilov, A.V., Keegan, J.M., Mathews, D.H.: Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinform. 7, 173 (2006)CrossRefGoogle Scholar
  41. 41.
    Vedaldi, A., Zisserman, A.: Efficient additive kernel via explicit feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 480–492 (2012)CrossRefGoogle Scholar
  42. 42.
    Wang, J., Saligrama, V.: Locally-linear learning machines (L3M). Proc. Mach. Learn. Res. 29, 451–466 (2013)Google Scholar
  43. 43.
    Webb, D.: Efficient Piecewise Linear Classifiers and Applications. Ph.D. thesis, University of Ballarat, Victoria, Australia (2012)Google Scholar
  44. 44.
    Wu, T.F., Lin, C.J., Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5, 975–1005 (2004)MathSciNetMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.CumberlandCanada

Personalised recommendations