Skip to main content
Log in

Accelerating kernel classifiers through borders mapping

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Support vector machine (SVM) and other kernel techniques represent a family of powerful statistical classification methods with high accuracy and broad applicability. Because they use all or a significant portion of the training data, however, they can be slow, especially for large problems. Piecewise linear classifiers are similarly versatile, yet have the additional advantages of simplicity, ease of interpretation and, if the number of component linear classifiers is not too large, speed. Here we show how a simple, piecewise linear classifier can be trained from a kernel-based classifier in order to improve the classification speed. The method works by finding the root of the difference in conditional probabilities between pairs of opposite classes to build up a representation of the decision boundary. When tested on 17 different datasets, it succeeded in improving the classification speed of a SVM for 12 of them by up to two orders of magnitude. Of these, two were less accurate than a simple, linear classifier. The method is best suited to problems with continuum features data and smooth probability functions. Because the component linear classifiers are built up individually from an existing classifier, rather than through a simultaneous optimization procedure, the classifier is also fast to train.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Alimoglu, F.: Combining Multiple Classifiers for Pen-Based Handwritten Digit Recognition. Master’s thesis, Bogazici University (1996)

  2. Bagirov, A.M.: Derivative-free methods for unconstrained nonsmooth optimization and its numerical analysis. Invstigacao Operacional 19, 75–93 (1999)

    Google Scholar 

  3. Bagirov, A.M.: Max-min separability. Optim. Methods Softw. 20(2–3), 277–296 (2005)

    Article  MathSciNet  Google Scholar 

  4. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)

    Article  Google Scholar 

  5. Crammer, K., Singer, Y.: On the learnability and design of output codes for multiclass problems. Mach. Learn. 47(2–3), 201–233

    Article  Google Scholar 

  6. Duarte, M.F., Hu, Y.H.: Vehicle classification in distributed sensor networks. J. Parallel Distrib. Comput. 64, 826–838 (2004)

    Article  Google Scholar 

  7. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. Mach. Learn. Open Source Softw. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  8. Feldkamp, L., Puskorius, G.V.: A signal processing framework based on dynamic neural networks with application to problems in adaptation, filtering, and classification. Proc. IEEE 86(11), 2259–2277 (1998)

    Article  Google Scholar 

  9. Frey, P., Slate, D.: Letter recognition using Holland-style adaptive classifiers. Mach. Learn. 6(2), 161–182 (1991)

    Google Scholar 

  10. Gai, K., Zhang, C.: Learning discriminative piecewise linear models with boundary points. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, pp. 444–450. Association for the Advancement of Artificial Intelligence (2010)

  11. Guyon, I., Gunn, S., Hur, A.B., Dror, G.: Results analysis of the NIPS 2003 feature selection challenge. In: Proceedings of the 17th International Conference on Neural Information Processing Systems, pp. 545–552. MIT Press, Vancouver (2004)

  12. Herman, G.T., Yeung, K.T.D.: On piecewise-linear classification. IEEE Trans. Pattern Anal. Mach. Intell. 14(7), 782–786 (1992)

    Article  Google Scholar 

  13. Hsu, C.W., Lin, C.J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)

    Article  Google Scholar 

  14. Huang, X., Mehrkanoon, S., Suykens, J.A.K.: Support vector machines with piecewise linear feature mapping. Neurocomputing 117(6), 118–127 (2013)

    Article  Google Scholar 

  15. Hull, J.J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16(5), 550–554 (1994)

    Article  Google Scholar 

  16. Iba, W., Wogulis, J., Lngley, P.: Trading of simplicity and coverage in incremental concept learning. In: Proceedings of Fifth International Conference on Machine Learning, pp. 73–79 (1988)

  17. King, R.D., Feng, C., Sutherland, A.: Statlog: comparison of classification problems on large real-world problems. Appl. Artif. Intell. 9(3), 289–333 (1995)

    Article  Google Scholar 

  18. Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer, Berlin (2000)

    MATH  Google Scholar 

  19. Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J., Torkkola, K.: LVQ PAK: The Learning Vector Quantization Package, Version 3.1 (1995)

  20. Kostin, A.: A simple and fast multi-class piecewise linear pattern classifier. Pattern Recogn. 39, 1949–1962 (2006)

    Article  Google Scholar 

  21. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  22. Lee, T., Richards, J.A.: Piecewise linear classification using seniority logic committee methods with application to remote sensing. Pattern Recogn. 17(4), 453–464 (1984)

    Article  Google Scholar 

  23. Lee, T., Richards, J.A.: A low cost classifier for multitemporal applications. Int. J. Remote Sens. 6(8), 1405–1417 (1985)

    Article  Google Scholar 

  24. Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml. Accessed 4 Mar 2017

  25. Lin, H.T., Lin, C.J., Weng, R.C.: A note on Platt’s probabilistic outputs for support vector machines. Mach. Learn. 68(267), 276 (2007)

    Google Scholar 

  26. Michie, D., Spiegelhalter, D.J., Tayler, C.C. (eds.): Machine Learning, Neural and Statistical Classification. Ellis Horwood Series in Artificial Intelligence. Prentice Hall, Upper Saddle River, NJ (1994). http://www.amsta.leeds.ac.uk/~charles/statlog/. Accessed 13 May 2017

  27. Mills, P.: Isoline retrieval: an optimal method for validation of advected contours. Comput. Geosci. 35(11), 2020–2031 (2009)

    Article  Google Scholar 

  28. Mills, P.: Efficient statistical classification of satellite measurements. Int. J. Remote Sens. 32(21), 6109–6132 (2011)

    Article  Google Scholar 

  29. Mohommad, R., Fadi Abdeljaber Thabtah, F.A., McCluskey, T.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2014)

    Article  Google Scholar 

  30. Müller, K.R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)

    Article  Google Scholar 

  31. Osborne, M.: Seniority logic: a logic of a committee machine. IEEE Trans. Comput. 26(12), 1302–1306 (1977)

    Article  Google Scholar 

  32. Ott, E.: Chaos in Dynamical Systems. Cambridge University Press, Cambridge (1993)

    MATH  Google Scholar 

  33. Pavlidis, N.G., Hofmeyr, D.P., Tasoulis, S.K.: Minimum density hyperplanes. J. Mach. Learn. Res. 17(156), 1–33 (2016)

    MathSciNet  MATH  Google Scholar 

  34. Platt, J.: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Advances in Large Margin Classifiers. MIT Press (1999)

  35. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C, 2nd edn. Cambridge University Press, Cambridge (1992)

    MATH  Google Scholar 

  36. Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, Urbana (1963)

    MATH  Google Scholar 

  37. Sklansky, J., Michelotti, L.: Locally trained piecewise linear classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 2(2), 101–111 (1980)

    Article  Google Scholar 

  38. Tenmoto, H., Kuda, M., Shimbo, M.: Piecewise linear classifiers with an appropriate number of hyperplanes. Pattern Recogn. 31(11), 1627–1634 (1998)

    Article  Google Scholar 

  39. Terrell, D.G., Scott, D.W.: Variable kernel density estimation. Ann. Stat. 20, 1236–1265 (1992)

    Article  MathSciNet  Google Scholar 

  40. Uzilov, A.V., Keegan, J.M., Mathews, D.H.: Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinform. 7, 173 (2006)

    Article  Google Scholar 

  41. Vedaldi, A., Zisserman, A.: Efficient additive kernel via explicit feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 480–492 (2012)

    Article  Google Scholar 

  42. Wang, J., Saligrama, V.: Locally-linear learning machines (L3M). Proc. Mach. Learn. Res. 29, 451–466 (2013)

    Google Scholar 

  43. Webb, D.: Efficient Piecewise Linear Classifiers and Applications. Ph.D. thesis, University of Ballarat, Victoria, Australia (2012)

  44. Wu, T.F., Lin, C.J., Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5, 975–1005 (2004)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Thanks to Chih-Chung Chan and Chih-Jen Lin of the National Taiwan University for data from the LIBSVM archive and also to David Aha and the curators of the UCI Machine Learning Repository for statistical classification datasets.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Mills.

Appendix A: Subsampling

Appendix A: Subsampling

Let \(n_i\) be the number of samples of the ith class such that:

$$\begin{aligned} n_i \ge n_{i-1} \end{aligned}$$

Let \(0 \le \alpha (n) \le 1\) be a function used to subsample each of the class distributions in turn:

$$\begin{aligned} n_i^\prime = \alpha (n_i) n_i \end{aligned}$$

We wish to retain the rank ordering of the class sizes:

$$\begin{aligned} \alpha (n_i) n_i \ge \alpha (n_{i-1}) n_{i-1} \end{aligned}$$

while ensuring that the smallest classes have some minimum representation:

$$\begin{aligned} \alpha (n_i) \le \alpha (n_{i-1}) \end{aligned}$$
(11)

Thus:

$$\begin{aligned} \frac{{\rm d}}{{\rm d} n} \left[ n \alpha (n) \right]&= \alpha (n) + n \frac{{\rm d} \alpha }{{\rm n}} \ge 0\nonumber \\ \frac{{\rm d}\alpha }{{\rm d}n}&\ge - \frac{\alpha (n)}{n} \end{aligned}$$
(12)

The simplest means of ensuring that both (11) and (12) are fulfilled is to multiply the right side of (12) with a constant, \(0 \le \zeta \le 1\), and equate it with the left side:

$$\begin{aligned} \frac{{\mathrm{d}} \alpha }{{\mathrm{d}} n} = - \frac{\zeta \alpha (n)}{n} \end{aligned}$$

Integrating:

$$\begin{aligned} \alpha (n)=Cn^{-\zeta } \end{aligned}$$

The parameter, \(C\), is set such that \(n_1^\prime =n_1\):

$$\begin{aligned} C = n_1^\zeta \end{aligned}$$

while \(\zeta\) is set such that:

$$\begin{aligned} f\sum _i n_i= & {} \sum _i \alpha (n_i) n_i \\= & {} n_1^\zeta \sum _i n_i^{1-\zeta } \end{aligned}$$

where \(0< f< 1\) is the desired fraction of training data.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mills, P. Accelerating kernel classifiers through borders mapping. J Real-Time Image Proc 17, 313–327 (2020). https://doi.org/10.1007/s11554-018-0769-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-018-0769-9

Keywords

Navigation