Abstract
Support vector machine (SVM) and other kernel techniques represent a family of powerful statistical classification methods with high accuracy and broad applicability. Because they use all or a significant portion of the training data, however, they can be slow, especially for large problems. Piecewise linear classifiers are similarly versatile, yet have the additional advantages of simplicity, ease of interpretation and, if the number of component linear classifiers is not too large, speed. Here we show how a simple, piecewise linear classifier can be trained from a kernel-based classifier in order to improve the classification speed. The method works by finding the root of the difference in conditional probabilities between pairs of opposite classes to build up a representation of the decision boundary. When tested on 17 different datasets, it succeeded in improving the classification speed of a SVM for 12 of them by up to two orders of magnitude. Of these, two were less accurate than a simple, linear classifier. The method is best suited to problems with continuum features data and smooth probability functions. Because the component linear classifiers are built up individually from an existing classifier, rather than through a simultaneous optimization procedure, the classifier is also fast to train.
Similar content being viewed by others
References
Alimoglu, F.: Combining Multiple Classifiers for Pen-Based Handwritten Digit Recognition. Master’s thesis, Bogazici University (1996)
Bagirov, A.M.: Derivative-free methods for unconstrained nonsmooth optimization and its numerical analysis. Invstigacao Operacional 19, 75–93 (1999)
Bagirov, A.M.: Max-min separability. Optim. Methods Softw. 20(2–3), 277–296 (2005)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)
Crammer, K., Singer, Y.: On the learnability and design of output codes for multiclass problems. Mach. Learn. 47(2–3), 201–233
Duarte, M.F., Hu, Y.H.: Vehicle classification in distributed sensor networks. J. Parallel Distrib. Comput. 64, 826–838 (2004)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. Mach. Learn. Open Source Softw. 9, 1871–1874 (2008)
Feldkamp, L., Puskorius, G.V.: A signal processing framework based on dynamic neural networks with application to problems in adaptation, filtering, and classification. Proc. IEEE 86(11), 2259–2277 (1998)
Frey, P., Slate, D.: Letter recognition using Holland-style adaptive classifiers. Mach. Learn. 6(2), 161–182 (1991)
Gai, K., Zhang, C.: Learning discriminative piecewise linear models with boundary points. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, pp. 444–450. Association for the Advancement of Artificial Intelligence (2010)
Guyon, I., Gunn, S., Hur, A.B., Dror, G.: Results analysis of the NIPS 2003 feature selection challenge. In: Proceedings of the 17th International Conference on Neural Information Processing Systems, pp. 545–552. MIT Press, Vancouver (2004)
Herman, G.T., Yeung, K.T.D.: On piecewise-linear classification. IEEE Trans. Pattern Anal. Mach. Intell. 14(7), 782–786 (1992)
Hsu, C.W., Lin, C.J.: A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)
Huang, X., Mehrkanoon, S., Suykens, J.A.K.: Support vector machines with piecewise linear feature mapping. Neurocomputing 117(6), 118–127 (2013)
Hull, J.J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16(5), 550–554 (1994)
Iba, W., Wogulis, J., Lngley, P.: Trading of simplicity and coverage in incremental concept learning. In: Proceedings of Fifth International Conference on Machine Learning, pp. 73–79 (1988)
King, R.D., Feng, C., Sutherland, A.: Statlog: comparison of classification problems on large real-world problems. Appl. Artif. Intell. 9(3), 289–333 (1995)
Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer, Berlin (2000)
Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J., Torkkola, K.: LVQ PAK: The Learning Vector Quantization Package, Version 3.1 (1995)
Kostin, A.: A simple and fast multi-class piecewise linear pattern classifier. Pattern Recogn. 39, 1949–1962 (2006)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lee, T., Richards, J.A.: Piecewise linear classification using seniority logic committee methods with application to remote sensing. Pattern Recogn. 17(4), 453–464 (1984)
Lee, T., Richards, J.A.: A low cost classifier for multitemporal applications. Int. J. Remote Sens. 6(8), 1405–1417 (1985)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml. Accessed 4 Mar 2017
Lin, H.T., Lin, C.J., Weng, R.C.: A note on Platt’s probabilistic outputs for support vector machines. Mach. Learn. 68(267), 276 (2007)
Michie, D., Spiegelhalter, D.J., Tayler, C.C. (eds.): Machine Learning, Neural and Statistical Classification. Ellis Horwood Series in Artificial Intelligence. Prentice Hall, Upper Saddle River, NJ (1994). http://www.amsta.leeds.ac.uk/~charles/statlog/. Accessed 13 May 2017
Mills, P.: Isoline retrieval: an optimal method for validation of advected contours. Comput. Geosci. 35(11), 2020–2031 (2009)
Mills, P.: Efficient statistical classification of satellite measurements. Int. J. Remote Sens. 32(21), 6109–6132 (2011)
Mohommad, R., Fadi Abdeljaber Thabtah, F.A., McCluskey, T.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2014)
Müller, K.R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001)
Osborne, M.: Seniority logic: a logic of a committee machine. IEEE Trans. Comput. 26(12), 1302–1306 (1977)
Ott, E.: Chaos in Dynamical Systems. Cambridge University Press, Cambridge (1993)
Pavlidis, N.G., Hofmeyr, D.P., Tasoulis, S.K.: Minimum density hyperplanes. J. Mach. Learn. Res. 17(156), 1–33 (2016)
Platt, J.: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Advances in Large Margin Classifiers. MIT Press (1999)
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C, 2nd edn. Cambridge University Press, Cambridge (1992)
Shannon, C.E., Weaver, W.: The Mathematical Theory of Communication. University of Illinois Press, Urbana (1963)
Sklansky, J., Michelotti, L.: Locally trained piecewise linear classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 2(2), 101–111 (1980)
Tenmoto, H., Kuda, M., Shimbo, M.: Piecewise linear classifiers with an appropriate number of hyperplanes. Pattern Recogn. 31(11), 1627–1634 (1998)
Terrell, D.G., Scott, D.W.: Variable kernel density estimation. Ann. Stat. 20, 1236–1265 (1992)
Uzilov, A.V., Keegan, J.M., Mathews, D.H.: Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinform. 7, 173 (2006)
Vedaldi, A., Zisserman, A.: Efficient additive kernel via explicit feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 480–492 (2012)
Wang, J., Saligrama, V.: Locally-linear learning machines (L3M). Proc. Mach. Learn. Res. 29, 451–466 (2013)
Webb, D.: Efficient Piecewise Linear Classifiers and Applications. Ph.D. thesis, University of Ballarat, Victoria, Australia (2012)
Wu, T.F., Lin, C.J., Weng, R.C.: Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5, 975–1005 (2004)
Acknowledgements
Thanks to Chih-Chung Chan and Chih-Jen Lin of the National Taiwan University for data from the LIBSVM archive and also to David Aha and the curators of the UCI Machine Learning Repository for statistical classification datasets.
Author information
Authors and Affiliations
Corresponding author
Appendix A: Subsampling
Appendix A: Subsampling
Let \(n_i\) be the number of samples of the ith class such that:
Let \(0 \le \alpha (n) \le 1\) be a function used to subsample each of the class distributions in turn:
We wish to retain the rank ordering of the class sizes:
while ensuring that the smallest classes have some minimum representation:
Thus:
The simplest means of ensuring that both (11) and (12) are fulfilled is to multiply the right side of (12) with a constant, \(0 \le \zeta \le 1\), and equate it with the left side:
Integrating:
The parameter, \(C\), is set such that \(n_1^\prime =n_1\):
while \(\zeta\) is set such that:
where \(0< f< 1\) is the desired fraction of training data.
Rights and permissions
About this article
Cite this article
Mills, P. Accelerating kernel classifiers through borders mapping. J Real-Time Image Proc 17, 313–327 (2020). https://doi.org/10.1007/s11554-018-0769-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-018-0769-9