Online learning represents an important family of efficient and scalable algorithms for large-scale classification problems. Many of them are linear with fast computational speed, but when faced with complex classification, they more likely have low accuracies. In order to improve accuracies, kernel trick is applied, however, it often brings high computational cost. In fact, discriminative information is vital in classification which is still not fully utilized in these algorithms. In this paper, we proposed a novel online linear method, called Sketch Discriminatively Regularized Online Gradient Descent Classification (SDROGD). In order to exploit inter-class separability and intra-class compactness, SDROGD utilizes a matrix to characterize the discriminative information and embeds it directly into a new regularization term. This matrix can be updated by the sketch technique in an online manner. After applying a simple but effective optimization, we show that SDROGD has a good time complexity bound, which is linear with the feature dimension or the number of samples. Experimental results on both toy and real-world datasets demonstrate that SDROGD has not only faster computational speed but also much better classification accuracies than some related kernelized algorithms.
This is a preview of subscription content, log in to check access.
Buy single article
Instant unlimited access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Frank R (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386
Shai S-S (2012) Online learning and online convex optimization. Found Trends®; Mach Learn 4(2):107–194
Shi T, Zhu J (2017) Online bayesian passive-aggressive learning. J Mach Learn Res 18(1):1084–1122
Koby C, Ofer D, Joseph K, Shai S-S, Yoram S (2006) Online passive-aggressive algorithms. J Mach Learn Res 7:551–585
Shai S-S, Yoram S, Nathan S, Andrew C (2011) Pegasos: Primal estimated sub-gradient solver for svm. Math Program 127(1):3–30
Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37 (3):277–296
Kivinen J, Smola AJ, Williamson RC (2002) Online learning with kernels. In: Advances in Neural Information Processing Systems, pp 785–792
Tu DN, Le T, Bui H, Phung DQ (2017) Large-scale online kernel learning with random feature reparameterization. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp 2543–2549
Wang Z, Koby C, Slobodan V (2012) Breaking the curse of kernelization Budgeted stochastic gradient descent for large-scale svm training. J Mach Learn Res 13(Oct):3103–3131
Lu J, Hoi Steven CH, Wang J, Zhao P, Liu Z-Y (2016) Large scale online kernel learning. J Mach Learn Res 17(1):1613–1655
Bo Y, Shao Q-M, Li P, Li W-B (2018) A study on regularized weighted least square support vector classifier. Pattern Recogn Lett 108:48–55
Jian L, Shen S, Li J, Liang X, Li Lei (2017) Budget online learning algorithm for least squares svm. IEEE Trans Neural Netw Learn Syst 28(9):2076–2087
Li Z, Ton J-F, Oglic D, Sejdinovic D (2018) Towards a unified analysis of random fourier features. arXiv:1806.09178
Kim T-K, Wong S-F, Bjorn S, Josef K, Roberto C (2007) Incremental linear discriminant analysis using sufficient spanning set approximations. In Computer Vision and Pattern Recognition. IEEE, pp 1–8
Xue H, Chen S, Yang Q (2009) Discriminatively regularized least-squares classification. Pattern Recogn 42(1):93–104
Pang S, Seiichi O, Nikola K (2005) Incremental linear discriminant analysis for classification of data streams. IEEE Trans Syst Man Cybern Part B Cybern 35(5):905
Ye J, Li Q, Xiong H, Haesun P, Ravi J, Kumar V (2005) Idr/qr: an incremental dimension reduction algorithm via qr decomposition. IEEE Trans Knowl Data Eng 17(9):1208–1222
Li W-H, Zhong Z, Zheng W-S (2017) One-pass person re-identification by sketch online discriminant analysis. arXiv:1711.03368
Edo L (2013) Simple and deterministic matrix sketching. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp581–588
Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev 60(2):223–311
Reddi SJ, Hefny A, Sra S, Poczos B, Smola A (2016) Stochastic variance reduction for nonconvex optimization. In: International conference on machine learning, pp 314–323
Kohavi R (1996) Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, vol 96, pp 202–207
Alex K, Geoffrey H (2009) Learning multiple layers of features from tiny images. Technical report, Citeseer
Sören S, Vojtech F (2010) Coffin: A computational framework for linear svms. In: Proceedings of the 27th International Conference on Machine Learning, pp 999–1006
Chang C-C, Lin C-J (2011) LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, pp 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Isabelle G, Steve G, Asa B-H, Gideon D (2005) Result analysis of the nips 2003 feature selection challenge. In: Advances in Neural Information Processing Systems, pp 545–552
Danil P (2001) Ijcnn 2001 neural network competition. Slide Present IJCNN 1:97
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
Xu H-M, Xue H, Chen X-H, Wang Y-Y (2017) Solving indefinite kernel support vector machine with difference of convex functions programming. In Thirty-First AAAI Conference on Artificial Intelligence
Koby C, Yoram S (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2(Dec):265–292
Kamiya R, Washizawa Y (2018) Discriminative sparse representation learning using multiclass hinge loss. In: 2018 Asia-pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp 955–958
This work was supported by the National Key R&D Program of China (Grant No. 2017YFB1002801), the National Natural Science Foundations of China (Grant No. 61876091).
It is also supported by the Collaborative Innovation Center of Wireless Communications Technology.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Xue, H., Ren, Z. Sketch discriminatively regularized online gradient descent classification. Appl Intell (2020). https://doi.org/10.1007/s10489-019-01590-6
- Machine learning
- Online learning
- Sketch technique