Sketch discriminatively regularized online gradient descent classification

  • 15 Accesses


Online learning represents an important family of efficient and scalable algorithms for large-scale classification problems. Many of them are linear with fast computational speed, but when faced with complex classification, they more likely have low accuracies. In order to improve accuracies, kernel trick is applied, however, it often brings high computational cost. In fact, discriminative information is vital in classification which is still not fully utilized in these algorithms. In this paper, we proposed a novel online linear method, called Sketch Discriminatively Regularized Online Gradient Descent Classification (SDROGD). In order to exploit inter-class separability and intra-class compactness, SDROGD utilizes a matrix to characterize the discriminative information and embeds it directly into a new regularization term. This matrix can be updated by the sketch technique in an online manner. After applying a simple but effective optimization, we show that SDROGD has a good time complexity bound, which is linear with the feature dimension or the number of samples. Experimental results on both toy and real-world datasets demonstrate that SDROGD has not only faster computational speed but also much better classification accuracies than some related kernelized algorithms.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

  2. 2.

  3. 3.

  4. 4.


  1. 1.

    Frank R (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386

  2. 2.

    Shai S-S (2012) Online learning and online convex optimization. Found Trends®; Mach Learn 4(2):107–194

  3. 3.

    Shi T, Zhu J (2017) Online bayesian passive-aggressive learning. J Mach Learn Res 18(1):1084–1122

  4. 4.

    Koby C, Ofer D, Joseph K, Shai S-S, Yoram S (2006) Online passive-aggressive algorithms. J Mach Learn Res 7:551–585

  5. 5.

    Shai S-S, Yoram S, Nathan S, Andrew C (2011) Pegasos: Primal estimated sub-gradient solver for svm. Math Program 127(1):3–30

  6. 6.

    Freund Y, Schapire RE (1999) Large margin classification using the perceptron algorithm. Mach Learn 37 (3):277–296

  7. 7.

    Kivinen J, Smola AJ, Williamson RC (2002) Online learning with kernels. In: Advances in Neural Information Processing Systems, pp 785–792

  8. 8.

    Tu DN, Le T, Bui H, Phung DQ (2017) Large-scale online kernel learning with random feature reparameterization. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp 2543–2549

  9. 9.

    Wang Z, Koby C, Slobodan V (2012) Breaking the curse of kernelization Budgeted stochastic gradient descent for large-scale svm training. J Mach Learn Res 13(Oct):3103–3131

  10. 10.

    Lu J, Hoi Steven CH, Wang J, Zhao P, Liu Z-Y (2016) Large scale online kernel learning. J Mach Learn Res 17(1):1613–1655

  11. 11.

    Bo Y, Shao Q-M, Li P, Li W-B (2018) A study on regularized weighted least square support vector classifier. Pattern Recogn Lett 108:48–55

  12. 12.

    Jian L, Shen S, Li J, Liang X, Li Lei (2017) Budget online learning algorithm for least squares svm. IEEE Trans Neural Netw Learn Syst 28(9):2076–2087

  13. 13.

    Li Z, Ton J-F, Oglic D, Sejdinovic D (2018) Towards a unified analysis of random fourier features. arXiv:1806.09178

  14. 14.

    Kim T-K, Wong S-F, Bjorn S, Josef K, Roberto C (2007) Incremental linear discriminant analysis using sufficient spanning set approximations. In Computer Vision and Pattern Recognition. IEEE, pp 1–8

  15. 15.

    Xue H, Chen S, Yang Q (2009) Discriminatively regularized least-squares classification. Pattern Recogn 42(1):93–104

  16. 16.

    Pang S, Seiichi O, Nikola K (2005) Incremental linear discriminant analysis for classification of data streams. IEEE Trans Syst Man Cybern Part B Cybern 35(5):905

  17. 17.

    Ye J, Li Q, Xiong H, Haesun P, Ravi J, Kumar V (2005) Idr/qr: an incremental dimension reduction algorithm via qr decomposition. IEEE Trans Knowl Data Eng 17(9):1208–1222

  18. 18.

    Li W-H, Zhong Z, Zheng W-S (2017) One-pass person re-identification by sketch online discriminant analysis. arXiv:1711.03368

  19. 19.

    Edo L (2013) Simple and deterministic matrix sketching. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp581–588

  20. 20.

    Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev 60(2):223–311

  21. 21.

    Reddi SJ, Hefny A, Sra S, Poczos B, Smola A (2016) Stochastic variance reduction for nonconvex optimization. In: International conference on machine learning, pp 314–323

  22. 22.

    Kohavi R (1996) Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, vol 96, pp 202–207

  23. 23.

    Alex K, Geoffrey H (2009) Learning multiple layers of features from tiny images. Technical report, Citeseer

  24. 24.

    Sören S, Vojtech F (2010) Coffin: A computational framework for linear svms. In: Proceedings of the 27th International Conference on Machine Learning, pp 999–1006

  25. 25.

    Chang C-C, Lin C-J (2011) LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, pp 2:27:1–27:27. Software available at

  26. 26.

    Isabelle G, Steve G, Asa B-H, Gideon D (2005) Result analysis of the nips 2003 feature selection challenge. In: Advances in Neural Information Processing Systems, pp 545–552

  27. 27.

    Danil P (2001) Ijcnn 2001 neural network competition. Slide Present IJCNN 1:97

  28. 28.

    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

  29. 29.

    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830

  30. 30.

    Xu H-M, Xue H, Chen X-H, Wang Y-Y (2017) Solving indefinite kernel support vector machine with difference of convex functions programming. In Thirty-First AAAI Conference on Artificial Intelligence

  31. 31.

    Koby C, Yoram S (2001) On the algorithmic implementation of multiclass kernel-based vector machines. J Mach Learn Res 2(Dec):265–292

  32. 32.

    Kamiya R, Washizawa Y (2018) Discriminative sparse representation learning using multiclass hinge loss. In: 2018 Asia-pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp 955–958

Download references


This work was supported by the National Key R&D Program of China (Grant No. 2017YFB1002801), the National Natural Science Foundations of China (Grant No. 61876091).

It is also supported by the Collaborative Innovation Center of Wireless Communications Technology.

Author information

Correspondence to Hui Xue.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Xue, H., Ren, Z. Sketch discriminatively regularized online gradient descent classification. Appl Intell (2020).

Download citation


  • Machine learning
  • Online learning
  • Classification
  • Sketch technique