Skip to main content
Log in

Feature-aware regularization for sparse online learning

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Learning a compact predictive model in an online setting has recently gained a great deal of attention. The combination of online learning with sparsity-inducing regularization enables faster learning with a smaller memory space than the previous learning frameworks. Many optimization methods and learning algorithms have been developed on the basis of online learning with L 1-regularization. L 1-regularization tends to truncate some types of parameters, such as those that rarely occur or have a small range of values, unless they are emphasized in advance. However, the inclusion of a pre-processing step would make it very difficult to preserve the advantages of online learning. We propose a new regularization framework for sparse online learning. We focus on regularization terms, and we enhance the state-of-the-art regularization approach by integrating information on all previous subgradients of the loss function into a regularization term. The resulting algorithms enable online learning to adjust the intensity of each feature’s truncations without pre-processing and eventually eliminate the bias of L 1-regularization. We show theoretical properties of our framework, the computational complexity and upper bound of regret. Experiments demonstrated that our algorithms outperformed previous methods in many classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Yu H-F, Hsieh C-J, Chang K-W, et al. Large linear classification when data cannot fit in memory. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM, 2010. 833–842

    Chapter  Google Scholar 

  2. Duchi J, Singer Y. Effcient online and batch learning using forward backward splitting. J Mach Learn Res, 2009, 10: 2899–2934

    MATH  MathSciNet  Google Scholar 

  3. Duchi J, Shalev-Shwartz S, Singer Y, et al. Composite objective mirror descent. In: 23rd International Conference on Learning Theory, Haifa, 2010. 14–26

    Google Scholar 

  4. Xiao L. Dual averaging methods for regularized stochastic learning and online optimization. J Mach Learn Res, 2010, 11: 2543–2596

    MATH  MathSciNet  Google Scholar 

  5. Brendan McMahan H, Streeter M J. Adaptive bound optimization for online convex optimization. In: 23rd International Conference on Learning Theory, Haifa, 2010. 244–256

    Google Scholar 

  6. Brendan McMahan H. Follow-the-regularized-leader and mirror descent: equivalence theorems and l1 regularization. In: 14th International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, 2011. 525–533

    Google Scholar 

  7. Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inf Process Manage, 1988, 24: 513–523

    Article  Google Scholar 

  8. Shalev-Shwartz S. Online learning and online convex optimization. Found Trends Mach Learn, 2012, 4: 107–194

    Article  Google Scholar 

  9. Bertsekas D P. Nonlinear Programming. 2nd edition. Athena Scientific. 1999

    MATH  Google Scholar 

  10. Zinkevich M. Online convex programming and generalized infinitesimal gradient ascent. In: 20th International Conference on Machine Learning, Washington D. C., 2003. 928–936

    Google Scholar 

  11. Beck A, Teboulle M. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper Res Lett, 2003, 31: 167–175

    Article  MATH  MathSciNet  Google Scholar 

  12. Nesterov Y. Primal-dual subgradient methods for convex problems. Math Program, 2009, 120: 221–259

    Article  MATH  MathSciNet  Google Scholar 

  13. Nesterov Y. A method of solving a convex programming problem with convergence rate o(1/k2). Sov Math Dokl, 1983, 27: 372–376

    MATH  Google Scholar 

  14. Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci, 2009, 2: 183–202

    Article  MATH  MathSciNet  Google Scholar 

  15. Tseng P. Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math Program, 2010, 125: 263–295

    Article  MATH  MathSciNet  Google Scholar 

  16. Carpenter B. Lazy sparse stochastic gradient descent for regularized multinomial logistic regression. Technical Report, Alias-i, Inc. 2008

    Google Scholar 

  17. Langford J, Li L H, Zhang T. Sparse online learning via truncated gradient. J Mach Learn Res, 2009, 10: 777–801

    MATH  MathSciNet  Google Scholar 

  18. Tsuruoka Y, Tsujii J, Ananiadou S. Stochastic gradient descent training for l1-regularized log-linear. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Stroudsburg: Association for Computational Linguistics, 2009. 477–485

    Google Scholar 

  19. Shalev-shwartz S, Singer Y. Convex repeated games and fenchel duality. In: Advances in Neural Information Processing Systems, Vancouver, 2006. 1265–1272

    Google Scholar 

  20. Dekel O, Gilad-Bachrach R, Shamir O, et al. Optimal distributed online prediction using mini-batches. J Mach Learn Res, 2012, 13: 165–202

    MATH  MathSciNet  Google Scholar 

  21. Duchi J, Agarwal A, Wainwright M J. Distributed dual averaging in networks. In: Advances in Neural Information Processing Systems, Vancouver, 2010. 550–558

    Google Scholar 

  22. Lee S, Wright S J. Manifold identification in dual averaging for regularized stochastic online learning. J Mach Learn Res, 2012, 13: 1705–1744

    MATH  MathSciNet  Google Scholar 

  23. Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res, 2011, 12: 2121–2159

    MATH  MathSciNet  Google Scholar 

  24. Kalai A, Vempala S. Efficient algorithms for online decision problems. J Comput Syst Sci, 2005, 71: 291–307

    Article  MATH  MathSciNet  Google Scholar 

  25. Shalev-Shwartz S, Singer Y. A primal-dual perspective of online learning algorithms. Mach Learn, 2007, 69: 115–142

    Article  Google Scholar 

  26. Sra S, Nowozin S, Wright S J. Optimization for Machine Learning. MIT Press, 2011

    Google Scholar 

  27. Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev, 1958, 65: 386–408

    Article  MathSciNet  Google Scholar 

  28. Crammer K, Dekel O, Keshet J, et al. Online passive-aggressive algorithms. J Mach Learn Res, 2006, 7: 551–585

    MATH  MathSciNet  Google Scholar 

  29. Dredze M, Crammer K, Pereira F. Confidence-weighted linear classification. In: 25th international conference on Machine learning. New York: ACM, 2008. 264–271

    Google Scholar 

  30. Crammer K, Fern M D, Pereira O. Exact convex confidence-weighted learning. In: Advances in Neural Information Processing Systems, Vancouver, 2008. 345–352

    Google Scholar 

  31. Narayanan H, Rakhlin A. Random walk approach to regret minimization. In: Advances in Neural Information Processing Systems, Vancouver, 2010. 1777–1785

    Google Scholar 

  32. Cesa-Bianchi N, Shamir O. Efficient online learning via randomized rounding. In: Advances in Neural Information Processing Systems, Granada, 2011. 343–351

    Google Scholar 

  33. Zou H, Hastie T. Regularization and variable selection via the elastic net. J Roy Statist Soc Ser B, 2005, 67: 301–320

    Article  MATH  MathSciNet  Google Scholar 

  34. Bondell H D, Reich B J. Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with oscar. Biometrics, 2008, 64: 115–123

    Article  MATH  MathSciNet  Google Scholar 

  35. Luo D J, Ding C H Q, Huang H. Toward structural sparsity: an explicit 2/0 approach. Knowl Inf Syst, 2013, 36: 411–438

    Article  Google Scholar 

  36. Wu X D, Yu K, Ding W, et al. Online feature selection with streaming features. IEEE Trans Patt Anal Mach Intell, 2013, 35: 1178–1192

    Article  Google Scholar 

  37. Wang H X, Zheng W M. Robust sparsity-preserved learning with application to image visualization. Knowl Inf Syst, 2013. doi: 10.1007/s10115-012-0605-7

    Google Scholar 

  38. Oiwa H, Matsushima S, Nakagawa H. Healing truncation bias: self-weighted truncation framework for dual averaging. In: IEEE 12th International Conference on Data Mining (ICDM), Brussels, 2012. 575–584

    Google Scholar 

  39. Oiwa H, Matsushima S, Nakagawa H. Frequency-aware truncated methods for sparse online learning. Lect Notes Comput Sci, 2011, 6912: 533–548

    Article  Google Scholar 

  40. Brendan McMahan H. A unified view of regularized dual averaging and mirror descent with implicit updates. arXiv:1009.3240, 2010

    Google Scholar 

  41. Blitzer J, Dredze M, Pereira F. Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: 45th Annual Meeting of the Association of Computational Linguistics, Prague, 2007. 440–447

    Google Scholar 

  42. Lang K. Newsweeder: learning to filter netnews. In: 12th International Conference on Machine Learning, Lake Tahoe, 1995. 331–339

    Google Scholar 

  43. Matsushima S, Shimizu N, Yoshida K, et al. Exact passive-aggressive algorithm for multiclass classification using support class. In: SIAM International Conference on Data Mining, Mesa, 2010. 303–314

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hidekazu Oiwa.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oiwa, H., Matsushima, S. & Nakagawa, H. Feature-aware regularization for sparse online learning. Sci. China Inf. Sci. 57, 1–21 (2014). https://doi.org/10.1007/s11432-014-5082-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-014-5082-z

Keywords

Navigation