Feature-aware regularization for sparse online learning

Oiwa, Hidekazu; Matsushima, Shin; Nakagawa, Hiroshi

doi:10.1007/s11432-014-5082-z

Feature-aware regularization for sparse online learning

Research Paper
Published: 22 April 2014

Volume 57, pages 1–21, (2014)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Hidekazu Oiwa¹,
Shin Matsushima¹ &
Hiroshi Nakagawa^1,2

256 Accesses
3 Citations
2 Altmetric
Explore all metrics

Abstract

Learning a compact predictive model in an online setting has recently gained a great deal of attention. The combination of online learning with sparsity-inducing regularization enables faster learning with a smaller memory space than the previous learning frameworks. Many optimization methods and learning algorithms have been developed on the basis of online learning with L ₁-regularization. L ₁-regularization tends to truncate some types of parameters, such as those that rarely occur or have a small range of values, unless they are emphasized in advance. However, the inclusion of a pre-processing step would make it very difficult to preserve the advantages of online learning. We propose a new regularization framework for sparse online learning. We focus on regularization terms, and we enhance the state-of-the-art regularization approach by integrating information on all previous subgradients of the loss function into a regularization term. The resulting algorithms enable online learning to adjust the intensity of each feature’s truncations without pre-processing and eventually eliminate the bias of L ₁-regularization. We show theoretical properties of our framework, the computational complexity and upper bound of regret. Experiments demonstrated that our algorithms outperformed previous methods in many classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Yu H-F, Hsieh C-J, Chang K-W, et al. Large linear classification when data cannot fit in memory. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM, 2010. 833–842
Chapter Google Scholar
Duchi J, Singer Y. Effcient online and batch learning using forward backward splitting. J Mach Learn Res, 2009, 10: 2899–2934
MATH MathSciNet Google Scholar
Duchi J, Shalev-Shwartz S, Singer Y, et al. Composite objective mirror descent. In: 23rd International Conference on Learning Theory, Haifa, 2010. 14–26
Google Scholar
Xiao L. Dual averaging methods for regularized stochastic learning and online optimization. J Mach Learn Res, 2010, 11: 2543–2596
MATH MathSciNet Google Scholar
Brendan McMahan H, Streeter M J. Adaptive bound optimization for online convex optimization. In: 23rd International Conference on Learning Theory, Haifa, 2010. 244–256
Google Scholar
Brendan McMahan H. Follow-the-regularized-leader and mirror descent: equivalence theorems and l1 regularization. In: 14th International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, 2011. 525–533
Google Scholar
Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inf Process Manage, 1988, 24: 513–523
Article Google Scholar
Shalev-Shwartz S. Online learning and online convex optimization. Found Trends Mach Learn, 2012, 4: 107–194
Article Google Scholar
Bertsekas D P. Nonlinear Programming. 2nd edition. Athena Scientific. 1999
MATH Google Scholar
Zinkevich M. Online convex programming and generalized infinitesimal gradient ascent. In: 20th International Conference on Machine Learning, Washington D. C., 2003. 928–936
Google Scholar
Beck A, Teboulle M. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper Res Lett, 2003, 31: 167–175
Article MATH MathSciNet Google Scholar
Nesterov Y. Primal-dual subgradient methods for convex problems. Math Program, 2009, 120: 221–259
Article MATH MathSciNet Google Scholar
Nesterov Y. A method of solving a convex programming problem with convergence rate o(1/k2). Sov Math Dokl, 1983, 27: 372–376
MATH Google Scholar
Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci, 2009, 2: 183–202
Article MATH MathSciNet Google Scholar
Tseng P. Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math Program, 2010, 125: 263–295
Article MATH MathSciNet Google Scholar
Carpenter B. Lazy sparse stochastic gradient descent for regularized multinomial logistic regression. Technical Report, Alias-i, Inc. 2008
Google Scholar
Langford J, Li L H, Zhang T. Sparse online learning via truncated gradient. J Mach Learn Res, 2009, 10: 777–801
MATH MathSciNet Google Scholar
Tsuruoka Y, Tsujii J, Ananiadou S. Stochastic gradient descent training for l1-regularized log-linear. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Stroudsburg: Association for Computational Linguistics, 2009. 477–485
Google Scholar
Shalev-shwartz S, Singer Y. Convex repeated games and fenchel duality. In: Advances in Neural Information Processing Systems, Vancouver, 2006. 1265–1272
Google Scholar
Dekel O, Gilad-Bachrach R, Shamir O, et al. Optimal distributed online prediction using mini-batches. J Mach Learn Res, 2012, 13: 165–202
MATH MathSciNet Google Scholar
Duchi J, Agarwal A, Wainwright M J. Distributed dual averaging in networks. In: Advances in Neural Information Processing Systems, Vancouver, 2010. 550–558
Google Scholar
Lee S, Wright S J. Manifold identification in dual averaging for regularized stochastic online learning. J Mach Learn Res, 2012, 13: 1705–1744
MATH MathSciNet Google Scholar
Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res, 2011, 12: 2121–2159
MATH MathSciNet Google Scholar
Kalai A, Vempala S. Efficient algorithms for online decision problems. J Comput Syst Sci, 2005, 71: 291–307
Article MATH MathSciNet Google Scholar
Shalev-Shwartz S, Singer Y. A primal-dual perspective of online learning algorithms. Mach Learn, 2007, 69: 115–142
Article Google Scholar
Sra S, Nowozin S, Wright S J. Optimization for Machine Learning. MIT Press, 2011
Google Scholar
Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev, 1958, 65: 386–408
Article MathSciNet Google Scholar
Crammer K, Dekel O, Keshet J, et al. Online passive-aggressive algorithms. J Mach Learn Res, 2006, 7: 551–585
MATH MathSciNet Google Scholar
Dredze M, Crammer K, Pereira F. Confidence-weighted linear classification. In: 25th international conference on Machine learning. New York: ACM, 2008. 264–271
Google Scholar
Crammer K, Fern M D, Pereira O. Exact convex confidence-weighted learning. In: Advances in Neural Information Processing Systems, Vancouver, 2008. 345–352
Google Scholar
Narayanan H, Rakhlin A. Random walk approach to regret minimization. In: Advances in Neural Information Processing Systems, Vancouver, 2010. 1777–1785
Google Scholar
Cesa-Bianchi N, Shamir O. Efficient online learning via randomized rounding. In: Advances in Neural Information Processing Systems, Granada, 2011. 343–351
Google Scholar
Zou H, Hastie T. Regularization and variable selection via the elastic net. J Roy Statist Soc Ser B, 2005, 67: 301–320
Article MATH MathSciNet Google Scholar
Bondell H D, Reich B J. Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with oscar. Biometrics, 2008, 64: 115–123
Article MATH MathSciNet Google Scholar
Luo D J, Ding C H Q, Huang H. Toward structural sparsity: an explicit 2/0 approach. Knowl Inf Syst, 2013, 36: 411–438
Article Google Scholar
Wu X D, Yu K, Ding W, et al. Online feature selection with streaming features. IEEE Trans Patt Anal Mach Intell, 2013, 35: 1178–1192
Article Google Scholar
Wang H X, Zheng W M. Robust sparsity-preserved learning with application to image visualization. Knowl Inf Syst, 2013. doi: 10.1007/s10115-012-0605-7
Google Scholar
Oiwa H, Matsushima S, Nakagawa H. Healing truncation bias: self-weighted truncation framework for dual averaging. In: IEEE 12th International Conference on Data Mining (ICDM), Brussels, 2012. 575–584
Google Scholar
Oiwa H, Matsushima S, Nakagawa H. Frequency-aware truncated methods for sparse online learning. Lect Notes Comput Sci, 2011, 6912: 533–548
Article Google Scholar
Brendan McMahan H. A unified view of regularized dual averaging and mirror descent with implicit updates. arXiv:1009.3240, 2010
Google Scholar
Blitzer J, Dredze M, Pereira F. Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: 45th Annual Meeting of the Association of Computational Linguistics, Prague, 2007. 440–447
Google Scholar
Lang K. Newsweeder: learning to filter netnews. In: 12th International Conference on Machine Learning, Lake Tahoe, 1995. 331–339
Google Scholar
Matsushima S, Shimizu N, Yoshida K, et al. Exact passive-aggressive algorithm for multiclass classification using support class. In: SIAM International Conference on Data Mining, Mesa, 2010. 303–314
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, 113-8654, Japan
Hidekazu Oiwa, Shin Matsushima & Hiroshi Nakagawa
Information Technology Center, The University of Tokyo, Tokyo, 113-8654, Japan
Hiroshi Nakagawa

Authors

Hidekazu Oiwa
View author publications
You can also search for this author in PubMed Google Scholar
Shin Matsushima
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Nakagawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hidekazu Oiwa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oiwa, H., Matsushima, S. & Nakagawa, H. Feature-aware regularization for sparse online learning. Sci. China Inf. Sci. 57, 1–21 (2014). https://doi.org/10.1007/s11432-014-5082-z

Download citation

Received: 31 August 2013
Accepted: 24 November 2013
Published: 22 April 2014
Issue Date: May 2014
DOI: https://doi.org/10.1007/s11432-014-5082-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature-aware regularization for sparse online learning

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Bolstering stochastic gradient descent with model building

Learning from positive and unlabeled data: a survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature-aware regularization for sparse online learning

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Bolstering stochastic gradient descent with model building

Learning from positive and unlabeled data: a survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation