Accelerating Adaptive Online Learning by Matrix Approximation

Wan, Yuanyu; Zhang, Lijun

doi:10.1007/978-3-319-93037-4_32

Yuanyu Wan¹⁹ &
Lijun Zhang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10938))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1986 Accesses
2 Citations

Abstract

Adaptive subgradient methods are able to leverage the second-order information of functions to improve the regret, and have become popular for online learning and optimization. According to the amount of information used, adaptive subgradient methods can be divided into diagonal-matrix version (ADA-DIAG) and full-matrix version (ADA-FULL). In practice, ADA-DIAG is the most commonly adopted rather than ADA-FULL, because ADA-FULL is computationally intractable in high dimensions though it has smaller regret when gradients are correlated. In this paper, we propose to employ techniques of matrix approximation to accelerate ADA-FULL, and develop two methods based on random projections. Compared with ADA-FULL, at each iteration, our methods reduce the space complexity from \(O(d^2)\) to \(O(\tau d)\) and the time complexity from \(O(d^3)\) to \(O(\tau ^2 d)\) where d is the dimensionality of the data and \(\tau \ll d\) is the number of random projections. Experimental results about online convex optimization show that both methods are comparable to ADA-FULL and outperform other state-of-the-art algorithms including ADA-DIAG. Furthermore, the experiments of training convolutional neural networks show again that our method outperforms other state-of-the-art algorithms including ADA-DIAG.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
MathSciNet MATH Google Scholar
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning, pp. 928–936 (2003)
Google Scholar
Krummenacher, G., McWilliams, B., Kilcher, Y., Buhmann, J.M., Meinshausen, N.: Scalable adaptive stochastic optimization using random projections. Adv. Neural Inf. Process. Syst. 29, 1750–1758 (2016)
Google Scholar
Hazan, E., Agarwal, A., Kale, S.: Logarithmic regret algorithms for online convex optimization. Mach. Learn. 69(2), 169–192 (2007)
Article Google Scholar
Luo, H., Agarwal, A., Cesa-Bianchi, N., Langford, J.: Efficient second order online learning by sketching. Adv. Neural Inf. Process. Syst. 29, 902–910 (2016)
Google Scholar
Xiao, L.: Dual averaging method for regularized stochastic learning and online optimization. Adv. Neural Inf. Process. Syst. 22, 2116–2124 (2009)
Google Scholar
Duchi, J., Shalev-Shwartz, S., Singer, Y., Tewari, A.: Composite objective mirror descent. In: Proceedings of the 23rd Annual Conference on Learning Theory, pp. 14–26 (2010)
Google Scholar
Nalko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)
Article MathSciNet Google Scholar
Kaski, S.: Dimensionality reduction by random mapping: Fast similarity computation for clustering. In: Proceedings of the 1998 IEEE International Joint Conference on Neural Networks, vol. 1, pp. 413–418 (1998)
Google Scholar
Magen, A., Zouzias, A.: Low rank matrix-valued Chernoff bounds and approximate matrix multiplication. In: Proceedings of the 22nd Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1422–1436 (2011)
Chapter Google Scholar
Woodruff, D.P.: Sketching as a tool for numerical linear algebra. Found. Trends Mach. Learn. 10(1–2), 1–157 (2014)
MathSciNet MATH Google Scholar
Fradkin, D., Madigan, D.: Experiments with random projections for machine learning. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–522 (2003)
Google Scholar
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. Adv. Neural Inf. Process. Syst. 21, 1177–1184 (2008)
Google Scholar
Maillard, O.A., Munos, R.: Linear regression with random projections. J. Mach. Learn. Res. 13, 2735–2772 (2012)
MathSciNet MATH Google Scholar
Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th International Conference on Machine Learning, pp. 186–193 (2003)
Google Scholar
Boutsidis, C., Zouzias, A., Drineas, P.: Random projections for \(k\)-means clustering. Adv. Neural Inf. Process. Syst. 23, 298–306 (2010)
Google Scholar
Dasgupta, S., Freund, Y.: Random projection trees and low dimensional manifolds. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, pp. 537–546 (2008)
Google Scholar
Freund, Y., Dasgupta, S., Kabra, M., Verma, N.: Learning the structure of manifolds using random projections. Adv. Neural Inf. Process. Syst. 21, 473–480 (2008)
Google Scholar
Gao, W., Jin, R., Zhu, S., Zhou, Z.H.: One-pass AUC optimization. In: Proceedings of the 30th International Conference on Machine Learning, pp. 906–914 (2013)
Google Scholar
Zhang, L., Mahdavi, M., Jin, R., Yang, T., Zhu, S.: Recovering the optimal solution by dual random projection. In: Proceedings of the 26th Annual Conference on Learning Theory, pp. 135–157 (2013)
Google Scholar
Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci. 66(4), 671–687 (2003)
Article MathSciNet Google Scholar
Liberty, E., Ailon, N., Singer, A.: Dense fast random projections and lean walsh transforms. Discrete Computat. Geom. 45(1), 34–44 (2011)
Article MathSciNet Google Scholar
Hager, W.W.: Updating the inverse of a matrix. SIAM Rev. 31(2), 221–239 (1989)
Article MathSciNet Google Scholar
Tropp, J.A.: An introduction to matrix concentration inequalities. Found. Trends Mach. Learn. 8(1–2), 1–230 (2015)
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Article Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011 (2011)
Google Scholar

Download references

Acknowledgements

This work was partially supported by the NSFC (61603177), JiangsuSF (BK20160658), YESS (2017QNRC001), and the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Yuanyu Wan & Lijun Zhang

Authors

Yuanyu Wan
View author publications
You can also search for this author in PubMed Google Scholar
Lijun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lijun Zhang .

Editor information

Editors and Affiliations

Deakin University, Geelong, Victoria, Australia
Dinh Phung
National Chiao Tung University, Hsinchu City, Taiwan
Vincent S. Tseng
Monash University, Clayton, Victoria, Australia
Geoffrey I. Webb
Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Bao Ho
University of Melbourne, Melbourne, Victoria, Australia
Mohadeseh Ganji
University of Melbourne, Melbourne, Victoria, Australia
Lida Rashidi

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 230 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wan, Y., Zhang, L. (2018). Accelerating Adaptive Online Learning by Matrix Approximation. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10938. Springer, Cham. https://doi.org/10.1007/978-3-319-93037-4_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-93037-4_32
Published: 20 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93036-7
Online ISBN: 978-3-319-93037-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics