Abstract
Adaptive subgradient methods are able to leverage the second-order information of functions to improve the regret, and have become popular for online learning and optimization. According to the amount of information used, adaptive subgradient methods can be divided into diagonal-matrix version (ADA-DIAG) and full-matrix version (ADA-FULL). In practice, ADA-DIAG is the most commonly adopted rather than ADA-FULL, because ADA-FULL is computationally intractable in high dimensions though it has smaller regret when gradients are correlated. In this paper, we propose to employ techniques of matrix approximation to accelerate ADA-FULL, and develop two methods based on random projections. Compared with ADA-FULL, at each iteration, our methods reduce the space complexity from \(O(d^2)\) to \(O(\tau d)\) and the time complexity from \(O(d^3)\) to \(O(\tau ^2 d)\) where d is the dimensionality of the data and \(\tau \ll d\) is the number of random projections. Experimental results about online convex optimization show that both methods are comparable to ADA-FULL and outperform other state-of-the-art algorithms including ADA-DIAG. Furthermore, the experiments of training convolutional neural networks show again that our method outperforms other state-of-the-art algorithms including ADA-DIAG.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning, pp. 928–936 (2003)
Krummenacher, G., McWilliams, B., Kilcher, Y., Buhmann, J.M., Meinshausen, N.: Scalable adaptive stochastic optimization using random projections. Adv. Neural Inf. Process. Syst. 29, 1750–1758 (2016)
Hazan, E., Agarwal, A., Kale, S.: Logarithmic regret algorithms for online convex optimization. Mach. Learn. 69(2), 169–192 (2007)
Luo, H., Agarwal, A., Cesa-Bianchi, N., Langford, J.: Efficient second order online learning by sketching. Adv. Neural Inf. Process. Syst. 29, 902–910 (2016)
Xiao, L.: Dual averaging method for regularized stochastic learning and online optimization. Adv. Neural Inf. Process. Syst. 22, 2116–2124 (2009)
Duchi, J., Shalev-Shwartz, S., Singer, Y., Tewari, A.: Composite objective mirror descent. In: Proceedings of the 23rd Annual Conference on Learning Theory, pp. 14–26 (2010)
Nalko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)
Kaski, S.: Dimensionality reduction by random mapping: Fast similarity computation for clustering. In: Proceedings of the 1998 IEEE International Joint Conference on Neural Networks, vol. 1, pp. 413–418 (1998)
Magen, A., Zouzias, A.: Low rank matrix-valued Chernoff bounds and approximate matrix multiplication. In: Proceedings of the 22nd Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1422–1436 (2011)
Woodruff, D.P.: Sketching as a tool for numerical linear algebra. Found. Trends Mach. Learn. 10(1–2), 1–157 (2014)
Fradkin, D., Madigan, D.: Experiments with random projections for machine learning. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–522 (2003)
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. Adv. Neural Inf. Process. Syst. 21, 1177–1184 (2008)
Maillard, O.A., Munos, R.: Linear regression with random projections. J. Mach. Learn. Res. 13, 2735–2772 (2012)
Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th International Conference on Machine Learning, pp. 186–193 (2003)
Boutsidis, C., Zouzias, A., Drineas, P.: Random projections for \(k\)-means clustering. Adv. Neural Inf. Process. Syst. 23, 298–306 (2010)
Dasgupta, S., Freund, Y.: Random projection trees and low dimensional manifolds. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, pp. 537–546 (2008)
Freund, Y., Dasgupta, S., Kabra, M., Verma, N.: Learning the structure of manifolds using random projections. Adv. Neural Inf. Process. Syst. 21, 473–480 (2008)
Gao, W., Jin, R., Zhu, S., Zhou, Z.H.: One-pass AUC optimization. In: Proceedings of the 30th International Conference on Machine Learning, pp. 906–914 (2013)
Zhang, L., Mahdavi, M., Jin, R., Yang, T., Zhu, S.: Recovering the optimal solution by dual random projection. In: Proceedings of the 26th Annual Conference on Learning Theory, pp. 135–157 (2013)
Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci. 66(4), 671–687 (2003)
Liberty, E., Ailon, N., Singer, A.: Dense fast random projections and lean walsh transforms. Discrete Computat. Geom. 45(1), 34–44 (2011)
Hager, W.W.: Updating the inverse of a matrix. SIAM Rev. 31(2), 221–239 (1989)
Tropp, J.A.: An introduction to matrix concentration inequalities. Found. Trends Mach. Learn. 8(1–2), 1–230 (2015)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011 (2011)
Acknowledgements
This work was partially supported by the NSFC (61603177), JiangsuSF (BK20160658), YESS (2017QNRC001), and the Collaborative Innovation Center of Novel Software Technology and Industrialization.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Wan, Y., Zhang, L. (2018). Accelerating Adaptive Online Learning by Matrix Approximation. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10938. Springer, Cham. https://doi.org/10.1007/978-3-319-93037-4_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-93037-4_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93036-7
Online ISBN: 978-3-319-93037-4
eBook Packages: Computer ScienceComputer Science (R0)