Algorithmic Connections between Active Learning and Stochastic Convex Optimization
Interesting theoretical associations have been established by recent papers between the fields of active learning and stochastic convex optimization due to the common role of feedback in sequential querying mechanisms. In this paper, we continue this thread in two parts by exploiting these relations for the first time to yield novel algorithms in both fields, further motivating the study of their intersection. First, inspired by a recent optimization algorithm that was adaptive to unknown uniform convexity parameters, we present a new active learning algorithm for one-dimensional thresholds that can yield minimax rates by adapting to unknown noise parameters. Next, we show that one can perform d-dimensional stochastic minimization of smooth uniformly convex functions when only granted oracle access to noisy gradient signs along any coordinate instead of real-valued gradients, by using a simple randomized coordinate descent procedure where each line search can be solved by 1-dimensional active learning, provably achieving the same error convergence rate as having the entire real-valued gradient. Combining these two parts yields an algorithm that solves stochastic convex optimization of uniformly convex and smooth functions using only noisy gradient signs by repeatedly performing active learning, achieves optimal rates and is adaptive to all unknown convexity and smoothness parameters.
Unable to display preview. Download preview PDF.
- 1.Raginsky, M., Rakhlin, A.: Information complexity of black-box convex optimization: A new look via feedback information theory. In: 47th Annual Allerton Conference on Communication, Control, and Computing (2009)Google Scholar
- 2.Ramdas, A., Singh, A.: Optimal rates for stochastic convex optimization under tsybakov noise condition. In: Intl. Conference in Machine Learning, ICML (2013)Google Scholar
- 4.Nemirovski, A., Yudin, D.: Problem complexity and method efficiency in optimization. John Wiley & Sons (1983)Google Scholar
- 5.Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. Core Discussion Papers 2, 2010 (2010)Google Scholar
- 6.Jamieson, K., Nowak, R., Recht, B.: Query complexity of derivative-free optimization. In: Advances in Neural Information Processing Systems, NIPS (2012)Google Scholar
- 10.Iouditski, A., Nesterov, Y.: Primal-dual subgradient methods for minimizing uniformly convex functions. Université Joseph Fourier, Grenoble, France (2010)Google Scholar
- 12.Castro, R., Nowak, R.: Active sensing and learning. Foundations and Applications of Sensor Management, 177–200 (2009)Google Scholar
- 13.Devroye, L., Györfi, L., Lugosi, G.: A probabilistic theory of pattern recognition, vol. 31. Springer (1996)Google Scholar
- 14.Hazan, E., Kale, S.: Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization. In: Proceedings of the 23nd Annual Conference on Learning Theory (2011)Google Scholar
- 15.Bach, F., Moulines, E.: Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Advances in Neural Information Processing Systems, NIPS (2011)Google Scholar