Theory of Computing Systems

, Volume 58, Issue 1, pp 191–222 | Cite as

On Two Continuum Armed Bandit Problems in High Dimensions

  • Hemant TyagiEmail author
  • Sebastian U. Stich
  • Bernd Gärtner


We consider the problem of continuum armed bandits where the arms are indexed by a compact subset of \(\mathbb {R}^{d}\). For large d, it is well known that mere smoothness assumptions on the reward functions lead to regret bounds that suffer from the curse of dimensionality. A typical way to tackle this in the literature has been to make further assumptions on the structure of reward functions. In this work we assume the reward functions to be intrinsically of low dimension kd and consider two models: (i) The reward functions depend on only an unknown subset of k coordinate variables and, (ii) a generalization of (i) where the reward functions depend on an unknown k dimensional subspace of \(\mathbb {R}^{d}\). By placing suitable assumptions on the smoothness of the rewards we derive randomized algorithms for both problems that achieve nearly optimal regret bounds in terms of the number of rounds n.


Bandit problems Continuum armed bandits Functions of few variables Online optimization Low-rank matrix recovery 



The project CG Learning acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 255827.

(A preliminary version of this paper appeared in the proceedings of the 11 t h Workshop on Approximation and Online Algorithms (WAOA). This is a significantly expanded version including analysis for a generalization of the problem considered in the WAOA paper.)


  1. 1.
    Abbasi-yadkori, Y., Pal, D., Szepesvari, C.: Online-to-confidence-set conversions and application to sparse stochastic bandits. In: Proceedings of AIStats (2012)Google Scholar
  2. 2.
    Abernethy, J., Hazan, E., Rakhlin, A.: Competing in the dark: An efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory (COLT) (2008)Google Scholar
  3. 3.
    Agrawal, R.: The continuum-armed bandit problem. SIAM J. Control Optim. 33, 1926–1951 (1995)zbMATHMathSciNetCrossRefGoogle Scholar
  4. 4.
    Audibert, J.Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11, 2635–2686 (2010)MathSciNetGoogle Scholar
  5. 5.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 (2-3), 235–256 (2002)zbMATHCrossRefGoogle Scholar
  6. 6.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proceedings of 36th Annual Symposium on Foundations of Computer Science, 1995, pp. 322–331 (1995)Google Scholar
  7. 7.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32 (1), 48–77 (2003)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Auer, P., Ortner, R., Szepesvari, C.: Improved rates for the stochastic continuum-armed bandit problem. In: Proceedings of 20th Conference on Learning Theory (COLT), pp. 454–468 (2007)Google Scholar
  9. 9.
    Bansal, N., Blum, A., Chawla, S., Meyerson, A.: Online oblivious routing. In: Proceedings of ACM Symposium in Parallelism in Algorithms and Architectures, pp. 44–49 (2003)Google Scholar
  10. 10.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)zbMATHCrossRefGoogle Scholar
  11. 11.
    Blum, A., Kumar, V., Rudra, A., Wu, F.: Online learning in online auctions. In: Proceedings of 14th Symp. on Discrete Alg., pp. 202–204 (2003)Google Scholar
  12. 12.
    Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: X-armed bandits. J. Mach. Learn. Res. (JMLR) 12, 1587–1627 (2011)MathSciNetGoogle Scholar
  13. 13.
    Bubeck, S., Stoltz, G., Yu, J.: Lipschitz bandits without the Lipschitz constant. In: Proceedings of the 22nd International Conference on Algorithmic Learning Theory (ALT), pp. 144–158 (2011)Google Scholar
  14. 14.
    Candès, E., Plan, Y.: Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. CoRR abs/1001.0339 (2010)Google Scholar
  15. 15.
    Carpentier, A., Munos, R.: Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. In: Proceedings of AIStats, pp. 190–198 (2012)Google Scholar
  16. 16.
    Chen, B., Castro, R., Krause, A.: Joint optimization and variable selection of high-dimensional gaussian processes. In: Proceedings International Conference on Machine Learning (ICML) (2012)Google Scholar
  17. 17.
    Coifman, R., Maggioni, M.: Diffusion wavelets. Appl. Comput. Harmon. Anal. 21, 53–94 (2006)zbMATHMathSciNetCrossRefGoogle Scholar
  18. 18.
    Cope, E.: Regret and convergence bounds for a class of continuum-armed bandit problems. IEEE Trans. Autom. Control 54, 1243–1253 (2009)MathSciNetCrossRefGoogle Scholar
  19. 19.
    DeVore, R., Petrova, G., Wojtaszczyk, P.: Approximation of functions of few variables in high dimensions. Constr. Approx 33, 125–143 (2011)zbMATHMathSciNetCrossRefGoogle Scholar
  20. 20.
    Djolonga, J., Krause, A., Cevher, V.: High dimensional gaussian process bandits. In: To Appear in Neural Information Processing Systems (NIPS) (2013)Google Scholar
  21. 21.
    Flaxman, A., Kalai, A., McMahan, H.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394 (2005)Google Scholar
  22. 22.
    Fornasier, M., Schnass, K., Vybiral, J.: Learning functions of few arbitrary linear parameters in high dimensions. Found. Comput. Math. 12 (2), 229–262 (2012)zbMATHMathSciNetCrossRefGoogle Scholar
  23. 23.
    Fredman, M., Komlós, J.: On the size of separating systems and families of perfect hash functions. SIAM. J. Algebr. Discret. Methods 5, 61–68 (1984)zbMATHCrossRefGoogle Scholar
  24. 24.
    Fredman, M., Komlós, J., Szemerédi, E.: Storing a sparse table with 0(1) worst case access time. J. ACM 31 (3), 538–544 (1984)zbMATHCrossRefGoogle Scholar
  25. 25.
    Greenshtein, E.: Best subset selection, persistence in high dimensional statistical learning and optimization under 1 constraint. Ann. Stat. 34, 2367–2386 (2006)zbMATHMathSciNetCrossRefGoogle Scholar
  26. 26.
    Kleinberg, R.: Nearly tight bounds for the continuum-armed bandit problem. In: 18th Advances in Neural Information Processing Systems (2004)Google Scholar
  27. 27.
    Kleinberg, R.: Online decision problems with large strategy sets. Ph.D. thesis. MIT, Boston (2005)Google Scholar
  28. 28.
    Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: Proceedings of Foundations of Computer Science, 2003., pp. 594–605 (2003)Google Scholar
  29. 29.
    Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandits in metric spaces. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, STOC ’08, pp. 681–690 (2008)Google Scholar
  30. 30.
    Körner, J.: Fredmankomlós bounds and information theory. SIAM J. Algebraic Discret. Methods 7 (4), 560–570 (1986)zbMATHCrossRefGoogle Scholar
  31. 31.
    Laurent, B., Massart, P.: Adaptive estimation of a quadratic functional by model selection. Ann. Stat. 28 (5), 1302–1338 (2000)zbMATHMathSciNetCrossRefGoogle Scholar
  32. 32.
    Li, Q., Racine, J.: Nonparametric econometrics: Theory and practice (2007)Google Scholar
  33. 33.
    McMahan, B., Blum, A.: Online geometric optimization in the bandit setting against an adaptive adversary. In: Proceedings of the 17th Annual Conference on Learning Theory (COLT), pp. 109–123 (2004)Google Scholar
  34. 34.
    Mossel, E., O’Donnell, R., Servedio, R.: Learning juntas. In: Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, STOC, pp. 206–212. ACM (2003)Google Scholar
  35. 35.
    Naor, M., Schulman, L., Srinivasan, A.: Splitters and near-optimal derandomization. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 182–191 (1995)Google Scholar
  36. 36.
    Nilli, A.: Perfect hashing and probability. Comb. Probab. Comput. 3, 407–409 (1994)zbMATHMathSciNetCrossRefGoogle Scholar
  37. 37.
    Orlitsky, A.: Worst-case interactive communication i: Two messages are almost optimal. IEEE Trans. Inf. Theory 36, 1111–1126 (1990)zbMATHMathSciNetCrossRefGoogle Scholar
  38. 38.
    Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)zbMATHMathSciNetCrossRefGoogle Scholar
  39. 39.
    Tropp, J.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12 (4), 389–434 (2012)zbMATHMathSciNetCrossRefGoogle Scholar
  40. 40.
    Tyagi, H., Cevher, V.: Active learning of multi-index function models. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1475–1483 (2012)Google Scholar
  41. 41.
    Tyagi, H., Cevher, V.: Learning non-parametric basis independent models from point queries via low-rank methods. Appl. Comput. Harmonic Anal. (2014)Google Scholar
  42. 42.
    Tyagi, H., Gärtner, B.: Continuum armed bandit problem of few variables in high dimensions. CoRR abs/1304.5793 (2013)Google Scholar
  43. 43.
    Wang, Z., Zoghi, M., Hutter, F., Matheson, D., de Freitas, N.: Bayesian optimization in high dimensions via random embeddings. In: Proc. IJCAI (2013)Google Scholar
  44. 44.
    Wedin, P.: Perturbation bounds in connection with singular value decomposition. BIT 12, 99–111 (1972)zbMATHMathSciNetCrossRefGoogle Scholar
  45. 45.
    Weyl, H.: Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Mathematische Annalen 71, 441–479 (1912)zbMATHMathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Hemant Tyagi
    • 1
    Email author
  • Sebastian U. Stich
    • 1
  • Bernd Gärtner
    • 1
  1. 1.Department of Computer Science, Institute of Theoretical Computer ScienceETH ZürichZürichSwitzerland

Personalised recommendations