Theory of Computing Systems

, Volume 58, Issue 1, pp 191–222 | Cite as

On Two Continuum Armed Bandit Problems in High Dimensions

  • Hemant Tyagi
  • Sebastian U. Stich
  • Bernd Gärtner
Article

Abstract

We consider the problem of continuum armed bandits where the arms are indexed by a compact subset of \(\mathbb {R}^{d}\). For large d, it is well known that mere smoothness assumptions on the reward functions lead to regret bounds that suffer from the curse of dimensionality. A typical way to tackle this in the literature has been to make further assumptions on the structure of reward functions. In this work we assume the reward functions to be intrinsically of low dimension kd and consider two models: (i) The reward functions depend on only an unknown subset of k coordinate variables and, (ii) a generalization of (i) where the reward functions depend on an unknown k dimensional subspace of \(\mathbb {R}^{d}\). By placing suitable assumptions on the smoothness of the rewards we derive randomized algorithms for both problems that achieve nearly optimal regret bounds in terms of the number of rounds n.

Keywords

Bandit problems Continuum armed bandits Functions of few variables Online optimization Low-rank matrix recovery 

References

  1. 1.
    Abbasi-yadkori, Y., Pal, D., Szepesvari, C.: Online-to-confidence-set conversions and application to sparse stochastic bandits. In: Proceedings of AIStats (2012)Google Scholar
  2. 2.
    Abernethy, J., Hazan, E., Rakhlin, A.: Competing in the dark: An efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory (COLT) (2008)Google Scholar
  3. 3.
    Agrawal, R.: The continuum-armed bandit problem. SIAM J. Control Optim. 33, 1926–1951 (1995)MATHMathSciNetCrossRefGoogle Scholar
  4. 4.
    Audibert, J.Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11, 2635–2686 (2010)MathSciNetGoogle Scholar
  5. 5.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 (2-3), 235–256 (2002)MATHCrossRefGoogle Scholar
  6. 6.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proceedings of 36th Annual Symposium on Foundations of Computer Science, 1995, pp. 322–331 (1995)Google Scholar
  7. 7.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32 (1), 48–77 (2003)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Auer, P., Ortner, R., Szepesvari, C.: Improved rates for the stochastic continuum-armed bandit problem. In: Proceedings of 20th Conference on Learning Theory (COLT), pp. 454–468 (2007)Google Scholar
  9. 9.
    Bansal, N., Blum, A., Chawla, S., Meyerson, A.: Online oblivious routing. In: Proceedings of ACM Symposium in Parallelism in Algorithms and Architectures, pp. 44–49 (2003)Google Scholar
  10. 10.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)MATHCrossRefGoogle Scholar
  11. 11.
    Blum, A., Kumar, V., Rudra, A., Wu, F.: Online learning in online auctions. In: Proceedings of 14th Symp. on Discrete Alg., pp. 202–204 (2003)Google Scholar
  12. 12.
    Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: X-armed bandits. J. Mach. Learn. Res. (JMLR) 12, 1587–1627 (2011)MathSciNetGoogle Scholar
  13. 13.
    Bubeck, S., Stoltz, G., Yu, J.: Lipschitz bandits without the Lipschitz constant. In: Proceedings of the 22nd International Conference on Algorithmic Learning Theory (ALT), pp. 144–158 (2011)Google Scholar
  14. 14.
    Candès, E., Plan, Y.: Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. CoRR abs/1001.0339 (2010)Google Scholar
  15. 15.
    Carpentier, A., Munos, R.: Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. In: Proceedings of AIStats, pp. 190–198 (2012)Google Scholar
  16. 16.
    Chen, B., Castro, R., Krause, A.: Joint optimization and variable selection of high-dimensional gaussian processes. In: Proceedings International Conference on Machine Learning (ICML) (2012)Google Scholar
  17. 17.
    Coifman, R., Maggioni, M.: Diffusion wavelets. Appl. Comput. Harmon. Anal. 21, 53–94 (2006)MATHMathSciNetCrossRefGoogle Scholar
  18. 18.
    Cope, E.: Regret and convergence bounds for a class of continuum-armed bandit problems. IEEE Trans. Autom. Control 54, 1243–1253 (2009)MathSciNetCrossRefGoogle Scholar
  19. 19.
    DeVore, R., Petrova, G., Wojtaszczyk, P.: Approximation of functions of few variables in high dimensions. Constr. Approx 33, 125–143 (2011)MATHMathSciNetCrossRefGoogle Scholar
  20. 20.
    Djolonga, J., Krause, A., Cevher, V.: High dimensional gaussian process bandits. In: To Appear in Neural Information Processing Systems (NIPS) (2013)Google Scholar
  21. 21.
    Flaxman, A., Kalai, A., McMahan, H.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394 (2005)Google Scholar
  22. 22.
    Fornasier, M., Schnass, K., Vybiral, J.: Learning functions of few arbitrary linear parameters in high dimensions. Found. Comput. Math. 12 (2), 229–262 (2012)MATHMathSciNetCrossRefGoogle Scholar
  23. 23.
    Fredman, M., Komlós, J.: On the size of separating systems and families of perfect hash functions. SIAM. J. Algebr. Discret. Methods 5, 61–68 (1984)MATHCrossRefGoogle Scholar
  24. 24.
    Fredman, M., Komlós, J., Szemerédi, E.: Storing a sparse table with 0(1) worst case access time. J. ACM 31 (3), 538–544 (1984)MATHCrossRefGoogle Scholar
  25. 25.
    Greenshtein, E.: Best subset selection, persistence in high dimensional statistical learning and optimization under 1 constraint. Ann. Stat. 34, 2367–2386 (2006)MATHMathSciNetCrossRefGoogle Scholar
  26. 26.
    Kleinberg, R.: Nearly tight bounds for the continuum-armed bandit problem. In: 18th Advances in Neural Information Processing Systems (2004)Google Scholar
  27. 27.
    Kleinberg, R.: Online decision problems with large strategy sets. Ph.D. thesis. MIT, Boston (2005)Google Scholar
  28. 28.
    Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: Proceedings of Foundations of Computer Science, 2003., pp. 594–605 (2003)Google Scholar
  29. 29.
    Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandits in metric spaces. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, STOC ’08, pp. 681–690 (2008)Google Scholar
  30. 30.
    Körner, J.: Fredmankomlós bounds and information theory. SIAM J. Algebraic Discret. Methods 7 (4), 560–570 (1986)MATHCrossRefGoogle Scholar
  31. 31.
    Laurent, B., Massart, P.: Adaptive estimation of a quadratic functional by model selection. Ann. Stat. 28 (5), 1302–1338 (2000)MATHMathSciNetCrossRefGoogle Scholar
  32. 32.
    Li, Q., Racine, J.: Nonparametric econometrics: Theory and practice (2007)Google Scholar
  33. 33.
    McMahan, B., Blum, A.: Online geometric optimization in the bandit setting against an adaptive adversary. In: Proceedings of the 17th Annual Conference on Learning Theory (COLT), pp. 109–123 (2004)Google Scholar
  34. 34.
    Mossel, E., O’Donnell, R., Servedio, R.: Learning juntas. In: Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, STOC, pp. 206–212. ACM (2003)Google Scholar
  35. 35.
    Naor, M., Schulman, L., Srinivasan, A.: Splitters and near-optimal derandomization. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 182–191 (1995)Google Scholar
  36. 36.
    Nilli, A.: Perfect hashing and probability. Comb. Probab. Comput. 3, 407–409 (1994)MATHMathSciNetCrossRefGoogle Scholar
  37. 37.
    Orlitsky, A.: Worst-case interactive communication i: Two messages are almost optimal. IEEE Trans. Inf. Theory 36, 1111–1126 (1990)MATHMathSciNetCrossRefGoogle Scholar
  38. 38.
    Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)MATHMathSciNetCrossRefGoogle Scholar
  39. 39.
    Tropp, J.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12 (4), 389–434 (2012)MATHMathSciNetCrossRefGoogle Scholar
  40. 40.
    Tyagi, H., Cevher, V.: Active learning of multi-index function models. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1475–1483 (2012)Google Scholar
  41. 41.
    Tyagi, H., Cevher, V.: Learning non-parametric basis independent models from point queries via low-rank methods. Appl. Comput. Harmonic Anal. (2014)Google Scholar
  42. 42.
    Tyagi, H., Gärtner, B.: Continuum armed bandit problem of few variables in high dimensions. CoRR abs/1304.5793 (2013)Google Scholar
  43. 43.
    Wang, Z., Zoghi, M., Hutter, F., Matheson, D., de Freitas, N.: Bayesian optimization in high dimensions via random embeddings. In: Proc. IJCAI (2013)Google Scholar
  44. 44.
    Wedin, P.: Perturbation bounds in connection with singular value decomposition. BIT 12, 99–111 (1972)MATHMathSciNetCrossRefGoogle Scholar
  45. 45.
    Weyl, H.: Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Mathematische Annalen 71, 441–479 (1912)MATHMathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Hemant Tyagi
    • 1
  • Sebastian U. Stich
    • 1
  • Bernd Gärtner
    • 1
  1. 1.Department of Computer Science, Institute of Theoretical Computer ScienceETH ZürichZürichSwitzerland

Personalised recommendations