Skip to main content

Continuum Armed Bandit Problem of Few Variables in High Dimensions

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 8447)

Abstract

We consider the stochastic and adversarial settings of continuum armed bandits where the arms are indexed by [0,1]d. The reward functions r:[0,1]d → ℝ are assumed to intrinsically depend on at most k coordinate variables implying \(r(x_1,\dots,x_d) = g(x_{i_1},\dots,x_{i_k})\) for distinct and unknown i 1,…,i k  ∈ {1,…,d} and some locally Hölder continuous g:[0,1]k → ℝ with exponent α ∈ (0,1]. Firstly, assuming (i 1,…,i k ) to be fixed across time, we propose a simple modification of the CAB1 algorithm where we construct the discrete set of sampling points to obtain a bound of \(O(n^{\frac{\alpha+k}{2\alpha+k}} (\log n)^{\frac{\alpha}{2\alpha+k}} C(k,d))\) on the regret, with C(k,d) depending at most polynomially in k and sub-logarithmically in d. The construction is based on creating partitions of {1,…,d} into k disjoint subsets and is probabilistic, hence our result holds with high probability. Secondly we extend our results to also handle the more general case where (i 1,…,i k ) can change over time and derive regret bounds for the same.

Keywords

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   34.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Awerbuch, B., Kleinberg, R.: Near-optimal adaptive routing: Shortest paths and geometric generalizations. In: Proceedings of ACM Symposium on Theory of Computing (2004)

    Google Scholar 

  2. Bansal, N., Blum, A., Chawla, S., Meyerson, A.: Online oblivious routing. In: Proceedings of ACM Symposium in Parallelism in Algorithms and Architectures, pp. 44–49 (2003)

    Google Scholar 

  3. Monteleoni, C., Jaakkola, T.: Online learning of non-stationary sequences. In: Advances in Neural Information Processing Systems (2003)

    Google Scholar 

  4. Blum, A., Kumar, V., Rudra, A., Wu, F.: Online learning in online auctions. In: Proceedings of 14th Symp. on Discrete Alg., pp. 202–204 (2003)

    Google Scholar 

  5. Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: Proceedings of Foundations of Computer Science, pp. 594–605 (2003)

    Google Scholar 

  6. Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocations rules. Proceedings of Adv. in Appl. Math. 6, 4–22 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  7. Rothschild, M.: A two-armed bandit theory of market pricing. Journal of Economic Theory 9, 185–202 (1974)

    Article  MathSciNet  Google Scholar 

  8. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proceedings of 36th Annual Symposium on Foundations of Computer Science, pp. 322–331 (1995)

    Google Scholar 

  9. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2-3), 235–256 (2002)

    Article  MATH  Google Scholar 

  10. Kleinberg, R.: Nearly tight bounds for the continuum-armed bandit problem. In: 18th Advances in Neural Information Processing Systems (2004)

    Google Scholar 

  11. Abernethy, J., Hazan, E., Rakhlin, A.: Competing in the dark: An efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory, COLT 2008 (2008)

    Google Scholar 

  12. DeVore, R., Petrova, G., Wojtaszczyk, P.: Approximation of functions of few variables in high dimensions. Constr. Approx. 33, 125–143 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  13. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)

    Article  MATH  Google Scholar 

  14. Agrawal, R.: The continuum-armed bandit problem. SIAM J. Control and Optimization 33, 1926–1951 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  15. Cope, E.W.: Regret and convergence bounds for a class of continuum-armed bandit problems. IEEE Transactions on Automatic Control 54, 1243–1253 (2009)

    Article  MathSciNet  Google Scholar 

  16. Auer, P., Ortner, R., Szepesvari, C.: Improved rates for the stochastic continuum-armed bandit problem. In: Proceedings of 20th Conference on Learning Theory (COLT), pp. 454–468 (2007)

    Google Scholar 

  17. Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandits in metric spaces. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, STOC 2008, pp. 681–690 (2008)

    Google Scholar 

  18. Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: X-armed bandits. Journal of Machine Learning Research (JMLR) 12, 1587–1627 (2011)

    MathSciNet  Google Scholar 

  19. Bubeck, S., Stoltz, G., Yu, J.Y.: Lipschitz bandits without the Lipschitz constant. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds.) ALT 2011. LNCS (LNAI), vol. 6925, pp. 144–158. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  20. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2003)

    Article  MathSciNet  Google Scholar 

  21. Mossel, E., O’Donnell, R., Servedio, R.: Learning juntas. In: Proceedings of the thirty-fifth Annual ACM Symposium on Theory of Computing, STOC 2009, pp. 206–212. ACM (2003)

    Google Scholar 

  22. Naor, M., Schulman, L.J., Srinivasan, A.: Splitters and near-optimal derandomization. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 182–191 (1995)

    Google Scholar 

  23. Tyagi, H., Gärtner, B.: Continuum armed bandit problem of few variables in high dimensions. CoRR, abs/1304.5793 (2013)

    Google Scholar 

  24. Audibert, J.-Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research 11, 2635–2686 (2010)

    MathSciNet  Google Scholar 

  25. Kleinberg, R.D.: Online Decision Problems with Large Strategy Sets. PhD thesis. MIT, Boston (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Tyagi, H., Gärtner, B. (2014). Continuum Armed Bandit Problem of Few Variables in High Dimensions. In: Kaklamanis, C., Pruhs, K. (eds) Approximation and Online Algorithms. WAOA 2013. Lecture Notes in Computer Science, vol 8447. Springer, Cham. https://doi.org/10.1007/978-3-319-08001-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08001-7_10

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08000-0

  • Online ISBN: 978-3-319-08001-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics