Continuum Armed Bandit Problem of Few Variables in High Dimensions

  • Hemant Tyagi
  • Bernd Gärtner
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8447)

Abstract

We consider the stochastic and adversarial settings of continuum armed bandits where the arms are indexed by [0,1]d. The reward functions r:[0,1]d → ℝ are assumed to intrinsically depend on at most k coordinate variables implying \(r(x_1,\dots,x_d) = g(x_{i_1},\dots,x_{i_k})\) for distinct and unknown i1,…,ik ∈ {1,…,d} and some locally Hölder continuous g:[0,1]k → ℝ with exponent α ∈ (0,1]. Firstly, assuming (i1,…,ik) to be fixed across time, we propose a simple modification of the CAB1 algorithm where we construct the discrete set of sampling points to obtain a bound of \(O(n^{\frac{\alpha+k}{2\alpha+k}} (\log n)^{\frac{\alpha}{2\alpha+k}} C(k,d))\) on the regret, with C(k,d) depending at most polynomially in k and sub-logarithmically in d. The construction is based on creating partitions of {1,…,d} into k disjoint subsets and is probabilistic, hence our result holds with high probability. Secondly we extend our results to also handle the more general case where (i1,…,ik) can change over time and derive regret bounds for the same.

Keywords

Bandit problems continuum armed bandits functions of few variables online optimization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Awerbuch, B., Kleinberg, R.: Near-optimal adaptive routing: Shortest paths and geometric generalizations. In: Proceedings of ACM Symposium on Theory of Computing (2004)Google Scholar
  2. 2.
    Bansal, N., Blum, A., Chawla, S., Meyerson, A.: Online oblivious routing. In: Proceedings of ACM Symposium in Parallelism in Algorithms and Architectures, pp. 44–49 (2003)Google Scholar
  3. 3.
    Monteleoni, C., Jaakkola, T.: Online learning of non-stationary sequences. In: Advances in Neural Information Processing Systems (2003)Google Scholar
  4. 4.
    Blum, A., Kumar, V., Rudra, A., Wu, F.: Online learning in online auctions. In: Proceedings of 14th Symp. on Discrete Alg., pp. 202–204 (2003)Google Scholar
  5. 5.
    Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: Proceedings of Foundations of Computer Science, pp. 594–605 (2003)Google Scholar
  6. 6.
    Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocations rules. Proceedings of Adv. in Appl. Math. 6, 4–22 (1985)CrossRefMATHMathSciNetGoogle Scholar
  7. 7.
    Rothschild, M.: A two-armed bandit theory of market pricing. Journal of Economic Theory 9, 185–202 (1974)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proceedings of 36th Annual Symposium on Foundations of Computer Science, pp. 322–331 (1995)Google Scholar
  9. 9.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2-3), 235–256 (2002)CrossRefMATHGoogle Scholar
  10. 10.
    Kleinberg, R.: Nearly tight bounds for the continuum-armed bandit problem. In: 18th Advances in Neural Information Processing Systems (2004)Google Scholar
  11. 11.
    Abernethy, J., Hazan, E., Rakhlin, A.: Competing in the dark: An efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory, COLT 2008 (2008)Google Scholar
  12. 12.
    DeVore, R., Petrova, G., Wojtaszczyk, P.: Approximation of functions of few variables in high dimensions. Constr. Approx. 33, 125–143 (2011)CrossRefMATHMathSciNetGoogle Scholar
  13. 13.
    Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)CrossRefMATHGoogle Scholar
  14. 14.
    Agrawal, R.: The continuum-armed bandit problem. SIAM J. Control and Optimization 33, 1926–1951 (1995)CrossRefMATHMathSciNetGoogle Scholar
  15. 15.
    Cope, E.W.: Regret and convergence bounds for a class of continuum-armed bandit problems. IEEE Transactions on Automatic Control 54, 1243–1253 (2009)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Auer, P., Ortner, R., Szepesvari, C.: Improved rates for the stochastic continuum-armed bandit problem. In: Proceedings of 20th Conference on Learning Theory (COLT), pp. 454–468 (2007)Google Scholar
  17. 17.
    Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandits in metric spaces. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, STOC 2008, pp. 681–690 (2008)Google Scholar
  18. 18.
    Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: X-armed bandits. Journal of Machine Learning Research (JMLR) 12, 1587–1627 (2011)MathSciNetGoogle Scholar
  19. 19.
    Bubeck, S., Stoltz, G., Yu, J.Y.: Lipschitz bandits without the Lipschitz constant. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds.) ALT 2011. LNCS (LNAI), vol. 6925, pp. 144–158. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  20. 20.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2003)CrossRefMathSciNetGoogle Scholar
  21. 21.
    Mossel, E., O’Donnell, R., Servedio, R.: Learning juntas. In: Proceedings of the thirty-fifth Annual ACM Symposium on Theory of Computing, STOC 2009, pp. 206–212. ACM (2003)Google Scholar
  22. 22.
    Naor, M., Schulman, L.J., Srinivasan, A.: Splitters and near-optimal derandomization. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 182–191 (1995)Google Scholar
  23. 23.
    Tyagi, H., Gärtner, B.: Continuum armed bandit problem of few variables in high dimensions. CoRR, abs/1304.5793 (2013)Google Scholar
  24. 24.
    Audibert, J.-Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research 11, 2635–2686 (2010)MathSciNetGoogle Scholar
  25. 25.
    Kleinberg, R.D.: Online Decision Problems with Large Strategy Sets. PhD thesis. MIT, Boston (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Hemant Tyagi
    • 1
  • Bernd Gärtner
    • 1
  1. 1.Institute of Theoretical Computer ScienceETH Zürich (ETHZ)ZürichSwitzerland

Personalised recommendations