Skip to main content
Log in

On Two Continuum Armed Bandit Problems in High Dimensions

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

We consider the problem of continuum armed bandits where the arms are indexed by a compact subset of \(\mathbb {R}^{d}\). For large d, it is well known that mere smoothness assumptions on the reward functions lead to regret bounds that suffer from the curse of dimensionality. A typical way to tackle this in the literature has been to make further assumptions on the structure of reward functions. In this work we assume the reward functions to be intrinsically of low dimension kd and consider two models: (i) The reward functions depend on only an unknown subset of k coordinate variables and, (ii) a generalization of (i) where the reward functions depend on an unknown k dimensional subspace of \(\mathbb {R}^{d}\). By placing suitable assumptions on the smoothness of the rewards we derive randomized algorithms for both problems that achieve nearly optimal regret bounds in terms of the number of rounds n.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Rewards sampled at each round in an i.i.d manner from an unknown probability distribution.

  2. See Remark 1 in Section 3.1 for discussion on how the logn factor can be removed.

  3. See Definition 2 in Section 2.1.

  4. Indeed, any function that depends on k k coordinates also depends on at most k coordinates.

  5. Indeed for a compact domain, any C 2 function is Lipschitz continuous but the converse is not necessarily true. Therefore, the mean reward functions that we consider, belong to a slightly restricted class of Lipschitz continuous functions.

  6. This theorem is stated again in Section 4 for completeness.

  7. This is actually true for k≥3. For k=1,2 the \(\left (n/\log n\right )^{\frac {4}{k+2}}\) factor dominates. See Remark 3 in Section 4.3 for details.

  8. The interested reader can find a full analysis for the case k=1 in [42].

  9. The above sampling scheme was first considered by Fornasier et al. [22], and later by Tyagi et al. [40], for the problem of approximating functions of the form f(x)=g(A x) from point queries.

  10. Of course in practice we will not be able to solve (32) exactly, but will instead obtain a solution that can be made to come arbitrarily close to the actual solution. This difference will hence appear as an additional error term in the error bound of Lemma 4.

  11. In the absence of external stochastic noise (i.e. σ=0) we can actually take 𝜖 to be arbitrarily small as shown by Tyagi et al. [41, Lemma 2]. This is also verified from (33), by plugging σ=0.

References

  1. Abbasi-yadkori, Y., Pal, D., Szepesvari, C.: Online-to-confidence-set conversions and application to sparse stochastic bandits. In: Proceedings of AIStats (2012)

  2. Abernethy, J., Hazan, E., Rakhlin, A.: Competing in the dark: An efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory (COLT) (2008)

  3. Agrawal, R.: The continuum-armed bandit problem. SIAM J. Control Optim. 33, 1926–1951 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  4. Audibert, J.Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11, 2635–2686 (2010)

    MathSciNet  Google Scholar 

  5. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 (2-3), 235–256 (2002)

    Article  MATH  Google Scholar 

  6. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proceedings of 36th Annual Symposium on Foundations of Computer Science, 1995, pp. 322–331 (1995)

  7. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32 (1), 48–77 (2003)

    Article  MathSciNet  Google Scholar 

  8. Auer, P., Ortner, R., Szepesvari, C.: Improved rates for the stochastic continuum-armed bandit problem. In: Proceedings of 20th Conference on Learning Theory (COLT), pp. 454–468 (2007)

  9. Bansal, N., Blum, A., Chawla, S., Meyerson, A.: Online oblivious routing. In: Proceedings of ACM Symposium in Parallelism in Algorithms and Architectures, pp. 44–49 (2003)

  10. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)

    Article  MATH  Google Scholar 

  11. Blum, A., Kumar, V., Rudra, A., Wu, F.: Online learning in online auctions. In: Proceedings of 14th Symp. on Discrete Alg., pp. 202–204 (2003)

  12. Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: X-armed bandits. J. Mach. Learn. Res. (JMLR) 12, 1587–1627 (2011)

    MathSciNet  Google Scholar 

  13. Bubeck, S., Stoltz, G., Yu, J.: Lipschitz bandits without the Lipschitz constant. In: Proceedings of the 22nd International Conference on Algorithmic Learning Theory (ALT), pp. 144–158 (2011)

  14. Candès, E., Plan, Y.: Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. CoRR abs/1001.0339 (2010)

  15. Carpentier, A., Munos, R.: Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. In: Proceedings of AIStats, pp. 190–198 (2012)

  16. Chen, B., Castro, R., Krause, A.: Joint optimization and variable selection of high-dimensional gaussian processes. In: Proceedings International Conference on Machine Learning (ICML) (2012)

  17. Coifman, R., Maggioni, M.: Diffusion wavelets. Appl. Comput. Harmon. Anal. 21, 53–94 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  18. Cope, E.: Regret and convergence bounds for a class of continuum-armed bandit problems. IEEE Trans. Autom. Control 54, 1243–1253 (2009)

    Article  MathSciNet  Google Scholar 

  19. DeVore, R., Petrova, G., Wojtaszczyk, P.: Approximation of functions of few variables in high dimensions. Constr. Approx 33, 125–143 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  20. Djolonga, J., Krause, A., Cevher, V.: High dimensional gaussian process bandits. In: To Appear in Neural Information Processing Systems (NIPS) (2013)

  21. Flaxman, A., Kalai, A., McMahan, H.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394 (2005)

  22. Fornasier, M., Schnass, K., Vybiral, J.: Learning functions of few arbitrary linear parameters in high dimensions. Found. Comput. Math. 12 (2), 229–262 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  23. Fredman, M., Komlós, J.: On the size of separating systems and families of perfect hash functions. SIAM. J. Algebr. Discret. Methods 5, 61–68 (1984)

    Article  MATH  Google Scholar 

  24. Fredman, M., Komlós, J., Szemerédi, E.: Storing a sparse table with 0(1) worst case access time. J. ACM 31 (3), 538–544 (1984)

    Article  MATH  Google Scholar 

  25. Greenshtein, E.: Best subset selection, persistence in high dimensional statistical learning and optimization under 1 constraint. Ann. Stat. 34, 2367–2386 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  26. Kleinberg, R.: Nearly tight bounds for the continuum-armed bandit problem. In: 18th Advances in Neural Information Processing Systems (2004)

  27. Kleinberg, R.: Online decision problems with large strategy sets. Ph.D. thesis. MIT, Boston (2005)

  28. Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: Proceedings of Foundations of Computer Science, 2003., pp. 594–605 (2003)

  29. Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandits in metric spaces. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, STOC ’08, pp. 681–690 (2008)

  30. Körner, J.: Fredmankomlós bounds and information theory. SIAM J. Algebraic Discret. Methods 7 (4), 560–570 (1986)

    Article  MATH  Google Scholar 

  31. Laurent, B., Massart, P.: Adaptive estimation of a quadratic functional by model selection. Ann. Stat. 28 (5), 1302–1338 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  32. Li, Q., Racine, J.: Nonparametric econometrics: Theory and practice (2007)

  33. McMahan, B., Blum, A.: Online geometric optimization in the bandit setting against an adaptive adversary. In: Proceedings of the 17th Annual Conference on Learning Theory (COLT), pp. 109–123 (2004)

  34. Mossel, E., O’Donnell, R., Servedio, R.: Learning juntas. In: Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, STOC, pp. 206–212. ACM (2003)

  35. Naor, M., Schulman, L., Srinivasan, A.: Splitters and near-optimal derandomization. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 182–191 (1995)

  36. Nilli, A.: Perfect hashing and probability. Comb. Probab. Comput. 3, 407–409 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  37. Orlitsky, A.: Worst-case interactive communication i: Two messages are almost optimal. IEEE Trans. Inf. Theory 36, 1111–1126 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  38. Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  39. Tropp, J.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12 (4), 389–434 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  40. Tyagi, H., Cevher, V.: Active learning of multi-index function models. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1475–1483 (2012)

  41. Tyagi, H., Cevher, V.: Learning non-parametric basis independent models from point queries via low-rank methods. Appl. Comput. Harmonic Anal. (2014)

  42. Tyagi, H., Gärtner, B.: Continuum armed bandit problem of few variables in high dimensions. CoRR abs/1304.5793 (2013)

  43. Wang, Z., Zoghi, M., Hutter, F., Matheson, D., de Freitas, N.: Bayesian optimization in high dimensions via random embeddings. In: Proc. IJCAI (2013)

  44. Wedin, P.: Perturbation bounds in connection with singular value decomposition. BIT 12, 99–111 (1972)

    Article  MATH  MathSciNet  Google Scholar 

  45. Weyl, H.: Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Mathematische Annalen 71, 441–479 (1912)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

The project CG Learning acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 255827.

(A preliminary version of this paper appeared in the proceedings of the 11th Workshop on Approximation and Online Algorithms (WAOA). This is a significantly expanded version including analysis for a generalization of the problem considered in the WAOA paper.)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hemant Tyagi.

Appendix: Proofs of Results in Section 4

Appendix: Proofs of Results in Section 4

1.1 A.1 Proof of Lemma 3

Proof

We can bound R 3 from above as follows.

$$\begin{array}{@{}rcl@{}} R_{3} &=& \sum\limits_{t=n_{1}+1}^{n}[\bar{r}(\mathbf x^{*}) - \bar{r}(\mathbf x^{**})] \end{array} $$
(44)
$$\begin{array}{@{}rcl@{}} &=& n_{2}[\bar{g}(\mathbf{A}\mathbf x^{*}) - \bar{g}(\mathbf{A}\mathbf x^{**})] \end{array} $$
(45)
$$\begin{array}{@{}rcl@{}} &=& n_{2}[\bar{g}(\mathbf{A}\mathbf x^{*}) - \bar{g}(\mathbf{A}\widehat{\mathbf{A}}^{T}\widehat{\mathbf{A}}\mathbf x^{**})] \end{array} $$
(46)
$$\begin{array}{@{}rcl@{}} &\leq& n_{2}[\bar{g}(\mathbf{A}\mathbf x^{*}) - \bar{g}(\mathbf{A}\widehat{\mathbf{A}}^{T}\widehat{\mathbf{A}}\mathbf x^{*})] \end{array} $$
(47)
$$\begin{array}{@{}rcl@{}} &\leq& n_{2} C_{2} \sqrt{k} \parallel{\mathbf{A}\mathbf x^{*} - \mathbf{A}\widehat{\mathbf{A}}^{T}\widehat{\mathbf{A}}\mathbf x^{*}}\parallel \end{array} $$
(48)
$$\begin{array}{@{}rcl@{}} &\leq& n_{2} C_{2} \sqrt{k} (1+\nu) \parallel{\mathbf{A} - \mathbf{A}\widehat{\mathbf{A}}^{T}\widehat{\mathbf{A}}}\parallel_{F} \end{array} $$
(49)
$$\begin{array}{@{}rcl@{}} &=& \frac{n_{2} C_{2} \sqrt{k} (1+\nu)}{\sqrt{2}} \parallel{\mathbf{A}^{T}\mathbf{A} - \widehat{\mathbf{A}}^{T}\widehat{\mathbf{A}}}\parallel_{F}. \end{array} $$
(50)

In (46) we used the fact that \(\mathbf x^{**} = \widehat {\mathbf {A}}^{T}\widehat {\mathbf {A}}\mathbf x^{**}\) since \(\mathbf x^{**} \in \mathcal {P}\). In (47) we used the fact that \(\bar {g}(\mathbf {A}\widehat {\mathbf {A}}^{T}\widehat {\mathbf {A}}\mathbf x^{**}) \geq \bar {g}(\mathbf {A}\widehat {\mathbf {A}}^{T}\widehat {\mathbf {A}}\mathbf {x}^{*})\) since \(\widehat {\mathbf {A}}^{T}\widehat {\mathbf {A}}\mathbf {x}^{*} \in \mathcal {P}\) and \(\mathbf {x}^{**} \in \mathcal {P}\) is an optimal strategy. Equation (48) follows from the mean value theorem along with the smoothness assumption made in (9). In (49) we used the simple inequality : ∥B x∥≤∥B F x∥. Obtaining (50) from (49) is a straightforward exercise. □

1.2 A.2 Proof of Lemma 4

Proof

We first recall the following result by Candes et al. [14, Theorem 1], which we will use in our setting, for bounding the error of the matrix Dantzig selector.

Theorem 4

For any \(\mathbf {X} \in \mathbb {R}^{d \times m_{\mathcal {X}}}\) such that rank( X) ≤k let \(\widehat {\mathbf {X}}_{DS}\) be the solution of ( 32 ). If \(\delta _{4k} < \delta < \sqrt {2}-1\) and ∥Φ ( H+N)∥≤λ then we have with probability at least \(1-2 e^{-m_{\Phi }q(\delta ) + 4k(d+m_{\mathcal X}+1)u(\delta )}\) that

$$\parallel{\mathbf{X} - \widehat{\mathbf{X}}_{DS}}\parallel_{F}^{2} \leq C_{0} k \lambda^{2} $$

where C 0 depends only on the isometry constant δ 4k .

What remains to be found for our purposes is λ which is a bound on ∥Φ(H+N)∥. Firstly note that ∥Φ(H+N)∥≤∥Φ(H)∥+∥Φ(N)∥. From Tyagi et al. [41, Lemma 1,Corollary 1], we have that:

$$\parallel{{\Phi}^{*}(\mathbf H)}\parallel \leq \frac{C_{2} \epsilon d m_{\mathcal{X}} k^{2}}{2\sqrt{m_{\Phi}}}(1+\delta)^{1/2} $$

holds with probability at least \(1-2 e^{-m_{\Phi }q(\delta ) + 4k(d+m_{\mathcal {X}}+1)u(\delta )}\) where δ is such that \(\delta _{4k} < \delta < \sqrt {2}-1\). Next we note that \(\mathbf {N} = [N_{1} N_{2} {\dots } N_{m_{\Phi }}]\) where

$$N_{i} = \underbrace{\frac{1}{\epsilon}\sum\limits_{j=1}^{m_{\mathcal{X}}} \eta_{j}}_{L_{1,i}} - \underbrace{\frac{1}{\epsilon}\sum\limits_{j=1}^{m_{\mathcal X}}\eta_{i,j}}_{L_{2,i}} $$

with \(\mathbf {L_{1}} = [L_{1,1} {\dots } L_{1,m_{\Phi }}]\) and \(\mathbf {L_{2}} = [L_{2,1} {\dots } L_{2,m_{\Phi }}]\) so that N=L 1L 2. We then have that ∥Φ(N)∥≤∥Φ(L 1)∥+∥Φ(L 2)∥. By using Lemma 1.1 of Candes et al. [14] and denoting \(m=\max \left \{d,m_{\mathcal X}\right \}\) we first have that:

$$ \parallel{{\Phi}^{*}(\mathbf{L_{1}})}\parallel \leq \frac{2\gamma\sigma}{\epsilon} \sqrt{(1+\delta) m_{\Phi} m_{\mathcal{X}} m} $$
(51)

holds with probability at least 1 − 2e cm where \(c = \frac {\gamma ^{2}}{2} - 2\log 12\) and \(\gamma \;>\;2\sqrt {\log 12}\). This can be verified using the proof technique of Candes et al. [14, Lemma 1.1]. Care has to be taken of the fact that the entries of L 1 are correlated as they are identical copies of the same Gaussian random variable \(\frac {1}{\epsilon }{\sum }_{j=1}^{m_{\mathcal X}} \eta _{j}\). Furthermore we also have that:

$$ \parallel{{\Phi}^{*}(\mathbf{L_{2}})}\parallel \leq \frac{2\gamma\sigma}{\epsilon} \sqrt{(1+\delta) m_{\mathcal{X}} m} $$
(52)

holds with probability at least 1−2e cm with constants c,γ as defined earlier. This is again easily verifiable using the proof technique of Candes et al. [14, Lemma 1.1], as the entries of L 2 are i.i.d Gaussian random variables. Combining (51) and (52) we then have that the following holds true with probability at least 1−4e cm.

$$ \parallel{{\Phi}^{*}(\mathbf{L_{1}})}\parallel + \parallel{{\Phi}^{*}(\mathbf{L_{2}})}\parallel \leq \frac{4\gamma\sigma}{\epsilon} \sqrt{(1+\delta) m_{\mathcal{X}} m_{\Phi} m}. $$
(53)

Lastly, it is fairly easy to see that \(\parallel {\widehat {\mathbf {X}}_{DS}^{(k)} - \mathbf {X}}\parallel _{F} \leq 2 \parallel {\widehat {\mathbf {X}}_{DS} - \mathbf {X}}\parallel _{F}\) where \(\widehat {\mathbf {X}}_{DS}^{(k)}\) is the best rank k approximation to \(\widehat {\mathbf {X}}_{DS}\) (see for example, the proof by Tyagi et al. [41, Corollary 1]). Combining the above observations we arrive at the stated error bound with probability at least \(1-2 e^{-m_{\Phi }q(\delta ) + 4k(d+m_{\mathcal {X}}+1)u(\delta )} - 4 e^{-c m}\). □

1.3 A.3 Proof of Lemma 5

Proof

Let τ denote the bound on \(\parallel {\widehat {\mathbf {X}}_{DS}^{(k)} - \mathbf {X}}\parallel _{F}\) as stated in Lemma 4. We now make use of a result by Tyagi et al. [41, Lemma 2]. This states that if \(\tau < \frac {\sqrt {(1-\rho )m_{\mathcal {X}}\alpha k}}{\sqrt {k}+\sqrt {2}}\) holds, then it implies that

$$ \parallel{\widehat{\mathbf{A}}^{T}\widehat{\mathbf{A}} - \mathbf{A}^{T}\mathbf{A}}\parallel_{F} \leq \frac{2\tau}{\sqrt{(1-\rho)m_{\mathcal{X}}\alpha} - \tau} $$
(54)

holds true for any 0 < ρ < 1 with probability at least \(1-k\exp \left (-\frac {m_{\mathcal {X}}\alpha \rho ^{2}}{2k {C_{2}^{2}}}\right )\). The proof makes use of Weyl’s inequality [45], Wedin’s perturbation bound [44] and a deviation bound for the extremal eigenvalues of the sum of random positive semidefinite matrices [39].

Therefore, upon using the value of τ we have that \(\tau < f \frac {\sqrt {(1-\rho )m_{\mathcal {X}}\alpha k}}{\sqrt {k}+\sqrt {2}}\) holds for any 0<f<1 if:

$$\begin{array}{@{}rcl@{}} C_{0}^{1/2} k^{1/2} (1+\delta)^{1/2} \left(\frac{C_{2}\epsilon d m_{\mathcal{X}} k^{2}}{\sqrt{m_{\Phi}}} + \frac{8\gamma\sigma\sqrt{m_{\mathcal X} m_{\Phi} m}}{\epsilon}\right) &<& f \frac{\sqrt{(1-\rho)m_{\mathcal{X}}\alpha k}}{\sqrt{k}+\sqrt{2}} \end{array} $$
(55)
$$\begin{array}{@{}rcl@{}} \Leftrightarrow \overbrace{C_{2} d k^{2}}^{a_{1}} \epsilon \sqrt{\frac{m_{\mathcal{X}}}{m_{\Phi}}} + \frac{8\gamma\sigma\sqrt{m_{\Phi} m}}{\epsilon} &<& f \left(\overbrace{\frac{1}{C_{0}^{1/2} (1+\delta)^{1/2}} \frac{\sqrt{(1-\rho)\alpha}}{\sqrt{k}+\sqrt{2}}}^{b_{1}}\right) \end{array} $$
(56)
$$\begin{array}{@{}rcl@{}} \Leftrightarrow a_{1} \sqrt{\frac{m_{\mathcal{X}}}{m_{\Phi}}} \epsilon^{2} - f b_{1} \epsilon + 8\gamma\sigma\sqrt{m_{\Phi} m} &<& 0. \end{array} $$
(57)

From (57) we get the stated condition on 𝜖. Lastly upon using \(\tau < \frac {f\sqrt {(1-\rho )m_{\mathcal X}\alpha k}}{\sqrt {k}+\sqrt {2}}\) in (54) we obtain the stated bound on \(\parallel {\widehat {\mathbf {A}}^{T}\widehat {\mathbf {A}} - \mathbf {A}^{T}\mathbf {A}}\parallel _{F}\). □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tyagi, H., Stich, S.U. & Gärtner, B. On Two Continuum Armed Bandit Problems in High Dimensions. Theory Comput Syst 58, 191–222 (2016). https://doi.org/10.1007/s00224-014-9570-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-014-9570-8

Keywords

Navigation