On Two Continuum Armed Bandit Problems in High Dimensions

Tyagi, Hemant; Stich, Sebastian U.; Gärtner, Bernd

doi:10.1007/s00224-014-9570-8

On Two Continuum Armed Bandit Problems in High Dimensions

Published: 12 September 2014

Volume 58, pages 191–222, (2016)
Cite this article

Theory of Computing Systems Aims and scope Submit manuscript

Hemant Tyagi¹,
Sebastian U. Stich¹ &
Bernd Gärtner¹

502 Accesses
1 Citation
Explore all metrics

Abstract

We consider the problem of continuum armed bandits where the arms are indexed by a compact subset of $\mathbb {R}^{d}$. For large d, it is well known that mere smoothness assumptions on the reward functions lead to regret bounds that suffer from the curse of dimensionality. A typical way to tackle this in the literature has been to make further assumptions on the structure of reward functions. In this work we assume the reward functions to be intrinsically of low dimension k ≪ d and consider two models: (i) The reward functions depend on only an unknown subset of k coordinate variables and, (ii) a generalization of (i) where the reward functions depend on an unknown k dimensional subspace of $\mathbb {R}^{d}$. By placing suitable assumptions on the smoothness of the rewards we derive randomized algorithms for both problems that achieve nearly optimal regret bounds in terms of the number of rounds n.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Quantum bridge analytics I: a tutorial on formulating and using QUBO models

Article 07 April 2022

Simon’s bounded rationality

Article Open access 10 April 2024

Notes

Rewards sampled at each round in an i.i.d manner from an unknown probability distribution.
See Remark 1 in Section 3.1 for discussion on how the logn factor can be removed.
See Definition 2 in Section 2.1.
Indeed, any function that depends on k ^′ ≤ k coordinates also depends on at most k coordinates.
Indeed for a compact domain, any C ² function is Lipschitz continuous but the converse is not necessarily true. Therefore, the mean reward functions that we consider, belong to a slightly restricted class of Lipschitz continuous functions.
This theorem is stated again in Section 4 for completeness.
This is actually true for k≥3. For k=1,2 the $\left (n/\log n\right )^{\frac {4}{k+2}}$ factor dominates. See Remark 3 in Section 4.3 for details.
The interested reader can find a full analysis for the case k=1 in [42].
The above sampling scheme was first considered by Fornasier et al. [22], and later by Tyagi et al. [40], for the problem of approximating functions of the form f(x)=g(A x) from point queries.
Of course in practice we will not be able to solve (32) exactly, but will instead obtain a solution that can be made to come arbitrarily close to the actual solution. This difference will hence appear as an additional error term in the error bound of Lemma 4.
In the absence of external stochastic noise (i.e. σ=0) we can actually take 𝜖 to be arbitrarily small as shown by Tyagi et al. [41, Lemma 2]. This is also verified from (33), by plugging σ=0.

References

Abbasi-yadkori, Y., Pal, D., Szepesvari, C.: Online-to-confidence-set conversions and application to sparse stochastic bandits. In: Proceedings of AIStats (2012)
Abernethy, J., Hazan, E., Rakhlin, A.: Competing in the dark: An efficient algorithm for bandit linear optimization. In: Proceedings of the 21st Annual Conference on Learning Theory (COLT) (2008)
Agrawal, R.: The continuum-armed bandit problem. SIAM J. Control Optim. 33, 1926–1951 (1995)
Article MATH MathSciNet Google Scholar
Audibert, J.Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11, 2635–2686 (2010)
MathSciNet Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 (2-3), 235–256 (2002)
Article MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proceedings of 36th Annual Symposium on Foundations of Computer Science, 1995, pp. 322–331 (1995)
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32 (1), 48–77 (2003)
Article MathSciNet Google Scholar
Auer, P., Ortner, R., Szepesvari, C.: Improved rates for the stochastic continuum-armed bandit problem. In: Proceedings of 20th Conference on Learning Theory (COLT), pp. 454–468 (2007)
Bansal, N., Blum, A., Chawla, S., Meyerson, A.: Online oblivious routing. In: Proceedings of ACM Symposium in Parallelism in Algorithms and Architectures, pp. 44–49 (2003)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)
Article MATH Google Scholar
Blum, A., Kumar, V., Rudra, A., Wu, F.: Online learning in online auctions. In: Proceedings of 14th Symp. on Discrete Alg., pp. 202–204 (2003)
Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: X-armed bandits. J. Mach. Learn. Res. (JMLR) 12, 1587–1627 (2011)
MathSciNet Google Scholar
Bubeck, S., Stoltz, G., Yu, J.: Lipschitz bandits without the Lipschitz constant. In: Proceedings of the 22nd International Conference on Algorithmic Learning Theory (ALT), pp. 144–158 (2011)
Candès, E., Plan, Y.: Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. CoRR abs/1001.0339 (2010)
Carpentier, A., Munos, R.: Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. In: Proceedings of AIStats, pp. 190–198 (2012)
Chen, B., Castro, R., Krause, A.: Joint optimization and variable selection of high-dimensional gaussian processes. In: Proceedings International Conference on Machine Learning (ICML) (2012)
Coifman, R., Maggioni, M.: Diffusion wavelets. Appl. Comput. Harmon. Anal. 21, 53–94 (2006)
Article MATH MathSciNet Google Scholar
Cope, E.: Regret and convergence bounds for a class of continuum-armed bandit problems. IEEE Trans. Autom. Control 54, 1243–1253 (2009)
Article MathSciNet Google Scholar
DeVore, R., Petrova, G., Wojtaszczyk, P.: Approximation of functions of few variables in high dimensions. Constr. Approx 33, 125–143 (2011)
Article MATH MathSciNet Google Scholar
Djolonga, J., Krause, A., Cevher, V.: High dimensional gaussian process bandits. In: To Appear in Neural Information Processing Systems (NIPS) (2013)
Flaxman, A., Kalai, A., McMahan, H.: Online convex optimization in the bandit setting: gradient descent without a gradient. In: Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 385–394 (2005)
Fornasier, M., Schnass, K., Vybiral, J.: Learning functions of few arbitrary linear parameters in high dimensions. Found. Comput. Math. 12 (2), 229–262 (2012)
Article MATH MathSciNet Google Scholar
Fredman, M., Komlós, J.: On the size of separating systems and families of perfect hash functions. SIAM. J. Algebr. Discret. Methods 5, 61–68 (1984)
Article MATH Google Scholar
Fredman, M., Komlós, J., Szemerédi, E.: Storing a sparse table with 0(1) worst case access time. J. ACM 31 (3), 538–544 (1984)
Article MATH Google Scholar
Greenshtein, E.: Best subset selection, persistence in high dimensional statistical learning and optimization under ℓ ₁ constraint. Ann. Stat. 34, 2367–2386 (2006)
Article MATH MathSciNet Google Scholar
Kleinberg, R.: Nearly tight bounds for the continuum-armed bandit problem. In: 18th Advances in Neural Information Processing Systems (2004)
Kleinberg, R.: Online decision problems with large strategy sets. Ph.D. thesis. MIT, Boston (2005)
Kleinberg, R., Leighton, T.: The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In: Proceedings of Foundations of Computer Science, 2003., pp. 594–605 (2003)
Kleinberg, R., Slivkins, A., Upfal, E.: Multi-armed bandits in metric spaces. In: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, STOC ’08, pp. 681–690 (2008)
Körner, J.: Fredmankomlós bounds and information theory. SIAM J. Algebraic Discret. Methods 7 (4), 560–570 (1986)
Article MATH Google Scholar
Laurent, B., Massart, P.: Adaptive estimation of a quadratic functional by model selection. Ann. Stat. 28 (5), 1302–1338 (2000)
Article MATH MathSciNet Google Scholar
Li, Q., Racine, J.: Nonparametric econometrics: Theory and practice (2007)
McMahan, B., Blum, A.: Online geometric optimization in the bandit setting against an adaptive adversary. In: Proceedings of the 17th Annual Conference on Learning Theory (COLT), pp. 109–123 (2004)
Mossel, E., O’Donnell, R., Servedio, R.: Learning juntas. In: Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, STOC, pp. 206–212. ACM (2003)
Naor, M., Schulman, L., Srinivasan, A.: Splitters and near-optimal derandomization. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 182–191 (1995)
Nilli, A.: Perfect hashing and probability. Comb. Probab. Comput. 3, 407–409 (1994)
Article MATH MathSciNet Google Scholar
Orlitsky, A.: Worst-case interactive communication i: Two messages are almost optimal. IEEE Trans. Inf. Theory 36, 1111–1126 (1990)
Article MATH MathSciNet Google Scholar
Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)
Article MATH MathSciNet Google Scholar
Tropp, J.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12 (4), 389–434 (2012)
Article MATH MathSciNet Google Scholar
Tyagi, H., Cevher, V.: Active learning of multi-index function models. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1475–1483 (2012)
Tyagi, H., Cevher, V.: Learning non-parametric basis independent models from point queries via low-rank methods. Appl. Comput. Harmonic Anal. (2014)
Tyagi, H., Gärtner, B.: Continuum armed bandit problem of few variables in high dimensions. CoRR abs/1304.5793 (2013)
Wang, Z., Zoghi, M., Hutter, F., Matheson, D., de Freitas, N.: Bayesian optimization in high dimensions via random embeddings. In: Proc. IJCAI (2013)
Wedin, P.: Perturbation bounds in connection with singular value decomposition. BIT 12, 99–111 (1972)
Article MATH MathSciNet Google Scholar
Weyl, H.: Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Mathematische Annalen 71, 441–479 (1912)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

The project CG Learning acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 255827.

(A preliminary version of this paper appeared in the proceedings of the 11^th Workshop on Approximation and Online Algorithms (WAOA). This is a significantly expanded version including analysis for a generalization of the problem considered in the WAOA paper.)

Author information

Authors and Affiliations

Department of Computer Science, Institute of Theoretical Computer Science, ETH Zürich, Zürich, Switzerland
Hemant Tyagi, Sebastian U. Stich & Bernd Gärtner

Authors

Hemant Tyagi
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian U. Stich
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Gärtner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hemant Tyagi.

Appendix: Proofs of Results in Section 4

1.1 A.1 Proof of Lemma 3

Proof

We can bound R ₃ from above as follows.

$$\begin{array}{@{}rcl@{}} R_{3} &=& \sum\limits_{t=n_{1}+1}^{n}[\bar{r}(\mathbf x^{*}) - \bar{r}(\mathbf x^{**})] \end{array} $$

(44)

$$\begin{array}{@{}rcl@{}} &=& n_{2}[\bar{g}(\mathbf{A}\mathbf x^{*}) - \bar{g}(\mathbf{A}\mathbf x^{**})] \end{array} $$

(45)

$$\begin{array}{@{}rcl@{}} &=& n_{2}[\bar{g}(\mathbf{A}\mathbf x^{*}) - \bar{g}(\mathbf{A}\widehat{\mathbf{A}}^{T}\widehat{\mathbf{A}}\mathbf x^{**})] \end{array} $$

(46)

$$\begin{array}{@{}rcl@{}} &\leq& n_{2}[\bar{g}(\mathbf{A}\mathbf x^{*}) - \bar{g}(\mathbf{A}\widehat{\mathbf{A}}^{T}\widehat{\mathbf{A}}\mathbf x^{*})] \end{array} $$

(47)

$$\begin{array}{@{}rcl@{}} &\leq& n_{2} C_{2} \sqrt{k} \parallel{\mathbf{A}\mathbf x^{*} - \mathbf{A}\widehat{\mathbf{A}}^{T}\widehat{\mathbf{A}}\mathbf x^{*}}\parallel \end{array} $$

(48)

$$\begin{array}{@{}rcl@{}} &\leq& n_{2} C_{2} \sqrt{k} (1+\nu) \parallel{\mathbf{A} - \mathbf{A}\widehat{\mathbf{A}}^{T}\widehat{\mathbf{A}}}\parallel_{F} \end{array} $$

(49)

$$\begin{array}{@{}rcl@{}} &=& \frac{n_{2} C_{2} \sqrt{k} (1+\nu)}{\sqrt{2}} \parallel{\mathbf{A}^{T}\mathbf{A} - \widehat{\mathbf{A}}^{T}\widehat{\mathbf{A}}}\parallel_{F}. \end{array} $$

(50)

In (46) we used the fact that $\mathbf x^{**} = \widehat {\mathbf {A}}^{T}\widehat {\mathbf {A}}\mathbf x^{**}$ since $\mathbf x^{**} \in \mathcal {P}$. In (47) we used the fact that $\bar {g}(\mathbf {A}\widehat {\mathbf {A}}^{T}\widehat {\mathbf {A}}\mathbf x^{**}) \geq \bar {g}(\mathbf {A}\widehat {\mathbf {A}}^{T}\widehat {\mathbf {A}}\mathbf {x}^{*})$ since $\widehat {\mathbf {A}}^{T}\widehat {\mathbf {A}}\mathbf {x}^{*} \in \mathcal {P}$ and $\mathbf {x}^{**} \in \mathcal {P}$ is an optimal strategy. Equation (48) follows from the mean value theorem along with the smoothness assumption made in (9). In (49) we used the simple inequality : ∥B x∥≤∥B∥_F∥x∥. Obtaining (50) from (49) is a straightforward exercise. □

1.2 A.2 Proof of Lemma 4

Proof

We first recall the following result by Candes et al. [14, Theorem 1], which we will use in our setting, for bounding the error of the matrix Dantzig selector.

Theorem 4

For any $\mathbf {X} \in \mathbb {R}^{d \times m_{\mathcal {X}}}$ such that rank( X) ≤k let $\widehat {\mathbf {X}}_{DS}$ be the solution of ( 32 ). If $\delta _{4k} < \delta < \sqrt {2}-1$ and ∥Φ ^∗ ( H+N)∥≤λ then we have with probability at least $1-2 e^{-m_{\Phi }q(\delta ) + 4k(d+m_{\mathcal X}+1)u(\delta )}$ that

$$\parallel{\mathbf{X} - \widehat{\mathbf{X}}_{DS}}\parallel_{F}^{2} \leq C_{0} k \lambda^{2} $$

where C ₀ depends only on the isometry constant δ _4k.

What remains to be found for our purposes is λ which is a bound on ∥Φ^∗(H+N)∥. Firstly note that ∥Φ^∗(H+N)∥≤∥Φ^∗(H)∥+∥Φ^∗(N)∥. From Tyagi et al. [41, Lemma 1,Corollary 1], we have that:

$$\parallel{{\Phi}^{*}(\mathbf H)}\parallel \leq \frac{C_{2} \epsilon d m_{\mathcal{X}} k^{2}}{2\sqrt{m_{\Phi}}}(1+\delta)^{1/2} $$

holds with probability at least $1-2 e^{-m_{\Phi }q(\delta ) + 4k(d+m_{\mathcal {X}}+1)u(\delta )}$ where δ is such that $\delta _{4k} < \delta < \sqrt {2}-1$. Next we note that $\mathbf {N} = [N_{1} N_{2} {\dots } N_{m_{\Phi }}]$ where

$$N_{i} = \underbrace{\frac{1}{\epsilon}\sum\limits_{j=1}^{m_{\mathcal{X}}} \eta_{j}}_{L_{1,i}} - \underbrace{\frac{1}{\epsilon}\sum\limits_{j=1}^{m_{\mathcal X}}\eta_{i,j}}_{L_{2,i}} $$

with $\mathbf {L_{1}} = [L_{1,1} {\dots } L_{1,m_{\Phi }}]$ and $\mathbf {L_{2}} = [L_{2,1} {\dots } L_{2,m_{\Phi }}]$ so that N=L ₁−L ₂. We then have that ∥Φ^∗(N)∥≤∥Φ^∗(L ₁)∥+∥Φ^∗(L ₂)∥. By using Lemma 1.1 of Candes et al. [14] and denoting $m=\max \left \{d,m_{\mathcal X}\right \}$ we first have that:

$$ \parallel{{\Phi}^{*}(\mathbf{L_{1}})}\parallel \leq \frac{2\gamma\sigma}{\epsilon} \sqrt{(1+\delta) m_{\Phi} m_{\mathcal{X}} m} $$

(51)

holds with probability at least 1 − 2e ^−cm where $c = \frac {\gamma ^{2}}{2} - 2\log 12$ and $\gamma \;>\;2\sqrt {\log 12}$. This can be verified using the proof technique of Candes et al. [14, Lemma 1.1]. Care has to be taken of the fact that the entries of L ₁ are correlated as they are identical copies of the same Gaussian random variable $\frac {1}{\epsilon }{\sum }_{j=1}^{m_{\mathcal X}} \eta _{j}$. Furthermore we also have that:

$$ \parallel{{\Phi}^{*}(\mathbf{L_{2}})}\parallel \leq \frac{2\gamma\sigma}{\epsilon} \sqrt{(1+\delta) m_{\mathcal{X}} m} $$

(52)

holds with probability at least 1−2e ^−cm with constants c,γ as defined earlier. This is again easily verifiable using the proof technique of Candes et al. [14, Lemma 1.1], as the entries of L ₂ are i.i.d Gaussian random variables. Combining (51) and (52) we then have that the following holds true with probability at least 1−4e ^−cm.

$$ \parallel{{\Phi}^{*}(\mathbf{L_{1}})}\parallel + \parallel{{\Phi}^{*}(\mathbf{L_{2}})}\parallel \leq \frac{4\gamma\sigma}{\epsilon} \sqrt{(1+\delta) m_{\mathcal{X}} m_{\Phi} m}. $$

(53)

Lastly, it is fairly easy to see that $\parallel {\widehat {\mathbf {X}}_{DS}^{(k)} - \mathbf {X}}\parallel _{F} \leq 2 \parallel {\widehat {\mathbf {X}}_{DS} - \mathbf {X}}\parallel _{F}$ where $\widehat {\mathbf {X}}_{DS}^{(k)}$ is the best rank k approximation to $\widehat {\mathbf {X}}_{DS}$ (see for example, the proof by Tyagi et al. [41, Corollary 1]). Combining the above observations we arrive at the stated error bound with probability at least $1-2 e^{-m_{\Phi }q(\delta ) + 4k(d+m_{\mathcal {X}}+1)u(\delta )} - 4 e^{-c m}$. □

1.3 A.3 Proof of Lemma 5

Proof

Let τ denote the bound on $\parallel {\widehat {\mathbf {X}}_{DS}^{(k)} - \mathbf {X}}\parallel _{F}$ as stated in Lemma 4. We now make use of a result by Tyagi et al. [41, Lemma 2]. This states that if $\tau < \frac {\sqrt {(1-\rho )m_{\mathcal {X}}\alpha k}}{\sqrt {k}+\sqrt {2}}$ holds, then it implies that

$$ \parallel{\widehat{\mathbf{A}}^{T}\widehat{\mathbf{A}} - \mathbf{A}^{T}\mathbf{A}}\parallel_{F} \leq \frac{2\tau}{\sqrt{(1-\rho)m_{\mathcal{X}}\alpha} - \tau} $$

(54)

holds true for any 0 < ρ < 1 with probability at least $1-k\exp \left (-\frac {m_{\mathcal {X}}\alpha \rho ^{2}}{2k {C_{2}^{2}}}\right )$. The proof makes use of Weyl’s inequality [45], Wedin’s perturbation bound [44] and a deviation bound for the extremal eigenvalues of the sum of random positive semidefinite matrices [39].

Therefore, upon using the value of τ we have that $\tau < f \frac {\sqrt {(1-\rho )m_{\mathcal {X}}\alpha k}}{\sqrt {k}+\sqrt {2}}$ holds for any 0<f<1 if:

$$\begin{array}{@{}rcl@{}} C_{0}^{1/2} k^{1/2} (1+\delta)^{1/2} \left(\frac{C_{2}\epsilon d m_{\mathcal{X}} k^{2}}{\sqrt{m_{\Phi}}} + \frac{8\gamma\sigma\sqrt{m_{\mathcal X} m_{\Phi} m}}{\epsilon}\right) &<& f \frac{\sqrt{(1-\rho)m_{\mathcal{X}}\alpha k}}{\sqrt{k}+\sqrt{2}} \end{array} $$

(55)

$$\begin{array}{@{}rcl@{}} \Leftrightarrow \overbrace{C_{2} d k^{2}}^{a_{1}} \epsilon \sqrt{\frac{m_{\mathcal{X}}}{m_{\Phi}}} + \frac{8\gamma\sigma\sqrt{m_{\Phi} m}}{\epsilon} &<& f \left(\overbrace{\frac{1}{C_{0}^{1/2} (1+\delta)^{1/2}} \frac{\sqrt{(1-\rho)\alpha}}{\sqrt{k}+\sqrt{2}}}^{b_{1}}\right) \end{array} $$

(56)

$$\begin{array}{@{}rcl@{}} \Leftrightarrow a_{1} \sqrt{\frac{m_{\mathcal{X}}}{m_{\Phi}}} \epsilon^{2} - f b_{1} \epsilon + 8\gamma\sigma\sqrt{m_{\Phi} m} &<& 0. \end{array} $$

(57)

From (57) we get the stated condition on 𝜖. Lastly upon using $\tau < \frac {f\sqrt {(1-\rho )m_{\mathcal X}\alpha k}}{\sqrt {k}+\sqrt {2}}$ in (54) we obtain the stated bound on $\parallel {\widehat {\mathbf {A}}^{T}\widehat {\mathbf {A}} - \mathbf {A}^{T}\mathbf {A}}\parallel _{F}$. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tyagi, H., Stich, S.U. & Gärtner, B. On Two Continuum Armed Bandit Problems in High Dimensions. Theory Comput Syst 58, 191–222 (2016). https://doi.org/10.1007/s00224-014-9570-8

Download citation

Received: 07 February 2014
Accepted: 23 August 2014
Published: 12 September 2014
Issue Date: January 2016
DOI: https://doi.org/10.1007/s00224-014-9570-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Two Continuum Armed Bandit Problems in High Dimensions

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Quantum bridge analytics I: a tutorial on formulating and using QUBO models

Simon’s bounded rationality

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs of Results in Section 4

1.1 A.1 Proof of Lemma 3

Proof

1.2 A.2 Proof of Lemma 4

Proof

Theorem 4

1.3 A.3 Proof of Lemma 5

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On Two Continuum Armed Bandit Problems in High Dimensions

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Quantum bridge analytics I: a tutorial on formulating and using QUBO models

Simon’s bounded rationality

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs of Results in Section 4

Appendix: Proofs of Results in Section 4

1.1 A.1 Proof of Lemma 3

Proof

1.2 A.2 Proof of Lemma 4

Proof

Theorem 4

1.3 A.3 Proof of Lemma 5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation