Online Learning in Budget-Constrained Dynamic Colonel Blotto Games

Leon, Vincent; Etesami, S. Rasoul

doi:10.1007/s13235-023-00518-7

Online Learning in Budget-Constrained Dynamic Colonel Blotto Games

Published: 27 July 2023

(2023)
Cite this article

Dynamic Games and Applications Aims and scope Submit manuscript

Vincent Leon¹ &
S. Rasoul Etesami¹

172 Accesses
1 Altmetric
Explore all metrics

Abstract

In this paper, we study the strategic allocation of limited resources using a Colonel Blotto game (CBG) under a dynamic setting and analyze the problem using an online learning approach. In this model, one of the players is a learner who has limited troops to allocate over a finite time horizon, and the other player is an adversary. In each round, the learner plays a one-shot Colonel Blotto game with the adversary and strategically determines the allocation of troops among battlefields based on past observations. The adversary chooses its allocation action randomly from some fixed distribution that is unknown to the learner. The learner’s objective is to minimize its regret, which is the difference between the cumulative reward of the best mixed strategy and the realized cumulative reward by following a learning algorithm while not violating the budget constraint. The learning in dynamic CBG is analyzed under the framework of combinatorial bandits and bandits with knapsacks. We first convert the budget-constrained dynamic CBG to a path planning problem on directed graph. We then devise an efficient algorithm that combines a special combinatorial bandit algorithm for path planning problem and a bandits with knapsack algorithm to cope with the budget constraint. The theoretical analysis shows that the learner’s regret is bounded by a term sublinear in time horizon and polynomial in other parameters. Finally, we justify our theoretical results by carrying out simulations for various scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Shapley Value in Knapsack Budgeted Games

Some Variations of Upper Confidence Bound for General Game Playing

BelMan: An Information-Geometric Approach to Stochastic Bandits

Availability of Data and Materials

This declaration is not applicable. This article does not use any datasets.

Notes

A static game is a one-shot game where all players act simultaneously. A game is dynamic if players are allowed to act multiple times based on the history of their strategies and observations.
Here, $(\mathcal {L}_t(a_t, i_t)=r_t(a_t)+1- \frac{T}{B_0}w_{t,i_t} (a_t))$ is the Lagrangian function at round t, where the subscript t again suppresses the dependency of parameters on the adversary’s action.

References

Agrawal S, Devanur NR, Li L (2016) An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. In: Feldman V, Rakhlin A, Shamir O (eds) 29th Annual conference on learning theory, vol 49. Proceedings of machine learning research. Columbia University, New York, New York, USA, pp 4–18
Agrawal S, Devanur NR (2014) Bandits with concave rewards and convex knapsacks. In: Proceedings of the fifteenth ACM conference on economics and computation. EC ’14. Association for Computing Machinery, New York, NY, USA, , pp 989–1006. https://doi.org/10.1145/2600057.2602844
Agrawal S, Devanur NR (2016) Linear contextual bandits with knapsacks. In: Proceedings of the 30th international conference on neural information processing systems. NIPS’16. Curran Associates Inc., Red Hook, NY, USA, pp 3458–3467
Ahmadinejad A, Dehghani S, Hajiaghayi M, Lucier B, Mahini H, Seddighin S (2019) From duels to battlefields: computing equilibria of Blotto and other games. Math Oper Res 44(4):1304–1325. https://doi.org/10.1287/moor.2018.0971
Article MathSciNet MATH Google Scholar
Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J Comput 32(1):48–77. https://doi.org/10.1137/S0097539701398375
Article MathSciNet MATH Google Scholar
Badanidiyuru A, Kleinberg R, Slivkins A (2013) Bandits with knapsacks. In: 2013 IEEE 54th annual symposium on foundations of computer science, Berkeley, California, USA, pp 207–216. IEEE
Bartlett P, Dani V, Hayes T, Kakade S, Rakhlin A, Tewari A (2008) High-probability regret bounds for bandit online linear optimization. In: Proceedings of the 21st annual conference on learning theory-COLT 2008. Omnipress, , pp 335–342
Behnezhad S, Dehghani S, Derakhshan M, Hajiaghayi M, Seddighin S (2023) Fast and simple solutions of Blotto games. Oper Res 71(2):506–516. https://doi.org/10.1287/opre.2022.2261
Article MathSciNet Google Scholar
Behnezhad S, Dehghani S, Derakhshan M, Aghayi MTH, Seddighin S (2017) Faster and simpler algorithm for optimal strategies of Blotto game. In: Proceedings of the thirty-first AAAI conference on artificial intelligence. AAAI’17. AAAI Press, San Francisco, California, USA, pp 369–375
Borel E (1921) La théorie du jeu et les équations intégralesa noyau symétrique. Comptes rendus de l’Acad des Sci 173(1304–1308):58
Google Scholar
Borel E (1953) The theory of play and integral equations with skew symmetric kernels. Econometrica 21(1):97–100
Article MathSciNet MATH Google Scholar
Cesa-Bianchi N, Lugosi G (2012) Combinatorial bandits. J Comput Syst Sci 78(5):1404–1422. https://doi.org/10.1016/j.jcss.2012.01.001
Article MathSciNet MATH Google Scholar
Cesa-Bianchi N, Lugosi G (2006) Prediction, learning, and games. Cambridge University Press, Cambridge, England. https://doi.org/10.1017/CBO9780511546921
Combes R, Talebi Mazraeh Shahi MS, Proutiere A, Lelarge M (2015) Combinatorial bandits revisited. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates Inc, Montreal, Quebec, Canada
Dani V, Hayes TP, Kakade SM (2007) The price of bandit information for online optimization. In: Proceedings of the 20th international conference on neural information processing systems. NIPS’07. Curran Associates Inc., Red Hook, NY, USA, pp 345–352
Etesami SR (2021) Open-loop equilibrium strategies for dynamic influence maximization game over social networks. IEEE Control Syst Lett 6:1496–1500. https://doi.org/10.1109/LCSYS.2021.3116030
Article MathSciNet Google Scholar
Etesami SR, Başar T (2019) Dynamic games in cyber-physical security: an overview. Dyn. Games Appl. 9(4):884–913. https://doi.org/10.1007/s13235-018-00291-y
Article MathSciNet MATH Google Scholar
Ferdowsi A, Sanjab A, Saad W, Basar T (2018) Generalized Colonel Blotto game. In: 2018 American Control Conference (ACC), Milwaukee, Wisconsin, USA, pp 5744–5749. https://doi.org/10.23919/ACC.2018.8431701
Fréchet M (1953) Emile Borel, initiator of the theory of psychological games and its application. Econometrica 21(1):95
Article MathSciNet MATH Google Scholar
Fréchet M (1953) Commentary on the three notes of Emile Borel. Econometrica 21(1):118–124
Article MathSciNet MATH Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139. https://doi.org/10.1006/jcss.1997.1504
Article MathSciNet MATH Google Scholar
Gross OA, Wagner RA (1950) A continuous Colonel Blotto game. RAND Corporation, Santa Monica, CA
Google Scholar
Guan S, Wang J, Yao H, Jiang C, Han Z, Ren Y (2020) Colonel Blotto games in network systems: Models, strategies, and applications. IEEE Trans Netw Sci Eng 7(2):637–649. https://doi.org/10.1109/TNSE.2019.2904530
Article MathSciNet Google Scholar
Gupta A, Schwartz G, Langbort C, Sastry SS, Başar T (2014) A three-stage Colonel Blotto game with applications to cyberphysical security. In: 2014 American Control Conference (ACC), Portland, Oregon, USA, pp 3820–3825. https://doi.org/10.1109/ACC.2014.6859164
Hajimirsaadeghi M, Mandayam NB (2017) A dynamic Colonel Blotto game model for spectrum sharing in wireless networks. In: 2017 55th Annual Allerton conference on communication, control, and computing (Allerton), pp 287–294. https://doi.org/10.1109/ALLERTON.2017.8262750
Hajimirsadeghi M, Sridharan G, Saad W, Mandayam NB (2016) Inter-network dynamic spectrum allocation via a Colonel Blotto game. In: 2016 Annual conference on information science and systems (CISS), pp 252–257. https://doi.org/10.1109/CISS.2016.7460510
Hortala-Vallve R, Llorente-Saguer A (2012) Pure strategy Nash equilibria in non-zero sum Colonel Blotto games. Int J Game Theory 41(2):331–343
Article MathSciNet MATH Google Scholar
Immorlica N, Sankararaman KA, Schapire R, Slivkins A (2019) Adversarial bandits with knapsacks. In: 2019 IEEE 60th annual symposium on foundations of computer science (FOCS), pp 202–219. https://doi.org/10.1109/FOCS.2019.00022
Immorlica N, Sankararaman KA, Schapire R, Slivkins A (2020) Adversarial bandits with knapsacks. arXiv:1811.11881
Kovenock D, Roberson B (2012) Coalitional Colonel Blotto games with application to the economics of alliances. J Public Econ Theory 14(4):653–676. https://doi.org/10.1111/j.1467-9779.2012.01556.x
Article Google Scholar
Kovenock D, Roberson B (2021) Generalizations of the general lotto and Colonel Blotto games. Econ Theor 71(3):997–1032
Article MathSciNet MATH Google Scholar
Labib M, Ha S, Saad W, Reed JH (2015) A Colonel Blotto game for anti-jamming in the Internet of Things. In: 2015 IEEE global communications conference (GLOBECOM), pp 1–6. https://doi.org/10.1109/GLOCOM.2015.7417437
Laslier J-F (2002) How two-party competition treats minorities. Rev Econ Design 7:297–307
Article MATH Google Scholar
Laslier J-F, Picard N (2002) Distributive politics and electoral competition. J Econ Theory 103(1):106–130. https://doi.org/10.1006/jeth.2000.2775
Article MathSciNet MATH Google Scholar
Lattimore T, Szepesvári C (2020) Bandit algorithms. Cambridge University Press, Cambridge. https://doi.org/10.1017/9781108571401
Li X, Sun C, Ye Y (2021) The symmetry between arms and knapsacks: a primal-dual approach for bandits with knapsacks. In: Meila M, Zhang T (eds) Proceedings of the 38th international conference on machine learning. Proceedings of machine learning research, vol 139, pp 6483–6492
Min M, Xiao L, Xie C, Hajimirsadeghi M, Mandayam NB (2018) Defense against advanced persistent threats in dynamic cloud storage: a Colonel Blotto game approach. IEEE Internet Things J 5(6):4250–4261. https://doi.org/10.1109/JIOT.2018.2844878
Article Google Scholar
Roberson B (2006) The Colonel Blotto game. Econ Theor 29(1):1–24
Article MathSciNet MATH Google Scholar
Thomas C (2018) N-dimensional Blotto game with heterogeneous battlefield values. Econ Theor 65(3):509–544
Article MathSciNet MATH Google Scholar
von Neumann J, Fréchet M (1953) Communication on the Borel notes. Econometrica 21(1):124–127
Vu DQ, Loiseau P, Silva A (2018) Efficient computation of approximate equilibria in discrete Colonel Blotto games. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, IJCAI-18. International Joint Conferences on Artificial Intelligence Organization, Stockholm, Sweden, pp 519–526. https://doi.org/10.24963/ijcai.2018/72
Vu DQ, Loiseau P, Silva A (2019) Combinatorial bandits for sequential learning in Colonel Blotto games. arXiv:1909.04912
Vu DQ, Loiseau P, Silva A (2019) Combinatorial bandits for sequential learning in Colonel Blotto games. In: 2019 IEEE 58th conference on decision and control (CDC), pp 867–872. https://doi.org/10.1109/CDC40024.2019.9029186
Zhang L, Wang Y, Han Z (2022) Safeguarding UAV-enabled wireless power transfer against aerial eavesdropper: a Colonel Blotto game. IEEE Wirel Commun Lett 11(3):503–507. https://doi.org/10.1109/LWC.2021.3133891
Article Google Scholar

Download references

Funding

This material is based upon work supported by the Air Force Office of Scientific Research under award number FA9550-23-1-0107 and the NSF CAREER Award under grant number EPCN-1944403.

Author information

Authors and Affiliations

Department of Industrial and Enterprise Systems Engineering and Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Vincent Leon & S. Rasoul Etesami

Authors

Vincent Leon
View author publications
You can also search for this author in PubMed Google Scholar
S. Rasoul Etesami
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Vincent Leon wrote the main manuscript with support from S. Rasoul Etesami. All authors reviewed the manuscript.

Corresponding author

Correspondence to Vincent Leon.

Ethics declarations

Competing interests

We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Ethical Approval

This declaration is not applicable. This article does not contain any human or animal studies.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Existing Algorithms

For the sake of completeness, in this appendix, we provide a detailed description of the existing algorithms (Algorithms 4–6) that we use as subroutines in our main algorithm.

Appendix B: Concentration Inequalities

Lemma 3

[Bernstein’s inequality for martingales, Lemma A.8 in [13]] Let $Y_1, Y_2, \ldots $ be a martingale difference sequence (i.e., ${\mathbb {E}}[Y_t\vert Y_{t-1},\ldots ,Y_1]=0 \, \forall t \in {\mathbb {Z}}_+$). Suppose that $|Y_t |\le c$ and ${\mathbb {E}}[Y_t^2 \vert Y_{t-1}, \ldots , Y_1] \le v$ almost surely for all $t \in {\mathbb {Z}}_+$. For any $\delta > 0$,

$$\begin{aligned} \text {P }\left( \sum _{t=1}^T Y_t > \sqrt{2 Tv \ln (1/\delta )} + \frac{2}{3} c\ln (1/\delta ) \right) \le \delta . \end{aligned}$$

Lemma 4

[Azuma-Hoeffding’s inequality, Lemma A.7 in [13]] Let $Y_1, Y_2, \ldots $ be a martingale difference sequence and $\vert Y_t\vert \le c$ almost surely for all $t \in {\mathbb {Z}}_+$. For any $\delta > 0$,

$$\begin{aligned} \text {P }\left( \sum _{t=1}^T Y_t > \sqrt{2Tc^2 \ln (1/\delta )}\right) \le \delta . \end{aligned}$$

Appendix C: Proof of Theorem 1

The proof of Theorem 1 is built upon several results from combinatorial bandits [7, 12, 15]. In the following, we first state some useful lemmas from [12, 43], and then use them to prove Theorem 1.

Throughout the appendix, let $\varvec{r}_t(\varvec{u}) = \varvec{l}_t^{\intercal } \varvec{u}$ be the reward by playing arm $\varvec{u}$ in round t, and ${\hat{r}}_t(\varvec{u}) = \hat{\varvec{l}}_t^{\intercal } \varvec{u}$ be the estimated reward where $\hat{\varvec{l}}_t$ is the unbiased cost estimator computed in Algorithm 2.

Lemma 5

Let $\lambda ^*$ be the smallest nonzero eigenvalue of the co-occurrence matrix $M(\mu )=\mathbb {E}_{\varvec{u}\sim \mu }[\varvec{u}\varvec{u}^{\intercal }]$ for the exploration distribution $\mu $ in algorithm Edge. For all $\varvec{u} \in {\mathcal {S}} \subseteq \{0,1\}^E$ and all $t\in [T]$, the following relations hold:

$\vert \vert \varvec{u}\vert \vert ^2 = \sum _{i=1}^{E} u_i^2 = n$
$\vert {\hat{r}}_t(\varvec{u})\vert = \vert \hat{\varvec{l}}_t^{\intercal } \varvec{u}\vert \le \frac{n}{\gamma \lambda ^*}$
$\varvec{u}^{\intercal } C_t^{-1} \varvec{u} \le \frac{n}{\gamma \lambda ^*}$
$\sum _{\varvec{u} \in \mathcal {S}} p_t(\varvec{u}) \varvec{u}^{\intercal } C_t^{-1} \varvec{u} \le E$
$\mathbb {E}_t[(\hat{\varvec{l}}_t^{\intercal } \varvec{u})^2] \le \varvec{u}^{\intercal } C_t^{-1} \varvec{u}$

where in the last expression $\mathbb {E}_t[\cdot ]{:}{=} \mathbb {E}[\cdot \vert \varvec{u}_{t-1},\ldots ,\varvec{u}_1]$.

Lemma 6

[12, Appendix A] By choosing $\eta = \frac{\gamma \lambda ^*}{n}$ such that $\eta \vert {\hat{r}}_t(\varvec{u})\vert \le 1$, for all $\varvec{u}^* \in \mathcal {S}$,

$$\begin{aligned} \sum _{t=1}^T {\hat{r}}_t(\varvec{u}^*) - \frac{1}{\eta }\ln S&\le \frac{1}{1-\gamma } \sum _{t=1}^T \sum _{\varvec{u}\in \mathcal {S}} p_t(\varvec{u}) {\hat{r}}_t(\varvec{u}) \\&\quad + \frac{\eta }{1-\gamma } \sum _{t=1}^T \sum _{\varvec{u}\in \mathcal {S}} p_t(\varvec{u}){\hat{r}}_t(\varvec{u})^2\\&\quad - \frac{\gamma }{1-\gamma } \sum _{t=1}^T \sum _{\varvec{u}\in \mathcal {S}} {\hat{r}}_t(\varvec{u}) \mu (\varvec{u}). \end{aligned}$$

(C1)

Proof

This is a straightforward result of Eqs. (A.1)–(A.3) in [12] by flipping the sign of $\eta $. $\square $

Lemma 6 provides a baseline for bounding the regret. We will proceed to bound each summation in (C1). The following lemma, derived from Bernstein’s inequality (Lemma 3), provides a high-probability bound on the left side of (C1).

Lemma 7

With probability at least $1 - \delta $, for all $\varvec{u} \in \mathcal {S}$, it holds that

$$\begin{aligned} \sum _{t=1}^T r_t(\varvec{u}) - \sum _{t=1}^T {\hat{r}}_t(\varvec{u}) \le \sqrt{2T \left( \frac{n}{\gamma \lambda ^*}\right) \ln (S/\delta )} + \frac{2}{3}\left( \frac{n}{\gamma \lambda ^*}+1\right) \ln (S/\delta ). \end{aligned}$$

Proof

Fix $\varvec{u} \in \mathcal {S}$. Define $Y_t = r_t(\varvec{u}) - {\hat{r}}_t(\varvec{u})$. Then $\{Y_t\}_{t=1}^T$ is a martingale difference sequence. From Lemma 5, we know that $\vert Y_t\vert \le \frac{n}{\gamma \lambda ^*}+1$. Let $\mathbb {E}_t[Y_t^2] = \mathbb {E}[Y_t^2\vert Y_{t-1}, \ldots , Y_1]$. Then,

$$\begin{aligned} \mathbb {E}_t[Y_t^2] \le \mathbb {E}_t[(\hat{\varvec{l}}_t^{\intercal } \varvec{u})^2] \le \varvec{u}^{\intercal } C_t^{-1} \varvec{u} \le \frac{n}{\gamma \lambda ^*}. \end{aligned}$$

Using Bernstein’s inequality, with probability at least $1-\delta /S$,

$$\begin{aligned} \sum _{t=1}^T Y_t \le \sqrt{2T \left( \frac{n}{\gamma \lambda ^*}\right) \ln (S/\delta )} + \frac{2}{3}\left( \frac{n}{\gamma \lambda ^*}+1\right) \ln (S/\delta ) \end{aligned}$$

The lemma now follows by using the above inequality and taking a union bound over all $\varvec{u} \in \mathcal {S}$. $\square $

The following two lemmas obtained in [7] provide a high-probability bound on the first and second summands on the right side of (C1). The proofs of these lemmas, which are omitted here due to space limitation, use a direct application of Bernstein’s inequality and Azuma-Hoeffding’s inequality.

Lemma 8

[7, Lemma 6] With probability at least $1-\delta $,

$$\begin{aligned}{} & {} \sum _{t=1}^T \sum _{\varvec{u}\in \mathcal {S}} p_t(\varvec{u}) {\hat{r}}_t(\varvec{u}) -\sum _{t=1}^T r_t(\varvec{u}_t) \le \left( \sqrt{E} + 1\right) \sqrt{2T\ln (1/\delta )}+\frac{4}{3}\ln (1/\delta )\left( \frac{n}{\gamma \lambda ^*}+1\right) . \end{aligned}$$

Lemma 9

[7, Lemma 8] With probability at least $1 - \delta $,

$$\begin{aligned} \sum _{t=1}^T \sum _{\varvec{u}\in \mathcal {S}} p_t(\varvec{u}) {\hat{r}}_t(\varvec{u})^2 \le ET + \frac{n}{\gamma \lambda ^*}\sqrt{2T \ln (1/\delta )}. \end{aligned}$$

Now we are ready to complete the proof of Theorem 1. Using Lemma 7 and because $\sum _{t=1}^T r_t(\varvec{u}) \ge 0$ for every $\varvec{u}\in \mathcal {S}$, we can bound the last term on the right side of (C1) with probability at least $1-\delta $ as

$$\begin{aligned} - \gamma \sum _{t=1}^T \sum _{\varvec{u}\in \mathcal {S}} {\hat{r}}_t(\varvec{u}) \mu (\varvec{u}) \le \gamma \sqrt{2T\left( \frac{n}{\gamma \lambda ^*}\right) \ln (S/\delta )}+ \frac{2}{3}\gamma \left( \frac{n}{\gamma \lambda ^*}+1\right) \ln (S/\delta ) \end{aligned}$$

Using Lemmas 7 to 9 and the above inequality in Lemma 6, with probability at least $1 - 4\delta $, for all $\varvec{u} \in \mathcal {S}$ we have,

$$\begin{aligned}&\sum _{t=1}^T r_t(\varvec{u}) - \sum _{t=1}^T r_t(\varvec{u}_t) \le \sqrt{2T \left( \frac{n}{\gamma \lambda ^*}\right) \ln (S/\delta )} + 3\left( \frac{n}{\gamma \lambda ^*}+1\right) \ln (S/\delta ) \\&\quad + (\sqrt{E} + 1) \sqrt{2T \ln (1/\delta )} + \frac{\gamma \lambda ^*}{n} ET + \sqrt{2T \ln (S/\delta )} + \gamma T. \end{aligned}$$

Finally, if we set

$$\begin{aligned} \gamma = \frac{n}{\lambda ^*}\sqrt{\frac{\ln S}{\left( \frac{n}{E\lambda ^*}+1\right) ET^{2/3}}}, \end{aligned}$$

the following regret bound can be obtained:

$$\begin{aligned} \sum _{t=1}^T r_t(\varvec{u})&- \sum _{t=1}^T r_t(\varvec{u}_t) \le \sqrt{2 \left( \frac{n}{E\lambda ^*}+1\right) ^{1/2} T^{4/3} E^{1/2} (\ln (S/\delta ))^{1/2}} \\&+ 3 \sqrt{\left( \frac{n}{E\lambda ^*}+1\right) E T^{2/3} \ln (S/\delta )} + 3 \ln (S/\delta ) \\&+ \sqrt{\left( \frac{n}{E\lambda ^*}+1\right) E T^{4/3} \ln S}+ (\sqrt{E}+1)\sqrt{2T \ln (1/\delta )} + \sqrt{2T \ln (S/\delta )} \\&= O\left( T^{2/3}\sqrt{\left( \frac{n}{E\lambda ^*}+1\right) E \ln (S/\delta )}\right) . \end{aligned}$$

Appendix D: Proof of Theorem 2

It has been shown in [21, 28] that the algorithm Hedge achieves the high-probability regret bound of

$$\begin{aligned} R_{\delta }(T) = O\left( \sqrt{T \ln (\vert A\vert /\delta )}\right) , \end{aligned}$$

where $\vert A\vert $ denotes the cardinality of action set, which in our setting is the number of resources. Since we only have two types of resources, namely the time and the troops, we have $\vert A\vert =2=O(1)$.

From Eq. (11), we have $\vert \mathcal {L}_t^{\text {troop}}\vert \le \max \{2, \vert 1-c \vert \} \le 1+c$ for $c \ge 1$. As a result, the actual reward $r_t(\varvec{u})$, estimated reward ${\hat{r}}_t(\varvec{u})$, and hence the regret bound of Theorem 1 are all scaled up by at most a constant factor $1+c$. Hence, in order to satisfy the assumption of Theorem 1 given in [12] that $\eta \vert {\hat{r}}_t(\varvec{u})\vert \le 1$, we set

$$\begin{aligned} \eta = \frac{\gamma \lambda ^*}{(1+c)n} = \frac{1}{1+c} \sqrt{\frac{\ln S}{\left( \frac{n}{E\lambda ^*}+1\right) ET^{2/3}}}. \end{aligned}$$

On the other hand, from Theorem 1, the high-probability regret bound of algorithm Lagrange-Edge is at most

$$\begin{aligned} R_{\delta }(T) = O\left( T^{2/3}\sqrt{\left( \frac{n}{E\lambda ^*}+1\right) E \ln (S/\delta )}\right) . \end{aligned}$$

Now, using Lemma 1, and noting that $B_0$ is scaled to $\frac{B}{(cB/T)}=\frac{T}{c}$ such that $O\left( \frac{T}{B_0}\right) =O(1)$, we obtain that with probability at least $1-O(\delta T)$, it holds that

$$\begin{aligned} R(T)&\le O(1) \cdot \Bigg ( O\left( T^{2/3}\sqrt{\left( \frac{n}{E\lambda ^*}+1\right) E \ln (ST/\delta )}\right) + O\left( \sqrt{T \ln (T/\delta )}\right) \Bigg ) \\&= O\left( T^{2/3}\sqrt{\left( \frac{n}{E\lambda ^*}+1\right) E \ln (ST/\delta )}\right) . \end{aligned}$$

Finally, we recall that $E = O(nm^2)$, $m = O(B/T)$, and $S = O\left( 2^{\min \{n-1,m\}}\right) $, such that $\ln S \le O(m) = O(B/T)$. Substituting these relations into the above inequality we get

$$\begin{aligned} R(T)&\le O\left( T^{2/3}\sqrt{\left( \frac{T^2}{B^2 \lambda ^*}+1\right) n \left( \frac{B}{T}\right) ^3 \ln (T/\delta )}\right) \\&= O \left( T^{1/6}\sqrt{\frac{nB}{\lambda ^*} \ln \left( T/\delta \right) }\right) . \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Leon, V., Etesami, S.R. Online Learning in Budget-Constrained Dynamic Colonel Blotto Games. Dyn Games Appl (2023). https://doi.org/10.1007/s13235-023-00518-7

Download citation

Accepted: 05 July 2023
Published: 27 July 2023
DOI: https://doi.org/10.1007/s13235-023-00518-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online Learning in Budget-Constrained Dynamic Colonel Blotto Games

Abstract

Access this article

Similar content being viewed by others

The Shapley Value in Knapsack Budgeted Games

Some Variations of Upper Confidence Bound for General Game Playing

BelMan: An Information-Geometric Approach to Stochastic Bandits

Availability of Data and Materials

Notes

References

Funding