Erratum to: Generalized Thompson sampling for sequential decision-making and causal inference

Nature’s probability of flipping either coin does not actually depend on the agent’s prediction, so we can replace the conditional probabilities p₀(θ|x) by p₀(θ). We have then an inner variational problem:

arg max_{\tilde{p} (θ | x)} \underset{θ}{Σ} \tilde{p} (θ | x) [- \frac{1}{β} log \frac{\tilde{p} (θ | x)}{p_{0} (θ)} + U (x, θ)]

(14)

with the solution

p (θ | x) = \frac{1}{Z_{β} (x)} p_{0} (θ) exp (β U (x, θ))

(15)

and the normalization constant $Z_{β} (x) = Σ_{θ} p_{0} (θ) exp (β U (x, θ))$ and an outer variational problem as described by equation (16) in the main text. Note that deliberation renders the two variables x and θ dependent.

Equation (19)

In the case of α = β and uniform prior $p_{0} (x) = U (x)$ , equation (17) reduces to

p (x) = \underset{θ}{Σ} p_{0} (θ) \frac{e^{α U (x, θ)}}{Z_{α}},

(19)

where $Z_{α} = \underset{x}{Σ} \underset{θ}{Σ} p_{0} (θ) e^{α U (x, θ)}$ . Note that e^αU(x,θ)/Z_α is in general not a conditional distribution. However, equation (19) can be equivalently rewritten as

p (x) = \underset{θ}{Σ} \frac{p_{0} (θ) \underset{x^{'}}{Σ} e^{α U (x^{'}, θ)}}{Z_{α}} \frac{e^{α U (x, θ)}}{\underset{x^{,}}{Σ} e^{α U (x^{'}, θ)}} = \underset{θ}{Σ} p (θ) p (x | θ),

where we have expanded the fraction by $\underset{x^{'}}{Σ} e^{α U (x^{'}, θ)}$ .

This last equality can also be obtained by stating the same variational problem in reverse causal order of x and θ, which is the natural statement of the Thompson sampling problem. The nested variational problem then becomes

\begin{matrix} arg max_{\tilde{p} (x, θ)} \underset{θ}{Σ} \tilde{p} (θ) [- \frac{1}{β} log \frac{\tilde{p} (θ)}{p_{0} (θ)} + \underset{x}{Σ} \tilde{p} (x | θ) [U (x, θ) - \frac{1}{α} log \frac{\tilde{p} (x | θ)}{p_{0} (x)}]] \end{matrix}

with the solutions

p (x | θ) = \frac{p_{0} (x) e^{α U (x, θ)}}{\underset{x^{'}}{Σ} p_{0} (x^{θ}) e^{α U (x^{'}, θ)}}

(i)

and

p (θ) = \frac{1}{Z_{β α}} p_{0} (θ) exp (\frac{β}{α} log \underset{x}{Σ} p_{0} (x) e^{α U (x, θ)})

(ii)

with normalization constant $Z_{β α} = \underset{θ}{Σ} p_{0} (θ) exp (β / α log \underset{x}{Σ} p_{0} (x) e^{α U (x, θ)})$ . In the limit α → ∞ and β → 0, the Thompson sampling agent is determined by the solutions $p (x | θ) = δ (x - arg max_{x^{'}} U (x^{'}, θ))$ and p(θ)=p₀(θ). Sampling an action from $p (x) = \underset{θ}{Σ} p (θ) p (x | θ)$ is much cheaper than sampling an action from equation (18) because of the reversed causal order in θ and x, which implies that β/α→ 0 in equation (ii) instead of β/α→∞ as in equation (17).

In the case of α=β the solutions for the two different causal orders of x and θ are equivalent. Assuming again a uniform prior $p_{0} (x) = U (x)$ , we can compute the Thompson sampling agent from equation (i) and equation (ii) for α=β to be

p (x) = \underset{θ}{Σ} p (θ) p (x | θ) = \underset{θ}{Σ} \frac{p_{0} (θ) \underset{x^{'}}{Σ} e^{α U (x^{'}, θ)}}{\underset{x^{'}}{Σ} \underset{θ^{'}}{Σ} p_{0} (θ^{'}) e^{α U (x^{'}, θ^{'})}} \frac{e^{α U (x, θ)}}{\underset{x^{θ}}{Σ} e^{α U (x^{'}, θ)}},

which is exactly equivalent to p(x) in equation (19). To sample from equation (19), we draw θ~p₀(θ) and accept $x ~ p_{0} (x) = U (x)$ if u≤e^αU(x,θ)/e^αT, where $u ~ U [0; 1]$ .

References

Ortega, PA, Braun DA: Generalized Thompson sampling for sequential decision-making and causal inference. Complex Adaptive Systems Modeling 2014, 2: 2.
Article Google Scholar

Download references

Author information

Authors and Affiliations

GRASP Laboratory, Electrical and Systems Engineering Department, University of Pennsylvania, Philadelphia, PA, 19104, USA
Pedro A Ortega
Max Planck Institute for Biological Cybernetics and Max Planck Institute for Intelligent Systems, Speemanstrasse 38, Tübingen, 72076, Germany
Daniel A Braun

Authors

Pedro A Ortega
View author publications
You can also search for this author in PubMed Google Scholar
Daniel A Braun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Pedro A Ortega or Daniel A Braun.

Additional information

The online version of the original article can be found at 10.1186/2194-3206-2-2

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Ortega, P.A., Braun, D.A. Erratum to: Generalized Thompson sampling for sequential decision-making and causal inference. Complex Adapt Syst Model 2, 4 (2014). https://doi.org/10.1186/s40294-014-0004-x

Download citation

Received: 24 June 2014
Accepted: 07 August 2014
Published: 01 October 2014
DOI: https://doi.org/10.1186/s40294-014-0004-x

Erratum to: Generalized Thompson sampling for sequential decision-making and causal inference

Abstract

Decisions in the presence of latent variables

Equations (14) and (15)