Persuasion with Ambiguous Receiver Preferences

I describe a Bayesian persuasion problem where Receiver has a private type representing a cutoff for choosing Sender's preferred action, and Sender has maxmin preferences over all Receiver type distributions with known mean and bounds. This problem can be represented as a zero-sum game where Sender chooses a distribution of posterior mean beliefs that is a mean-preserving contraction of the prior over states, and an adversarial Nature chooses a Receiver type distribution with the known mean; the player with the higher realization from their chosen distribution wins. I formalize the connection between maxmin persuasion and similar games used to model political spending, all-pay auctions, and competitive persuasion. In both a standard binary-state setting and a new continuous-state setting, Sender optimally linearizes the prior distribution over states to create a distribution of posterior means that is uniform on a known interval with an atom at the lower bound of its support.


Introduction
Consider a politician who is deciding how to disclose information about the costeffectiveness of a new welfare program, but does not know how much spending voters will support. All voters have the same prior beliefs, but some will only approve if they expect the program to provide a high level of benefits per dollar spent, while others are willing to support even a moderately inefficient government outlay. Rather than imposing a prior distribution over preferences, the politician wishes to be robust to the worst-case distribution she may face given a known threshold for the average voter. In this setting, what disclosure rule maximizes the share of voters who approve of the welfare program after taking into account the politician's message? How do the optimal rule and the politician's utility differ from the case where the politician faces a known distribution of citizen preferences? I address and generalize those questions through a model of Bayesian persuasion (Kamenica and Gentzkow, 2011), where a Sender commits to a message distribution in each state of the world and a Receiver uses Bayesian updating to form a posterior belief about the state based on the message structure. To represent Receiver's preferences, I use private types denoting the cutoff above which Receiver chooses Sender's preferred action. Sender knows the mean and support of Receiver types, and has maxmin preferences (Gilboa and Schmeidler, 1989) over all Receiver type distributions satisfying those constraints. Regardless of the true state of the world, Sender maximizes the probability of inducing the favorable action. This model captures situations where all Receiver types process information in the same way, but may have different preferences over outcomes. In addition to the political spending example described above, a model of this style also applies to a variety of other situations, such as disclosing information about product quality (if potential customers share a prior belief about quality, but may be more or less picky about when they buy) or screening job candidates (if all firms have a common prior about candidate quality and see the same resumé, but have different thresholds for hiring).
This persuasion model can be reinterpreted as a zero-sum game between Sender and an adversarial Nature. Following the Bayesian persuasion literature, I can allow Sender to directly choose any distribution of posterior mean beliefs about the state that is a mean-preserving contraction of the prior. Then, Nature chooses a Receiver type distribution with the appropriate mean and domain; this choice is equivalent to choosing a mean-preserving contraction of a Receiver type distribution with support {0, 1}. The player with the higher realization from their chosen distribution wins the game. Such mean-preserving contraction games (henceforth MPC games), albeit with simultaneous moves, have been studied in prior literature outside of the persuasion context (for example by Myerson 1993), as well as being used to represent competition between many Senders persuading a single Receiver (as in Boleslavsky and Cotton 2015). Many of those works emphasize the role of uniform distributions, which induce indifference among many possible strategies for the opposing players. Adapting these results to my setting, I show that in a binary-state setting where the probability of the high state is weakly less than 1/2, Sender's unique optimal posterior distribution places an atom at 0 and is uniform on an interval [0, c] for c ≤ 1. In doing so, I formalize the connection between maxmin persuasion and MPC games and show that the sequential timing of the maxmin persuasion game does not affect Sender's optimal distribution but the tie-breaking rule sometimes does. I also use a geometric approach based on the concavification argument of Kamenica and Gentzkow (2011) to show that for any finite number of states of the world, or when the state is continuous and unimodal, a similar distribution-uniform on [a, b] ⊂ (0, 1) with an atom at a-is one of many optimal distributions for Sender. The continuous-state setting is a novel specification of both the MPC game and the maxmin persuasion problem.

Related Literature
This work builds on the Bayesian persuasion problem of Kamenica and Gentzkow (2011), and adopts a similar approach to existing work in robust mechanism design.
In addition, my model resembles a class of games I call MPC games, which include a continuous version of the Colonel Blotto game as well as competitive Bayesian persuasion by multiple Senders. I discuss the first two topics here and postpone discussion of the third to Section 3.3, after presenting the formal model.
In the baseline Bayesian persuasion model of Kamenica and Gentzkow (2011), Receiver has no private information. Subsequent literature in this area is surveyed in detail by Kamenica (2019) and Bergemann and Morris (2019), so I focus on the two works most directly related to the model I propose, Kolotilin et al. (2017) andHu andWeng (2021). 1 The former has an interval state space, Receiver types that enter payoffs linearly, and a binary action, as in my model; however, it endows Sender with a prior distribution over Receiver types. If that prior distribution is log-concave, then the optimal distribution for Sender can be generated by upper censorship; the resulting distribution of posterior means is essentially a truncated version of the prior where states in some interval [α, 1] are replaced with an atom at β ∈ (α, 1). In the continuous-state version of my model, linearizing the prior rather than censoring high states helps Sender avoid facing a tailored Receiver type distribution in response.
To make sure this strategy respects Bayes-plausibility, Sender may use a truncated uniform distribution with interior support.
The model of Hu and Weng (2021) is most similar to the one considered here: it is a binary-action model where Sender has maxmin preferences over Receiver types and maximizes the probability of inducing the favorable action. However, Receiver types represent an ambiguous posterior about a binary state of the world rather than a payoff-relevant characteristic which does not directly interact with beliefs about the state. This model captures substantively different applications-e.g., voters with common ideology who privately read outside news sources before listening to a politician's speech, rather than the equally-informed voters with different ideological positions in 1 Other works use maxmin preferences in Bayesian persuasion settings, but are much more distinct.
In Kosterina (2022), possible Receiver type distributions are distortions of a "reference distribution;" in Dworczak and Pavan (2022), there is full ambiguity about Receiver's posterior belief; and in Laclau and Renou (2016) and Beauchêne et al. (2019), Receiver has maxmin preferences. potential relaxations of these assumptions which endow Sender with less precise information about states or Receiver types.
Sender considers potential Receiver type distributions T in the set T = cdf T over [0,1] r dT (r) = r * .
I restrict Sender to the standard Bayesian persuasion tool of committing ex-ante to a Blackwell experiment, i.e., a state-dependent signal distribution, and in particular do not allow her to elicit Receiver's type in order to capture the public-communication interpretation of this model. After Sender communicates, Receiver chooses a binary action a ∈ {0, 1} whose utility depends on the state and on Receiver's type: u R (a, ω, r) = a (ω − r). Sender's goal is to maximize the probability of inducing the high action a = 1 independent of the true state ω and true Receiver type r: u S (a, ω, r) = a.

Thus when Receiver believes E[ω |
3.2. The Maxmin Persuasion Problem. Since Receiver's choice of action depends only on the mean q of the posterior belief distribution, I can follow Blackwell (1953) and directly consider Sender choosing a distribution of posterior means G such that 3 Receiver's choice when indifferent will not affect equilibrium outcomes, but will affect Sender's equilibrium strategy. I discuss this tie-breaking issue in Section 4.1.
G is a mean-preserving contraction of the prior distribution F . The set of feasible distributions of posterior means is therefore and 1 0 G(q) dq = 1 0 F (q) dq .
I follow the literature in referring to this constraint as Bayes-plausibility. Note that when supp(F ) = {0, 1}, a case which I refer to as binary support, any posterior distribution that satisfies the equality at x = 1 satisfies the inequality for all x ∈ [0, 1).
Using this formulation and Receiver's preferences, I rewrite Sender's utility as where I assume that an indifferent Receiver chooses Sender's less-preferred action,  (1953), which shows that Sender's maxmin and minmax utilities are equal, and therefore equal to the utility from the MPC game.
The second result (Lemma 2 in Appendix A1) is that tie-breaking against Sender is equivalent to ignoring tie-breaking but allowing the Receiver type distribution to be unbounded above. 7 Since Nature moves second, I use tie-breaking against Sender to ensure that the minimizing Receiver type distribution for each posterior distribution is well-defined; however, most results on MPC games use even tie-breaking, where a posterior q = r convinces that Receiver type with probability 1/2. With unfavorable tie-breaking, in order to persuade Receiver type r, Sender must generate posterior q ε r = r + ε for arbitrary ε > 0. Thus, unlike in an MPC game with even tie-breaking, Sender can never persuade type r = 1. I can restore the equivalence between maxmin 6 This equivalence also holds when F does not have binary support. However, I do not use it in characterizing Sender's optimal distribution with a continuous-support prior in Section 5.
7 Similar observations have been made in the context of all-pay auctions by Szech (2015) and Gelder et al. (2022). The latter model finds that when ties may occur on intervals with positive measure, players' equilibrium strategies involve multiple disjoint intervals with an atom at 0, rather than the single interval and atom at 0 that arises in my model.
In green, π = 1/3 < 3/5 = r * and cannot exceed 1−r * . Whenever Sender chooses a posterior distribution with a convex cdf, this Receiver type distribution is indeed optimal for Nature, and Sender attains her maximum payoff. Since the uniform distribution U [0, 1] has the smallest mean among distributions with convex cdfs, this choice is feasible for Sender if and only if π ≥ 1/2 (and multiple such distributions are feasible when π > 1/2). When π < 1/2, Sender chooses a distribution that is as close to uniform as possible given her Bayes-plausibility constraint. This choice requires her to place an atom at q = 0, truncate the upper bound of the distribution's support below q = 1, or both. Fixing π ≤ 1/2, for small r * Sender truncates the support at 2π but places no atom at 0.
As r * increases, Sender simultaneously increases the size of the atom and moves the upper bound of the support towards 1. A higher average Receiver type makes high posteriors more valuable to Sender, so she is willing to sometimes fully reveal the low state in order to generate more of these posteriors. Figure 1 shows three examples of optimal posterior distributions, corresponding to the three cases of Proposition 1.
Hart (2015) does not establish uniqueness of this Nash equilibrium of the MPC game.
A related work, Amir (2017), shows through explicit calculations of players' utilities under various distributions that Sender's Nash equilibrium strategy is unique when π ≤ 1/2 (Theorems 4 and 5 in that work) but gives only a partial characterization of optimal strategies for Sender when π > 1/2 (Theorem 10 in that work). In the maxmin persuasion setting with tie-breaking against Sender, I am able to avoid issues with limits of ε-approximating distributions and close that gap: Lemma 4 in Appendix A2 gives a necessary and sufficient condition for Sender's optimal distribution when π > 1/2. Additionally, in Lemma 8 of Appendix A3, I provide a novel geometric proof of Sender's optimal posterior distribution for the case π ≤ 1/2, including its uniqueness, which leverages the concavification approach of Kamenica and Gentzkow (2011). This proof informs my approach in the continuous-state setting.

The Continous-State Setting
While the solution when F has binary support is especially sharp, that restriction may not always be plausible. In this section, I consider the maxmin persuasion problem of Equation (1) when F is a continuously differentiable and unimodal cdf over [0, 1] with F (0) = 0. I assume that, for some mode m ∈ [0, 1], the density f is strictly increasing on [0, m) and strictly decreasing on (m, 1]. To rule out the binary-state solution, I also assume that f ′ (0) < 1 − 2π. In this setting, a double-truncated uniform distribution of posterior means is optimal when r * is sufficiently small (Proposition 2) or large (Proposition 3). Analogous results hold when, rather than being continuous, F is supported on N values {q 1 , ..., q N } ∈ [0, 1] N with N > 2; I provide full details and proofs of this extension in Appendix B6. Before turning to the main result, I first discuss two simple cases which extend the intuitions of the binary-state setting.

Simple Continuous Priors.
In the continuous-state setting, Sender's constraint is different from Nature's. It is no longer true that any cdf G with support [0, 1] and mean π is a mean-preserving contraction of the prior F . Instead, the chosen cdf must additionally satisfy the integral constraint Under the assumptions F (0) = 0 and f ′ (0) < 1 − 2π, this constraint prevents Sender from choosing any of the optimal distributions in Proposition 1, as they violate it in the interval (0, ε) for ε > 0 sufficiently small.
Despite this new constraint, two cases of the continuous-state model are easy to solve using the intuitions of the previous section. Nature may still choose the binary support distribution which generates only Receiver types r = 0 and r = 1, so Sender's utility is still upper-bounded by 1 − r * . Thus for any F that first-order stochastically dominates U [0, 1]-so that F (q) ≤ q ∀ q ∈ [0, 1]-it is easy to see that full disclosure is optimal, since it ensures that G * = F and Sender's utility attains the upper bound.
This condition generalizes the case where F is unimodal with m = 1.
When F is concave but not uniform, it must be that F lies strictly above U [0, 1] on (0, 1) and therefore that π < 1/2. Additionally, the uniform distribution U [0, 2π], which was uniquely optimal when r * ≤ π ≤ 1/2 in the binary-state setting of Proposition 1, satisfies the integral constraint. To see why, note that the shape of F ensures that U [0, 2π] lies strictly below F on (0, c) for some c < 1; therefore the integral constraint is satisfied with equality at x = 0 and strict inequality for x ∈ (0, c]. The difference between the left-and right-hand sides of the constraint strictly decreases for x ∈ (c, 1), but only reaches 0 at x = 1: thus the weak inequality is preserved on the entire interval [0, 1]. 8 The binary-state maxmin persuasion problem is a relaxed version of the continuous-state maxmin persuasion problem, so a feasible optimal solution for Sender in the former must be optimal in the latter. Thus, if r * ≤ π and F is concave but not uniform, then G * = U [0, 2π] is the unique optimal distribution for Sender. This condition generalizes the case of unimodal F with m = 0.
In the maxmin setting, Nature's mean constraint represents information Sender pos- there is only one DTU with mean π and support [ℓ, u]. I can thus characterize a DTU by the slope β > 0 and intercept y ∈ [0, 1) of the uniform portion of its cdf, writing it as G β y . Explicit formulas for the relationship between truncation lengths, atom size, and slope, as well as bounds on these parameters, are in Appendix B1.
For each y, there is a slope β(y) which delivers Sender's highest utility among DTUs with intercept y (Lemma 9 in Appendix B2); I refer to the DTU G β(y) y as y-optimal.
Let q i (β, 0) be the smallest nonzero point of intersection between the DTU G β y and the prior F . Figure 2 shows an example of the 0-optimal DTU when F is a truncated normal distribution, highlighting the notation above. With notation fixed, the following proposition describes Sender's choice for small r * :  The key step in the proof is to show that the integral constraint binds only at a single interior point, the intersection q i (β(0), 0) between the DTU and the prior cdf (Lemma 11 in Appendix B3). 9 Using this result, I can adapt the the strategy used in my geometric proof of Proposition 1. Towards simplifying the integral constraint, note that a DTU G β 0 will have zero, one, or two interior intersections with F depending 9 When F has finite support, the integral constraint need not bind at any interior points. In on its slope. 10 Writing the integral constraint as a function of x, the intersections of F and G β 0 can be used to infer whether v is increasing or decreasing on particular intervals. Figure 3 shows an example of this approach. Combined with the observation that v(0) = v(1) = 0, this behavior allows me to show that if G β 0 has two interior intersections with F , then it satisfies the integral constraint if and only if v(q i (β, 0)) ≥ 0. To select among these Bayes-plausible DTUs, note that when y = 0 the slope of a concavified DTU G β 0 equals β in the lower truncation interval [0, ℓ] and uniform interval [ℓ, u]. Thus Sender's 0-optimal DTU is given by making β small (to minimize Nature's utility) while satisfying the simplified integral constraint.
The remainder of the proof uses the fact that the integral constraint binds only and adapts the strategy used in my geometric proof of Proposition 1. First, I show that any DTU which delivers higher utility for Sender than G β(0) 0 cannot be Bayes-plausible. I then approximate an arbitrary optimal distribution H by a DTU, using the concavification of H to ensure that this DTU upper-bounds H everywhere above the lower truncation. If H delivers Sender strictly higher utility than G β(0) 0 , the approximating DTU must do so as well; therefore it cannot be Bayes-plausible, and neither is H itself. If H delivers Sender the same utility as G β(0) 0 , then the approximating DTU is precisely G β(0) 0 and the concavification of H equals that of G β(0) 0 . This last step relies crucially on the slope of G β(0) 0 being the same as that of its concavification in the lower truncation interval [0, ℓ]. For a DTU with intercept y > 0, this property will no longer hold, and as a result the concavified optimal distribution will no longer be unique.
is not itself a unique solution to the maxmin persuasion problem. In the binary-state setting, Sender's optimal distribution was equal to its concavification everywhere on [0, 1], and any other distribution with the same concavification would have a different mean. In the continuous-state setting, a DTU differs from its concavification on the lower truncation interval, so it is possible for a non-DTU distribution H to have the same mean and concavification as a DTU. Thus uniqueness of the concavification is the strongest result that can be obtained.

5.3.
Optimal Distributions with Large r * . When r * > q i (β(0), 0), characterizing both the optimal DTU and optimal distributions more generally becomes more difficult. In fact, an optimal distribution may not exist, though Sender's supremum utility over a sequence of distributions converging to optimality is always well-defined.
Despite these challenges, I can still show that for sufficiently large values of r * , DTUs are not dominated by other distributions. This result holds without alteration when F has finite support: Proposition 3. Let r * ∈ [π, 1). Then no distribution of posterior means gives Sender strictly higher utility than all DTUs.
Proof. See Appendix B5, or Appendix B6 for the finite-state case. □ This proof is similar in approach to that of Proposition 2. When y > 0, the slope of a concavified DTU G β y is larger in the lower truncation interval [0, ℓ] than that of the DTU itself (since the concavification passes through the origin, while the DTU has intercept y), but is again equal to β in the uniform interval [ℓ, u]. Thus the concavified DTU has a kink at q = ℓ. However, setting r * ≥ π ensures that the kink does not affect the value of the concavified DTU at r * . Then, as in Proposition 2, Sender's y-optimal DTU for each intercept y is given by minimizing the slope β subject to the integral constraint. Unlike in that proposition, I cannot directly characterize which choice of intercept is optimal. In fact, since the set [0, 1) of possible intercept choices is not compact, it may be that the optimal choice is y = 1 and Sender's highest utility is attained only in the limit. However, I can still use the fact that each y-optimal DTU has minimal slope among Bayes-plausible DTUs with intercept y to extend the bounding argument of Proposition 2. This approach rules out as infeasible any distribution that delivers strictly higher utility than all DTUs, but again leaves room for non-DTU distributions that attain Sender's highest possible utility. 5.4. Non-Uniform Optimal Distributions. The result of Proposition 2 provides an appealing reason for focusing on DTUs as opposed to other maxmin-optimal posterior distributions: outside of the lower-truncation region, the optimal DTU is precisely equal to the unique optimal concavification. However, in the large-r * case of Proposition 3, the optimal concavification is no longer unique. To see why, assume an optimal DTU with y > 0 exists. Its concavification passes through the origin rather than the point (0, y), so it has a kink at q = ℓ. This kink can be used to alter the DTU without affecting Sender's utility. In particular, consider a distribution that places slightly positive mass in the interval [ℓ − ε, ℓ), has a smaller atom than the DTU at q = ℓ, and places slightly less mass than the DTU in the interval (ℓ, ℓ + ε]. This distribution, shown in Figure 4, changes slope at ℓ and ℓ + ε, but is equal to the DTU for q / ∈ (ℓ − ε, ℓ + ε). Whenever r * ≥ ℓ + ε, and in particular when r * ≥ π (the case in Proposition 3) the deviation delivers the same utility for Sender.

5.5.
Intermediate r * . The deviation in Figure 4 also sheds light on the difficulty of characterizing the optimal distribution when r * ∈ (q 1 (β(0), 0), π). In the binary-state setting the optimal distribution equals its concavification. With a continuous state, lower truncation is one possible response to the integral constraint, but it is not a unique solution for Sender because that constraint may only bind at a finite set of interior points. For instance, the deviation in Figure 4 gives Sender a greater utility than the corresponding DTU when r * ∈ (0, ℓ+ε) and is feasible whenever the integral constraint does not bind in that interval. Without further structure on the space of possible deviations from DTUs, even numerical approaches with a parametric prior distribution provide no insight, since they would require a novel algorithm to search over all mean-preserving contractions of the prior.
Despite this challenge, I am able to shed light on the prevalence of the intermediate-r * case by numerically estimating q i (β(0), 0) within a class of parametric prior distributions. For truncated normal priors-generated by taking a N (µ, σ 2 ) distribution and truncating it to lie in the unit interval-I show numerically that there is a gap between Propositions 2 and 3 only when µ < 1/2 and σ is large enough. For example, when µ = 0.2, shown in orange in Figure 5, there is a gap only when σ ≥ 0.135. Full details of the algorithm for computing q i (β(0), 0) are in Appendix C; Figure 5 shows an example of the output from these computations. Each color represents a fixed mean µ of the generating normal distribution, with the x-axis representing that distribution's standard deviation σ. Because the normal distribution is truncated to produce a prior in [0, 1], the "true mean" π of that prior depends on both µ and σ; it is shown as a dashed line. The solid line shows the numerically computed value q i (β(0), 0). Thus there is a gap between the small-r * case of Proposition 2 and the large-r * case of Proposition 3 if and only if a dashed line lies above its corresponding same-color solid line. When µ > 1/2, this property never holds and there is no gap between Proposition 2 and Proposition 3. When µ < 1/2, there is no gap for σ small enough, but a gap arises for larger σ. However, making σ too large violates the assumption f ′ (0) < 1 − 2π, invalidating the propositions. These results suggest that double-truncated uniform distributions are optimal for many possible priors.

Extensions
In the motivating example, a politician has a well-defined prior belief about the state of the world and knows the average voter's cost-effectiveness threshold, but she makes no further assumptions on the distribution. This difference in information is not unreasonable: the politician can fine-tune the details of her welfare program, but voter preferences are subject to a number of factors outside her control, e.g., opposition campaigning and news coverage. Loosely speaking, limited data about voter preferences allows the politician to estimate the population mean with convergence rate 1/n, but estimation of the distribution (or any given quantile) converges at rate 1/ √ n; she may thus be more willing to base her strategy on the former than the latter. Despite these justifications, it may still be realistic to weaken these informational assumptions; I do so in this section and discuss how my existing results extend.  Let u s (r * , r) be Sender's utility if she chooses the optimal distribution for mean Receiver type r * , but in fact faces a realized mean Receiver type r (drawn from some distribution chosen by Nature). For any r * , the value of u s (r * , r) is given by the concavification of a truncated uniform distribution (following Proposition 1 and Lemma 3), so it is linear in r. 12 Thus if the expected mean Receiver type chosen by Nature is r * * , Sender's utility from choosing the r * -optimal distribution is u s (r * , r * * ) regardless of the full distribution of mean Receiver types. This expression is maximized by choosing the r * * -optimal posterior distribution, in which case Sender could do no better even if she knew the mean Receiver type was r * * with certainty. Therefore, fixing some zero-cost mean Receiver type r * , Nature's gain from any distribution with mean 12 It is linear if r is in the support of Sender's optimal distribution. Any mass strictly above the support brings Nature no additional benefit, so I assume without loss that this choice is never made.
Receiver type r * * is captured by the difference between Sender's maxmin utility at r * and her maxmin utility at r * * . Sender's maxmin utility is a well-defined, continuously differentiable function of the mean Receiver type (computed using Proposition 1), so Nature's choice of r * * can be straightforwardly found by, e.g., setting the marginal cost of increasing the expected mean Receiver type equal to the marginal decrease in Sender's utility. Sender's optimal distribution when facing this new mean Receiver type is given by using r * * in Proposition 1.
In the continuous-state case, the presence of multiple optimal distributions (some of which do not have linear concavifications), the kink in the concavification of a doubletruncted uniform distribution (as discussed in Section 5.4), and the potential gap between Propositions 2 and 3 rule out the approach above. Finding Sender's optimal response in this case would require a different characterization of her maxmin utility, so I leave it for future work.

Conclusion
Bayesian persuasion provides a tractable model of communication that can be extended to include rich uncertainty about the Receiver who is the target of persuasion.
This work contributes to a growing literature that also introduces ambiguity by posing a maxmin persuasion problem, where the Sender seeks to be robust to any possible prior belief about Receiver types with a known mean. In a binary-state setting, I show a connection to mean-preserving contraction (MPC) games, where competing players choose mean-preserving contractions of probability distributions to obtain the highest realization, and fully characterize Sender's optimal distribution. As in many other MPC games, when her constraint is strong enough Sender chooses a uniform distribution mixed with atoms at the lowest and highest posterior beliefs. These results highlight the importance of the tie-breaking assumption in persuasion problems and emphasize the strength of the maxmin criterion, which delivers strictly lower utility for Sender than any prior belief over Receiver types when the probability of the high state is less than 1/2. I then use a geometric approach to show, in both a finite-support setting and a novel continuous-state setting, that uniform distributions with an atom at the lower bound of their support are in many cases still optimal.
Unlike in the binary-state setting, these distributions now have support in the interior of [0, 1], and Sender's optimal distribution is no longer unique. However, the intuition of linearizing the prior belief over states in order to make Nature indifferent between many worst-case Receiver type distributions is preserved.
and Nature's choice set is defined as The functional of Equation (1) is linear in both distributions, so it is convex in G and concave in T . For fixed G ∈ G, it is also lower semicontinuous on T : the result follows from lower semicontinuity of the indicator function (which applies for q > r, any q ∈ [0, r] produces the same value) and application of Fatou's Lemma.
Therefore, I can apply Theorem 2 of Fan (1953) to state that Because G is compact, I can in fact replace the supremum on the left-hand side of the equation with a maximum, giving precisely the expression in Equation (1). 13 It 13 I cannot replace the supremum on the right-hand side with a maximum, and indeed the results for MPC games are often stated using limits of sequences of distributions.
is then clear that Sender's utility with simulatenous moves in the MPC game must be equal to her utility in the maxmin persuasion problem.
I now prove the "if" portion of the lemma. Let G * solve the maxmin persuasion problem. Since the maxmin and minmax utilities for Sender are equal, the game has a value, and both Sender and Receiver have strategies that guarantee them at least the value. G * is by definition such a strategy; let T * be such a strategy for Receiver.
It must be that the pair (G * , T * ) guarantees each player exactly the value of the game because it is zero-sum: if either player's utility were strictly above the value, then the other's would be strictly below it. Thus G * is a best response to T * , since no other strategy gives Sender strictly higher utility (or T * would not guarantee Receiver the value). Similarly, T * is a best response to G * . Thus G * is a Nash equilibrium in the MPC game for Sender.
For the "only if" portion, let G * be a Nash equilibrium strategy in the MPC game for Sender. Then the Nash equilibrium payoff for Sender results from taking Nature's best response to G * . Since Nature's payoff is the opposite of Sender's, that payoff is therefore Sender's minimum utility from G * . Thus a Nash equilibrium distribution for Sender has the same payoff in the MPC game and the maxmin persuasion problem, and that utility is precisely equal to Sender's maximum utility in the maxmin persuasion problem, so G * solves the maxmin persuasion problem by definition. □ (1) The support of Receiver's distribution is not bounded above-Receiver may choose any mean-preserving contraction that obeys the lower bound.
(2) Tie-breaking is even-in the case of a tie, a winner is randomly chosen.
Proof. The game F is (modulo simultaneous moves, which Lemma 1 shows are irrelevant) the same as the maxmin persuasion problem. There, Sender persuades Receiver type r < 1 by generating a posterior q 0 r = r; with unfavorable tie-breaking, she must generate q ε r = r + ε for arbitrary ε > 0. Since posteriors must lie in [0, 1], posteriors q ε 1 are infeasible and Sender can never persuade Receiver type r = 1. As ε → 0, the effect on the Bayes-plausibility constraint from replacing any q 0 r with q ε r vanishes, allowing a Sender facing unfavorable tie-breaking to match her utility with favorable tie-breaking (and thus for any intermediate tie-breaking rule) for interior Receiver types, but not for type r = 1. Thus Sender's utility is affected by the tie-breaking rule if and only if she chooses a posterior distribution with an atom at q = 1.
for Nature. of Amir (2017) show that the Nash equilibrium strategy for Player B in the Captain Lotto game is unique when π ≤ 1/2, and therefore so is Sender's optimal posterior distribution. To complete the proof of Proposition 1, I replace the sufficient condition for Nash equilibrium when π > 1/2 in Theorem 10 of Amir (2017) Then the following equality holds: Proof. Manipulating the bounds of integration to rewrite Sender's objective function from Equation (1) gives (1 − G(r)) dT (r).
Then the minimzation portion of the problem can be written as where I have dropped the constant, rewritten the min as a max, and explicitly included the mean restriction to highlight the similarity to a Bayesian persuasion problem. In this case, the Receiver type r fills the role of "posterior belief," Nature's utility from a realized Receiver type is G(r), and the "prior" is the distribution with support {0, 1} and mean r * . This final point follows from the observation in Section 3 that when the prior distribution has binary support, the Bayes-plausibility constraint is the same as a mean restriction. Thus by Corollary 2 of Kamenica and Gentzkow (2011), Nature's utility is given byḠ. the concavification of G over the interval [0, 1], evaluated at the prior mean r * . Flipping the sign again, Sender's utility is 1 −Ḡ(r * ). □

My necessary and sufficient condition is an immediate consequence of this result:
Lemma 4. Let π > 1/2. Then a posterior distribution G * is optimal for Sender if and only if E G * [ω] = π and G * (q) ≤ q ∀ q ∈ [0, 1]. More than one distribution satisfying this condition always exists.
Proof. As established in Section 3, the only constraint on a feasible distribution G for Sender is that E G [ω] = π; I show that the second constraint is both necessary and sufficient for optimality.
Assume G * (q) ≤ q ∀ q ∈ [0, 1]. Then the function U (q) = q upper-bounds G * and is concave. Since U is the pointwise-smallest concave function on [0, 1] passing through the point (1, 1), it must therefore be the concavification of G * , and Sender's utility from G * is 1 − U (r * ) = 1 − r * . Because Nature may always choose a Receiver type distribution T with supp(T ) = {0, 1}, Sender's utility from any posterior distribution is no more than 1 − r * (the probability that Receiver type r = 0 is drawn from T ).
Thus G * attains the upper bound and is optimal for Sender. There are at least two such distributions for any π > 1/2. The first is given by solving π = n/(n + 1) for n and setting G * (q) = q n . The second is given by Therefore an optimal distribution always exists and is non-unique.
Now assume G * is optimal for Sender; then, since I have just shown an optimal distribution exists, it must be that 1 −Ḡ * (r tive proof of Proposition 1 is to show that if G gives Sender a higher utility than the optimal distribution G * , then that tangent line implicitly defines a cdf whose mean is greater than π. Because the tangent line lies below the function 1 − G, it must therefore be that G itself has a mean greater than π, and thus G is not a Bayes-plausible posterior distribution.
Towards establishing this result, consider upper-truncated uniform posterior distributions (henceforth UTUs), a class of posterior distributions which place mass x ≥ 0 on posterior q = 0, equal mass on all posteriors q ∈ (0, r h ] for some r h ≤ 1, and no mass on posteriors q ∈ (r h , 1]. I can use Bayes-plausibility to solve for the unique value of r h corresponding to a given x, so that a UTU is fully characterized by x: Since r h is uniquely determined by x, I denote a UTU by G x . The following lemma shows that a single choice of x is optimal among all UTUs and can be written as a closed-form function of r * : Lemma 5. Let π ≤ 1/2. Then if r * ≤ π, Sender's unique optimal UTU is G 0 ; if π ≤ r * ≤ 1/2, it is G 1−π/r * ; and if 1/2 ≤ r * it is G 1−2π .
Proof. By construction, any UTU G x is concave and is therefore equal to its concav-ificationḠ x . By Lemma 3, the utility from a UTU G x is therefore The first-order condition in x for the expression in brackets is The bracketed expression is increasing in x when x < x FOC and decreasing in x when x > x FOC . Since x ∈ [0, 1 − 2π], if r * < π the constrained optimal solution is x * = 0 and if r * > 1/2 the constrained optimal solution is x * = 1−2π; otherwise the optimum is the interior solution x * = x FOC = 1 − π/r * . □ I now prove two lemmas describing the relationship between the UTU G 1−2π and the function 1−Ḡ derived from an arbitrary posterior distribution G. The first establishes that if, for some posterior distribution G, the function 1 − G falls below 1 − G 1−2π at some mean Receiver type q, Sender's utility from G remains below her utility from G 1−2π for all higher Receiver types: then it is also the case that Proof. The proof is by contradiction. Assume there is q such that but that there is q ′ ∈ [q, 1) such that Since 1 −Ḡ(q) < 1 − G 1−2π (q) but 1 −Ḡ(q ′ ) ≥ 1 − G 1−2π (q ′ ), it must be that there is q 1 ∈ [q, q ′ ] where the slope of 1 −Ḡ is strictly greater than that of 1 − G 1−2π . But because G and G 1−2π are cdfs and 1 −Ḡ is weakly positive, so there must be q 2 ∈ [q ′ , 1] where the slope of 1 −Ḡ is weakly less than that of 1 − G 1−2π . Then q 1 ≤ q 2 but the slope of 1 −Ḡ at q 1 is strictly greater than at q 2 , violating convexity of 1 −Ḡ, and thus concavity ofḠ. □ The next lemma describes features of 1 −Ḡ when the posterior distribution G weakly improves on Sender's utility from G 1−2π : Lemma 7. If G ̸ = G 1−2π is a cdf such that 1 −Ḡ(r * ) ≥ 1 − G 1−2π (r * ) and q dG(q) = π, then the slope 14 of 1 −Ḡ at r * is strictly less than the slope of 1 − G 1−2π at r * .
Proof. I first show that there is q d ∈ (r * , 1] such that Note that for G to be distinct from G 1−2π , there must be some posterior q d ∈ [0, 1] It cannot be the case that If that is the case, then because G is a cdf, it is right-continuous, and therefore fixing Since the slope of 1 − G 1−2π is no greater than 0, setting ε ∈ (0, Therefore there is a non-degenerate interval where 1 − G > 1 − G 1−2π , and by assumption 1 − G ≥ 1 − G 1−2π everywhere on [0, 1], so integrating the inequality gives a violation of Bayes-plausibility: Thus by contradiction there must be q d ∈ [0, 1] such that 14 Because 1 −Ḡ is convex, it is continuous on (0, 1) and its left and right derivatives are always well-defined. The function 1 − G for any UTU G is also continuous with well-defined left and right derivatives. When referring to the slope or to a tangent line I consider the right derivative.
By Lemma 6, since 1 −Ḡ(r * ) ≥ 1 − G 1−2π (r * ), there is no q ∈ [0, r * ) where 1 − G(q) < 1 − G 1−2π (q). Thus it must be that and therefore q d ∈ (r * , 1]. The claim now follows by the argument in Lemma 6. Since 1 −Ḡ(r * ) ≥ 1 − G 1−2π (r * ) where the slope of 1 −Ḡ is strictly less than that of 1 − G 1−2π . But sinceḠ is concave, 1 −Ḡ is convex and its slope cannot increase as q decreases; the slope of 1 −H at r * must therefore be strictly less than that of 1 − G 1−2π at r * . □ The implication is vacuous for r * ≤ 1/2, where there are no posterior distributions that meet the conditions; however, even in that case the result is central to a proof by contradiction.
With these three lemmas in hand, I now provide an alternative proof of the case π ≤ 1/2 in Proposition 1: Lemma 8. If π ≤ 1/2, Sender's unique optimal posterior distribution is as follows: • If r * ≤ π ≤ 1/2, • If π ≤ r * ≤ 1/2, • If π ≤ 1/2 ≤ r * , Proof. The proof is by contradiction. Let G be a proposed alternative posterior distribution that delivers weakly greater utility for Sender than G * . By Lemma 3 (to define the utility from each posterior distribution) and Lemma 5 (since G * is a UTU, it must be uniquely optimal among UTUs), it is the case that Consider the line L that is tangent to 1 −Ḡ at r * . 15 BecauseḠ is convex and weakly positive (recall that the line ℓ(q) = 0 is convex and lower-bounds 1 − G), it is lowerbounded by L + (q) = max {L(q), 0}. Furthermore, by Lemma 7, the slope of L is less than that of 1 − G 1−2π , so it must be that For any , there exists an UTU-call it G alt for alternative-with 1 − G alt (0) = L + (0). If G alt ̸ = G * , then because G * is uniquely optimal among UTUs, it must be that Then, because L and 1 − G alt intersect at q = 0 but L is greater than 1 − G alt at q = r * , it must be that the slope of L is strictly greater than the slope of the strictly downward-sloping portion of 1 − G alt ; therefore in fact Integrating the expression and using the fact that L + lower-bounds 1 −Ḡ, which in turn lower-bounds 1 − G, it is the case that The first and penultimate equalities are both from integration by parts, and the final equality is because all UTUs (including G alt ) are Bayes-plausible by construction.
Therefore G violates Bayes-plausibility and is not a valid alternative distribution. 15 Recall that if r * is a kink point of 1 −Ḡ, I use the right derivative of 1 −Ḡ to define the slope.
Even when G alt = G * , it is still the case that, whenever the slope of L is greater than the slope of the strictly downward-sloping portion of 1 − G * . In this case, L + (q) > 1 − G * (q) ∀q ∈ (0, r * ] and just as before. Thus G again violates Bayes-plausibility. If instead G alt = G * but now L + (r * ) = 1 − G(r * ), it must be the case that L and the strictly downward-sloping portion of 1 − G have the same slope, so in fact Then then there are two possible cases. The first is trivial: so that G is not a deviation at all. In the second, there must be some q ∈ [0, 1] so that 1 − G(q) > L + (q); recall that L + lower-bounds 1 − G, and thus the direction of the inequality is known. Because G is a cdf, it is right-continuous, and therefore fixing ε > 0 there is δ(ε) > 0 such that Since the slope of L + is no greater than 0, setting ε ∈ (0, 1 − G(q) − L + (q)) ensures Therefore there is a non-degenerate interval where 1 − G > L + , and 1 − G ≥ L + everywhere on [0, 1], so integrating the inequality gives q dG(q) = 1 − G(q) dq > L + (q) dq = 1 − G * dq = q dG * (q) = π, as desired. Having covered both the case G alt ̸ = G * and the case G alt = G * , I have shown that in all cases H violates Bayes-plausibility and therefore, by contradiction, The full proof of Proposition 1 without reference to MPC games is therefore obtained by combining Lemmas 4 and 8. One optimal posterior distribution for Sender's is as follows: • If r * ≤ π ≤ 1/2, • If π ≤ r * ≤ 1/2, • If 1/2 < r * , G * (q) = (1 − π) δ 0 + π δ 1 .
• If r * ≤ 1/2 < π, Receiver type for that posterior belief. This choice allows Sender to obtain a utility higher than 1 − r * , since even if the Receiver type is r = 1, they are now persuaded whenever posterior q = 1 is realized. However, creating this atom tightens the Bayesplausibility constraint, so if neither Sender nor Nature's constraint is slack enough to allow frequent realizations of 1, Sender uses the same approach as with unfavorable tie-breaking. Thus when the probability of the high state and the mean Receiver type are both small, Sender's maxmin utility remains strictly below her utility with even the most unfavorable prior belief about Receiver types, regardless of whether tie-breaking is favorable or not.
In the maxmin persuasion context, it seems natural to break ties either entirely in favor of or entirely against Sender. Those rules allow me to interpret a Receiver of type r either as the highest Receiver type who is convinced by posterior belief q = r or the lowest Receiver type who is not convinced by that belief, respectively. However, if the MPC game is interpreted as competitive persuasion, as in Boleslavsky and Cotton (2015), then it also seems reasonable to consider breaking ties evenly, so as to The unique optimal posterior distribution G * for Sender's is as follows: • If r * ≤ π ≤ 1/2, • If π ≤ r * ≤ 1/2, • If 1/2 ≤ π and r * ≤ π, • If 1/2 ≤ r * and π ≤ r * , Proof. This result appears verbatim in Boleslavsky and Cotton (2015), with Sender as Player A when r * ≤ π and Player B when r * ≥ π. □ Finally, note for general interest that in the MPC game when both players' feasible distributions have domain R + (and are mean-preserving contractions of binary support distributions), the unique solution is the same as the cases r * ≤ π ≤ 1/2 and 16 When the players are persuading a Receiver about a common state of the world, as in Au and Kawai (2020), it is reasonable to also require π = r * . In Boleslavsky and Cotton (2015), the players are schools convincing a Receiver about the binary ability of a student drawn from a school-specific distribution, so π ̸ = r * represents one school producing more high-type students on average. π ≤ r * ≤ 1/2 of Corollary 2, with the relationship between π and r * determining which case applies. The solution when π = r * , so that the constraints are symmetric, first appears in Bell and Cover (1980), and also appears in Myerson (1993). The solution for the asymmetric case first appears in Sahuguet and Persico (2006), and also appears in Hart (2008).
Appendix B: Omitted Proofs for Section 5 B1: Properties of DTUs. To begin, I describe DTUs in more detail. The uniform portion of the DTU (between the lower and upper truncations) has slope β, which I refer to as the slope of the DTU. The line L(q) = βq + y, which forms that uniform portion, intersects the vertical axis at y; I refer to this value as the intercept of the DTU. To derive a relationship between β, y, and ℓ, I use the fact that Bayesplausibility requires E G [ω] = π. This condition immediately imposes the restriction that ℓ ∈ [0, π]; using simple geometry to compute the integral of a DTU's cdf and set it equal to 1 − π shows that This expression is continuously differentiable for ℓ ∈ (0, π] and y ∈ [0, 1). Fixing ℓ, β(ℓ, y) is injective and decreasing in y. Fixing y, β(ℓ, y) is injective and increasing in ℓ, attaining a maximum of β(π, y) = (1 − y)/π. While β(0, y) is not defined using the expression above, the limit from the right exists: I thus define β(0, y) = (1 − y) 2 /(2π) explicitly. For y ∈ [0, 1 − 2π], β(0, y) is the slope of the UTU with intercept y. When y > 1 − 2π, there is no corresponding UTU; instead, the lower bound of interest is β(ℓ, y) = 1 − y, the slope that satisfies The assumption y > 1 − 2π implies 1 − y ∈ ((1 − y) 2 /(2π), (1 − y)/π), so the lower bound is attained at an interior ℓ ∈ (0, π); I call this value ℓ min y . Because the function β(ℓ, y)−(1−y) is continuously differentiable, the Implicit Function Theorem ensures that I can write ℓ min y as a continuously differentiable function of y.
17 This is the desired lower bound because any cdf H over [0, 1] must satisfy H = 1, and I wish to use DTUs to upper-bound other feasible probability distributions.
The concavification of a DTU is easy to compute: so long as the slope of the line through (0, 0) and (ℓ, β(ℓ)ℓ + y) is weakly less than β(ℓ, y), the concavification will bē That condition is simply Lemma 9. Given r * ∈ (0, 1) and y ∈ [0, 1), there is a well-defined DTU G β(y) y with lower truncation length ℓ * y that maximizes Sender's utility among all Bayes-plausible DTUs with intercept y.
Proof. Let V y ⊂ [0, π] be the set of ℓ such that a DTU with lower truncation ℓ and intercept y is Bayes-plausible. I first show that V y is closed; since it is clearly also bounded, V y is therefore compact. To do so, I define the function for some DTU G ℓ y with intercept y and lower truncation ℓ. This function captures the value of the Bayes-plausibility integral constraint for G ℓ y at x ∈ [0, 1]. Clearly v(0, ℓ) = 0, and v(1, ℓ) At any x, the integral of G ℓ y on [0, x] is continuous in ℓ. This result is obvious for x ̸ = ℓ (since G ℓ (q) is continuous in ℓ at those points) and holds for x = ℓ because the left and right limits as x → ℓ are both 0. Therefore v(x, ℓ) is also continuous in ℓ for fixed x, since it depends on ℓ only through that integral. If G ℓ y is not Bayes-plausible, then (since it satisfies E G ℓ y [ω] = π by construction) there must be some x neg ∈ (0, 1) for which v(x neg , ℓ) < 0. Because v(x neg , ℓ) is continuous in ℓ, there is ε > 0 such that for any ℓ ′ in a ε-neighborhood of ℓ, v(x neg , ℓ ′ ) < 0. Therefore any G ℓ ′ y is not Bayes-plausible, so U ⊂ [0, π], the set of ℓ where Bayes-plausibility fails, is open.
Since V y = [0, π] \ U , it must be that V is closed.
By Lemma 3, Sender's utility from a DTU is given by This function is continuous in ℓ on [0, π]. Since β(ℓ) is continuous in ℓ on [0, π], each of the two piecewise portions of u S are clearly continuous in ℓ; it remains only to check the case ℓ = r * . But because the left and right limits as ℓ → r * exist (by continuity of each piecewise portion) and are equal (by construction of u S ), u S is continuous at ℓ = r * as well. Therefore the image of V under u S must be compact, and thus contains a well-defined maximum, which is attained by some (possibly multiple) ℓ ∈ V . □ Unlike in the binary-state setting, it is not possible to solve analytically for G β(y) y . However, appropriate sufficient conditions can ensure that G β(y) y is both unique and slope-minimizing among Bayes-plausible DTUs with intercept y: Lemma 10. Fix y ∈ [0, 1) and r * ∈ (0, 1). There is a unique and well-defined DTU G sm y that has minimal slope among all Bayes-plausible DTUs with intercept y. If y = 0 or r * ∈ [π, 1), then the y-optimal DTU G β(y) y equals G sm y Proof. Fix y ∈ (0, 1). By Lemma 9, the set V y of values of ℓ such that G ℓ y is Bayesplausible is closed, and the function β(ℓ, y) is continuous and monotonic in ℓ for fixed y, so there is a unique ℓ sm ∈ V y such that β(ℓ sm , y) = inf ℓ∈Vy β(ℓ, y). Now I show that either of the conditions provided in the lemma are sufficient for the slope-minimizing DTU to be optimal. First fix y = 0. Then β(ℓ) + y/ℓ = β(ℓ), so u S (r * , ℓ) = 1 − G(r * ); that is, there is no kink at ℓ in Sender's utility from DTUs with intercept 0. Thus Sender's utility from G ℓ y is strictly greater than her utility from G ℓ ′ y if and only if β(ℓ) < β(ℓ ′ ). By Lemma 9, there exists a DTU G β(0) 0 with lower truncation length ℓ * 0 that maximizes Sender's utility among all Bayes-plausible DTUs with intercept 0. No other Bayes-plausible DTU can have a strictly smaller slope, since then it would deliver a strictly higher utility. But no other Bayes-plausible DTU can have the same slope, β(ℓ * 0 ), since there can be no ℓ ′ ̸ = ℓ * 0 where β(ℓ) = β(ℓ * 0 ). Therefore all other Bayes-plausible DTUs have strictly larger slope, and so G β(0) 0 satisfies both (1) and (2).
If instead r * ∈ [π, 1), then similarly u S (r * , ℓ) = 1 − G(r * ); since ℓ ∈ [0, π], r * surely lies weakly above ℓ. The argument is then the same; a DTU is utility-maximizing if and only if it is slope-minimizing, Lemma 9 guarantees the existence of a utilitymaximizing DTU, and the injectivity of the map from ℓ to β(ℓ) guarantees uniqueness. □ B3: Simplifying the Integral Constraint. Let U r * be the set of utilities attained by any y-optimal DTU: U r * = u S (r * , ℓ, y) | G y ℓ = G β(y) y for some y ∈ [0, 1) , where I restore the dependence on y in u S , since y is no longer fixed. That set is a subset of [0, 1], and is therefore bounded, so sup U r * , Sender's supremum utility over all y-optimal DTUs (and thus over all DTUs) is well-defined and contained in the closure of U r * . Further restrictions on F and r * provide sufficient conditions for U r * to be closed, and thus for the maximum to exist. In order to state these sufficient conditions, I first prove Lemma 11. In this proof, I again drop the dependence on y from all functions since y is fixed, but note important changes in the argument for different values of y.
Lemma 11. Let G sm y be the DTU with the minimal slope among all Bayes-plausibile DTUs with intercept y, and let ℓ sm y be its lower truncation length. If y ∈ [0, 1 − 2π], then the minimal interior q where G β y (q) = F (q), call it q 1 (ℓ, y), is well-defined and ℓ sm y satisfies if and only if x ∈ (0, q 1 (ℓ sm y , y)) ∪ (q 1 (ℓ sm y , y), 1).
If instead y ∈ (1 − 2π, 1), then either the two conditions above hold or ℓ sm y equals the minimum lower truncation length ℓ min y .
Proof. By Lemma 10, there exists a unique minimal-slope Bayes-plausible DTU with intercept y.
Because of the shape of F , the equation L(q) = β(ℓ) q + y = F (q) has at most two solutions with q ∈ (0, 1]. In particular, if the slope of L is such that it lies completely above F in (0, 1], then there are no solutions in that interval; if the slope of L is such that it is tangent to F , then there is one; 18 and if the slope of L is less than that of the tangent to F through y, there are two. Consider a DTU G β(ℓ) y with lower truncation length ℓ. If L(q) ≥ F (q) ∀ q ∈ (0, 1]-that is, L is either tangent to F at a point q t or lies entirely above F -then this DTU satisfies Bayes-plausibility. The function v(x, ℓ), which gives the value of the Bayes-plausibility integral constraint for G β(ℓ) y at some x ∈ [0, 1], is weakly decreasing whenever L(x) ≥ F (x). 19 Thus v(x, ℓ) is weakly decreasing for all x ∈ (ℓ, 1).
Since v(1) = 0, it must therefore be that v(x, ℓ) ≥ 0 ∀ x ∈ (ℓ, 1); of course v(x, ℓ) ≥ 18 There is at most one value of ℓ such that β(ℓ) q + y is tangent to F in (0, 1]. 19 When L(x) > 1, G ℓ y (x) = 1 rather than following L(x), but since the line y = 1 is an upper bound on F as well, the upper truncation does not affect the behavior of v(x, ℓ).
The case where L intersects F twice in (0, 1] will form the bulk of the proof. In particular, let q 1 be the smallest q ∈ (0, 1] such that β(ℓ) q + y = F (q), and let q 2 be the largest. 20 By the Implicit Function Theorem, since the function β(ℓ) q + y − F (q) is continuously differentiable in all variables, I can write q 1 and q 2 as continuous functions of ℓ. Note that because of this definition, q 1 and q 2 are both well-defined (and satisfy q 1 = q 2 ) if β(ℓ) q + y is tangent to F , as well as for all smaller values of ℓ. I now address two-intersection DTUs by focusing on the cases q 1 (ℓ) > ℓ and then v(q) ≥ 0 ∀ q ∈ [0, 1] and Bayes-plausubility is satisfied. Given the increasing and decreasing behavior of v(x, ℓ), it is clear that v(q 1 (ℓ), ℓ) = min Therefore if a DTU violates Bayes-plausibility, it must be because v(x, ℓ) < 0 for some x ∈ (0, 1), which in turn implies that v(q 1 (ℓ), ℓ) < 0. Thus when q 1 (ℓ) > ℓ,

Equation (2) is a necessary and sufficient condition for a DTU to be Bayes-plausible.
Furthermore, if the inequality is strict for some ℓ, then because v(q 1 (ℓ), ℓ) is continuous in ℓ, it is also strict for ℓ − ε.
To close out the case q 1 (ℓ) < ℓ, I show that either q 1 < q 2 < 1 or β(ℓ min y ) q + y does not intersect F twice. To see why, note that if q 2 (ℓ) = 1 then either y = 1 − 2π and ℓ = 0, or y ∈ (1 − 2π, 1) and ℓ = ℓ min y . In the former case, any the DTU is actually a UTU, and any UTU intersects F twice: otherwise it lies weakly above F on the interval [0, 1] and strictly above F on some measurable subset of [0, 1], and could not have the same mean as F , contradicting the construction of UTUs. Thus q 1 (ℓ) < 1, v(x, ℓ) is strictly increasing in (q 1 , q 2 ) and is negative at x = q 1 , and G β(ℓ) y is not Bayes-plausible. In the latter case, if β(ℓ min y ) q + y intersects F twice, then the same argument applies and G ℓ y is not Bayes-plausible.

Equation
(2) is a necessary and sufficient condition for Bayes-plausibility of G β(ℓ−ε) y , and therefore G β(ℓ−ε) y is Bayes-plausible and has a smaller slope than G ℓ y .
21 Of course, this choice may not be valid for y > 1 − 2π, since the lower bound on the set of valid ℓ is strictly above ℓ = 0; if so, I cannot rule out that q 1 (ℓ) ≤ ℓ for the minimum permissible ℓ.
Having established sufficient conditions for when Bayes-plausibility is satisfied, I can now use them to obtain the desired characterization of the slope-minimizing lower truncation length ℓ sm y . I begin with the case y ∈ [0, 1 − 2π] and show that ℓ sm y satisfies v(q 1 (ℓ sm y ), ℓ sm y ) = 0. When y ∈ [0, 1 − 2π], the lowest permissible slope for a DTU is (1 − y) 2 /(2π), the slope of the UTU with intercept y. Therefore the line L(q) = q (1 − y) 2 /(2π) + y must intersect F twice in (0, 1]. Furthermore, the line L(q) = q (1 − y)/π + y corresponds to the maximum permissible slope for a DTU, and thus must lie above F for the mean of that DTU to equal the mean of F . Therefore by continuity of β(ℓ) in ℓ and continuity of f , there exists a value ℓ t ∈ (0, π) where the line L(q) = β(ℓ) q + y is tangent to F . The point of tangency must be interior, as β(ℓ t ) · 1 + y = 1 only if ℓ t = 0, in which case the line β(ℓ t ) q + y forms part of a UTU and (as argued above) cannot be tangent to F . Therefore, for ε > 0 sufficiently small the line β(ℓ t − ε) q + y intersects F twice, and both intersections are bounded strictly below 1. As argued when showing that q 1 (ℓ) ≤ ℓ implies Bayes-plausibility of G β(ℓ) y , the constraint in Equation (2) does not bind for G ℓt y , so it does not bind for G ℓt−ε y , and the latter DTU is therefore Bayes-plausible. Thus the y-optimal DTU G sm y cannot be tangent to F and must intersect F twice in (0, 1]. Since y ∈ [0, 1 − 2π], as shown for the case q 1 (ℓ) ≤ ℓ it cannot be that q 1 (ℓ sm y ) ≤ ℓ sm y . Therefore q 1 (ℓ) > ℓ and the necessary and sufficient condition for Bayes-plausibility in Equation (2) applies.
To show that it holds with equality, consider the UTU corresponding to ℓ = 0. It is not Bayes-plausible 22 and intersects F twice, so it must be that v(q 1 (0), 0) < 0.
22 Any UTU with y = 0 has an atom at 0 while F does not. If y = 0, the restriction that f (0) < 1/(2π) ensures that the UTU is not Bayes-plausible, since there is ε > 0 such that the UTU places more mass in the interval [0, ε] than does F .
To complete the proof of the lemma, I show that if y ∈ (1 − 2π, 1), then either ℓ sm y = ℓ min y or v(q 1 (ℓ sm y ), ℓ sm y ) = 0. Assume that β(ℓ min y ) q + y intersects F twice; otherwise clearly G ℓ min y y is Bayes-plausible and ℓ sm y = ℓ min y . Assume also that the smallest ℓ for which v(q 1 (ℓ), ℓ) = 0, which I label ℓ 0 y , satisfies ℓ 0 y > ℓ min y ; otherwise clearly G ℓ 0 y y is both Bayes-plausible and slope-minimizing, so again ℓ sm y = ℓ min y (if no ℓ satisfying v(q 1 (ℓ), ℓ) = 0 exists, I let ℓ 0 y = π, and the argument still holds). If ℓ sm y ∈ (ℓ min y , ℓ 0 y ), then it must be that β(ℓ sm y ) q+y intersects F twice, because β(ℓ 0 y ) q+y does. By the definition of ℓ 0 y , v(q 1 (ℓ * y ), ℓ sm y ) ̸ = 0. Clearly that expression cannot be strictly positive, or by continuity there would be ε > 0 small enough so that ℓ sm y − ε is both a valid choice of ℓ (i.e., greater than ℓ min y ) and generates a Bayes-plausible DTU. It must therefore be strictly negative, which means that q 1 (ℓ sm y ) ≤ ℓ sm y ; otherwise G sm y would not be Bayes-plausible. But then the proof that q 1 (ℓ) ≤ ℓ cannot occur for small ℓ implies that there is ε > 0 small enough so that ℓ sm y − ε > ℓ min y and G ℓ sm y −ε y is Bayes-plausible, which contradicts the slope-minimizing property of ℓ sm y (the caveat for y > 1 − 2π does not apply, since we have already covered and ruled out the case ℓ sm y = ℓ min y ). Thus it cannot be true that ℓ sm y ∈ (ℓ min y , ℓ 0 y ), so it must be that either ℓ sm y = ℓ min y or ℓ sm y = ℓ 0 y ; the latter implies the desired condition v(q 1 (ℓ sm y ), ℓ sm y ) = 0. □ B4: Overall-Optimal DTUs. The simplified integral constraint in Lemma 11 can be used as a key step in deriving the continuity of ℓ * y in y, and thus in providing sufficient conditions for the existence of an overall-optimal DTU in Lemma 12. As an immediate corollary, though, it allows a characterization of the overall-optimal DTU when r * is small: Corollary 3. Let β(0) be the slope of the 0-optimal double-truncated uniform distribution G β(0) 0 , and let q i (β(0), 0) be the smallest q ∈ (0, 1) that satisfies β(0) q = F (q). 23 If r * ≤ q i (β(0), 0), then G β(0) 0 is uniquely optimal among all double-truncated uniform distributions. 23 The existence of q i (β(0), 0) is guaranteed by the proof of Lemma 11.
Proof. The proof is by contradiction, and resembles the geometric proof of Proposition 1 for the binary-state setting. Fix r * and assume some other DTU G does weakly better than G β(0) 0 for Sender. It must therefore have a smaller slope than G * 0 : the intercept of G is larger than that of G β(0) 0 , and G must intersect the horizontal line y = 1 at a larger value of q than G β(0) 0 or the concavification of G would be everywhere The inequality in the first line is strict because F (q 1 (β(0), 0)) < 1, so r * is not in the upper-truncated region of G and there is some strict difference between G β(0) 0 and G captured in the integral. The first implication follows from the bound on r * . The inequality in the third line is because all DTUs have equal means, so The equality in the third line is by Lemma 11, since by Lemma 10 the DTU G β(0) 0 has minimal slope among Bayes-plausible DTUs with intercept 0. □ Using the characterization of Lemma 11, I now prove a sufficient condition on F for U r * , the set of utilities attained by y-optimal DTUs, to be compact, and thus for Sender to have a well-defined overall-optimal DTU: Lemma 12. Let r * ∈ [π, 1] and f (1) > 0. Then Sender's maximum utility over all double-truncated uniform distributions is well-defined, and is attained by a doubletruncated uniform distribution G * .
Proof. I first show that the optimal lower truncation length ℓ * y is continuous in y at any y ∈ [0, 1). Given the restriction on r * , Sender's utility from a y-optimal DTU G β(y) y is given by 1 − G β(y) y = 1 − (β(y), y) r * + y). Thus continuity of ℓ * y in y ensures that Sender's maximum utility over DTUs with intercept y is continuous in y. I can then provide sufficient conditions for the intercept of a potential overall-optimal DTU to lie in a compact set. The continuity condition implies that U r * is compact, so that it contains its closure. Therefore there is some DTU that attains Sender's supremum utility over all DTUs.
To show continuity, I first work with y ∈ [0, 1 − 2π), where the argument is most straightforward. Since in that range v(q 1 (ℓ * y , y), ℓ * y ) = 0 by Lemma 11, and the proof of that lemma shows that ℓ * y is the minimal ℓ where the property holds, I can apply the Implicit Function Theorem to write ℓ * y as a continuous function of y.
Because both ℓ min y and the minimal ℓ satisfying v(q 1 (ℓ, y), ℓ) = 0 are continuous in y, the minimum over those two choices is also continuous in y. Thus ℓ * y is continuous in y for y ∈ (1 − 2π, 1).
All that remains is to show that ℓ * y is continuous in y at y = 1 − 2π. The continuity of ℓ min y in y ensures that the function is also continuous in y. Because u(y) > 0 for any y ∈ [0, 1 − 2π], as shown in the proof of why q 1 (ℓ) > ℓ for small enough ℓ, it must be that for δ > 0 sufficiently small, u(y ′ ) > 0 for any y ′ ∈ (1−2π, 1−2π+δ). Since the line β(0, 1−2π) q+(1−2π) intersects F twice in (0, 1], it must therefore be that for δ > 0 sufficiently small and y ′ ∈ (1−2π, 1−2π+δ), so do the lines β(ℓ min y ′ , 1−2π) q+(1−2π), β(ℓ min y ′ , y ′ ) q+(1−2π), and β(ℓ min y ′ , y ′ ) q + y ′ . Because the last intersects F twice in (0, 1], and both intersections occur at values q > ℓ min y ′ , the proof of Lemma 11 shows that v(q 1 (ℓ * y ′ , 0), ℓ * y ′ ) = 0 and ℓ * y ′ is the minimal value of ℓ such that this property holds. Therefore, by the continuity of the minimal value of ℓ satisfying this equation, ℓ * is continuous in y at Having shown continuity of ℓ * y in y, I use the second part of the lemma statement to show that the set of possibly overall-optimal DTU intercepts is compact. Given that f (1) > 0, there must beȳ ∈ (0, 1) such that 1 −ȳ > f (1). Then for any intercept y ≥ȳ, the DTU with minimal permissible slope lies above F on (0, 1), and is therefore Bayes-plausible. Since any DTU with intercept y >ȳ surely lies above the slope-minimal DTU with interceptȳ for all q ∈ [π, 1], no DTU with intercept in (ȳ, 1) can be optimal among all DTUs. Thus the intercept of the overall-optimal DTU lies in the compact set [0,ȳ]. □ Note that only the last step of the proof relies on f (1) > 0; if this condition is violated, then (as in the statement of Proposition 3 in the text) it may be that no DTU attains Sender's supremum utility, but there exists a limiting sequence of DTUs converging to that value so no distribution delivers strictly higher utility than all DTUs.
B5: Optimal Posterior Distributions. Having established properties of overalloptimal DTUs, I can now jointly prove the optimal distribution portions of Propositions 2 and 3: Proof. Let H be a candidate optimal distribution of posterior means. I approximatē H, the concavification of H, by a tangent at r * , which I call L(q); let L(0) = y L ∈ [0, 1) be its intercept. Consider the y L -optimal DTU G β(y L ) y L . In order for H to do at least as well for Sender as G β(y) y L , by Lemma 3 it must be that Thus L must have a weakly smaller slope than G β(y L ) y L ; otherwise L(r * ) > G β (y L ) y L (r * ) and the above inequality is violated.
If y L ∈ [0, 1 − 2π], then for any slope β ∈ ((1 − y L ) 2 /(2π), β(y L )] there is a DTU with that slope and intercept y L . If instead y L ∈ (1 − 2π, 1), then for any slope β ∈ [1 − y L , β(y L )] there is a DTU with that slope and intercept y L . In the first case, the slope of L cannot lie below that interval or it would have a weakly smaller slope than the UTU with intercept y L ; then the argument of Proposition 1 applies and H is not Bayes-plausible. In the second case, L must have a slope weakly greater than the lowest-slope DTU with intercept y L , or it would fail to pass through (1, 1), and therefore so wouldH and H. Thus there is a DTU G L with the same slope as L.
If instead r * ∈ [π, 1) and G L has a strictly smaller slope than G β(y L ) y L , then by Lemma 10, G L is not Bayes-plausible.
In either case, given that G L violates Bayes-plausibility, H must violate it as well.
Because G L upper-boundsH beyond ℓ, it must be that for any q ∈ [ℓ, 1]. Since G L violates Bayes-plausibility, there is some and since the left-hand side equals 0 for any q ∈ [0, ℓ), it must be that q v ∈ [ℓ, 1].
Then because H and G L have the same mean, where the third line follows from the earlier upper bound on the integral of H. Therefore H violates Bayes-plausibility and is not a valid distribution.
If G L has the same slope as G (i.e., it has no kink at ℓ * 0 ) there is no smaller concave function that takes the same values at q = 0 and q = r * ; thusH =Ḡ β(0) 0 on [0, r * ] as well.
If instead ℓ * 0 > r * , then it is now the case thatH = G , since that is the range where the latter equality holds. However, the upper-bounding relationship still holds on [0, ℓ * 0 ], and thus the argument above still applies andH =Ḡ The Finite-State Case. Note that the proofs and results of Lemmas 9 and 10 go through unchanged. Thus I can in fact easily prove an analogue for Proposition 3 by following the proof in Appendix B5. In particular, a candidate optimal distribution can be approximated by a DTU G L . If G L has a strictly smaller slope than the yoptimal DTU with the same intercept y, then by assuming r * ∈ [π, 1] and following Lemma 10, it cannot be Bayes-plausible. Given that G L is not Bayes-plausible, neither is the candidate optimal distribution. Since there is a y-optimal DTU for any possible y, no distribution can give Sender strictly higher utility than all DTUs.
To obtain a tighter characterization of which DTU is optimal in this setting, I can prove an analogue of Lemma 11, showing where the integral constraint binds for the y-optimal DTU: Lemma 13. Let y = 0 or r * ∈ [π, 1] so that the slope-minimizing Bayes-plausible DTU is also Sender's optimal DTU. Fix the value of y, and let q i be the location of the ith interior atom of the prior F . Then either the optimal lower truncation length ℓ * y is equal to the minimum lower truncation length ℓ min y , or for at least one i ∈ {1, ..., N − 2} it is true that Proof. For completeness, define q 0 = 0.
The proof is algorithmic; the algorithm for finding the optimal DTU is as follows.
(1) Initialize β as the minimal feasible slope for a DTU with intecept y.
(2) For i ∈ 1, ..., N − 2: (a) Check whether the integral constraint is satisfied at the left limit of q i .
That is, whether If the constraint is satisfied, exit.
In step (b), if the integral constraint is satisfied at the left limit of q i , it must be satisfied everywhere in [q i−1 , q i ), since in that interval F is constant but G β y is weakly decreasing. The algorithm first finds the minimal value β where G β y satisfies the 24 By the proof of Lemma 9, the difference of integrals is continuous and monotonically increasing in β, so there will be exactly one value where the constraint binds.
integral constraint in [0, q 1 ), then proceeds across subsequent intervals, increasing β if necessary to ensure the integral constraint is satisfied. Finally, the integral constraint is automatically satisfied in [q N −2 , 1] because G β y has the appropriate mean.
Thus either the initial value of β satisfies the integral constraint for all i, in which case that minimal feasible slope is y-optimal, or the constraint is satisfied for at least one i, giving the result in the lemma. □ As in the continuous-state case, the integral constraint binds at a finite and possibly empty set of points for each y-optimal DTU. In the continuous-state case, this set was guaranteed to be nonempty for all y ∈ [0, 1 − 2π]; however if F (0) > 0 in this finitesupport setting, the set may be empty for even G β y (0), the 0-optimal DTU. In the case where there does exist a minimal q i -call it q min -where the integral constraint binds for G β(0) 0 (a fact which depends on the specification of F ), there is a natural analogue of Corollary 3: if r * ∈ [0, q min ], then G β(0) 0 is Sender's overall-optimal DTU. The proof exactly parallels that of the original corollary. Indeed, the analogue of Proposition 2 also follows, since the proof of optimality in Appendix B5 then goes through in the same way.
Appendix C: A Numerical Approach to the Continuous-State Setting C1: Summary of Numerical Results. Propositions 2 and 3 leave open the optimal distribution of posterior means when the mean Receiver type r * lies in π ∈ (q i (β(0), 0), π). While the value q i (β(0), 0) is well-defined for a given prior distribution, a closed-form solution may not exist. However, fixing a prior distribution, I can use a two-step solution algorithm to numerically compute q i (β(0), 0) and show qualitatively how the size of the intermediate interval changes with various properties of the prior distribution.
Informally, given a prior F with mean π, the first step is to find the 0-optimal slope β(0). To check Bayes-plausibility, I use the simplified integral constraint from Lemma 11. Starting with the minimum β, I increase β only if the constraint is violated and stop when it binds. The second step checks for intersections between the 0-optimal DTU and F ; by definition the smallest interior intersection is q i (β(0), 0). A full formal description is in Appendix C2.
I briefly discuss some intuition for the results of the numerical computation below.
A reader interested in further detail may refer to Appendix C3 for thorough figures showing the output of the algorithm at various parameter values, or to Appendix C4 for a detailed exposition of those figures. Throughout this section µ and σ refer to the mean and standard deviation of the generating normal distribution N (µ, σ 2 ) while π refers to the mean of the prior F , i.e., N (µ, σ 2 ) truncated in [0, 1].
For fixed µ, the 0-optimal slope β(0) and the 0-optimal lower truncation length ℓ(0) are decreasing in σ. A smaller slope is better for Sender, but may be ruled out by the integral constraint; increasing σ means the prior cdf increases less steeply, so the integral constraint allows Sender's chosen distribution to increase less steeply as well.
Increasing σ lowers the slope of the 0-optimal DTU, which would decrease q i (β(0), 0) if the shape of the prior were unchanged. However, holding fixed a DTU's slope, increasing σ spreads out the prior mass and increases q i (β(0), 0). For small µ, either of these effects can dominate. For large µ, most of the prior mass is far enough away from the origin that the second effect dominates.
The prior mean π always lies strictly below q i (β(0), 0) for σ small enough, and increases with σ when µ < 1/2 but decreases with σ when µ > 1/2. This behavior is a known property of the truncated normal distribution; in my setting, it implies that there is no gap between Proposition 2 and Proposition 3 whenever µ > 1/2. When µ < 1/2, it implies that eventually q i (β(0), 0) < π, producing a gap between the results.
C2: A Detailed Algorithm for Computing q i (β(0), 0). In this section I describe in detail the algorithm for computing q i (β(0), 0), as well as some notes on its key steps and the details of implementation.
2. Find the smallest interior intersection q i (β(0), 0):   C4: Discussion of Numerical Results. Figure 6 shows clear patterns in the 0-optimal choice of β, β(0), across prior parameter values. Fixing µ, β(0) is monotonically decreasing in σ with a roughly exponential shape. As µ increases, the range of β(0) decreases. Note that for σ large, the assumption f ′ (0) < 1−2π used in Section 5 to rule out the binary-state solution may be violated. In this case, Proposition 2 does not apply as β(0) = β min may be a feasible solution. This case occurs in panels 1 and 2 of the figure. A similar phenomenon occurs when µ is large, so that the prior F is concave (as described in Section 5.1), and appears in panels 5 and 6 of the figure.
The 0-optimal lower truncation length ℓ(0) behaves as expected given the results for β(0), and is shown in Figure 7. For fixed µ, ℓ(0) is monotonically decreasing and appears to have a reverse-S shape (concave and then convex). Increasing µ shifts the curve up and flattens it. As with β(0), the results in the first two panels show some instability at high values of σ resulting from violations of the assumption f ′ (0) < 1 − 2π. In panels 5 and 6, where β(0) = β min at high values of σ, the lower truncation length tracks directly with the prior mean π in order to maintain Bayesplausibility of the distribution. Figure 7 also shows the prior mean π of the truncated normal prior. The effects of truncating the normal distribution are well-known; I only note that π increases with σ for µ < 1/2 and decreases with σ for µ > 1/2 because µ = 1/2 is the midpoint of the truncation interval.
The key value of interest is the smallest interior point of intersection q i (β(0), 0) between the 0-optimal DTU and the prior distribution; I abbreviate this value to q i for the remainder of this discussion. Figure 8 illustrates some of the complex interactions between q i and the shape of the prior, as well as some of the difficulties faced in the numerical approach. For small µ, q i is inverse-U-shaped as a function of σ. As discussed earlier, σ large enough means that a lower truncation region may not be necessary. In this case I set q i = 0 to preserve continuity of q i in σ and to reflect that the Bayes-plausibility constraint does not bind at any interior intersection (it trivially binds at a posterior mean of 0, and binds at q i when a lower truncation is necessary).
This case appears in panel 1. As µ increases, the curve flattens and moves up, as seen in panels 2 and 3. For µ > 1/2, q i becomes monotonically increasing in σ, as shown in panel 4. In panels 5 and 6, because F is convex on [ε, 1] for small ε ≥ 0, all interior intersections between the 0-optimal DTU and F are either close to 0 or close to 1. The numerical algorithm thus becomes unstable and alternates between these two regions (as seen in panel 5) or chooses the default solution of q i = 0 (as seen in panel 6). Nevertheless, the shape of the curve for small σ suggests that the trend of monotonically increasing q i when µ > 1/2 is preserved.
With respect to the relationship between q i and π, which determines the gap between Propositions 2 and 3, the numerical results show that the cutoff of µ = 1/2 is key. For any µ < 1/2, there existsσ large enough so that π > q i for all σ >σ. However,σ may be so large that the assumption f ′ (0) < 1 − 2π is violated, in which case π < q i for all valid choices of σ. For µ > 1/2, π < q i for all σ; to verify this numerically, I spotchecked values σ ∈ {1, 10, 100} and manually debugged the numerical integration.
Thus there is only a gap between Propositions 2 and 3 when µ < 1/2 and σ is large enough to exceedσ but not so large as to violate f ′ (0) < 1 − 2π. For example, in panel 1 we can see that for σ ∈ (0.06, 0.165), there is a gap between the two propositions. For σ < 0.06, q i > π so there is no gap, and for σ > 0.165, the assumption f ′ (0) < 1 − 2π is violated and the propositions do not apply.