A framework for the analysis of self-confirming policies

This paper provides a general framework for analyzing self-confirming policies. We study self-confirming equilibria in recurrent decision problems with incomplete information about the true stochastic model. We characterize stationary monetary policies in a linear-quadratic setting.

develop a general analysis of self-confirming policies in recurrent decision problems with incomplete information about the true stochastic model. Next we apply and illustrate the theory with a characterization of stationary monetary policies in a linear-quadratic setting.
Consider an either moderately patient or impatient agent (she) who makes recurrent decisions under uncertainty. In each period she takes an action a that, via a feedback function f, delivers an observable outcome, or message, m ¼ f a; s ð Þ that depends on an unobservable state of nature s. A fixed, unknown stochastic model r Ã (that is, a probability measure over the states) determines an i.i.d. process of state realizations. The agent knows the feedback function f, but not the stochastic model r Ã . Note that, for some action a , the same outcome m may result from multiple states, i.e., f a; Á ð Þ need not be injective; in this case, the realized outcome does not reveal the realized underlying state as exemplified below. There are no structural links between periods, but the agent observes the realized outcome in each period t and therefore updates her subjective belief l t about the fixed unknown model r Ã . Over time, given a true model r Ã and a prior belief, the intertemporal subjective expected utility maximizing strategy yields a convergent active learning process, i.e., a stochastic process of actions and updated beliefs a t ; l t ð Þthat converges almost surely. 1 The realization a Ã ; l Ã ð Þ of the stochastic limit satisfies almost surely the following two properties: • Confirmed beliefs: l Ã assigns probability 1 to the set of models r that are observationally equivalent to the true model r Ã given action a Ã ; 2 • Subjective best reply: action a Ã maximizes the agent's one-period subjective expected utility given belief l Ã .
We take ''confirmed beliefs'' and ''subjective best reply'' to be the characterizing properties of stationary actions and beliefs. We call self-confirming equilibrium an action-belief pair a Ã ; l Ã ð Þ with these properties. Indeed, conceptually this is a special case of the self-confirming equilibrium idea of Battigalli (1987) and Fudenberg and Levine (1993a, b), applied to one-person games with incomplete information about the probabilities of states. 3 The key observation is that the confirmed belief l Ã need not assign probability one to the true stochastic model r Ã and, therefore, action a Ã may differ from the objective best reply to r Ã . In other words, although equilibrium beliefs are disciplined by long-run empirical frequencies of observations, they do not necessarily concentrate on the true model r Ã , so the long-run action a Ã may be objectively sub-optimal. This can happen even if the decision maker is quite patient: in the learning phase a positive discount factor can induce experimentation with actions that do not maximize one-period subjective expected utility, but the option value of experimentation vanishes in the limit. 4 Consider the following heuristic example. A decision maker is asked to repeatedly bet on the color of a ball that will be drawn from an urn that contains 90 black, green, or yellow balls. After the draw, she is told whether she won (in which case she receives 1 euro) or not (in which case he receives 0 euros). Thus, there are three states, S ¼ B; G; Y f g, three actions A ¼ b; g; y f g, and two monetary outcomes M ¼ 0; 1 f g. The feedback function attains value 1 when the action matches the state (f ðB; bÞ ¼ f G; g ð Þ ¼ f ðY; yÞ ¼ 1) and value 0 otherwise. Thus, winning reveals the realized state, but losing only rules out one state out of three. Suppose the urn contains 45 black balls, 35 green balls, and 10 yellow balls, i.e., r Ã B ð Þ ¼ 1 2 , r Ã G ð Þ ¼ 7 18 , and r Ã Y ð Þ ¼ 1 9 . The objective best reply is to bet on B, but the decision maker does not know this. Suppose she keeps betting on G. With high objective probability, she is going to win more than 1 3 of the times and she may come to deem very likely that the urn contains more green than black or yellow balls. Indeed, betting on G infinitely many times she will almost certainly observe that the winning frequency is 7 18 and in the long run her limit belief l 1 will assign probability 1 to the set models r : r G ð Þ ¼ 7 18 È É . As long as her limit belief l 1 also assigns a sufficiently high probability to the set of models she will find it optimal to keep betting on G, which is an objectively sub-optimal choice. Indeed, betting on G with such beliefs is a self-confirming equilibrium given the true model r Ã . Even if she initially experiments betting on B, sufficiently many unlucky outcomes will dissuade her from doing it again. In other words, the vagaries of the active learning process may well drive her in the trap of choosing forever the ''satisficing'' action g that wins more that 1 3 of the times rather than to experiment with b long enough to realize that it yields an even higher winning frequency.
Thus, in a self-confirming equilibrium, the decision maker may be best-replying to empirically confirmed but wrong views about the actual data generating model. She may thus get trapped in self-confirming behavior that differs substantially from the objectively optimal behavior postulated by rational expectations models. 5 This trap and the resulting welfare loss is, at the same time, especially relevant and disturbing for policy making. It is relevant when policy makers cannot obtain enough reliable evidence before choosing (e.g., with externally valid lab or field experiments), but instead have to rely on evidence that is a by-product of their actual policies; it is disturbing because welfare in self-confirming equilibria can be lower 4 One can easily prove also a kind of converse result: for every self-confirming equilibrium pair a Ã ; l Ã ð Þ there are a prior belief, a discount factor, and a subjectively optimal strategy such that the resulting action process converges to a Ã almost surely. However, it may be necessary to allow for knife-edge cases, e.g., when a Ã is weakly dominated. 5 In order to remove pervasive inconsistencies of pre-rational-expectations models, rational-expectations models often assume that decision makers know the true data generating process, thus making decisions objectively optimal. The traditional Nash equilibrium concept shares this objective best-reply feature. The Bayesian Nash equilibrium concept of Harsanyi (1967) instead allows for subjective and incorrect beliefs about parameters, without imposing a confirmed-beliefs condition. Therefore, Bayesian Nash equilibrium does not refine self-confirming equilibrium. than in rational expectations equilibria. The main contribution of the present paper is to provide a formal steady-state framework in which this important policy issue can be rigorously studied. We then use this framework and illustrate the macroeconomic relevance of our analysis in the context of a 70's U.S. policy debate about whether there is a trade-off between inflation and unemployment that can be systematically exploited by a benevolent policy maker.
Illustrative application We consider a stylized model economy in which a policy maker chooses average inflation a and observes an unemployment/inflation outcome u; p ð Þ ¼ f a; s ð Þ that depends on the unobservable random state s of the economy. This model economy can be interpreted as reflecting an aggregate response function of a continuum of market agents. Assuming a quadratic loss function, we completely characterize the self-confirming equilibrium map that associates each conceivable model economy with a corresponding set of selfconfirming beliefs and monetary policies. Given a fixed policy, the monetary authority infers from long-run data the first and second moments of the joint distribution of u and p, and hence the slope of the Phillips curve; but it cannot infer the true policy multiplier. We show that observing (in the long run) the distribution of u; p ð Þ leaves the monetary authority with a residual one-dimensional uncertainty about the model economy, parameterized by the direct impact of policy on unemployment (i.e., neglecting the impact on u through p).
For example, even if the true model is a rational expectations augmented Phillips curve, in equilibrium the monetary authority may believe that its policy does not shift the Phillips curve and hence that there is an exploitable trade-off given by the slope of the Phillips regression; the (Keynesian) monetary policy is optimal given a (falsely) conjectured trade-off, the subjectively expected unemployment rate coincides with the natural rate, and average inflation is (objectively) excessive. But we do not take a stand on what the true model is and so also consider selfconfirming equilibria where the monetary authority pushes average inflation to zero falsely believing that there is no exploitable trade-off. Whatever the case, our analysis shows how partial identification may trap policy makers in inferior, yet self-confirming, policies that result in significant losses compared to the objectively optimal policies.
Manifesto Partial identification pervades economic policy debates: despite the use of sophisticated econometric techniques, economists disagree about how the economy works. Therefore, at least some economists must be wrong, but all of them should hold beliefs consistent with the data, which indeed only partially identify the relevant unknowns. The agents that inhabit our models -in particular, policy makers-are in a similar position, but their partial identification problem is exacerbated because what they can infer about the relevant unknowns depends on their own behavior, so it is endogenous. Thus, different policies justified by different beliefs -so, ultimately, by different (possibly conflicting) economic views-may be self-confirming. Such beliefs may even be dogmatic, for example because they assign probability one to a parameter vector resulting from observed long-run frequencies and untested, possibly false, identifying hypotheses.
To escape the partial identification trap more experimentation may be advisable. But we do not see an easy way out: large-scale social experiments can have huge costs, captured in our framework by the opportunity cost of not using a one-period subjective best reply, while small-scale ones may have little external validity.
Roadmap As anticipated, the first part of this paper (Sects. 2, 3, 4) develops an abstract analysis of self-confirming choices. The general contribution of this part is to provide a theoretical framework that is: • broad enough to include the finite one-period setting in which self-confirming analysis was originally developed within game theory as well as the infinite setup relevant for many economic applications, including macroeconomic policy analysis; • specific enough to provide welfare implications for relevant policy questions with the backdrop of a neat learning foundation.
The main issues that we address at this abstract level concern the scope of partial identification (Sect. 3), equilibrium values, and the resulting effects on the decision maker's welfare (Sect. 4.2). The most novel results of the abstract part concern the latter topic. We show that if among two policies justified by different selfconfirming beliefs one allows better identification of the true model, even if only partially, then this policy yields higher welfare. Similarly, self-confirming equilibria justified by sharper beliefs yield higher welfare. The theoretical concepts of this first part are illustrated and clarified by a running example of a monopolist facing an uncertain demand. The second part of the paper (Sect. 5) builds on the abstract analysis to gain a better perspective and novel results on the classical debate on the possibility of systematically exploiting unemployment/inflation trade-offs. In particular, the scope of partial identification is characterized in Sect. 5.2, while equilibria, their values, and the welfare effects of model uncertainty are analyzed in Sect. 5.3. Sections 5.4 and 5.5 illustrate the analysis by considering two important special cases. Section 6 offers some concluding remarks.
Appendix A collects some more technical material and all the formal proofs. 6 Related literature Our analysis contributes to and provides a bridge between two strands of literature, one in game theory and the other in macroeconomics, that are concerned with related issues, but have so far proceeded with limited cross fertilization and very different languages.
In the game-theoretic literature, a strategy profile that satisfies the properties of confirmed beliefs and subjective best reply has been called ''conjectural equilibrium'' (Battigalli, 1987;Battigalli & Guaitoli, 1988), ''self-confirming equilibrium'' (Fudenberg & Levine, 1993a) and ''subjective equilibrium'' (Kalai & Lehrer, 1993, 1995. Here we adopt the more self-explanatory terminology of Fudenberg and Levine. We refer the reader to Battigalli et al. (2015) for an up-to-date discussion of this literature. Here we point out that, although we focus on oneperson decision problems with uncertainty, our abstract analysis extends seamlessly to n-person games except for the aforementioned comparative results about equilibrium values. To our knowledge, papers in the extant literature either consider finite (one-period) games, or games with no randomness. We extend the analysis of self-confirming equilibria to settings with inherent randomness and possibly infinite spaces of strategies and states of nature. Technically, this extension is not straightforward, it requires mathematical precision and care. We also point out that the learning foundation of the equilibrium concept is more solid in the one-person case analyzed here: while self-confirming equilibria of multi-person games represent the steady states of learning dynamics, convergence to a steady state is not guaranteed under general conditions, as is instead the case when the model can be effectively reduced to a one-person decision problem. Finally, note that Battigalli et al. (2015) is focused on the interaction between ambiguity aversion and selfconfirming equilibria in games. Here instead we consider a decision maker who maximizes her subjective expected utility (i.e., she is ambiguity neutral). This simplifies the general analysis without affecting the illustrative examples and the application. Indeed, they feature conditions under which the degree of ambiguity aversion does not affect the set of self-confirming equilibria (see Sect. 6 and Battigalli et al., 2021), although it may well affect learning dynamics and the likelihood to be trapped in the long run in self-confirming equilibria with an objectively suboptimal choice (see Battigalli et al., 2019).
The macroeconomic literature focuses on policy making and learning dynamics. Sargent (1999) explains the rise and fall of US inflation assuming that the monetary authority sequentially estimates a Phillips curve, ignoring its impact on expectations, and best replies to updated beliefs. Standard OLS estimation leads to a Keynesian self-confirming equilibrium, but if instead recent observations are given more weight, because the monetary policy maker's decisions make the Phillips curve slowly shift and rotate over time, the process first approaches a neighborhood of this equilibrium, but then recurrently abandons it when the Phillips curve looks ''more vertical'' leading the monetary policy maker to lower inflation. 7 Cho et al. (2002) and Sargent and Williams (2005) sharpen the theoretical analysis of such learning dynamics. 8 Cho and Kasa (2015) note that the low inflation outcome at the end of Sargent's (1999) narrative -according to the postulated learning model-cannot persist either; therefore, they consider an alternative stochastic learning dynamic in which the policy maker best responds to the current estimate of an aggregate supply model, out of a set of conceivable functional forms, as long as the model passes a statistical test; when the model is rejected, a new model is selected at random and the process is restarted. Also, in their model the Keynesian self-confirming equilibrium cannot persist, because, in the very long-run, the monetary authority adopts a vertical Phillips curve model. 9 In our paper, we focus only on the set of 7 See also, Cogley and Sargent (2005), Sargent et al. (2006), andCogley et al. (2007). 8 The phrase ''escaping Nash inflation'' in the title of Cho et al. (2002) deserves an explanation. When the decision model is interpreted as a game between the monetary authority and a representative agent, a self-confirming equilibrium outcome is also a (possibly subgame imperfect) Nash equilibrium outcome. Battigalli (1987) and Fudenberg and Levine (1993a) provide sufficient conditions for the realizationequivalence between Nash and self-confirming equilibrium. Such conditions are satisfied in the model of Cho et al. 9 In his work on rational belief equilibria, Kurz (1994aKurz ( , 1994b analyzes stochastic dynamics where agents' beliefs may be incorrect, but are eventually consistent with the long-run frequencies of possible limit points of learning dynamics. Furthermore, in our monetary policy application, we follow Sargent (2008) and assume that the monetary authority may believe in the exploitability of a trade-off between unemployment and inflation. Unlike the papers we have mentioned, we do not take a stand on a true model economy. Thus, instead of assuming that the true model economy features a rational-expectations augmented Phillips curve, we characterize the selfconfirming equilibria and values for many conceivable models.
Other papers in the literature focus, like ours, mainly on self-confirming equilibrium policies rather than learning dynamics. In particular, Battigalli and Guaitoli (1988) analyze the self-confirming equilibria with rationalizable beliefs of a stylized policy game with incomplete information, showing that there are equilibria with Keynesian features and equilibria with new-classical features. Fudenberg and Levine (2009) discuss the Lucas critique through the analysis of refined self-confirming equilibria in some insightful illustrative examples; they emphasize the role of rationalizable beliefs and of robustness to experimentation. Unlike the foregoing papers, we formally analyze a one-agent framework, which makes the issue of the rationalizability of beliefs mute. According to the application, when the one-agent framework is interpreted as a reduced form of a multi-person game, the shape of outcome/feedback function f may implicitly represent such rationalizability constraints, e.g., the decision maker is a leader and the outcome function captures the best-reply behavior of followers; this is clarified by our monopoly example. As for the monetary policy application, only a genuinely gametheoretic model of the economy would allow a thorough analysis of the rationalizability of self-confirming beliefs, but tackling such difficult issue is beyond the scope of this article. In a series of papers, Saint Paul (e.g., 2013Paul (e.g., , 2018 considers an expert who knows the true model and advises the policy maker while pursuing her own policy agenda; the policy maker and the agents in the market fully trust the expert as long as the data are consistent with her advice. With this, the expert manipulates the policy maker and market agents under a self-confirmation constraint. Finally, Gaballo and Marimon (2021) analyze a directed search model of the credit market where lenders post excessively high interest rates because of confirmed pessimistic beliefs about returns on investments, but the monetary authority can break the spell by easing credit. The main difference with our monetary policy application is that we study the self-confirming actions and beliefs of the monetary authority, not of the agents in the market. 10 To the best of our knowledge, besides the novelty of several results, our paper is unique in integrating an abstract analysis of self-confirming policies with an economic application. There are under-appreciated complementarities between abstract theory and applications. The former allows to focus on key concepts and Footnote 9 continued observables, which is in the spirit of self-confirming equilibrium. The most important difference with the literature on the latter is that, although Kurz analyzes multi-agent systems, he does not use a game theoretic framework. Specifically, unlike game models, there is no function specifying how agents' actions and-possibly-exogenous variables determine outcomes and observables. 10 The model is not explicitly represented as a game. Therefore the connection to the traditional selfconfirming equilibrium concept is not immediate.
properties uncluttered by specific modeling features, the latter helps to better understand the abstract theory and points to its relevance. Here we consider a monetary policy application, but the scope of our analysis goes well beyond that. For example, the difficulty of thorough experimentation and its consequences for welfare naturally arise in the context of environmental policies.

Mathematics
Differently from Battigalli et al. (2015), the Phillips curve exploitation model that motivates and illustrates this paper features infinite action, state, and consequence spaces as well as unbounded payoff functions. The necessary adaptation is conceptually natural, but technically nontrivial. In particular, it requires that the analysis be carried out within a standard Borel space X; X ð Þ, where X is a completely metrizable and separable topological space and X is its Borel sigma algebra. The Borel sets B 2 X are themselves standard Borel spaces under the relative sigma algebra X \ B. 11 When X is countable (i.e., finite or denumerable), standardness requires X to be the power set of X (see Appendix A.1).
We denote by D X ð Þ the collection of all probability measures on X , endowed with the natural sigma algebra, 12 which in turn makes D X ð Þ a standard Borel space too. With this, the Borel subsets R of D X ð Þ with their relative sigma algebras are standard Borel spaces themselves. The meaning of D R ð Þ is then obvious. Finally, d : X ! D X ð Þ denotes the canonical Dirac embedding of X into D X ð Þ, that is, d x ð Þ is the probability measure on X which assigns probability 1 to each Borel set containing x 2 X. 13 Let Y; Y ð Þ be another standard Borel space. The Cartesian product X Â Y is a standard Borel space with respect to the product sigma algebra. Moreover, each measurable function u : X ! Y induces a measurable distribution mapû : DðXÞ ! DðYÞ defined byû for each probability measure n 2 DðXÞ. That is,ûðnÞ B ð Þ ¼ nðu À1 ðBÞÞ for all sets B in Y. 14 Lemma 1 Let u : X ! Y be a measurable function. The following conditions are equivalent: (i) u is one-to-one, (ii)û is one-to-one, 11 See Kechris (2012) for the properties of standard Borel spaces that we use. 12 That is, the sigma algebra generated by the evaluation maps n7 !n B ð Þ for all B 2 X . 13 The usual notation for the Dirac measure concentrated on x is d x : 14 See Appendix A.1. In the applied probability literature,ûðnÞ B ð Þ is sometimes denoted byû B j n ð Þ, interpreted as the probability of observing a realization in B given n with ''measurement'' u.
Interpreting x as a state and y ¼ u x ð Þ as an observable outcome, we can phrase these equivalent conditions as follows: (i) u reveals the state x in X, (ii)û reveals the distribution n in DðXÞ, (iii) u generates the sigma-algebra X .
Finally, we say that X and Y are isomorphic, written X ffi Y, if there is a bimeasurable bijection u : X ! Y, that is, u is measurable and u À1 : Y ! X is a well defined measurable function.

Classical subjective expected utility
Let S be a space of states of nature, A a space of actions available to the decision maker, C a space of consequences, and q : A Â S ! C a measurable consequence function that associates a consequence q a; s ð Þ 2 C with each pair a; s ð Þ 2 A Â S of action and state. When consequences are monetary, C is a (Borel) subset of the real line.
The quartet is the basic structure of the decision problem. The inherent randomness characterizing the realization of states -often called physical uncertainty-is described by probability models r 2 D S ð Þ that can be regarded as possible generative mechanisms. For each probability model r, actions a are evaluated through their expected utility Z S v q a; s ð Þ ð Þdr s ð Þ where v : C ! R is a measurable and bounded above von Neumann-Morgenstern utility function. It is often convenient to write the criterion in the expected-payoff form Also the payoff function is easily seen to be measurable and bounded above. All our integrals are thus well defined, but may take value À1. The decision maker may not know the true probability model r Ã but is able to posit a (measurable) collection R D S ð Þ of probability models that contains the true one; that is, r Ã 2 R. We thus abstract from misspecification issues. We call structural the kind of information that allows the decision maker to posit the collection R. For example, if the problem is to bet on the color, white or black, of a ball drawn from a two-color urn, and it is only known that the urn contains n balls, then R has n þ 1 elements and is isomorphic to the set 0; 1 n ; . . .; nÀ1 n ; 1 È É of possible fractions of white balls. When R is a singleton, i.e., the true model is known, the decision maker confronts only risk. Otherwise, she faces model uncertainty. 15 We can also give R a somewhat different interpretation: it represents a backdrop theory accepted by the decision maker, which happens to be correct (i.e., such that r Ã 2 R).
In particular, as we assume that the same decision problem is faced infinitely often, representing uncertainty with R rests on the assumption that the process of states is i.i.d. 16 The decision maker ranks actions according to the classical subjective expected utility (SEU) criterion: 17 where l 2 D R ð Þ is a subjective prior probability over models that reflects personal beliefs about models that the decision maker may have, in addition to the structural information behind R. 18 This representation admits the reduced form Z This reduced form is the original representation of Savage (1954), who derived r l from preferences over bets.
The decision problem can then be summarized by the sextet that combines the basic structure (1) with the information and taste traits R and v. A few special cases are noteworthy.
(i) When the support of l is a singleton r f g, that is, l ¼ d r ð Þ, the decision maker believes (maybe wrongly) that r is the true model. The predictive probability trivially coincides with r and criterion (2) reduces to the Savage expected payoff criterion R a; r ð Þ. Being a predictive probability, r here is a subjective probability measure, albeit one derived from a dogmatic belief. (ii) When R is a singleton r Ã f g, the decision maker has maximal structural information and, as a result, knows that r Ã is the true model. In this case, there is only physical uncertainty, quantified by r Ã , without any model 15 Model uncertainty is also called model ambiguity (see Hansen & Marinacci, 2016). 16 If, instead, the process of states were assumed to be Markovian, probability models would be kernel functions (with finitely many states, transition probability matrices) rather than elements of D S ð Þ. 17 The integral is well defined, for each R and each l 2 D R ð Þ, because the expected payoff function R a; Á ð Þ : D S ð Þ ! À1; 1 ½ Þis, for every action a, measurable and bounded above on D S ð Þ and hence on R. 18 See Marinacci (2015) for a discussion of this setup. Classical SEU is proposed by Cerreia-Vioglio et al. (2013), where ''classical'' refers to the fact that a posited set of probability models is a basic feature of classical statistics. uncertainty. Criterion (2) again reduces to the expected payoff criterion R a; r Ã ð Þ, but now interpreted as a von Neumann-Morgenstern criterion. For instance, if the decision maker either observed infinitely many draws from a given urn or were just able to count the balls of each color, she would learn/know the urn composition and R would be a singleton.
g , there is no physical uncertainty, but only model uncertainty, quantified by l. We can identify prior and predictive probabilities: with a slight abuse of notation, we can write l 2 D S ð Þ and so (2) takes the form R a; l d ð Þ. 19 Throughout this part (Sects. 2, 3, 4), we illustrate the abstract theoretical concepts with a stripped-down monopoly example.
Example 1 (Monopoly: Unknown demand) A monopolist choosing output a ! a faces an imperfectly known (state-dependent) inverse demand function a7 !P a; s ð Þ. We interpret the lower bound a ! 0, when strictly positive, as a pre-commitment to a minimum level of production. If a ¼ 0 there is no pre-commitment. The firm knows the slope, but not the intercept, which has a permanent component h modified by an additive noise e: , and e 2 À e; e ½ is the realization of a random variable e with known distribution g and 0 mean. 20 The firm has a known linear cost function, with average and marginal cost c [ 0. To further simplify the analysis, we assume that price is certainly strictly positive on the relevant range of outputs, i.e., also for the largest subjective best reply across all possible beliefs. This is the case if 21 With this, we can ignore the 0-price floor, and the relevant inverse demand map becomes We can parameterize R as follows: 22 The consequence function (again, in the relevant range of outputs) is the profit function 19 See Corollary 4 in Appendix A.1. 20 We write random variables in boldface font and their realization in normal font. 21 Ignoring the 0-price floor, h À c À Á =2 is the best reply to the most optimistic belief. The condition guarantees that, at this largest output, price is strictly positive even with the lowest inverse demand function. This implies that the standard first-order conditions identify a global subjective optimum.
Under risk neutrality, v is the identity on the range of q; thus, r ¼ q. Given the parameterization of R, the objective expected payoff and subjective expected utility can be written as The example clarifies that decision problem D could be the reduced form of a multi-agent model where the unknown state s represents the behavior of other agents, such as buyers. Such behavior is unaffected by choice a, either literally, or because it represents a profile of strategies (decision functions) rather than actual actions. In the quantity-setting monopoly, s may be determined by a distribution of private valuations, with output a sold in a multi-unit uniform price auction and with unit-demand buyers bidding their valuations as their dominant bid. For a pricesetting monopolist valuations determine individual demand functions as optimal reactions to the set price. In these cases, the map a7 !q s a ð Þ is determined by rational behavior of the un-modeled agents. In an alternative interpretation, the firm is a monopolistic competitor of negligible size and s represents general market conditions.

Feedback
The decision maker faces decision problem D recurrently in a stationary environment with an i.i.d. process of states determined by unknown probability model r Ã . To determine what actions and beliefs can be stable given r Ã , we have to specify the information obtained ex post by the decision maker for each action a and state s. We model such information through a (measurable) feedback function where M is a space of messages. By selecting an action a 2 A, the decision maker receives a message m ¼ f a s ð Þ when s occurs. 23 The decision maker's (ex post) information about the state is thus endogenous. When M is finite, such endogenous information is represented by the partition f À1 a m ð Þ : m 2 M È É of the state space S that the messages induce, which depends on the choice of action a. This partition generates the algebra of events whose probability can be inferred from the long-run frequencies of messages. When M is infinite, it may be the case that this collection of events cannot be recovered from the partition. Hence, it is technically convenient to represent information with the sigma algebra A decision problem with feedback is described by the octet where a feedback function and a message space are added to the decision problem (3). When information does not depend on action a, we say that there is own-action independence of feedback about the state; formally, F a ¼ F a 0 for all a; a 0 2 A. The most important instance of own-action independence is perfect feedback, which occurs when each section f a of the feedback function f generates S -that is, in view of Lemma 1, when f a is one-to-one for each a 2 A. In this case, messages reveal to the decision maker which state obtained, regardless of the chosen action. When this is not the case, feedback about the state is imperfect, maximally so when each section f a is constant, so that F a ¼ ;; S f g and all states return the same message. An action a is fully revealing if f a is one-to-one, that is, if it allows the decision maker to learn which state obtained. Under perfect feedback, all actions are fully revealing. The existence of fully revealing actions is a weak form of ''endogenous'' perfect feedback.
We assume throughout that consequences are observable. Formally, this amounts to assuming that, for each action a 2 A, the section q a of the consequence function q is F a -measurable. The next result, which will play an important role in our analysis, characterizes this assumption within a decision problem with feedback (4).
Proposition 1 Consequences are observable if and only if, for each action a 2 A, there exists a measurable function g a : M ! C such that In this case, the payoff r a ¼ v q a of each action a is F a -measurable.
In words, messages encode consequences and so payoffs. In particular, when the consequences of the actions are the only observed messages, we have C ¼ M and f ¼ q. This is the most common and important case of feedback, which is also featured by our macroeconomic application.
Example 2 (Monopoly: Feedback) A natural assumption about feedback for the quantity-setting monopoly of Example 1 is that the firm observes the realized market price, that is (for the relevant range of outputs) and g a is the affine map p7 !a p À c ð Þ from market price to profit. Note, however, that if the firm has a zero lower bound a ¼ 0, assuming that a realized price can be observed even with 0 output to sell is contrived; indeed, the most plausible assumption is that nothing is observed. The same observability pattern occurs with an alternative assumption about feedback: the firm only observes its revenue, f a; s ð Þ ¼ q a; s ð Þþca, e.g., because it is the grower of a unique variety of weed sold to a dealer who returns the proceeds from an auction. With 0 production, nothing can be observed, with positive production a [ 0, the unit price and realized state can be backed out: p ¼ q=a þ c (per-unit revenue) and s ¼ p þ a. To sum up, both feedback functions satisfy observability of consequences, and each interior choice (a [ 0) is revealing. Thus, own-action independence of feedback about the state holds if there is a strictly positive lower bound on production a [ 0. Absent this constraint (a ¼ 0), own-action independence of feedback about the state fails because 0-output reveals nothing, while positive output is revealing.

Partial identification correspondence
In our steady state setting, a message distribution m 2 D M ð Þ can be interpreted as a long-run empirical frequency of messages received by the decision maker. Specifically, for each Borel set B 2 M, m B ð Þ is the long-run empirical frequency with which messages m belong to B. For any action a 2 A, consider the distribution That is, for each B 2 M. 24 Thenf a r ð Þ B ð Þ is the long-run empirical frequency with which the decision maker receives messages m in B, when action a is chosen and r is the of models that are observationally equivalent given that action a is chosen infinitely often and that the frequency distribution of messages m is observed in the long-run conditional on a. In other words,f À1 a m ð Þ is the collection of all probability models that may have generated m given a.
If action a is fully revealing, thenf a is one-to-one and sof À1 a m ð Þ is at most a singleton for every m . In this case the decision problem is identified under a since different models generate different message distributions, which thus uniquely pin down models. Otherwise, whenf À1 a m ð Þ is nonsingleton for some m, we have partial identification under action a. In the extreme case whenf a is constant -that is, when all models generate the same message distribution-the decision problem is completely unidentified under action a. Interestingly,f a is constant if and only if f a is constant, that is, all states generate the same message (see Lemma 6 in Appendix A.1). Now recall that the decision maker posits a set of models R determined by structural information or a backdrop theory. Upon observing m, one can conclude that the data generating model belongs tô For this reason, models r and r 0 in R such thatf a r ð Þ ¼f a ðr 0 Þ ¼ m are observationally equivalent under action a. Formally, given an action a, two models r; r 0 2 R are observationally equivalent if f a r ð Þ ¼f a ðr 0 Þ: We denote the class of models observationally equivalent to r given a bŷ In other words,R a r ð Þ is the partially identified set of models given action a. 25 We can thus regard the mapR a Á ð Þ : RR, which associates to each element of R its observational equivalence class, as the partial identification correspondence determined by action a.
It is easy to see thatR a has convex values if the collection R is convex. Moreover, iff a is one-to-one, thenR a is the identity:R a r ð Þ ¼ r f g for all r 2 R. In this case, message distributions identify the true model. In contrast, whenR a r ð Þ is nonsingleton there is genuine partial identification.
Summing up, the collection fR a ðrÞg r2R is a measurable partition of R and its cells consist of probability models that are observationally equivalent under action a. Clearly, the dependence on a is lost under own-action independence of feedback about the state.
Example 3 (Monopoly: Partial Identification) Consider the quantity-setting monopoly of Example 1. If feedback is the realized price and output a [ 0 is chosen infinitely often, the firm observes in the long-run the average price With a 0-lower bound (a ¼ 0 ), producing 0 instead reveals nothing. Thus, given the parameterization of R, the partial identification correspondence is 25 We can also writeR a r ð Þ ¼ fr 0 2 R : r 0 jF a ¼ r jF a g, i.e., partial identification is determined by the information sigma algebra F a .

Comparative statics
The extent of partial identification depends, intuitively, on how informative is the underlying feedback function. To formalize this intuition, we need to compare feedback functions according to their informativeness. To this end, we say that a feedback function f 0 is coarser (or less fine) than a feedback function f if, for each In the monopoly example, with a 0-lower bound (a ¼ 0) and under the assumption that a notional realized price could be observed even at 0 output, realized revenue/ profit f 0 a s ð Þ ¼ q a s ð Þ is a coarser feedback than realized price f a s ð Þ ¼ P a s ð Þ, with h a p ð Þ ¼ ap.
A coarser feedback function is less informative. Using this comparative notion, we show that a less informative feedback function aggravates the decision maker's partial identification problem, thus formalizing the previous intuition. Given feedback functions f and f 0 , we letR a Á ð Þ andR 0 a Á ð Þ denote the identification correspondences respectively derived from f and f 0 .
Proposition 2 Fix feedback functions f and f 0 . If f 0 is coarser than f, thenR a r ð Þ R 0 a r ð Þ for all a 2 A and r 2 R.
Coarser feedback functions thus determine, for each action, coarser observational equivalence relations: a worse information translates into a lower degree of statistical identification. In particular, the assumption that consequences are observable makes the consequence function q the coarsest possible feedback. Perfect feedback is, instead, the finest.

Definition
Throughout this section, we fix a decision problem A; S; C; q; R; v; M; f ð Þ with feedback and observable consequences, where R contains the true model r Ã that generates the states. With this, we introduce a concept that is at the heart of our analysis and is motivated by the partial identification issues discussed in the previous section. and The definition relies on two pillars: the optimality condition (6) that ensures that action a Ã is subjectively optimal under belief l Ã , and the belief confirmation condition (7) that guarantees that belief l Ã is consistent with the data that action a Ã reveals in the long run. 26 In fact, given model r Ã , action a Ã determines the message distribution m Ã ¼f a Ã r Ã ð Þ, which is the long-run evidence that disciplines the subjective belief l Ã . In this respect, note that Therefore,R a Ã r Ã ð Þ depends only on the induced message distribution m Ã . Note also that condition (7) makes self-confirming equilibrium for decision problems with feedback a genuine equilibrium concept. Indeed, we already mentioned in the Introduction that it characterizes the steady states of learning dynamics in stochastic control problems. Relatedly, it is a fixed-point concept: suppose for simplicity that there is a unique best reply B l ð Þ for each belief l; then, a self-confirming belief is a fixed point of the correspondence Finally, it is worth noting that a self-confirming belief may exclude the true model. 27 We can indeed formulate the data confirmation condition (7) as follows: The equilibrium belief must thus exclude everything which is not consistent with either observations or structural information/theory, that is, but it may exclude other models as well, including the true one. 26 Here, sinceR a Ã r Ã ð Þ is a measurable subset of R, the set DðR a Ã r Ã ð ÞÞ is identified with the family of elements of D R ð Þ that assign probability 1 toR a Ã r Ã ð Þ. 27 See the example in the introduction.
Under own-action independence of feedback about the state, the data confirmation condition (7) becomes l Ã 2 DðR r Ã ð ÞÞ. We thus return to a traditional optimization notion with a purely exogenous data confirmation condition. In particular, under perfect feedback -and so full identification-the optimality condition (6) becomes since condition (7) requires l Ã ¼ d r Ã ð Þ. In this case, common in the rational expectations literature, the decision maker has a correct belief about the true model and confronts only risk.
We say that an action a Ã 2 A is objectively optimal given r Ã if it satisfies the optimality condition (10). Objectively optimal actions are the ones that the decision maker would select if she knew the true model, that is, under full identification. As such, they provide an important benchmark to assess alternative courses of action, as the next welfare analysis will show.
That said, observe that a ''rational-expectations'' pair a Ã ; d r Ã ð Þ ð Þ , where action a Ã is objectively optimal and belief d r Ã ð Þ is concentrated on the true model, is a selfconfirming equilibrium. Indeed, r Ã 2R a Ã r Ã ð Þ and so d r Ã ð ÞðR a Ã r Ã ð ÞÞ ¼ 1. Traditional rational-expectations analysis can thus be seen as the special case of ours that arises when the decision maker confronts only risk.
We close the section with a useful equivalence result. The optimality condition (6) can be written in predictive form as R a Ã ; r l Ã À Á ! R a; r l Ã À Á for each a 2 A. Relatedly, the data confirmation condition ( 7) implies that the predictive probability r l Ã belongs toR a Ã r Ã ð Þ if it belongs to R. 28 In this case, ða Ã ; d r l Ã À Á Þ is a selfconfirming equilibrium too. Hence we have the following dogmatic equivalence principle.

Value and welfare
We now turn to a ''welfare analysis,'' that is, we compare equilibrium values with objective expected payoffs, including the maximum expected payoff that could be attained by the decision maker if she knew the true model r Ã . We start with an important preliminary result: since we assume that consequences are observable, it follows that, for each action, observationally equivalent models yield the same objective expected payoff. 29 Lemma 2 Let a; r ð Þ 2 A Â D S ð Þ. If model r 0 2 D S ð Þ is observationally equivalent to model r under a, then R a; r 0 ð Þ ¼ R a; r ð Þ.
This has a noteworthy consequence for self-confirming equilibrium values.
Proposition 4 If a Ã ; l Ã ð Þ is a self-confirming equilibrium given r Ã , then V a Ã ; l Ã ð Þ¼R a Ã ; r Ã ð Þ: Thus, the value of any self-confirming equilibrium a Ã ; l Ã ð Þ coincides with the true expected payoff of a Ã , irrespective of the supporting belief l Ã . As a result, because of the data confirmation condition, the optimality condition (6) amounts to assuming that the ''true value'' of the self-confirming equilibrium action is higher than the subjective value, under the equilibrium belief, of all alternative actions. This interplay of objective and subjective features shows the substantial bite of the data confirmation condition.
Lemma 2 has interesting comparative welfare implications. For our welfare analysis, it is convenient to focus on actions that are part of some self-confirming equilibrium, thus neglecting the supporting confirmed beliefs.
Definition 2 Action a Ã is a self-confirming (equilibrium) action given r Ã if there exists a belief l Ã 2 D R ð Þ such that a Ã ; l Ã ð Þ is a self-confirming equilibrium.
Since in this case V a Ã ; l Ã ð Þ¼R a Ã ; r Ã ð Þ sup a2A R a; r Ã ð Þ , the decision maker incurs a welfare loss when she selects the self-confirming action a Ã . In particular, ' a Ã ; r Ã ð Þ¼0 if and only if a Ã is objectively optimal; the loss is caused by the decision maker's ignorance, which makes it possible to assign positive subjective probability to (neighborhoods of) models different from the true one. Our next result shows that selfconfirming equilibria with sharper basic subjective assessments yield higher welfare (lower loss). Formally, l Ã is absolutely continuous with respect to (i.e., ''sharper . This means that l Ã rules out more models than m Ã ; in particular, if R is finite, it means that suppl Ã suppm Ã . Consider the self-confirming equilibria a Ã ; l Ã ð Þand b Ã ; m Ã ð Þsuch that (i) a Ã yields better identification than b Ã (i.e.,R a Ã ðr Ã Þ R b Ã ðr Ã Þ), and (ii) l Ã and m Ã do not rule out any model consistent with the statistical evidence given a Ã and b Ã respectively. Then, we obtain a special case of Proposition 5 and we can conclude that ' a Ã ; r Ã ð Þ ' b Ã ; r Ã ð Þ. The following result shows that we can dispense with condition (ii): Independently of their justifying confirmed beliefs, self-confirming actions with better identification properties exhibit lower losses.
Proposition 6 Let a Ã and b Ã be self-confirming actions given r Ã . If Propositions 5 and 6 are the only results in our analysis that depend on the oneperson assumption in an essential way. In a multi-person game they hold only for the comparison of equilibria where the strategies of all players but one are the same and the focus is on the welfare of the only agent playing a different strategy.
The next related result shows that an action with the best identification properties -thus, optimal from a purely statistical viewpoint-is self-confirming only when objectively optimal. Truth is ancillary to the decision maker's pursuit of her goals (and so of her happiness).
Proposition 7 An action a 2 A such thatR a ðr Ã Þ R a 0 ðr Ã Þ for each a 0 2 A is selfconfirming given r Ã if and only if it is objectively optimal.
Under own-action independence of feedback about the state,R a ðr Ã Þ is independent of a. Therefore, Proposition 7 yields the following noteworthy implication.
Corollary 1 Under own-action independence of feedback about the state, every selfconfirming action is objectively optimal. Example 4 (Monopoly: self-confirming equilibrium) Under the assumptions of Example 1, certainty equivalence holds and the subjective best reply function of the monopolist is where h parameterizes models according to the average intercept of the inverse demand function. Since any positive output is revealing (see Example 3), if the firm is pre-committed to a positive minimum output (a [ 0) own-action independence of feedback holds and the only self-confirming output is the objective best reply max a; h Ã À c ð Þ=2 f g . Next, suppose that a ¼ 0, and furthermore h\c and h Ã [ c. Then own-action independence of feedback does not hold and there are two selfconfirming actions: (i) the fully revealing action a Ã ¼ h Ã À c ð Þ=2 [ 0 is the objective best reply, thus illustrating Proposition 7, and (ii) b Ã ¼ 0 is justified by any ''pessimistic'' belief l such that E l h ð Þ\c, which is trivially consistent with long-run evidence because b Ã ¼ 0 is fully un-revealing. The comparison of selfconfirming actions a Ã and b Ã illustrates Proposition 6: indeed, a Ã is more revealing The example prompts the following question. We mentioned in the Introduction that self-confirming equilibria are limit steady states of active learning processes, which we do not model explicitly here. Suppose that the monopolist believes it is optimal in the short run to produce 0, but deems it possible that the objective best reply is positive, i.e., that h [ c . Should she not experiment with a positive output? This depends on several elements: her subjective belief, her degree of patience (discount factor), and the amount of noise. If the subjective probability l h [ c ð Þis relatively small and price is noisy, it is dynamically optimal not to experiment even if the decision maker is moderately patient. In particular, noise is important: only repeated experimentation with positive output can provide reliable evidence, and this has a high subjective opportunity cost. 30 In sum, the decision maker is not just a statistician: she is not interested in discovering the true model per se, unless the action (played in the long run) that allows the discovery is subjectively optimal.
In this first part we expressed and analyzed the self-confirming equilibrium concept in an abstract framework amenable to policy applications. This requires to allow for an infinite action space (e.g., to use calculus) and for an infinite state space, and to posit an objective probability model characterizing the data generating process. Technically, the latter calls for the use of standard Borel spaces. Many of the themes analyzed within the framework of the first part are illustrated in the second part by an application to monetary policy.

Phillips curve exploitation model
We now illustrate our machinery in the context of a 1970's U.S. policy debate about whether a trade-off between inflation and unemployment can be systematically exploited by a benevolent policy maker. We extend a formulation of Sargent (1999Sargent ( , 2008, who presents a self-confirming equilibrium in which a policy maker believes in a model asserting an exploitable trade-off between unemployment and inflation while the truth is that the trade-off is not exploitable. 31

Steady state model economies
We study a class H of model economies h at a (stochastic) steady state. We assume that unemployment u and inflation p, beyond depending on the unknown h, are affected by random shocks w and e with zero mean, and by a monetary policy variable a. Specifically, unemployment and inflation outcomes u; p ð Þ are connected to the state of the economy s ¼ w; e; h ð Þand the government action a according to The vector parameter h ¼ h 0 ; h 1p ; h 1a ; h 2 ; h 3 ð Þ 2 R 5 , that is, the last component of the state vector, specifies the structural coefficients of an aggregate supply equation (11) and an inflation determination equation (12). Coefficients h 1p and h 1a are slope responses of unemployment to actual and planned inflation, 32 while the coefficients 30 On the other hand, without noise ( e ¼ 0) a one-off experimentation with positive output would identify the true model, making the no-experimentation region in the belief-discount factor space very small. 31 Section 6 of the working paper version contains a more general analysis of self-confirming economic policies. 32 The economic interpretation is that planned inflation a affects agents' expectations to an extent parameterized by h 1a . h 2 and h 3 quantify shock volatilities (see Sargent 2008, p. 18). Finally, the intercept h 0 is the baseline rate of unemployment that would (systematically) prevail at a zero planned inflation policy a ¼ 0.
Throughout the section we maintain the following assumption about structural coefficients.
In words, we posit a strictly positive baseline rate of unemployment, as well as strictly positive shock coefficients (nontrivial, possibly asymmetric, shocks thus affect both the inflation and the unemployment equations, their unknown values form the first component w; e ð Þ of the state vector). Finally, we assume that -other things being equal-more inflation reduces unemployment.
The reduced form of each model economy is The coefficients of the reduced form are n ¼ h 0 ; h 1p þ h 1a ; h 1p h 3 ; h 2 ; h 3 ð Þ 2 R 5 . Since h 3 6 ¼ 0 (Assumption 1), it is easy to check that different structural parameter vectors h 2 H correspond to different reduced form parameter vectors n, that is, We assume that only realized unemployment and inflation are observable by the monetary authority. Thus, the reduced form above will give us the feedback function u; p ð Þ ¼ f a; s ð Þof the previous sections. Specifically, rewriting (13) j j quantifies the impact of planned inflation on unemployment. It is the sum of the direct and indirect impact of planned inflation on unemployment quantified, respectively, by h 1a and h 1p . There is a systematic trade-off between unemployment and inflation when the multiplier is strictly negative, that is, n 2 \0. If so, the model economy is Keynesian; otherwise, it is new-classical. In the rest of the section we make the following hypothesis on the multiplier.
Thus, we assume that an increase in planned inflation never increases unemployment. A possible interpretation of the model is that h 1a = h 1p j j is the constant fraction of experienced/sophisticated agents in the economy who factor planned inflation into their expectations, and n 2 =h 1p is the fraction of inexperienced/naive agents. To sum up, the set of parameters is To clarify our language, we note that we keep using ''model'' in the same sense as in the previous sections, that is, a probability measure over states, or a specific parameter value that determines such measure. Thus, a set of parameterized equations like ( 11)-(12) corresponds to a class of models. We will therefore refer subclasses of models satisfying some restrictions as ''kinds''. With this, our analysis will pay special attention to the following two competing kinds of model economies.

Lucas-Sargent models
The first kind of model economy, based on Lucas (1972) and Sargent (1973), is In such new-classical models the policy multiplier n 2 is zero, and so the systematic part of inflation a has no effect on unemployment; only the unsystematic part h 3 e does.

Samuelson-Solow models
A second kind of model economy, based on Samuelson and Solow (1960), is In such Keynesian models, the policy multiplier n 2 ¼ h 1p is strictly negative: monetary policies affect, at steady state, unemployment rates.

Setup
The monetary authority chooses policy a. As anticipated, the state space is the Cartesian product S ¼ W Â E Â H, which expresses that the monetary authority is uncertain about both shocks and permanent features of the economy, or models. The consequence space C consists of unemployment and inflation pairs c ¼ u; p ð Þ, so we set C ¼ U Â P R 2 . The consequence function q :

Factorization
As anticipated, we assume that the messages received by the monetary authority are the policy outcomes. Hence, a message m ¼ u; p ð Þconsists of an unemployment and inflation pair, and the feedback function corresponds to the reduced form of the model economy. When the monetary authority chooses policy a and in the long run observes a distribution over u; p ð Þ pairs, it can partially infer the underlying stochastic model r. For example, if r has finite support, the induced probability of outcome u; p ð Þ is 33 The partially identified setR a r ð Þ of stochastic models indistinguishable from r is the set of r 0 that induce the same joint distribution on unemployment/inflation outcomes given a.
At this point, it is convenient to add structure to this setup to provide a sharp characterization of the partially identified set corresponding to each policy a and model r. Within a state s ¼ w; e; h ð Þ, the pair w; e ð Þ represents random shocks and h parameterizes a model economy. This suggests factorizing the probability models where the true marginal distribution of shocks q 2 D W Â E ð Þ is assumed to be known and d h ð Þ 2 D H ð Þ is a Dirac probability measure concentrated on a given economic model h 2 H, a permanent feature of the environment. We thus parameterize probability models with h and write r h .
The simplifying assumption that, at a steady state, the distribution q of shocks is known is common in the rational expectations literature since Lucas and Prescott (1971) and Lucas (1972). The resulting factorization (17) has two modeling consequences: (i) it establishes a one-to-one correspondence between model 33 In the general case, for any measurable set of outcomes economies and probability models (in particular, a true economic model h Ã corresponds to a true probability model r h Ã ); (ii) since q is known, it allows us to identify R with H via the relation and so to define the prior l on H. 34 A first dividend of the factorization is that the objective function (2) In words, shocks are uncorrelated and normalized.

Identification
In this ''factorized'' setup, we can shift our focus from observationally equivalent probability models r to observationally equivalent model economies h. The partially identified set becomes: With this, a sharp identification result holds.
Proposition 8 The partial identification correspondenceR a : Given the true model h, the shock coefficients h 2 and h 3 are thus identified, along with the slope h 1p of the Phillips curve, independently of the chosen policy a. As we discuss below, the intercept of the curve is also identified, but it depends on the maintained policy a through the unidentified parameter h 1a . This important identification result is made possible by some moment conditions, formally spelled out in the proof. We can, however, heuristically describe them via the bivariate random variable u a ; p a ð Þ: W Â E Â H ! U Â P that, for a given policy a, represents the unemployment and inflation rates determined by the state 34 The map h7 !q Â d h is bijective and measurable. See Corollary 3 in the appendix. 35 Whenever convenient, in what follows we will use the shorthand notation E for integrals, for example w; e; h ð Þ. 36 The monetary authority infers the following moments from the long-run distribution of outcomes: Therefore, is the beta coefficient of the Phillips regression of unemployment on inflation, 37 is the residual variance of u a (unexplained by the regression), and h 3 is the standard deviation of inflation. Finally, though the two structural coefficients h 0 and h 1a remain unidentified even in the long-run, they satisfy where the right side is the alpha coefficient of the Phillips regression. In the longrun, the alpha coefficient is observed by the monetary authority, but what is observed depends on the policy a that the authority chooses.

Estimated model economy
As an approximation of a situation in which the dataset is large and the sample variance is small, we take the idealized perspective of a monetary authority (or its econometrician) who can rely on an infinite dataset and therefore can perfectly estimate the identifiable parameters by observing some moments of the true distribution as specified in Proposition 8. The moments that identify the three coefficients h 1p , h 2 , and h 3 do not depend on the chosen policy a, but only on the true model h. To emphasize this key feature, we denote byb the beta regression coefficient that identifies h 1p , 38 byr ujp the residual standard deviation that identifies h 2 , and byr p the standard deviation of inflation 36 Formally, u a and p a are the sections u a; Á ð Þ and p a; Á ð Þ at policy a of the random variables u and p, respectively. 37 The Phillips regression u ¼ a þ bp is run by the monetary authority using long run data. 38 By Assumption 2, the beta coefficient of the Phillips regression is negative, that is,b\0. This negative sign will be tacitly assumed when interpreting our findings. that identifies h 3 . In contrast, the alpha regression coefficient that identifies the sum h 0 þ h 1a a depends on policy a; we denote it byâ a ð Þ. With this, we can writê As a result, the long-run estimated version of the model economy (11, 12) that the monetary authority considers is u ¼â a ð Þ þbp þr ujp w; ð22Þ In particular, (22) is the estimated aggregate supply equation and (23) is the estimated inflation equation. The intercept of the former equation depends on policy a via eq. (24), which only partly identifies the two coefficients h 0 and h 1a . In turn, this makes the policy multiplier n 2 ¼b þ h 1a unidentified. We will momentarily address this key partial identification issue.

Partial identification line
The monetary authority cannot identify -even in the long-run-the two structural coefficients h 0 and h 1a . The former is the average unemployment at zero planned inflation, h 0 ¼ E h u 0 ð Þ; the latter is the ''direct'' impact of policy on unemployment. The parameter space of the estimated model economy (22,24) reduces to H ¼H Â fðb;r ujp ;r p Þg, whereH ¼ R þþ Â ðÀ1; Àb is the collection of all possible values ðh 0 ; h 1a Þ of the two remaining unidentified coefficients and fðb;r ujp ;r p Þg is the singleton containing the identified vector h 1p ; h 2 ; h 3 ð Þ . To ease notation, in what follows we will consider directlyH as the parameter space. As a result, the parameter space is now a subset of the plane. By (19), the partial identification correspondenceR a :H ! 2H becomeŝ In words,R a h ð Þ is a straight line in the plane, with slope Àa and intercept h 0 þ h 1a a (determined by the policy a and by the true economic model h). We thus have a partial identification line that defines a linear relationship between the two unidentified coefficients, given the true model. In other words, partial identification is unidimensional.
Given true model h ¼ h 0 ; h 1a ð Þ, the collection fR a h ð Þ : a 2 Ag of partial identification lines is the family of all straight lines in the plane that pass through the true model h 0 ; h 1a ð Þand have slope À1=a. In each such line there is a unique Lucas-Sargent model, characterized by h 0 1a ¼ Àb, as well as a unique Samuelson-Solow model, characterized by h 0 1a ¼ 0. In other words, partial identification lines feature a unique specimen of each kind of models. Figure 1 illustrates the previous analysis. In particular, LS stands for Lucas-Sargent model and SS for Samuelson-Solow model, while the red (resp., blue) line is the partial identification line that correspond to policy a ¼ 0 (resp., a [ 0).

5.3
The policy problem: value, equilibria and welfare

Value and equilibrium
As much of the literature, we assume a quadratic von Neumann-Morgenstern utility function v : C ! R given by v u; p ð Þ ¼ Àu 2 À p 2 , so that the reward function r : A Â S ! R becomes: r a; w; e; h ð Þ¼À u 2 a; w; e; h ð ÞÀp 2 a; w; e; h ð Þ : The linear model economy and quadratic utility together form a classic linear quadratic policy framework.
Lemma 3 For every h; a ð Þ 2H Â A, we have R a; h The linear quadratic framework thus allows us to express the expected reward as the utility of expectations. As a result, the objective function (18) becomes As for self-confirming equilibria, we begin with a piece of notation: throughout the rest of this section we fix a true model economy h Ã (rather than h) inH, while h (rather than h 0 ) denotes a generic element ofH. With this notation, the partial identification line iŝ Hence, a policy and belief pair a Ã ; l Ã ð Þ2A Â DðHÞ is self-confirming if and only if Next we characterize self-confirming equilibria of the estimated model economy (22,23,24). In both equilibrium conditions, the true multiplier n Ã 2 ¼b Ã þ h Ã 1a and its conjectured value E l Ã ðn 2 Þ ¼b Ã þ E l Ã h 1a ð Þ play a key role. 39 Proposition 9 A policy and belief pair a Ã ; l Ã ð Þ2A Â DðHÞ is a self-confirming equilibrium given h Ã if and only if and The result can be heuristically derived in the special case of dogmatic beliefs, when l Ã is concentrated on a single parameter vector . By (26), up to a constant the monetary authority's value function is The conjectured multiplier is n 2 ¼b Ã þ h 1a . For instance, a new-classical authority 39 Recall thatb Ã is the beta regression coefficient of unemployment over inflation (given the true model h Ã ). that believes that there is no systematically exploitable trade-off between inflation and unemployment assumes h 1a ¼ Àb Ã (and so the conjectured multiplier is zero). In contrast, a Keynesian authority that believes in a trade-off may assume, for instance, h 1a ¼ 0 (the conjectured multiplier is thenb Ã , and so strictly negative). Based on the estimated model economy (22, 23, 24), a dogmatic authority conjectures that, according to the chosen policy a, the expected values of inflation and unemployment are constrained by the equation This conjectured constraint is the version of the estimated aggregate supply equation (22) that the authority expects to face systematically given its dogmatic belief. So the authority's decision problem is With this, the Lagrangian is and the first-order conditions are By solving them we get Since E h p a ð Þ ¼ a, the monetary authority's best reply is thus the policy a ¼ B h À Á . As a result, a policy and belief pair a Ã ; d h À Á À Á is a self-confirming equilibrium if and only if Simple algebra shows that this is the case if and only if and which are the equilibrium relations (27) and (28) in the case of dogmatic beliefs. 40 Figure 2 illustrates the previous heuristic argument when the true model is of Lucas-Sargent kind, so that h Ã 0 is the natural rate of unemployment and h Ã 1a ¼ Àb Ã (and so the true policy multiplier n Ã 2 is zero). Under this true model, policy a induces average unemployment E h Ã u a ð Þ ¼ h Ã 0 and average inflation E h Ã p a ð Þ ¼ a. But a Fig. 2 Self-confirming equilibrium in a new-classical world 40 Note that, with the dogmatic value h 1a of h 1a in place of its expectation E l Ã h 1a ð Þ, the dogmatic equilibrium relations are identical to the general ones. This is a consequence of the certainty equivalence principle stated in Proposition 3. monetary authority with dogmatic belief d h À Á expects to observe the pair of longrun averages E h u a ð Þ; a ð Þ . This dogmatic belief is confirmed, and so condition (30) is satisfied, if E h u a ð Þ ¼ h Ã 0 , that is, if the pair of average unemployment and average inflation lies on the vertical partial identification line with abscissa h Ã 0 . The subjective best reply condition (29) is represented by the tangency between the (red) indifference curve and the (green) conjectured constraint, according to which an increase Da in average inflation yields a À n 2 Da decrease in average unemployment, where n 2 ¼b Ã þ h 1a is the conjectured multiplier. When the dogmatic belief is such that h 1a ¼ 0 so that n 2 ¼b Ã becomes the conjectured multiplier, the monetary authority is ''orthodox'' Keynesian. See Fig. 3.
Its slope is the beta coefficient of the Phillips regression, which represents the trade-off between inflation and unemployment that the Keynesian authority believes to be systematically exploitable.

Policy activism and welfare
To complete our equilibrium analysis we need to compare the self-confirming equilibrium action with the objectively optimal one and to compute the resulting welfare loss.
To this end we need to consider the estimated policy multiplier n 2 ¼b þ h 1a . The authority underestimates the multiplier when E l Ã ðn 2 Þ [ n Ã 2 and overestimates it when E l Ã ðn 2 Þ\n Ã 2 . 41 In structural terms, E l Ã ðn 2 Þ?n Ã 2 if and only if E l Ã h 1a ð Þ?h Ã 1a . For instance, when h Ã 1a and E l Ã h 1a ð Þ are positive this means that the multiplier is under/ overestimated if and only if the direct impact of planned inflation on unemployment is over/underestimated.
The objectively optimal policy is It is immediate to see that . The equilibrium action is objectively optimal when the monetary authority has a correct expected value of the estimated policy multiplier n 2 . More generally, next we show that policy hyperactivism characterizes authorities that overestimate the policy multiplier, while hypoactivism characterizes authorities that underestimate it. 42 Proposition 10 Given a true model h Ã , for every self-confirming equilibrium a Ã ; l Ã ð Þ, 41 Both n Ã 2 and E l Ã ðn 2 Þ are negative (Assumption 2), and so E l Ã ðn 2 Þ?n Ã 2 if and only if E l Ã ðn 2 Þ 7 n Ã 2 . 42 Since n Ã 2 0 (Assumption 2), the cases considered in the proposition exhaust all possibilities. Also note that, since if and only if policy a Ã is objectively optimal, i.e., a Ã ¼ a o ; and only if policy a Ã is zero-target-inflation, i.e., a Ã ¼ 0.
For the monetary authority, both kinds of deviations from objective optimality, hyperactivism and hypoactivism, cause the same welfare loss. Indeed: In the next section we will illustrate this result with a few examples.

Equilibria
Assume that the monetary authority has dogmatic equilibrium beliefs DðHÞ is self-confirming if and only if it satisfies relations (31) and (32). Two special cases are noteworthy.
New-classical authority Suppose the monetary authority believes that the policy multiplier is zero, i.e., h 1a ¼ À h 1p . Since in equilibrium h 1p is identified by the slope of the Phillips regression, we have Here the conjectured constraint is vertical at the baseline unemployment rate h Ã 0 : the new-classical authority does not believe in any systematically exploitable trade-off between inflation and unemployment. A zero-target-inflation equilibrium policy results (Proposition 10-(iv)).
Keynesian authority Suppose the monetary authority believes that there is a fully exploitable trade-off between inflation and unemployment, i.e., h 1a ¼ 0. Then, in equilibrium, the conjectured policy multiplier n Ã 2 ¼b Ã is strictly negative. A positive-target-inflation equilibrium policy results: By Proposition 10, such a policy is , and objectively optimal if h Ã 1a ¼ 0. To sum up, the two equilibria feature new-classical nonintervention a la Friedman-Hayek and Keynesian activism, respectively. Regardless of the true model economy, such policy prescriptions emerge through suitable dogmatic beliefs.

A new-classical world
So far we did not fix a specific economic model. Now, by way of example, assume that a Lucas-Sargent model economy h Ã ¼ ðh Ã 0 ; Àb Ã Þ 2H is the true model, with no systematically exploitable trade-off between inflation and unemployment. Then, the pair Hence, the policy and belief pair is the dogmatic self-confirming equilibrium in a Lucas-Sargent model economy. By Proposition 10, policy a Ã is hyperactive when h 1a \h Ã 1a and objectively optimal when h 1a ¼ h Ã 1a . The welfare loss is ' a Ã ; h Ã ð Þ¼h Ã2 0 ðb Ã þ h 1a Þ 2 . Next we consider two different equilibria in this new-classical world according to the monetary authority's dogmatic beliefs.
New-classical authority Suppose the monetary authority correctly believes that there is no exploitable trade-off between inflation and unemployment, that is, is the new-classical self-confirming equilibrium. It features a zero-target-inflation policy, which is the objectively optimal policy (so, there is no welfare loss) as well as the fully revealing one that allows the authority to learn, in the long-run, the true coefficient h Ã 0 . Keynesian authority Suppose the monetary authority wrongly believes that there is a fully exploitable trade-off between inflation and unemployment, with say The policy and belief pair is thus a Keynesian self-confirming equilibrium. It features an hyperactive positivetarget-inflation policy. Since it is not the objectively optimal policy, the monetary authority suffers a welfare loss ' a Ã ; h Ã ð Þ¼ðh Ã 0b Ã Þ 2 .

A Keynesian world
What we noted above can be reversed as we consider the case of a Keynesian model economy where the policy multiplier n 2 is different from zero, i.e., the monetary authority may systematically reduce average unemployment. To consider a stark (although implausible) example, suppose that h Ã ¼ ðh Ã 0 ; 0Þ 2H is the true model, that is, there is a full systematically exploitable trade-off between inflation and unemployment because monetary policy does not affect expectations (h Ã 1a ¼ 0). A Keynesian authority makes the objectively optimal positive-inflation choice in equilibrium. A new-classical authority chooses zero inflation, an inferior outcome.

Welfare consequences
What are the welfare implications of incorrect beliefs under dogmatism? By way of example, we consider a new-classical authority in a Keynesian economy, as well as a Keynesian authority in a new-classical economy. The loss of a new-classical zero inflation policy in a Keynesian economy, with h Ã 1a ¼ 0, is ðh Ã 0b Ã Þ 2 . It is the same loss of a Keynesian nonzero inflation policy (36) in a new-classical economy: a mistaken new-classical authority has the same lower welfare as a mistaken Keynesian one.

Equilibria
Suppose that the monetary authority is not dogmatic, but has instead a two-model belief. 43 Specifically, she is uncertain whether the true model is of the Lucas-Sargent or Samuelson-Solow kind and her self-confirming subjective belief l Ã assigns positive probability mass to just one specimen of each kind, so that the (subjective) support consists of two points: a Lucas-Sargent model ðh ls 0 l Ã ð Þ; Àb Ã Þ and a Samuelson-Solow (Keynesian) model h ss 0 l Ã ð Þ; 0 À Á . Denoting by l Ã k 2 0; 1 ½ the subjective weight of the latter model, we can write belief l Ã as Since and the pair a Ã ; l Ã ð Þ is a self-confirming equilibrium if and only if and As a result, in this case, a pair of the form A is a self-confirming equilibrium for every l Ã k 2 0; 1 ½ . We thus have a continuum of equilibria parameterized by the subjective weight l Ã k of the model of the Samuelson-Solow kind (and so by the expected multiplier l Ã kb Ã ). In particular, the equilibrium policy a Ã is increasing in l Ã k : the higher the weight of the Keynesian model, the higher the planned inflation. If l Ã k ¼ 0 we get back to the dogmatic new-classical equilibrium, while if l Ã k ¼ 1 we get back to the dogmatic Keynesian equilibrium (Sect. 5.4.1).
In equilibrium, the coefficients (39) of the models of the Lucas-Sargent and Samuelson-Solow kind depend on the authority's subjective weight l Ã k : different weights correspond to different Lucas-Sargent and Samuelson-Solow equilibrium specifications. Though the support of the equilibrium belief (37) always contains a specimen of both classes of model economies, that specimen changes as the weight l Ã k changes. Finally, the welfare loss is This curious interplay between the models deemed possible and the weight on each kind of model is our main finding for the two-model self-confirming belief; therefore, it will be further clarified in a prominent special case.

A new-classical world
Assume that a Lucas-Sargent model economy h Ã ¼ ðh Ã 0 ; Àb Ã Þ is the true model. If so, by (38) and (39) the pair a Ã ; l Ã ð Þis a self-confirming equilibrium if and only if Hence, in this case, the pair is a self-confirming equilibrium for every subjective weight l Ã k 2 0; 1 ½ . The welfare loss is ' a Ã ; h Ã ð Þ¼ðh Ã 0b Ã l Ã k Þ 2 . As implied by the analysis of Sect. 5.5.1, we have a continuum of equilibria parameterized by the weight l Ã k of the model of the Keynesian (Samuelson-Solow) kind: if l Ã k [ 0 the equilibrium policy is hyperactive, if l Ã k ¼ 0 we get the dogmatic new-classical equilibrium (35). Moreover, if l Ã k ¼ 1 we get back to the dogmatic Keynesian equilibrium (36). Now, however, the equilibrium coefficient h ls 0 l Ã ð Þ is pinned down by the true natural rate of unemployment h Ã 0 : the monetary authority understands that, if the true model were of the Lucas-Sargent kind, average unemployment and baseline unemployment at 0-planned inflation would coincide; furthermore, in the case under consideration the average rate of unemployment must be the natural rate. In contrast, the subjective equilibrium coefficient h ss l Ã k Þ still depends on weight l Ã k : a higher subjective weight of the Samuelson-Solow specification corresponds to a higher planned inflation in equilibrium, hence, to a higher Phillips regression line, whose horizontal intercept is h ss 0 l Ã ð Þ. Thus, the support of the equilibrium belief always contains a specimen of the Samuelson-Solow model; it, however, changes as l Ã k changes. More generally, a two-model belief is determined by its (subjective) support and the relative likelihoods of the two models in the support. The self-confirming equilibrium conditions jointly discipline these two aspects of the belief. Figure 4 illustrates. The monetary authority is uncertain about the true economic constraint, the vertical line at the natural rate of unemployment or the Phillips regression line. Since the true model is of the Lucas-Sargent kind, at a selfconfirming equilibrium the average unemployment expected by the monetary authority must be the natural rate h Ã 0 ; the subjective best reply condition is expressed by the tangency between the (red) indifference curve and a (green) line describing the expected constraint, the slope of which is intermediate between the vertical line at the natural rate h Ã 0 and the Phillips regression line (which, in turn, depends on weight l Ã k via the equilibrium relation Þwith the represented self-confirming equilibrium determined by l Ã k [ 0, one can see that the latter features higher planned inflation a Ã [ 0 and higher horizontal intercept h ss 0 l Ã ð Þ[ h Ã 0 . Figure 5 gives an alternative geometrical representation. Fix the true model h Ã and an alternative model h. Every policy a induces a pair of objective expected rewards, the reward under model h Ã , Rða; h Ã Þ, and the reward under model h, Rða; hÞ. By changing a one obtains the locus of possible pairs of rewards. If Rða; h Ã Þ 6 ¼ Rða; hÞ, the monetary authority can infer which of the two models is true from the observed long-run average payoff. Therefore, the partial identification condition is Rða; h Ã Þ ¼ Rða; hÞ. At a self-confirming equilibrium ða Ã ; l Ã Þ with suppl Ã ¼ fh Ã ; hg, this belief-confirmation condition must hold; therefore, the equilibrium point Rða Ã ; h Ã Þ; Rða Ã ; hÞ ð Þ is at the intersection of the main diagonal in the RðÁ; h Ã Þ; RðÁ; hÞ ð Þ -space, the ''partial identification line,'' with the locus of feasible pairs Rða; h Ã Þ; Rða; hÞ ð Þ : a 2 A f g , the constraint. At this intersection point, the constraint curve must be tangent to the constant-SEU line with slope ð1 À l Ã k Þ=l Ã k . Recall that B Á ð Þ denotes the best reply function. With this, Þis constant on the support of selfconfirming belief l Ã (see Lemma 2 and Proposition 4). The correct-belief Þfeatures sharper beliefs than B l Ã ð Þ; h Ã ð Þ . Therefore, this is an instance of Proposition 5: self-confirming equilibria with sharper beliefs yield higher values and lower losses.

Concluding remarks
While applied theorists and economists more generally can benefit from seeing the self-confirming equilibrium concept in action, we think it is important to frame such applications within the context of an abstract analysis. Indeed, this allows to better understand key essential concepts like partial identification given the equilibrium choice, endogeneity of feedback about the state, and the role of observability of consequences. In this paper we put forward an abstract framework for the analysis of self-confirming policies amenable to economic applications, hence featuring both intrinsic randomness and (possibly) infinite spaces of actions and states. All the concepts and techniques can be extended to n-person games, but we focus on decision problems with uncertainty (i.e., one-person games with incomplete information) for several reasons. First, the analysis is simpler and it suffices for our examples and the monetary policy application. Second, it clarifies that selfconfirming equilibrium is a genuine equilibrium concept also in a one-person setting, because equilibrium beliefs are disciplined by choice-dependent evidence. This should be contrasted with Harsanyi's (1967) Bayesian equilibrium whereby subjective beliefs about unknown parameters are not disciplined by evidence; thus, in one-person settings Bayesian equilibrium just requires that the decision maker best reply to her subjective belief. Finally, we are not aware of simple and interesting n-person generalizations of our new comparative welfare results, Propositions 5 and 7. Our monetary policy application illustrates the abstract framework and extends previous work in several ways. In particular, it takes a more neutral perspective on the true model economy and it considers general beliefs rather than dogmatic ones. Besides the n-person case, several other extensions of the selfconfirming equilibrium idea are conceivable. Here we consider a few that we find worth exploring.
Ambiguity aversion It is possible to allow for non-neutral attitudes toward perceived ambiguity, 44 e.g., by considering the smooth ambiguity model of Klibanoff et al. (2005), or the maxmin model of Gilboa and Schmeidler (1989). This is done in a companion paper (Battigalli et al., 2021). Here we give a hint of why such extension is immaterial in the examples and application of this paper. Go back to Fig. 5.b. Choices are represented as vectors of objective expected rewards. The best-reply condition requires that the set of feasible vectors is separated by the upper-contour set of vectors preferred to the chosen one, which under ambiguity aversion is convex. The key observation is that, in our examples and application, every undominated feasible vector is on the ''efficient'' boundary of the convex hull of the feasible vectors, i.e., it is not dominated by convex combinations of feasible vectors. By an intuitive application of the separating hyperplane theorem, this means that if a feasible vector is a best reply under ambiguity aversion, then it is also a best reply under ambiguity neutrality (subjective expected utility maximization), with the upper half-space delimited by the separating hyperplane as uppercontour set. 44 That is, lack of certainty about the objective probabilities of consequences of choices.
Prospect Theory It would be also natural to extend the selfconfirming equilibrium idea and its applications to prospect theory models a`la Kahneman and Tversky (1979), see Wakker (2010) for an extensive treatment. The exercise is natural, but also challenging. On one hand, the equilibrium payoff is a natural (endogenous) reference point for the prospect theory analysis of selfconfirming equilibria. On the other hand, including the long-run empirical information represented by the partially identified setR a r ð Þ in a prospect theory model is less immediate than doing it in a smooth ambiguity or in a maxmin model. A possibility is to require the distortion functions featured by prospect theory (for gains and losses) to affect a specific model inR a r ð Þ. An alternative route is to consider ''smooth ambiguity like'' versions of prospect theory a`la Vinogradov (2013), and require the equilibrium prior to be supported onR a r ð Þ (this yields the previous approach if the decision maker's prior is a Dirac measure at some point inR a r ð Þ). In any case, the problem definitely deserves more attention, and presents an avenue for future research. The works of Peter Wakker on prospect theory provide a starting point for this intriguing companion quest.
Motivated beliefs Kunda (1990) wrote an influential paper on how motivation influences reasoning. Since then, ''motivated beliefs'' became an important topic in psychology and also in economics, as exemplified in the Introduction by Epley and Gilovich to an interesting symposium on this topic in the 2016 summer issue of the Journal of Economic Perspectives. In their contribution, Benabou and Tirole (2016) cite reports of how agents neglect negative information, distort it, or choose not to obtain important information at little or no cost. Such behavior is explained in economic models where agents' utility directly depends on their posterior beliefs and agents take this into account in forming their (action-dependent) beliefs. The self-confirming equilibrium (SCE) idea instead posits agents who take information more seriously and exploit all the information they obtain, given their choices. Despite such clear differences, SCE can be combined with belief-dependent motivations to explain an important stylized fact studied by the motivated-beliefs literature, i.e., the reluctance to acquire materially useful and cheap information (see Mannahan, 2021). Consider a decision maker (DM) with a prior belief l over probabilistic models parameterized by an unknown personal trait h 2 H & R, such as her intelligence, general ability, or health. Let l 0 denote her realized posterior belief, conditional on the received message (material outcome) given her action. To fix ideas, let the DM's ''psychological utility'' (Battigalli and Dufwenberg, forthcoming) be the sum of a standard utility function and an ego-utility component that depends on a posterior estimate of the unknown trait: v m; a; l 0 ð Þ¼v m; a ð Þþe E l 0 h ð Þ À Á , where e Á ð Þ is an increasing function. The decision maker can either choose a status quo-action a Ã that yields a known (or learned) distribution of material outcomes, or an alternative action a t (e.g., taking a test) that yields a h-dependent distribution of material outcomes, and that would teach her about her trait. It may well be the case that, absent the ego-utility component, the best choice (possibly the dominant one) would be to take the test. But if function e Á ð Þ is concave, the expected variability of the posterior estimate E l 0 h ð Þ may make taking the test too ''ego-risky'' for the DM. In this case, the status-quo action a Ã would be an SCE action. This is somewhat similar to the preference for the statusquo in an SCE under smooth-ambiguity aversion (Battigalli et al. 2015), with concavity of the ''second-order utility'' replaced by concavity of ego-utility.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommons.org/licenses/by/4.0/.
Note that this lemma does not require the measurable spaces X; X ð Þor Y; Y ð Þ to be standard Borel, but rather hinges on the choice of the natural sigma algebras on D X ð Þ and D Y ð Þ.
Lemma 5 Let X and Y be standard Borel spaces and let u : X ! Y be measurable. If u is one-to-one, thenû is one-to-one. In this case: ð Þ, that is, u generates X ; •û : DðXÞ ! Dðu X ð ÞÞ is a measurable isomorphism (under the identification of uðnÞ on Y with its restriction to Y \ u X ð Þ).
In particular, the following three statements are equivalent: (i) u : X ! Y is one-to-one; (ii)û : DðXÞ ! DðYÞ is one-to-one; (iii) u generates X .
MMT If u : X ! Y is an injective and measurable map between two standard Borel spaces X and Y, then u X ð Þ 2 B Y and u is a measurable isomorphism between X; B X ð Þ and u X ð Þ; B Y \ u X ð Þ ð Þ . In particular, u B ð Þ 2 B Y for all B 2 B X .
By MMT, for each B 2 B X , since u B ð Þ 2 B Y , then B ¼ u À1 u B ð Þ ð Þ2u À1 B Y ð Þ, whence B X u À1 B Y ð Þ; and the converse inclusion follows from the measurability of u : X ! Y. Thus u generates B X . This proves the first three points of the statement.
If n; n 0 2 D X ð Þ, thenû n ð Þ ¼û n 0 ð Þ if and only if n u À1 C ð Þ ð Þ¼n 0 u À1 C ð Þ ð Þfor all C 2 B Y , that is, if and only if n and n 0 coincide on the sigma algebra u À1 B Y ð Þ generated by u. But u À1 B Y ð Þ ¼ B X , then n B ð Þ ¼ n 0 B ð Þ for all B 2 B X , thus n ¼ n 0 . Therefore,û is one-to-one.
Ifû : DðXÞ ! DðYÞ is one-to-one, since it is measurable and the spaces are standard Borel, by MMT, it follows thatû D X ð Þ ð Þis a Borel subset of D Y ð Þ andû is a measurable isomorphism between D X ð Þ; as wanted. So far, we have shown that, if u is injective, then the map is an isomorphism of standard Borel spaces; and that for all k 2 Dðu X ð ÞÞ, the set function n k ¼ k u, defined by k u ð Þ B ð Þ ¼ k u B ð Þ ð Þ for all B 2 B X , belongs to D X ð Þ and k ¼ iû n k ð Þ ð Þ¼ũ k u ð Þ; that is, Þ. This proves the fourth point of the statement. Finally, we already proved that if u is injective, then u generates B X , and that if u generates B X , thenû is injective. The proof is concluded by showing that, ifû is injective, so is u. We will actually prove the contrapositive statement.
Recall that d : X ! D X ð Þ is the Dirac embedding of X into D X ð Þ. For each x 2 X, we haveûðd Therefore, if u is not one-to-one,û is not one-to-one. In fact, if there are x 6 ¼ z in X such that u x ð Þ ¼ u z At the opposite side of the spectrum we have the case of constant functions.
Lemma 6 Let X and Y be standard Borel spaces and let u : X ! Y be measurable. Then u is constant if and only ifû is constant.
Proof If u y is constant, then given any r 2 D X ð Þ and any C 2 Y, The converse is proved by contraposition. If u is not constant, then there exist x 6 ¼ y such that u x ð Þ 6 ¼ u y where the two external equalities follow from (41) and the internal inequality by the fact that u x ð Þ 6 ¼ u y ð Þ and singletons are measurable in C. h Let X; X ð Þ, Y; Y ð Þ, and Z; Z ð Þ be measurable spaces, f : X ! Y and g : Y ! Z, and h : X ! Z. It is convenient to denote by F ð Þ X the sigma algebras generated by f, g, and h, respectively. Moreover, we will say that h is f-measurable if it is F -Z -measurable.
Lemma 7 Let X; X ð Þbe a measurable space, and let Y; Y ð Þand Z; Z ð Þbe standard Borel spaces. Then the following conditions are equivalent for two measurable functions f : X ! Y and h : X ! Z: Finally, the following results relate self-confirming equilibrium actions to objectively optimal actions: Corollary 5 A fully revealing action is self-confirming if and only if it is objectively optimal.
Under own-action independence of feedback about the state, we have a stronger result. Eq. (5) and Lemma 2 imply: Corollary 6 Under own-action independence of feedback about the state, an action is self-confirming if and only if it is objectively optimal.
Thus, from a decision perspective, own-action independence of feedback is equivalent to perfect feedback. Proof of Lemma 2 Fix a 2 A. Observability of consequences implies that q a s ð Þ ¼ g a f a s ð Þ ð Þ for each s 2 S, where g a : M ! C is B M À B C -measurable; as f a : S ! M is F a À B M -measurable, then q a : S ! C is F a À B C -measurable. Moreover, v : C ! R is B C À B R -measurable and bounded above, and so r a ¼ v q a : S ! R is F a À B R -measurable and bounded above. Thus, 8r 2 D S ð Þ; R a r ð Þ ¼ Z S r a dr ¼ Z S r a dr jF a : In particular, if r 2 R and r 0 2R a ðrÞ, then R a r ð Þ ¼ R S r a dr jF a ¼ R S r a dr 0 jF a ¼ R a ðr 0 Þ.
As a result,R a Ã h Ã ð Þ is equal to