Forgetful updating and stubborn decision-makers

A decision-maker receives an informative signal each period and is randomly required to make a terminal action based on the signals received so far. The decision-maker is restricted to use a (stochastic) finite automaton no larger than a given size to process information. In contrast to the existing literature that focuses on very low probability of termination, I consider information structures with a (nearly) revealing signal, in which analytical solutions are available for all probability values of termination. Results from that model reveal two robust predictions regarding constrained optimal behaviour. First, it is optimal to ignore small (in terms of informativeness) signals. Second, when deterministic schemes are optimal, big signals with similar strengths should be treated similarly; otherwise, randomization takes a lexicographic order according to the strengths of the signals. I also identify a new behavioural bias, information stubbornness, according to which the decision maker does not respond to further informative signals after seeing a nearly revealing signal. As a result, the decision-maker can persistently choose the wrong action even after an unbounded number of informative signals.


Introduction
Recently there has been a surge of interests in behavioural biases in economic agents' decision-making processes. One way to understand such biases is to introduce frictions/capacity constraints in the agent's ability to process information, and derive I am grateful to Kalyan Chatterjee for discussions that benefit the paper in early stages. behavioural biases as the endogenous responses to those constraints. In particular, Hellman and Cover (1970) and its recent incarnation, Wilson (2014), derive constrained optimal decision rules under limited memory modelled by finite automata constrained by a given number of memory states. 1 Wilson (2014) shows that a constrained decision-maker (DM) only responds to extreme signals but ignores any others, and a positive extreme signal moves one memory state up and a negative one moves one down (although the movements can be randomized), regardless of their relative strengths. However, these results are obtained at the limit case where the expected time horizon for a decision is infinitely far away relative to the memory capacity, and it is not obvious whether or which of these results are robust when the expected time horizon is bounded away from that limit.
In this paper, I revisit these results for a class of information structures that allow for tractable analysis. I adopt the same framework as (Wilson 2014), which features two possible states of nature, H and L. In each period the DM receives an informative signal, and there is a chance that she has to make a terminal decision. Without memory constraints, the optimal decision rule is to take the optimal action with respect to the posterior from the signals received so far according to the Bayes' rule. Instead, a memory constraint is imposed so that only decision rules that are implementable by a finite automaton of a given size are allowed. This consists of finitely many memory states and a transition rule that governs its evolution by specifying the next state to go to (which can be randomized) conditional upon the signal received. Each memory state is associated with an action, which would be taken if a terminal action is called for when the DM happens to land on that memory state. Wilson (2014) focuses on the limit case where the probability of terminal action, η, vanishes. Assuming that the distribution of signals has a full support, she proves two salient features of the optimal finite automaton of a given size for η sufficiently small. First, small signals are ignored in the sense that they do not trigger a transition to a different memory state. Surprisingly, however, all signals but the most informative signal in either direction are small signals, and hence the optimal automaton only reacts to extreme signals. Second, the memory states can be ranked in the sense that the extreme signal (the high signal) that increases the posterior on H would cause a transition to a higher memory state, and the extreme signal that decreases it (the low signal) would cause a downward transition. Here comes the second surprise: any transition can only occur between adjacent memory states (although sometimes with different probabilities), even though the high signal and the low signal can have very different relative strengths in terms of Bayesian updating.
In contrast to Wilson (2014), I consider arbitrary termination probability for a class of information structures that allow for tractable analysis. First I show that the decision problem in Wilson (2014) with memory constraints implies intrinsic biases: no finite automaton can implement the unconstrained optimal decision rule under the fullsupport condition. This makes the model intractable and only asymptotic results are available as η vanishes. In contrast, if one of the signals fully reveals one of the states of nature, then the unconstrained optimum is implementable with a finite automaton, whose size depends on the prior belief. A two-signal information structure with one of then being fully revealing is called a "model of breakthroughs," with the revealing signal being the "breakthrough" signal. 2 In the model of breakthroughs I obtain analytical solutions for all η's. In particular, in any optimal finite automaton, the revealing signal, say the low signal, will cause a transition to the lowest memory state which is absorbing. I then show that this feature remains for the constrained optimal finite automaton when the low signal is sufficiently informative, but it does not need to be fully revealing. This new behavioural bias, called information stubbornness, is not included in Wilson (2014)'s results, although I also maintain the full-support condition. This feature is proved using results from the model of breakthroughs and the modified multi-self consistency, a concept invented by Piccione and Rubinstein (1997) and then extended by Wilson (2014) to the framework here. Multi-self consistency implies that in the optimal finite automaton, each memory state is associated with a posterior (potentially biased) belief, and both the optimal action and optimal transitions are determined by a version of "sequential rationality" according to those beliefs. This necessary condition for optimality then traces biases in behaviour back to the biases in the associated beliefs. A key observation here is that these beliefs are continuous in the underlying parameters governing the information structure. Since it is strictly optimal to transit to the lowest memory state when the low signal is fully revealing, the main part of the proof shows that it remains so when the low signal is sufficiently informative.
Information stubbornness highlights a new behavioural bias implied by limited memory. Without memory constraints, the DM would revert to the action corresponding to state of nature H when she receives sufficiently many high signals. In contrast, the constrained optimal rule dictates the action corresponding to L once a low signal is received, without responding to any signals thereafter, despite the fact that they are informative. Furthermore, the result also differs significantly from those in Wilson (2014), who shows that any transition can only occur between adjacent memory states at the limit where η is arbitrarily close to zero. In contrast, information stubbornness implies that the transition "jumps" from any memory state to the lowest one when a low signal is received, and it stays there thereafter.
Besides information stubbornness, I obtain two other analytical features of the constrained optimal rule by adding a third signal to the model of breakthroughs. First, I consider the situation where the third signal is sufficiently uninformative, and show that it is optimal to completely ignore it in the constrained optimal rule. This result is consistent with Wilson (2014)'s result for η close to zero but here it is generalized to all η in my environment. Thus, besides information stubbornness, the bias "sticky information" identified in Wilson (2014) still holds here. Second, I consider the case where the two unrevealing signals have similar strengths in the model of breakthroughs. In contrast to Wilson (2014), the DM treats signals of similar strengths similarly, albeit in a lexicographical order. When randomization is optimal, the stronger signal is more fluid-it moves the memory state to the next until the weaker one is completely sticky. However, both signals can be fluid even for η arbitrarily small.

The model
The model is essentially (Wilson 2014). There are two states of nature, θ ∈ {H , L}, and the prior probability over θ = H is P 0 (H ) = p 0 . The model has an unbounded number of periods, and in each period, with probability η the DM has to choose between two actions, a ∈ A = {a H , a L }, with utility function u(a, θ) given by Once the action is chosen the game is over. With probability 1 − η the game continues. Note that the chance to choose the terminal action is exogenously given.
The state of nature, however, is not observable to the DM. Instead, the DM can observe a sequence of signals, which is i.i.d. conditional on the state of nature. Here I focus on information structures with two signal, S = {h, }. A decision rule is a function D : S * → A, where S * = ∞ t=0 S t is the set of all partial histories of signal realizations, including the empty one. The decision rule maps each possible partial history of signal realizations to the terminal action if the DM is called upon to make one. The unconstrained optimal rule can be fully characterize by the posterior, p: the optimal decision is to take a H whenever p > p * ≡ u L /(u H + u L ) and to take a L whenever p < p * . The posterior is computed according to the Bayes rule, for which it is convenient to work with likelihood ratios. Define I normalize the labels so that ξ(h) > 1 > ξ( ). I use ρ to denote the likelihood ratio p/(1 − p) for a generic posterior p on H , and from ρ and signal s, the likelihood ratio for the new posterior is ρ = ρξ(s). Hence, h increases the posterior on H while increases the posterior on L.

Finite automata and implementation
Given the set of signals, S, a stochastic finite-state automaton (SFSA) consists of a list M = Q, τ, d, g , where Q is the set of memory states, τ : Q × S → (Q) is the transition rule, d : Q → (A) is the action rule, and g ∈ (Q) is the distribution over initial states. I use τ (q, s; q ) to denote the transition probability from q to q when receiving s, and d(a; q) to denote the probability of taking action a at memory state q when called upon for terminal action. When the finite automaton is deterministic (abbreviated as DFSA), I use τ (q, s) = q to denote the transition rule and d(q) = a to denote the decision rule. The following result fully characterizes the signal structures for which a finite automaton can implement the unconstrained optimum.
Proposition 2.1 is proved by the use of the Myhill-Nerode Theorem (Nerode 1958), which gives a full characterization of what finite automata of a given size can do in terms of partitions of the partial histories. 3 For self-containment I give the relevant version of the theorem in the Appendix. Proposition 2.1 then shows that except for the case where one of the two signals fully reveals the state of nature, no finite automaton can implement the unconstrained optimum, and hence the constrained optimal behaviour must exhibit some biases relative to the full rationality benchmark.
The impossibility result can be easily extended to more than two signals, as more signals can only make the problem more complicated. As a benchmark, in the next section I study a model where the -signal does fully reveal the state of nature, and in Section 4 I use the results from there to characterize the constrained optimal behaviour when the -signal is not fully revealing but only nearly so.

A model of breakthroughs
Here I consider the case where μ H h = 1, that is, ξ( ) = 0. 4 Later I will extend the analysis to allow for μ H h < 1. To simplify notation, denote μ L ≡ μ with μ ∈ (0, 1). A low signal, , fully reveals the state of nature L (hence a breakthrough), while a high signal, h, only gradually increases the posterior on H . Without memory constraint, the optimal rule would dictate that action a L to be taken whenever an -signal appears and thereafter, while a H is taken only after sufficiently many h-signals without seeing any -signal. As shown later, this can be implemented by a finite automaton with sufficiently many memory states. I will also characterize the constrained optimal rule when memory is constrained.
First, suppose that ρ 0 ≥ ρ * , where the likelihood ratios are defined in (1). In this case, the unconstrained optimum can be implemented by a two-state DFSA with , and d(q H ) = a H and d(q L ) = a L , as depicted in Fig. 1. Thus, the memory state q L is self-absorbing for both signals, and the action is a L ; q H is absorbing only for signal-h, and the action is a H . This DFSA, labelled M b 2 , with q H as the initial state, implements the unconstrained optimum: since p 0 ≥ p * , it is optimal to take a H when receiving high signals, and it is optimal to switch to a L and continue using that whenever a low signal is received. Note that this implementation does not depend on η. Now suppose that ρ 0 < ρ * . In this case, the unconstrained optimum can still be implemented by a DFSA, but its size depends on the distance between ln ρ 0 and ln ρ * relative to ln ξ(h). In particular, let 786 T-W. Hu Then, the optimal DFSA requires N + 2 states, given by Fig. 2, with q 1 as the initial memory state. The action rule is given by , implements the unconstrained optimum for all η ∈ (0, 1). When N = 0, this is equivalent to M b 2 , and hence we can set N (ρ 0 ) = 0 in for ρ 0 ≥ ρ * . These results then show that whether the memory constraint, |Q| ≤ K , binds or not depends only on ρ 0 but not on η, and it binds if and only if K < N (ρ 0 ) + 2. Conversely, for a given K , let ρ K 0 be the smallest ρ 0 such that N (ρ 0 ) + 2 ≤ K . Then, the memory constraint is binding if and only if ρ 0 < ρ K 0 . The following proposition shows that strict randomization is optimal if and only if the memory constraint is binding and the optimal SFSA is not trivial. In fact, it shows that the following SFSA is optimal, which deviate from Finally, define M L as the finite automaton with one memory state that takes action a L all the time.
Moreover, ρ 0 converges to zero as η goes to zero.
Proposition 3.1 gives a full characterization of optimal SFSA in the model of breakthroughs. The optimal SFSA identified uniquely determines the constrained optimal decision rule in each case (except at ρ 0 = ρ 0 ), and is the smallest SFSA to implement the corresponding rule. It shows that strict randomization is optimal if and only if the memory constraint binds, that is, ρ ∈ (ρ 0 , ρ K 0 ). For lower priors, the optimal rule is to take a L all the time, which reflects the fact that randomization cannot fully substitute for efficient learning. The lower bound for priors under which randomization is optimal, ρ 0 , does depend on η, although its upper bound, ρ K 0 , does not. At the threshold ρ 0 , M K b (α * ) under optimal α * and M L give exactly the same ex ante payoffs. However, as η converges to zero, the payoff from M b K (α) for any α < 1 converges to p 0 u H + (1 − p 0 )u L , the highest payoff possible, and this implies that ρ 0 converges to zero. To my knowledge this is the first complete characterization of constrained optimal rules for a class of information structure for all η and for all priors.

Information stubbornness
In this section I show information stubbornness when is sufficiently informative, but not fully revealing, according to which it is optimal for the DM to ignore any further signals once an -signal is received-hence the stubbornness. To this end, I first need some structural results from Wilson (2014) to characterize optimal rule under memory constraint. Given a state of nature θ and a memory state q ∈ Q, the expected payoff accumulated from q when called to act conditional on θ is then where As noted in Wilson (2014), f (q|θ) is the stationary distribution under the transition probability from q to q given by Wilson (2014), extending (Piccione and Rubinstein 1997), then defines the "belief" associated with memory state q ∈ Q as and p(q, s) with the associated likelihood ratios given by To characterize optimal SFSA, I use V q (θ ) to denote the continuation value when the current memory state is at q conditional state of nature θ , which is characterized by the following recursive equations: Following (Wilson 2014), two memory states q and q are called Proposition 4.1 Suppose that M is an optimal SFSA with memory constraint K and without equivalent states. Then, we rank the memory states in M according to Proposition 4.1 follows directly from Corollary 1 and Lemma 1 in Wilson (2014). For self-containment, however, a full proof of Proposition 4.1 can be found in the Online Appendix. As in Wilson (2014), the proof is mainly based on an extended version of the multiself consistency proposed by Piccione and Rubinstein (1997), and I prove that version in the Online Appendix as well. Proposition 4.1 states that in the optimal SFSA, the DM essentially uses the beliefs ρ(q) to decide the optimal transition and the optimal actions if called upon. Note that Proposition 4.1 applies to all information structures, including the model of breakthroughs and information structures that feature full supports. Now I turn to information stubbornness and consider the case where μ H h < 1 but close, and hence the signal is no longer a breakthrough but still a strong signal for state of nature L. Recall the threshold ρ 0 given by Proposition 3.1.
, any optimal SFSA transits to the lowest memory state after from any memory state, which is absorbing.
According to Proposition 4.2, for any given η ∈ (0, 1) and ρ > ρ 0 , for a range of information structures that satisfy full-support, when receiving signal it is optimal to transit all the way back to q L for all memory states in any optimal SFSA. This is in great contrast to the conclusion from Wilson (2014), where transition only occurs between adjacent memory states for small η's. The difference lies in the order according to which the limit is taken. In Wilson's case it is to fix μ H h < 1 and to take η to zero, in my case it is to fix η (no matter how small) and to consider μ H h close to one. Proposition 4.2 also implies that the DM's behaviour exhibits behavioural biases relative to the full-rationality benchmark. In particular, consider a situation where the DM receives an -signal at an early stage and hence is stuck in the memory state q L . Her belief then stays the same, regardless of the signals she receives afterwards. In contrast, a Bayesian DM would shift her beliefs upwards if more h-signals come. In fact, if the underlying state of nature is H , then the constrained DM would almost surely get stuck with a biased belief fully convinced of L, while the Bayesian DM's belief would almost surely converge to H . Note that the result holds for any given K . This finding also highlights a difference between modelling limited memory by finite automata and by bounded recall. In models of bounded recall, only most recent experiences count. 6 In contrast, here the -signal can occur some time ago while the DM remains stubborn.
The result is proved by showing that the optimal SFSA takes the form given by (3) in the relevant range of priors. Since Proposition 3.1 also ensures uniqueness, by continuity one only needs to check against small deviations to the transition probabilities in the optimal SFSA. This is guaranteed essentially by the continuity of the beliefs identified in Proposition 4.1 applied to the model of breakthroughs. The proof verifies that when μ H h = 1, it is strictly optimal to transit to q L for all memory states, and this optimality is maintained when μ H h is close to one by continuity.

Three signals
Here I extend the results to information structures with three signals where S = {h, h , }. For expositional simplicity I only consider the case where μ H h + μ H h = 1, i.e., -signal fully reveals state of nature L as in the model of breakthroughs but signals h and h both increase the posterior on H . However, all the results here can be extended to the case where μ H h is close to one, as in Proposition 4.2. I use two benchmark cases to illustrate how different strengths of the signals h and h affect optimal randomization. I normalize the parameters so that ξ(h) > ξ(h ), that is, h-signal is a stronger signal than h . When η is close to zero, Wilson (2014) shows that transition can only happen when receiving h but not h . Here I show that this is true for all η when ξ(h ) is sufficiently close to one, that is, when h -signal is very uninformative. In contrast to her result, however, for ξ(h ) close to ξ(h), I show that it is optimal not to ignore h , but randomization may occur. I also characterize how the randomization is shared between h and h .
To state these results, I fix ξ(h) but vary ξ(h ) to be between 1 and ξ(h), and I do so by parametrizing the conditional probabilities as follows.
In the two extreme cases the optimal SFSA is characterized almost in exactly the same way as in Proposition 3.1. When ξ(h ) = 1, Proposition 3.1 directly applies by amending the SFSA so that τ (q, h ) = q for all q, that is, the h -signal is completely ignored. In the other extreme where ξ(h ) = ξ(h), the result still holds but with a sightly more subtle modification: randomization at q i under the optimal SFSA, M b K (α), can be split arbitrarily between the two signals as long as for all i = 1, . . . , K − 2, with q K −1 = q H . That is, the average probability of staying at q i is equal to α across the signals h and h . Note that when α = 0 is optimal this 1) implies that h -signal is ignored everywhere. Note also that, since ξ(h) is the same in both extreme cases, the unconstrained optimal decision rule is implementable if and only if ρ 0 ≥ ρ K 0 , where ρ K 0 is determined in exactly the same way as in Section 3 (that is, ρ K 0 is the lowest ρ 0 such that N (ρ 0 ) ≤ K − 2, with N (ρ 0 ) given by (2)). Interestingly, however, for μ L h slightly below ν, the unconstrained optimal rule is not implementable even for ρ 0 ∈ (ρ K 0 , ρ * ), since for ξ(h ) very close to one, it will require more than K h -signals to cross ρ * from ρ 0 . For ρ 0 < ρ K 0 , the constraint K will be binding for both extreme cases. Note that we still have the threshold ρ 0 below which taking a L all the time is the constrained optimal rule, but that threshold would depend on μ L h . We have the following proposition.

Proposition 4.3 Suppose that μ
Proposition 4.3 only considers priors in the range (ρ 0 , ρ * ). For ρ 0 ≥ ρ * , M b 2 is still optimal. For ρ 0 < ρ 0 , it is still optimal to use M L . For priors in between, Proposition 4.3 (1) shows that it is optimal to ignore signal-h when it is sufficiently uninformative, i.e., when ξ(h ) is close to the unity. Note that this implies a different kind of biases when the memory constraint is binding. Indeed, when ρ 0 > ρ K 0 but close and when ξ(h ) > 1 but close, a rational DM will take action a H after seeing sufficiently many h 's but not the constrained DM, who would stay at q 1 and take action a L . This result extends Wilson's (2014) result that small signals should be ignored, but here it is shown for all η's in information structures that are close to the breakthroughs. Proposition 4.3 (2) deals with the case where ξ(h ) is close to ξ(h) but slightly lower. Note that for ρ 0 ≥ ρ K 0 , the unconstrained optimal rule is still implementable when ξ(h ) is sufficiently close to ξ(h). According to Proposition 4.3 (2), for lower ρ 0 , randomization is optimal and it would first occur when receiving h before it occurs when receiving h; in other words, randomization between staying in the current memory state and the next when receiving the more informative signal h can happen only when it is optimal to ignore the less informative signal, h . Note that this last point essentially follows from the characterization result, Proposition 4.1, according Proposition 4.3 only considers the two extreme cases. For the case where K = 3, the model becomes tractable and we have the following proposition.

Proposition 4.4 Suppose that μ
According to Proposition 4.4, the optimal SFSA is M b K (β, β ) in the three-signal environment under the model of breakthroughs whenever the memory constraint is binding (for priors not low; otherwise M L is optimal). Note that when ρ 0 ≥ ρ * /ξ(h ) the unconstrained optimal rule is implementable. For lower priors, the second line in (12) shows that at q 1 both h and h are fluid signals, and neither is ignored. This is in contrast to Wilson's (2014) result where only extreme signals are fluid, and note that for this result I do not really need ξ( ) = 0 but only need -signal to be sufficiently informative as in Proposition 4.2.

Conclusion
An important point in Wilson (2014) is to establish that potential behavioural biases are in fact rational responses to decision-makers' imperfections in information processing abilities. Here I demonstrated that these implications do depend not only on the constraints imposed by such imperfections, but also the underlying environment, such as the probability of terminal actions. My results suggest that two behaviour implications are robust: first, it is almost universally true that constrained optimal rule ignores small signals; second, the DM should treat big signals according to their informativeness, and, when it comes to optimal randomization, use a lexicographical order.
Moreover, a novel behavioural prediction is identified here, information stubbornness, and I have shown that this is a robust feature of the constrained optimal rule when there is a sufficiently informative signal, regardless of the expected horizon of the decision problem. This bias shows that a single experience can lock in a memory-constrained decision-maker's belief without responding to future informative signals.
Finally, results in the paper are proved mainly based on the extended principle of modified multiself consistency proposed by Piccione and Rubinstein (1997). Hence, I suspect that most of them would survive in environments where that principle also holds true. For example, the result that small signals are ignored only depend on the fact that the belief associated with a memory state according to the modified multiself consistency lies strictly within the boundaries given by Proposition 4.1, which is a direct consequence of multi-self consistency under binary states of nature.

Appendix: Proofs
Before the proof of Proposition 2.1, I first give a version of the Myhill-Nerode theorem adopted to the framework in the paper. To do so, I need to introduce some concepts first. A relation over the set of partial histories of signal realizations, where • denotes concatenation. Given a decision rule D, L D θ denotes the set of partial histories under which action a θ is taken, θ = H , L.

Theorem 5.1 (Myhill-Nerode Theorem) The rule D can be implementable by a DFSA iff there is a right-invariant equivalence relation R with finitely many equivalence classes such that L D
θ is a union of some of those equivalence classes for both θ = H , L. The equivalence classes correspond to the memory states in the corresponding DFSA, and the equivalence classes that make up L D θ consist of those action states where action a θ is taken, for both θ = H , L. Thus, the DFSA gives a finite partition of partial histories that captures the finiteness of the DM's memory capacity. The right-invariance condition captures the fact that if the DFSA enters the same memory state after two different partial histories, then it will end up in the same memory state (although not necessarily the same as the original one) after any consecutive partial history.

Proof of Proposition 2.1
Note that the "if" part is proved in the analysis of the model of breakthroughs. For the "only if" part, under the assumption of two signals with ξ(h) ∈ (1, ∞) and ξ( ) ∈ (0, 1), I show that under the unconstrained optimal rule D, any right-invariant equivalence relation R such that L D θ is a union of some of those equivalence classes for both θ = H , L, it must have infinitely many equivalence classes. The "only if" part then follows from the Myhill-Nerode theorem. Now, let R be such an equivalence relation. Without loss of generality I assume that ρ 0 ≤ ρ * . Now, I construct an infinite sequence of partial histories, x 1 , x 2 , . . . , x n , . . . and show that each has to be of a different equivalence class. The construction is simple: each x n consists of m n h-signals, with the sequence {m n } ∞ n=1 determined as follows. Let m 1 be the smallest integer m ≥ 1 such that Given m n , m n+1 is the smallest integer m such that This implies that m n strictly increases with n. Now we show that, for each n ≥ 2, x n must belong to a different equivalence class other than the one x i belongs to for all i < n. To see this, let k be the smallest integer such that This implies that D(x i • y) = a L for all i < n, where y consists of k -signals. However, by (14), and hence D(x n • y) = a H . Thus, by right-invariance, x n must belong to a different cell than x i for all i < n.

Proof of Proposition 3.1
We first show that any optimal SFSA takes the form of with other transitions the same as M b K . Note that M b K (α) is a special case where α 1 = ....α K −2 = α. We may assume that there are no redundant states. By Proposition 4.1, we can rank the memory states. For notational consistency, I use q L to denote the one with the lowest associated belief and q H the highest, and rank the others by ρ(q 1 ) ≤ ρ(q 2 ) ≤ .... ≤ ρ(q K −2 ). Now, since an -signal sends the posterior to zero, it follows that ρ(q L ) = 0, and for q L it is optimal to stay at q L after any signal. For other memory state, since receiving h can only trigger a transition to go up or to stay, V q i (θ ) only depends on V q j (θ ) for j > i, while ρ(q i ) only depends on transitions from q j with j < i. Thus, we can take ρ(q i ) as the prior and look for the optimal SFSA with K − i memory states, and use the necessary conditions in Proposition 4.1 to characterize the optimal SFSA. Note that at q H , the highest memory state, it is optimal to stay there and take action a H after seeing h-signal.
This allows for an induction argument starting from i = K − 2 and then working backwards. We first show by induction that the optimal SFSA the transitions from q i , . . . , q K take the form M b,K −i+1 (α i , . . . , α K −2 ) (part (a)), and then we show that the optimal SFSA has α 1 = α 2 = ... = α K −2 = α ∈ (0, 1) (part (b)). (a) For i = K − 2 this is immediate, as the only possible transition is either staying in q K −2 or going to q H after seeing an h-signal. Suppose that it holds for i + 1 ≤ K − 2, and in the optimal SFSA the transitions from q i+1 , . . . , q K −2 take the form M b,K −i (α i+1 , . . . , α K −2 ). Below in (b) we prove that it is optimal to have α i+1 = · · · = α K −2 . This, by a simple computation from the corresponding value functions, implies that for all j = i + 1, . . . , K − 3,ρ j+1 /ρ j is constant and , that is, α i+1 = 0 and, by (15), this implies that the transitions follow the DFSA M b,K −i and, furthermore, ρ(q i+1 , h) =ρ i+1 ξ(h) =ρ i+2 . This last equality also implies that from q i+1 following signal h there is indifference between moving to q i+2 and q i+3 , implying that there is a redundant updating state. This concludes that ρ , by Proposition4.1, from q i and an h-signal, randomization can only occur between q i and q i+1 , or between q i+1 and q i+2 , but not both. Now we show that if the latter happens, then we can eliminate q i+1 without affecting the ex ante expected payoff and there is a redundant state; as a result, only the former matters and hence the optimal SFSA takes the form M b,K −i+1 (α i , . . . , α K −2 ) from q i on. To see this, suppose, by contradiction, that the optimal SFSA randomizes between q i+1 and q i+2 from q i after h. This implies that ρ(q i ) < ρ(q i+1 ), and that, by Proposition4.1, ρ(q i , h) =ρ i+1 , which in turn implies that ρ(q i+1 , h) > ρ(q i , h) =ρ i+1 . By Proposition4.1 the last inequality implies that α i+1 = 0, and by the symmetry noted above, in M b,K −i (α i+1 , . . . , α K −2 ) we have α i+1 = ... = α K −2 = 0. That is, we have a deterministic scheme from q i+1 on. Moreover, this also implies that at prior ρ(q i ), M b,K −i and M b,K −i−1 give exactly the same payoff, and hence q i+1 is a redundant state.
(b) Here I show that it is optimal to set α 1 = · · · = α K −2 ∈ (0, 1). First we compute the value functions under M b K (α 1 , . . . , α K −2 ): Note that this implies that the ex ante payoff, p 0 V q 1 (H )+(1− p 0 )V q 1 (L), is symmetric in (α 1 , . . . , α K −2 ) and supermodular, and hence it is optimal to set α i = α for all i. By doing so, the ex ante payoff is Thus, Thus, F (α) > 0 at α = 0 if and only if that is, if and only if the memory constraint is binding. Since the best DFSA is M b K = M b K (0) (other than taking a L all the time), this shows that strict randomization is better. Note that this also shows that M b K (α) is the optimal SFSA for a range of priors below ρ K 0 . Finally, I show the existence of ρ 0 above which taking a L all the time is not optimal. For each ρ 0 ≤ ρ K 0 , let W (ρ 0 ) be the payoff from M b K (α) with the optimal α. By the Theorem of Maximum, W (ρ 0 ) is continuous in ρ 0 . Now, at ρ 0 = ρ K 0 , since M b K implements the unconstrained optimum, we have W (ρ K 0 ) > (1 − p 0 )u L , the latter being the payoff from taking a L always. Now, and by (19), this is equivalent to But we also know that α * satisfies the FOC with F (α * ) = 0, and by (20), this is then equivalent to Since α * strictly decreases with ρ 0 , there exists a unique ρ 0 < ρ K 0 such that W (ρ 0 ) > (1 − p 0 )u L for all ρ 0 ∈ (ρ 0 , ρ K 0 ]. Finally, from (19) it is straightforward to verify that for any α ∈ [0, 1) and hence ρ 0 converges to zero as η approaches zero.

Proof of Proposition 4.2
Let η ∈ (0, 1) be given. When μ H h = 1, from Proposition 3.1 for any ρ > ρ 0 the Note that in the latter case, when K > N (ρ 0 ) + 2, there can be multiplicity of optimal SFSA up to redundant memory states (i.e., one can split q i to q i,1 and q i,2 with identical transition and action rule as q i , and hence with V q i,1 (θ ) = V q i,2 (θ ) for both θ = H , L). I prove the results in two parts, first for ρ 0 < ρ K 0 and the second for then for a range of μ H h below one the optimal SFSA still takes the form of M b K (α). To prove this, I first show that small deviations from M b K (α) are not optimal. More precisely, consider each SFSA (with K memory states) as a vector of all transition probabilities and action probabilities, and consider the distance between them as the maximum difference across those probabilities, and I show that there exists 0 such that optimal SFSA takes the form of M b K (α) among SFSA that deviate from those within distance 0 .
To simplify notation, denote μ H h by 1 − ε and μ L by μ. Since the beliefs ρ(q i ) and the value functions V q i (θ ) are all continuous in μ H h and since when μ H h = 1, in M b K (α) we have ρ(q L ) = 0, it follows that for small deviation of the transition probabilities, we still have ρ(q i , ) <q 0 for μ H h slightly below one. Thus, by Proposition 4.1, optimal SFSA has to take the form of M b K (α 1 , . . . , α K −2 ). We can then compute the continuation values for By the same arguments as before, it is optimal to set α i = α for all i, and hence The function F(α) is quasi-concave in α ∈ [0, 1], and hence its local optimum is continuous in ε. To see this, note that, disregarding some constants, F (α) is proportional to and it can be verified that the second term in the last equation strictly decreases with α whenever ε < μ, and hence can have at most one solution to F (α) = 0 besides α = 1. Note that α = 1 is excluded by the fact that at ε = 0, the optimal SFSA is M b K (α * ) with α * ∈ [0, 1), and hence it is cannot be optimal for ε small. Now, let ε be small so that we have local optimality. To ensure global optimality, consider the set of SFSA M that does not take the form M b K (α 1 , . . . , α K −1 ) and is at least 0 -distance from any of those, denoted by D. This set is compact, and, when μ H h = 1, By continuity again, there exists ε 1 ∈ (0, ε] such that for any μ H h ∈ [1 − ε, 1], (27) still holds. Part (ii) Here I show that if ρ 0 > ρ K 0 , then optimal SFSA takes the form M b N +2 with N = N (ρ 0 ) for a range of μ H h below one. The proof follows the same outline as in (i), but now I need to handle replicate memory states as K > N (ρ 0 ) + 2. To do that, notice that in any replica of M b N +2 the beliefs for any replica of q L and q i remains the same for i = 1, . . . , N as their correspondents, but the replica for q H can be higher. However, whatever that belief may be, it is finite and p(q H , ) = 0 when μ H h = 1 and hence it is still optimal to transit to q L for a range of μ H h below one.

Proof of Proposition 4.3
Proof of (1). Here I consider μ L h = ν − ε for ε small. Let ρ 0 ∈ (ρ 0 , ρ * ) and K be given. When ε = 0, the optimal SFSA is M b min{N (ρ 0 )+2,K } (β, 1) for appropriate β. One can then compute the value functions and associated beliefs, which are very similar to those computed in the Proof of Proposition 3.1, and in the optimal SFSA, ρ(q i ) <ρ i for all i. Using the same arguments as in the Proof of Proposition 4.2 and continuity of the beliefs and value functions, we can restrict attention to local deviations in the same sense as in the Proof of Proposition 4.2. But this then implies that ρ(q i , h ) <ρ i under any local deviations and for ε small, as a result, by Proposition 4.1, the optimal SFSA has the form of M b K (β, β ) with β = 1. The fact that it is optimal to have symmetric transition probabilities across memory states follow the same arguments as in Proposition 3.1. Proof of (2). Here I consider μ L h = ν(1−μ) 1−ν +ε for ε small. Let ρ 0 ∈ (ρ 0 , ρ K 0 ) be given. When ε = 0, the optimal SFSA takes the form M b K (β, β ). Note that when ε = 0, the optimal β and β are not determined individually but only the joint transition probabilities as given by (11) are uniquely determined. Using the same arguments as in the Proof of Proposition 4.2 and continuity of the beliefs and value functions, for ε small, by Proposition 4.1 we can restrict attention to SFSA in the form of M b K (β, β ). The fact that it is optimal to have symmetric transition probabilities across memory states follow the same arguments as in Proposition 3.1. Now I study the optimal β and β for ε small, which, as we will see below, are uniquely determined for ε > 0. Let Now, the continuation values can be computed in the same way. In particular, V q i (H ) still takes the form of (16), but with α replaced by α H , and V q 1 (L) still takes the form of (18), but with α replaced by α L . Thus, the expected payoff from M b K (β, β ) is now Taking derivatives, we have (1 − μ), Now, for any ε > 0, since the first terms in both (28) and (29) are negative but the second terms are positive and since ∂ F ∂β = 0 implies that ∂ F ∂β < 0, that is, if optimal β < 1, optimal β = 0. Moreover, this shows that whenever ρ 0 ≤ ρ K and hence at α H = α L = 0, ∂ F ∂β ≥ 0 when ε = 0. Thus, for any ε > 0, ∂ F ∂β > 0 under β = β = 0 and hence optimal β > 0.

Proof of Proposition 4.4
Without loss of generality, rank the memory states so that ρ(q L ) < ρ(q 1 ) < ρ(q H ), and assume that from some memory state after seeing -signal it transits to q L and d(q L ) = a L . This implies that ρ(q L ) = 0, and by Proposition 4.1 it follows that it is optimal to transit to q L from any memory state after an -signal. Also, in q 1 , after seeing signal-h or h the transition cannot go down. Moreover, d(q H ) = a H for otherwise there is no variation in actions. This also implies that q o = q 1 . If d(q 1 ) = a H , then q 1 and q H become equivalent memory states. So d(q 1 ) = a L . It then follows that the optimal SFSA takes the form M b 3 (β, β ). For ρ 0 ∈ [ ρ * ξ(h ) , ρ * ), one h -signal is sufficient to convince the DM to take action a H and hence the unconstrained optimal rule can be implemented by M b 3 (0, 0). So consider ρ 0 < ρ * ξ(h ) . We can then compute the ex ante expected payoffs from M b 3 (β, β ) for any μ L h ∈ ( ν(1−μ) 1−ν , ν), which is given by where α H = (1 − ν)β + νβ and α L = [(1 − μ)β + μ L h β ]/(1 − μ + μ L h ). Thus, and Now, letρ where the inequality follows from the fact that μ L h < ν. Now, it is straightforward to verify, using (30) and (31), that for all ρ 0 >ρ 0 , the only solutions to the FOC's for β and β is to have β = 0 and β > 0, and that optimal β strictly increases as ρ 0 decreases from ρ 3 0 and hits one before ρ 0 reachesρ 0 . Similarly, for ρ 0 >ρ 0 , optimal β > 0 and β = 1. Finally, ρ 0 is determined in a similar way as in the proof of Proposition 3.1, and, in case ρ 0 <ρ 0 given by (32), takeρ 0 = ρ 0 .
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.