Abstract
This paper explores the value of memory in decision making in dynamic environments. We examine the decision problem faced by an agent with bounded memory who receives a sequence of signals from a partially observable Markov decision process. We characterize environments in which the optimal memory consists of only two states. In addition, we show that the marginal value of additional memory states need not be positive and may even be negative in the absence of free disposal.
Similar content being viewed by others
Notes
Unlike a decision maker with bounded recall (see, among others, Lehrer 1988; Aumann and Sorin 1989 or Alós-Ferrer and Shi 2012) who knows only a finite truncation of history, a decision maker with bounded memory has a finite number of states that summarize all her information. Such models have been studied extensively in repeated-game settings: Neyman (1985), Rubinstein (1986) and Kalai and Stanford (1988) are some of the early contributions to this literature, while Romero (2011), Compte and Postlewaite (2012a, b) and Monte (2012) are more recent. Closely related is the literature on “dynastic” games, as in Anderlini and Lagunoff (2005) and Anderlini et al. (2008).
Kalai and Solan (2003) also consider a model of dynamic decision making with bounded memory, but focus on the role and value of simplicity and randomization.
With discounting, the optimal bounded memory system will be somewhat present biased, with distortions that are dependent on the decision maker’s initial prior. (Kocer (2010), Lemma 1) suggests, however, that discounting and the limit of means criterion are “close”—the payoff to the discounted-optimal memory system converges, as the discount rate goes to zero, to the payoff to the limit-of-means-optimal memory system.
Therefore, this decision problem is very different from a multi-armed bandit problem and departs from the optimal experimentation literature. See Kocer (2010) for a model of experimentation with bounded memory.
Recall that Bayesian updating in this environment is symmetric, with \(\rho _{t+1}^{H}(\rho )+\rho _{t+1}^{L}(1-\rho )=1\) for all \(\rho \in [0,1]\).
Quantifying the loss from bounded memory (relative to an unbounded Bayesian decision maker) is certainly a natural avenue for further inquiry. Such an attempt is complicated, however, by the difficulty of analytically characterizing the general solution to a partially observable Markov decision problem such as our own and is thus beyond the scope of the present work.
References
Alós-Ferrer, C., Shi, F.: Imitation with asymmetric memory. Econ. Theory 49(1), 193–215 (2012)
Anderlini, L., Gerardi, D., Lagunoff, R.: A “super” folk theorem for dynastic repeated games. Econ. Theory 37(3), 357–394 (2008)
Anderlini, L., Lagunoff, R.: Communication in dynastic repeated games: ‘whitewashes’ and ‘coverups’. Econ. Theory 26(2), 265–299 (2005)
Aumann, R.J., Sorin, S.: Cooperation and bounded recall. Games Econ. Behav. 1(1), 5–39 (1989)
Compte, O., Postlewaite, A.: Belief formation. Unpublished manuscript, University of Pennsylvania (2012a)
Compte, O., Postlewaite, A.: Plausible cooperation. Unpublished manuscript, University of Pennsylvania (2012b)
Cover, T., Hellman, M.: On memory saved by randomization. Ann. Math. Stat. 42(3), 1075–1078 (1971)
De Grauwe, P.: Animal spirits and monetary policy. Econ. Theory 47(2–3), 423–457 (2011)
Güth, S., Ludwig, S.: How helpful is a long memory on financial markets? Econ. Theory 16(1), 107–134 (2000)
Hellman, M., Cover, T.: Learning with finite memory. Ann. Math. Stat. 41(3), 765–782 (1970)
Kalai, E., Solan, E.: Randomization and simplification in dynamic decision-making. J. Econ. Theory 111(2), 251–264 (2003)
Kalai, E., Stanford, W.: Finite rationality and interpersonal complexity in repeated games. Econometrica 56(2), 397–410 (1988)
Kaneko, M., Kline, J.J.: Partial memories, inductively derived views, and their interactions with behavior. Econ. Theory 53(1), 27–59 (2013)
Kocer, Y.: Endogenous learning with bounded memory. Unpublished manuscript, Princeton University (2010)
Lehrer, E.: Repeated games with stationary bounded recall strategies. J. Econ. Theory 1, 130–144 (1988)
Lipman, B.L.: Information processing and bounded rationality: a survey. Can. J. Econ. 28(1), 42–67 (1995)
Miller, D.A., Rozen, K.: Optimally empty promises and endogenous supervision. Unpublished manuscript, Yale University (2012)
Monte, D.: Learning with bounded memory in games. Unpublished manuscript, Sao Paulo School of Economics (2012)
Mullainathan, S.: A memory-based model of bounded rationality. Q. J. Econ. 117(3), 735–774 (2002)
Neyman, A.: Bounded complexity justifies cooperation in the finitely repeated prisoners’ dilemma. Econ. Lett. 19(3), 227–229 (1985)
Romero, J.: Finite automata in undiscounted repeated games with private monitoring. Unpublished manuscript, Purdue University (2011)
Rubinstein, A.: Finite automata play the repeated prisoner’s dilemma. J. Econ. Theory 39(1), 83–96 (1986)
Rubinstein, A.: Modeling Bounded Rationality. MIT Press, Cambridge (1998)
Stokey, N.L., Lucas R.E. Jr.: Recursive Methods in Economic Dynamics. Harvard University Press, Cambridge (1989)
Wilson, A.: Bounded memory and biases in information processing. Unpublished manuscript, Harvard University (2004)
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper supersedes an earlier working paper circulated as “Learning in Hidden Markov Models with Bounded Memory.” We thank the editor, an anonymous referee, Dirk Bergemann, Dino Gerardi, Bernardo Guimaraes, Johannes Hörner, Abraham Neyman, and Ben Polak, as well as seminar participants at Yale University and Simon Fraser University for their helpful advice and comments.
Appendix
Appendix
Proof of Theorem 2
Note that symmetry implies \(\mu _{(1,L)}=\mu _{(2,H)}\) and \(\mu _{(2,L)}=\mu _{(1,H)}\); therefore, the decision maker solves
We may write the steady-state condition in Eq. (3) for state \((2,H)\) as
where the second equality follows from symmetry. Recalling that \(\varphi _{1,1}^{H}=1-\varphi _{1,2}^{H}\), and that monotonicity implies \(\varphi _{2,1}^{H}=0\), this may be written as
Combining this expression with the observation from Eq. (2) that \(\mu _{(1,H)}=\frac{1}{2}-\mu _{(2,H)}\), we can then solve for \(\mu _{(2,H)}\). In particular, we must have
With this in hand, we may write the decision maker’s payoff as
Differentiating with respect to \(\varphi _{1,2} ^{H}\) yields
Since \(\alpha \in (0,\frac{1}{2})\) and \(\gamma \in (\frac{1}{2},1)\), this expression is strictly positive for all \(\varphi _{1,2}^{H}\in [0,1]\); therefore, the maximum is achieved when \(\varphi _{1,2}^{H}=1\), yielding a payoff of \(U_{2}(1)=1-2\gamma (1-\gamma )\). \(\square \)
Proof of Theorem 3
Notice first that symmetry implies that \(\mu _{(1,L)}=\mu _{(3,H)}, \mu _{(2,L)}=\mu _{(2,H)}\), and \(\mu _{(3,L)}=\mu _{(1,H)}\). Thus, \(\frac{\mu _{(2,H)}}{\mu _{(2,L)}+\mu _{(2,H)}}=\frac{1}{2}\), implying that the expected payoff, conditional on being in state 2, is \(\frac{1}{2}\gamma +\frac{1}{2}(1-\gamma )=\frac{1}{2}\). Therefore, the agent solves
We begin by writing the steady-state condition for states \((1,H)\) and \((3,H)\) from Eq. (3) as
Imposing symmetry and monotonicity, these may be written as
Combining the two equations above with the observation in Eq. (2) that
we can solve for \(\mu _{(1,H)}\) and \(\mu _{(3,H)}\). In particular, we have
Furthermore, note that the decision maker’s expected payoff is
where we have again substituted for \(\mu _{(2,H)}\) using Eq. (2). This implies that the decision maker maximizes
Note first that, if \(\varphi _{1,2}^{H}=0\) (and, by symmetry, \(\varphi _{3,2}^{L}=0\)), then the “middle” memory state (state 2) is effectively redundant—the memory system only makes use of the two extremal states. Applying Theorem 2, the optimal memory, conditional on \(\varphi _{1,2}^{H}=0\), must have \(\varphi _{1,3}^{H}=1\). As in the optimal two-state memory from Theorem 2, this memory yields an expected payoff of
Clearly, the value of \(\varphi _{2,3}^{H}\) is irrelevant in this case. However, in order to ensure that there is only a single recurrent communicating class, we simply set \(\varphi _{2,3}^{H}=1\) when \(\varphi _{1,2}^{H}=0\).
Suppose instead that \(\varphi _{1,2}^{H}>0\). Then differentiating the payoff in Eq. (5) with respect to \(\varphi _{2,3}^{H}\) yields
Clearly, the denominator is positive. Moreover, \(\varphi _{1,2}^{H}>0\) implies that the numerator is positive. Thus, it is without loss of generality to set \(\varphi _{2,3}^{H}=1\) whenever \(\varphi _{1,2}^{H}>0\).
With this in mind, we consider two cases. We first assume that \(\varphi _{1,2}^{H}>0\) and \(\varphi _{1,2}^{H}+\varphi _{1,3}^{H}=1\). In this case, the decision maker’s payoff is \(U_{3}(\varphi _{1,2}^{H},1-\varphi _{1,2}^{H},1)\). Note, however, that
where we define \(\kappa :=(1-2\alpha )\gamma (1-\gamma )\). Since \(\alpha \in (0,\frac{1}{2})\) and \(\gamma \in (\frac{1}{2},1)\), we must have \(\kappa \in (0,1)\). Thus, \(\varphi _{1,2}^{H}\in [0,1]\) implies that
In addition, we can write
Note that \((2\kappa \varphi _{1,2}^{H}-\alpha )(\kappa \varphi _{1,2}^{H}-\alpha )\) is negative if, and only if, \(2\kappa \varphi _{1,2}^{H}-\alpha >0>\kappa \varphi _{1,2}^{H}-\alpha \). But \(2\kappa \varphi _{1,2}^{H}-\alpha <2\kappa \) and \(\kappa \varphi _{1,2}^{H}-\alpha >-\alpha \), implying that
Thus, \(U_{3}(\varphi _{1,2}^{H},1-\varphi _{1,2}^{H},1)\) is a convex function of \(\varphi _{1,2}^{H}\), and is therefore maximized either when \(\varphi _{1,2}^{H}=0\) or \(\varphi _{1,2}^{H}=1\). The decision maker’s expected payoff in each of these cases is
Then we have
Recalling the definition of \(\kappa \), we may then conclude that \(U_{3}(1,0,1)>U_{3}(0,1,1)\) if, and only if,
Turning to our second case, suppose that \(\varphi _{1,2}^{H}>0\) and \(\varphi _{1,2}^{H}+\varphi _{1,3}^{H}<1\). In addition, assume that \(\varphi _{1,3}^{H}>0\). Therefore, the first-order conditions for both \(\varphi _{1,2}^{H}\) and \(\varphi _{1,3}^{H}\) must hold; that is, we have
Solving these two equations yields
Note, however, that these two expressions sum to more than 1, a contradiction. Thus, we must have \(\varphi _{1,3}^{H}=0\), and only the first of the FOCs above can hold. This implies that
(Of course, this is less than 1 if, and only if, \(\frac{2\alpha }{(1-2\alpha )}<\gamma (1-\gamma )\); otherwise, we are at the corner solution where \(\varphi _{1,2}^{H}=1\).) Note, however, that
if, and only if, \(\alpha <\kappa \varphi _{1,2}^{H}\). Since \(\varphi _{1,2}^{H}=\sqrt{2\alpha /\kappa }\), this implies that \(U_{3}(\sqrt{2\alpha /\kappa },0,1)>U_{3}(0,1,1)\) if, and only if, \(\frac{\alpha }{2(1-2\alpha )}<\gamma (1-\gamma )\). \(\square \)
Lemma 4
The expected payoff of the four-state memory system depicted in Fig. 4 is
Proof
Note that we can write the Eq. (3) steady-state condition for states \((1,H), (2,H)\), and \((3,H)\), for the case of the four-state memory in Fig. 4, as
where we have made use of the fact that the memory transition rule is given by
Symmetry also implies that \(\mu _{(1,L)}=\mu _{(4,H)}, \mu _{(2,L)}=\mu _{(3,H)}, \mu _{(3,L)}=\mu _{(2,H)}\), and \(\mu _{(4,L)}=\mu _{(1,H)}\); therefore, we may write
In addition, recall from Eq. (2) that \(\mu _{(1,H)}+\mu _{(2,H)}+\mu _{(3,H)}+\mu _{(4,H)}=\frac{1}{2}\). Solving this system of four equations in four unknowns yields the stationary distribution of this memory system, which is given by
Therefore, the expected payoff of this memory system is
\(\square \)
Rights and permissions
About this article
Cite this article
Monte, D., Said, M. The value of (bounded) memory in a changing world. Econ Theory 56, 59–82 (2014). https://doi.org/10.1007/s00199-013-0771-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00199-013-0771-1