Pack Light on the Move: Exploitation and Exploration in a Dynamic Environment
This paper revisits a recent study by Posen and Levinthal (Manag Sci 58:587–601, 2012) on the exploration/exploitation tradeoff for a multi-armed bandit problem, where the reward probabilities undergo random shocks. We show that their analysis suffers two shortcomings: it assumes that learning is based on stale evidence, and it overlooks the steady state. We let the learning rule endogenously discard stale evidence, and we perform the long run analyses. The comparative study demonstrates that some of their conclusions must be qualified.
KeywordsDynamic Environment Learning Model Turbulence Level Search Intensity Bandit Problem
- 3.Duffy J (2006) Agent-based models and human subject experiments. In: Tesfatsion L, Judd KL (eds) Handbook of computational economics, vol 2. North-Holland, Amsterdam/New York, pp 949–1011Google Scholar
- 4.LiCalzi M, Marchiori D (2013) Pack light on the move: exploitation and exploration in a dynamic environment. Working Paper 4/2013, Department of Management, Università Ca’ Foscari Venezia,Google Scholar
- 7.Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. The MIT University Press, CambridgeGoogle Scholar