Tug-of-War Model for Multi-armed Bandit Problem
We propose a model – the “tug-of-war (TOW) model” – to conduct unique parallel searches using many nonlocally correlated search agents. The model is based on the property of a single-celled amoeba, the true slime mold Physarum, which maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a “nonlocal correlation” among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). This nonlocal correlation was shown to be useful for decision making in the case of a dilemma. The multi-armed bandit problem is to determine the optimal strategy for maximizing the total reward sum with incompatible demands. Our model can efficiently manage this “exploration–exploitation dilemma” and exhibits good performances. The average accuracy rate of our model is higher than those of well-known algorithms such as the modified ε-greedy algorithm and modified softmax algorithm.
KeywordsMulti-armed bandit problem reinforcement learning bio-inspired computation amoeba-based computing
Unable to display preview. Download preview PDF.
- 10.Kim, S.-J., Aono, M., Hara, M.: Tug-of-war model for the two-bandit problem: nonlocally-correlated parallel exploration via resource conservation (submitted)Google Scholar
- 13.Gittins, J., Jones, D.: A dynamic allocation index for the sequential design of experiments. In: Gans, J. (ed.) Progress in Statistics, pp. 241–266. North Holland, Amsterdam (1974)Google Scholar
- 19.Sutton, R., Barto, A.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)Google Scholar