New Generation Computing

, Volume 34, Issue 3, pp 291–306 | Cite as

Interactive Restless Multi-armed Bandit Game and Swarm Intelligence Effect



We obtain the conditions for the emergence of the swarm intelligence effect in an interactive game of restless multi-armed bandit (rMAB). A player competes with multiple agents. Each bandit has a payoff that changes with a probability p c per round. The agents and player choose one of three options: (1) Exploit (a good bandit), (2) Innovate (asocial learning for a good bandit among n I randomly chosen bandits), and (3) Observe (social learning for a good bandit). Each agent has two parameters (c, p obs ) to specify the decision: (i) c, the threshold value for Exploit, and (ii) p obs , the probability for Observe in learning. The parameters (c, p obs ) are uniformly distributed. We determine the optimal strategies for the player using complete knowledge about the rMAB. We show whether or not social or asocial learning is more optimal in the (p c , n I ) space and define the swarm intelligence effect. We conduct a laboratory experiment (67 subjects) and observe the swarm intelligence effect only if (p c , n I ) are chosen so that social learning is far more optimal than asocial learning.


Multi-armed Bandit Swarm Intelligence Interactive Game Experiment Optimal Strategy 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Auer P., Cesa-Bianchi N., Fisher P.: “Finite-time analysis of the multi-armed bandit problem,”. Mach. Learn. 47, pp. 235–256 (2002)CrossRefMATHGoogle Scholar
  2. 2.
    Berry D., Fristedt B.: Bandit Problems: Sequential Allocation of Experiments. Springer, Berlin (1985)CrossRefMATHGoogle Scholar
  3. 3.
    Galef B. G.: “Strategies for social learning: Testing predictions from formal theory,”. Adv. Stud. Behav. 39, pp. 117–151 (2009)CrossRefGoogle Scholar
  4. 4.
    Giraldeau L.-A., Valone T. J., Templeton J. J.: “Potential disadvantages of using socially acquired information,”. Philos. Trans. R. Soc. London Ser. B 357, pp. 1559–1566 (2002)CrossRefGoogle Scholar
  5. 5.
    Gueudré T., Dobrinevski A., Bouchaud J. P.: “Explore or exploit? A generic model and an exactly solvable case,”. Phys. Rev. Lett. 112, pp. 050602–050606 (2014)CrossRefGoogle Scholar
  6. 6.
    Kameda T., Nakanishi D.: “Does social/cultural learning increase human adaptability? Rogers’s question revisited,”. Evol. Hum. Behav. 24, pp. 242–260 (2003)CrossRefGoogle Scholar
  7. 7.
    Lai T., Robbins H.: “Asymptotically efficient adaptive allocation rules,”. Adv. Appl. Math. 6, pp. 4–22 (1985)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Laland K. N.: “Social Learning Strategies,”. Learn. Behav. 32, pp. 4–14 (2004)CrossRefGoogle Scholar
  9. 9.
    Papadimitriou C. H., Tsitsiklis J. N.: “The complexity of optimal queueing network control,”. Math. Oper. Res. 24, pp. 293–305 (1999)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Rendell, L. et al., “Why copy others? Insights from the social learning strategies tournament,” Science, 328, pp. 208–213, 2010.Google Scholar
  11. 11.
    Sutton, R. S. and Barto, A. G., eds., Reinforcement Learning: An Introduction, Cambridge, MIT Press, 1998.Google Scholar
  12. 12.
    Toyokawa, W., Kim, H. and Kameda, T., “Human collective intelligence under dual exploration-exploitation dilemma,” PLoS ONE, 9, p. e95789, 2014.Google Scholar
  13. 13.
    White, J. M., Bandit Algorithms for Website Optimization, O’Reilly Media, 2012.Google Scholar

Copyright information

© Ohmsha and Springer Japan 2016

Authors and Affiliations

  • Shunsuke Yoshida
    • 1
  • Masato Hisakado
    • 2
  • Shintaro Mori
    • 1
  1. 1.Kitasato UniversitySagamiharaJapan
  2. 2.Financial Services AgencyChiyoda-kuJapan

Personalised recommendations