Skip to main content

Tug-of-War Model for Multi-armed Bandit Problem

  • Conference paper
Unconventional Computation (UC 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6079))

Included in the following conference series:

Abstract

We propose a model – the “tug-of-war (TOW) model” – to conduct unique parallel searches using many nonlocally correlated search agents. The model is based on the property of a single-celled amoeba, the true slime mold Physarum, which maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a “nonlocal correlation” among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). This nonlocal correlation was shown to be useful for decision making in the case of a dilemma. The multi-armed bandit problem is to determine the optimal strategy for maximizing the total reward sum with incompatible demands. Our model can efficiently manage this “exploration–exploitation dilemma” and exhibits good performances. The average accuracy rate of our model is higher than those of well-known algorithms such as the modified ε-greedy algorithm and modified softmax algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Nakagaki, T., Yamada, H., Toth, A.: Maze-solving by an amoeboid organism. Nature 407, 470 (2000)

    Article  Google Scholar 

  2. Tero, A., Kobayashi, R., Nakagaki, T.: Physarum solver: A biologically inspired method of road-network navigation. Physica A 363, 115–119 (2006)

    Article  Google Scholar 

  3. Nakagaki, T., Iima, M., Ueda, T., Nishiura, Y., Saigusa, T., Tero, A., Kobayashi, R., Showalter, K.: Minimum-risk path finding by an adaptive amoebal network. Phys. Rev. Lett. 99, 068104 (2007)

    Article  Google Scholar 

  4. Saigusa, T., Tero, A., Nakagaki, T., Kuramoto, Y.: Amoebae anticipate periodic events. Phys. Rev. Lett. 100, 018101 (2008)

    Article  Google Scholar 

  5. Aono, M., Hara, M., Aihara, K.: Amoeba-based neurocomputing with chaotic dynamics. Communications of the ACM 50(9), 69–72 (2007)

    Article  Google Scholar 

  6. Aono, M., Hara, M.: Spontaneous deadlock breaking on amoeba-based neurocomputer. BioSystems 91, 83–93 (2008)

    Article  Google Scholar 

  7. Aono, M., Hirata, Y., Hara, M., Aihara, K.: Amoeba-based chaotic neurocomputing: Combinatorial optimization by coupled biological oscillators. New Generation Computing 27, 129–157 (2009)

    Article  MATH  Google Scholar 

  8. Aono, M., Hirata, Y., Hara, M., Aihara, K.: Resource-competing oscillator network as a model of amoeba-based neurocomputer. In: Calude, C.S., Costa, J.F., Dershowitz, N., Freire, E., Rozenberg, G. (eds.) UC 2009. LNCS, vol. 5715, pp. 56–69. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  9. Kim, S.-J., Aono, M., Hara, M.: Tug-of-war model for two-bandit problem. In: Calude, C.S., Costa, J.F., Dershowitz, N., Freire, E., Rozenberg, G. (eds.) UC 2009. LNCS, vol. 5715, p. 289. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Kim, S.-J., Aono, M., Hara, M.: Tug-of-war model for the two-bandit problem: nonlocally-correlated parallel exploration via resource conservation (submitted)

    Google Scholar 

  11. Robbins, H.: Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58, 527–536 (1952)

    Article  MATH  MathSciNet  Google Scholar 

  12. Thompson, W.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)

    MATH  Google Scholar 

  13. Gittins, J., Jones, D.: A dynamic allocation index for the sequential design of experiments. In: Gans, J. (ed.) Progress in Statistics, pp. 241–266. North Holland, Amsterdam (1974)

    Google Scholar 

  14. Gittins, J.: Bandit processes and dynamic allocation indices. J. R. Stat. Soc. B 41, 148–177 (1979)

    MATH  MathSciNet  Google Scholar 

  15. Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  16. Agrawal, R.: Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Adv. Appl. Prob. 27, 1054–1078 (1995)

    Article  MATH  Google Scholar 

  17. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 235–256 (2002)

    Article  MATH  Google Scholar 

  18. Vermorel, J., Mohri, M.: Multi-armed bandit algorithms and empirical evaluation. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L., et al. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 437–448. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  19. Sutton, R., Barto, A.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  20. Daw, N., O’Doherty, J., Dayan, P., Seymour, B., Dolan, R.: Cortical substrates for exploratory decisions in humans. Nature 441, 876–879 (2006)

    Article  Google Scholar 

  21. Cohen, J., McClure, S., Yu, A.: Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Phil. Trans. R. Soc. B 362(1481), 933–942 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, SJ., Aono, M., Hara, M. (2010). Tug-of-War Model for Multi-armed Bandit Problem. In: Calude, C.S., Hagiya, M., Morita, K., Rozenberg, G., Timmis, J. (eds) Unconventional Computation. UC 2010. Lecture Notes in Computer Science, vol 6079. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13523-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13523-1_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13522-4

  • Online ISBN: 978-3-642-13523-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics