Segmented $$\varepsilon $$ -Greedy for Solving a Redesigned Multi-arm Bandit Environment

Shankar, Anuraag; Diwan, Mufaddal; Marathe, Aboli; Takalikar, Mukta

doi:10.1007/978-981-99-3250-4_22

Anuraag Shankar¹³,
Mufaddal Diwan¹³,
Aboli Marathe¹³ &
…
Mukta Takalikar¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 698))

Included in the following conference series:

International Conference on Advances in Data-driven Computing and Intelligent Systems

2 Altmetric

Abstract

The exploration–exploitation trade-off has been one of the most common problems in reinforcement learning. There have been multiple policies in the past that have tried to solve this issue optimally. We propose a redesigned version of the classical multi-arm bandit problem. The new environment formulates the multi-arm bandit problem as an episodic task with the possibility of termination in the middle of the episode. This task tests the ability of the agent to explore the environment as the states change significantly while proceeding through it. We also propose a policy- —segmented $\varepsilon $-Greedy—that allows the agent to pass through the environment while maximizing its returns along the way. This policy has been compared with existing policies on our proposed environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Biewald L (2020) Experiment tracking with weights and biases. https://www.wandb.com/, software available from wandb.com
Gupta AK, Smith KG, Shalley CE (2006) The interplay between exploration and exploitation. Acad Manage J 49(4):693–706
Article Google Scholar
Hunter JD (2007) Matplotlib: a 2d graphics environment. Comput Sci Eng 9(3):90–95
Article Google Scholar
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285
Article Google Scholar
March JG (1991) Exploration and exploitation in organizational learning. Organization Sci 2(1):71–87
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602
Rao R, Narasimhan K (2020) m-stage epsilon-greedy exploration for reinforcement learning
Google Scholar
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. A Bradford Book, Cambridge, MA, USA, pp 32–33
Google Scholar
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. A Bradford Book, Cambridge, MA, USA, pp 34–36
Google Scholar
Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 30
Google Scholar
Wang Z, Schaul T, Hessel M, Hasselt H, Lanctot M, Freitas N (2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning. PMLR, pp 1995–2003
Google Scholar
Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3):279–292
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, SCTR’s Pune Institute of Computer Technology, Pune, 411043, India
Anuraag Shankar, Mufaddal Diwan, Aboli Marathe & Mukta Takalikar

Authors

Anuraag Shankar
View author publications
You can also search for this author in PubMed Google Scholar
Mufaddal Diwan
View author publications
You can also search for this author in PubMed Google Scholar
Aboli Marathe
View author publications
You can also search for this author in PubMed Google Scholar
Mukta Takalikar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anuraag Shankar .

Editor information

Editors and Affiliations

Electronics and Communication Sciences Unit, Indian Statistical Institute, Kolkata, India
Swagatam Das
Birla Institute of Technology and Science, Goa, India
Snehanshu Saha
Department of Computer Science, Center for Research and Advanced Studies of the National Polytechnic Institute, Mexico City, Mexico
Carlos A. Coello Coello
Department of Mathematics, South Asian University, New Delhi, India
Jagdish Chand Bansal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shankar, A., Diwan, M., Marathe, A., Takalikar, M. (2023). Segmented $\varepsilon $-Greedy for Solving a Redesigned Multi-arm Bandit Environment. In: Das, S., Saha, S., Coello Coello, C.A., Bansal, J.C. (eds) Advances in Data-Driven Computing and Intelligent Systems. ADCIS 2022. Lecture Notes in Networks and Systems, vol 698. Springer, Singapore. https://doi.org/10.1007/978-981-99-3250-4_22

Download citation

DOI: https://doi.org/10.1007/978-981-99-3250-4_22
Published: 04 August 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-3249-8
Online ISBN: 978-981-99-3250-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Segmented \(\varepsilon \)-Greedy for Solving a Redesigned Multi-arm Bandit Environment

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Segmented \(\varepsilon \)-Greedy for Solving a Redesigned Multi-arm Bandit Environment

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation