Reinforcement Learning for the Knapsack Problem

Pierotti, Jacopo; Kronmueller, Maximilian; Alonso-Mora, Javier; van Essen, J. Theresia; Böhmer, Wendelin

doi:10.1007/978-3-030-86286-2_1

Jacopo Pierotti¹¹,
Maximilian Kronmueller¹¹,
Javier Alonso-Mora¹¹,
J. Theresia van Essen¹¹ &
…
Wendelin Böhmer¹¹

Part of the book series: AIRO Springer Series ((AIROSS,volume 6))

1014 Accesses

Abstract

Combinatorial optimization (CO) problems are at the heart of both practical and theoretical research. Due to their complexity, many problems cannot be solved via exact methods in reasonable time; hence, we resort to heuristic solution methods. In recent years, machine learning (ML) has brought immense benefits in many research areas, including heuristic solution methods for CO problems. Among ML methods, reinforcement learning (RL) seems to be the most promising method to find good solutions for CO problems. In this work, we investigate an RL framework, whose agent is based on self-attention, to achieve solutions for the knapsack problem, which is a CO problem. Our algorithm finds close to optimal solutions for instances up to one hundred items, which leads to conjecture that RL and self-attention may be major building blocks for future state-of-the-art heuristics for other CO problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Method for solving constrained 0-1 quadratic programming problems based on pointer network and reinforcement learning

Article 30 July 2022

Dynamic programming with meta-reinforcement learning: a novel approach for multi-objective optimization

Article Open access 18 May 2024

Mastering construction heuristics with self-play deep reinforcement learning

Article 29 October 2022

Notes

1.
Transitions do not have to be consecutive, but, by chance, they could be.
2.
http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf.
3.
For details see https://pytorch.org/docs/stable/generated/torch.nn.TransformerEncoder.html. Many optional parameters were set to the default values, such as the feedforward dimension was set to 512 and the probability of dropout to 0.1.

References

Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. Preprint (2016). arXiv:1607.06450
Google Scholar
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. Preprint (2016). arXiv:1611.09940
Google Scholar
Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur. J. Oper. Res. 290, 405–421 (2021)
Article MathSciNet Google Scholar
Bontemps, L., McDermott, J., Le-Khac, N.-A.: Collective anomaly detection based on long short-term memory recurrent neural networks. In: International Conference on Future Data and Security Engineering, pp. 141–152. Springer (2016)
Google Scholar
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), (2011)
Google Scholar
Fox, R., Pakman, A., Tishby, N.: Taming the noise in reinforcement learning via soft updates. Preprint (2015). arXiv:1512.08562
Google Scholar
Hasselt, H.: Double q-learning. Adv. Neural Inf. Process. Syst. 23, 2613–2621 (2010)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Joshi, C.K., Cappart, Q., Rousseau, L.-M., Laurent, T., Bresson, X.: Learning TSP requires rethinking generalization. Preprint (2020). arXiv:2006.07054
Google Scholar
La Maire, B.F., Mladenov, V.M.: Comparison of neural networks for solving the travelling salesman problem. In: 11th Symposium on Neural Network Applications in Electrical Engineering, pp. 21–24. IEEE (2012)
Google Scholar
Nazari, M., Oroojlooy, A., Snyder, L.V., Takáč, M.: Reinforcement learning for solving the vehicle routing problem. Preprint (2018). arXiv:1802.04240
Google Scholar
O’Shea, K., Nash, R.: An introduction to convolutional neural networks. Preprint (2015). arXiv:1511.08458
Google Scholar
Pisinger, D.: Where are the hard knapsack problems? Comput. Oper. Res. 32(9), 2271–2284 (2005)
Article MathSciNet Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (2018)
Google Scholar
Torabi, F., Warnell, G., Stone, P.: Behavioral cloning from observation. Preprint (2018). arXiv:1805.01954
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Preprint (2017). arXiv:1706.03762
Google Scholar
Vithayathil Varghese, N., Mahmoud, Q.H.: A survey of multi-task deep reinforcement learning. Electronics 9(9), 1363 (2020)
Article Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Machine Learning 8(3-4), 279–292 (1992)
Article Google Scholar
Zhang, J., He, T., Sra, S., Jadbabaie, A.: Why gradient clipping accelerates training: A theoretical justification for adaptivity. Preprint (2019). arXiv:1905.11881
Google Scholar
Zhang, S., Sutton, R.S.: A deeper look at experience replay. Preprint (2017). arXiv:1712.01275
Google Scholar

Download references

Acknowledgements

This research was supported in part by Ahold Delhaize. All content represents the opinion of the author(s), which is not necessarily shared or endorsed by their respective employers and/or sponsors.

Author information

Authors and Affiliations

TU Delft, Delft, Netherlands
Jacopo Pierotti, Maximilian Kronmueller, Javier Alonso-Mora, J. Theresia van Essen & Wendelin Böhmer

Authors

Jacopo Pierotti
View author publications
You can also search for this author in PubMed Google Scholar
Maximilian Kronmueller
View author publications
You can also search for this author in PubMed Google Scholar
Javier Alonso-Mora
View author publications
You can also search for this author in PubMed Google Scholar
J. Theresia van Essen
View author publications
You can also search for this author in PubMed Google Scholar
Wendelin Böhmer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jacopo Pierotti .

Editor information

Editors and Affiliations

Department of Electrical Engineering and Information Technology, University of Naples “Federico II”, Naples, Italy
Adriano Masone
Optrail, Rome, Italy
Veronica Dal Sasso
Faculty of Science and Technology, Free University of Bozen, Bolzano, Italy
Valentina Morandi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pierotti, J., Kronmueller, M., Alonso-Mora, J., van Essen, J.T., Böhmer, W. (2021). Reinforcement Learning for the Knapsack Problem. In: Masone, A., Dal Sasso, V., Morandi, V. (eds) Optimization and Data Science: Trends and Applications. AIRO Springer Series, vol 6. Springer, Cham. https://doi.org/10.1007/978-3-030-86286-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-86286-2_1
Published: 14 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86285-5
Online ISBN: 978-3-030-86286-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Reinforcement Learning for the Knapsack Problem

Abstract

Access this chapter

Similar content being viewed by others

Method for solving constrained 0-1 quadratic programming problems based on pointer network and reinforcement learning

Dynamic programming with meta-reinforcement learning: a novel approach for multi-objective optimization

Mastering construction heuristics with self-play deep reinforcement learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Reinforcement Learning for the Knapsack Problem

Abstract

Access this chapter

Similar content being viewed by others

Method for solving constrained 0-1 quadratic programming problems based on pointer network and reinforcement learning

Dynamic programming with meta-reinforcement learning: a novel approach for multi-objective optimization

Mastering construction heuristics with self-play deep reinforcement learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation