Abstract
Combinatorial optimization (CO) problems are at the heart of both practical and theoretical research. Due to their complexity, many problems cannot be solved via exact methods in reasonable time; hence, we resort to heuristic solution methods. In recent years, machine learning (ML) has brought immense benefits in many research areas, including heuristic solution methods for CO problems. Among ML methods, reinforcement learning (RL) seems to be the most promising method to find good solutions for CO problems. In this work, we investigate an RL framework, whose agent is based on self-attention, to achieve solutions for the knapsack problem, which is a CO problem. Our algorithm finds close to optimal solutions for instances up to one hundred items, which leads to conjecture that RL and self-attention may be major building blocks for future state-of-the-art heuristics for other CO problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Transitions do not have to be consecutive, but, by chance, they could be.
- 2.
- 3.
For details see https://pytorch.org/docs/stable/generated/torch.nn.TransformerEncoder.html. Many optional parameters were set to the default values, such as the feedforward dimension was set to 512 and the probability of dropout to 0.1.
References
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. Preprint (2016). arXiv:1607.06450
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. Preprint (2016). arXiv:1611.09940
Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur. J. Oper. Res. 290, 405–421 (2021)
Bontemps, L., McDermott, J., Le-Khac, N.-A.: Collective anomaly detection based on long short-term memory recurrent neural networks. In: International Conference on Future Data and Security Engineering, pp. 141–152. Springer (2016)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), (2011)
Fox, R., Pakman, A., Tishby, N.: Taming the noise in reinforcement learning via soft updates. Preprint (2015). arXiv:1512.08562
Hasselt, H.: Double q-learning. Adv. Neural Inf. Process. Syst. 23, 2613–2621 (2010)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Joshi, C.K., Cappart, Q., Rousseau, L.-M., Laurent, T., Bresson, X.: Learning TSP requires rethinking generalization. Preprint (2020). arXiv:2006.07054
La Maire, B.F., Mladenov, V.M.: Comparison of neural networks for solving the travelling salesman problem. In: 11th Symposium on Neural Network Applications in Electrical Engineering, pp. 21–24. IEEE (2012)
Nazari, M., Oroojlooy, A., Snyder, L.V., Takáč, M.: Reinforcement learning for solving the vehicle routing problem. Preprint (2018). arXiv:1802.04240
O’Shea, K., Nash, R.: An introduction to convolutional neural networks. Preprint (2015). arXiv:1511.08458
Pisinger, D.: Where are the hard knapsack problems? Comput. Oper. Res. 32(9), 2271–2284 (2005)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (2018)
Torabi, F., Warnell, G., Stone, P.: Behavioral cloning from observation. Preprint (2018). arXiv:1805.01954
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. Preprint (2017). arXiv:1706.03762
Vithayathil Varghese, N., Mahmoud, Q.H.: A survey of multi-task deep reinforcement learning. Electronics 9(9), 1363 (2020)
Watkins, C.J., Dayan, P.: Q-learning. Machine Learning 8(3-4), 279–292 (1992)
Zhang, J., He, T., Sra, S., Jadbabaie, A.: Why gradient clipping accelerates training: A theoretical justification for adaptivity. Preprint (2019). arXiv:1905.11881
Zhang, S., Sutton, R.S.: A deeper look at experience replay. Preprint (2017). arXiv:1712.01275
Acknowledgements
This research was supported in part by Ahold Delhaize. All content represents the opinion of the author(s), which is not necessarily shared or endorsed by their respective employers and/or sponsors.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pierotti, J., Kronmueller, M., Alonso-Mora, J., van Essen, J.T., Böhmer, W. (2021). Reinforcement Learning for the Knapsack Problem. In: Masone, A., Dal Sasso, V., Morandi, V. (eds) Optimization and Data Science: Trends and Applications. AIRO Springer Series, vol 6. Springer, Cham. https://doi.org/10.1007/978-3-030-86286-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-86286-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86285-5
Online ISBN: 978-3-030-86286-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)