Abstract
Beam search (BS) is a popular incomplete breadth-first search widely used to find near-optimal solutions to hard combinatorial optimization problems in limited time. Its central component is an evaluation function that estimates the quality of nodes encountered on each level of the search tree. While this function is usually manually crafted for a problem at hand, we propose a Policy-Based Learning Beam Search (P-LBS) that learns a policy to select the most promising nodes at each level offline on representative random problem instances in a reinforcement learning manner. In contrast to an earlier learning beam search, the policy function is realized by a neural network (NN) that is applied to all the expanded nodes at a current level together and does not rely on the prediction of actual node values. Different loss functions suggested for beam-aware training in an earlier work, but there only theoretically analyzed, are considered and evaluated in practice on the well-studied Longest Common Subsequence (LCS) problem. To keep P-LBS scalable to larger problem instances, a bootstrapping approach is further proposed for training. Results on established sets of LCS benchmark instances show that P-LBS with loss functions “upper bound” and “cost-sensitive margin beam” is able to learn suitable policies for BS such that results highly competitive to the state-of-the-art can be obtained.
Keywords
- Beam Search
- Machine Learning
- Reinforcement Learning
- Longest Common Subsequence Problem
This project is partially funded by the Doctoral Program “Vienna Graduate School on Computational Optimization”, Austrian Science Foundation (FWF), grant W1260-N35.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abe, K., Xu, Z., Sato, I., Sugiyama, M.: Solving NP-hard problems on graphs with extended alphago zero. arXiv:1905.11623 [cs, stat] (2020)
Bezerra, F.: A longest common subsequence approach to detect cut and wipe video transitions. In: Proceedings of the 17th Brazilian Symposium on Computer Graphics and Image Processing, pp. 154–160. IEEE Press (2004)
Chang, K.W., Krishnamurthy, A., Agarwal, A., Daumé, H., Langford, J.: Learning to search better than your teacher. In: Proceedings of the 32nd International Conference on Machine Learning, vol. 37, pp. 2058–2066 (2015)
Collins, M., Roark, B.: Incremental parsing with the perceptron algorithm. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, pp. 111-es (2004)
Daumé, H., Marcu, D.: Learning as search optimization: approximate large margin methods for structured prediction. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 169–176. ACM Press (2005)
Djukanovic, M., Raidl, G.R., Blum, C.: A beam search for the longest common subsequence problem guided by a novel approximate expected length calculation. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds.) LOD 2019. LNCS, vol. 11943, pp. 154–167. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37599-7_14
Easton, T., Singireddy, A.: A large neighborhood search heuristic for the longest common subsequence problem. J. Heuristics 14(3), 271–283 (2008)
Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1764–1772. PMLR (2014)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Huang, L., et al.: Linearfold: linear-time approximate RNA folding by 5’-to-3’ dynamic programming and beam search. Bioinformatics 35(14), i295–i304 (2019)
Huber, M., Raidl, G.R.: Learning beam search: utilizing machine learning to guide beam search for solving combinatorial optimization problems. In: Nicosia, G., et al. (eds.) Machine Learning, Optimization, and Data Science. LNCS, vol. 13164, pp. 283–298. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-95470-3_22
Huber, M., Raidl, G.R.: A relative value function based learning beam search for the longest common subsequence problem. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds.) EUROCAST 2022. LNCS, vol. 13789, pp. 87–95. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25312-6_10
Laterre, A., et al.: Ranked reward: enabling self-play reinforcement learning for combinatorial optimization. In: AAAI 2019 Workshop on Reinforcement Learning on Games. AAAI Press (2018)
Lowerre, B.T.: The harpy speech recognition system. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA (1976)
Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM 25(2), 322–336 (1978)
Negrinho, R., Gormley, M., Gordon, G.J.: Learning beam search policies via imitation learning. In: Bengio, S., et al. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 10652–10661. Curran Associates, Inc. (2018)
Ning, K., Ng, H.K., Leong, H.W.: Analysis of the relationships among longest common subsequences, shortest common supersequences and patterns and its application on pattern discovery in biological sequences. Int. J. Data Min. Bioinf. 5(6), 611–625 (2011)
Ossman, M., Hussein, L.F.: Fast longest common subsequences for bioinformatics dynamic programming. Int. J. Comput. Appl. 975, 8887 (2012)
Shyu, S.J., Tsai, C.Y.: Finding the longest common subsequence for multiple biological sequences by ant colony optimization. Comput. Oper. Res. 36(1), 73–91 (2009)
Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362(6419), 1140–1144 (2018)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, vol. 27. Curran Associates, Inc. (2014)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Xu, Y., Fern, A.: On learning linear ranking functions for beam search. In: Proceedings of the 24th International Conference on Machine Learning, pp. 1047–1054. ACM Press (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ettrich, R., Huber, M., Raidl, G.R. (2023). A Policy-Based Learning Beam Search for Combinatorial Optimization. In: Pérez Cáceres, L., Stützle, T. (eds) Evolutionary Computation in Combinatorial Optimization. EvoCOP 2023. Lecture Notes in Computer Science, vol 13987. Springer, Cham. https://doi.org/10.1007/978-3-031-30035-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-30035-6_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30034-9
Online ISBN: 978-3-031-30035-6
eBook Packages: Computer ScienceComputer Science (R0)