Skip to main content

A Policy-Based Learning Beam Search for Combinatorial Optimization

  • 216 Accesses

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13987)


Beam search (BS) is a popular incomplete breadth-first search widely used to find near-optimal solutions to hard combinatorial optimization problems in limited time. Its central component is an evaluation function that estimates the quality of nodes encountered on each level of the search tree. While this function is usually manually crafted for a problem at hand, we propose a Policy-Based Learning Beam Search (P-LBS) that learns a policy to select the most promising nodes at each level offline on representative random problem instances in a reinforcement learning manner. In contrast to an earlier learning beam search, the policy function is realized by a neural network (NN) that is applied to all the expanded nodes at a current level together and does not rely on the prediction of actual node values. Different loss functions suggested for beam-aware training in an earlier work, but there only theoretically analyzed, are considered and evaluated in practice on the well-studied Longest Common Subsequence (LCS) problem. To keep P-LBS scalable to larger problem instances, a bootstrapping approach is further proposed for training. Results on established sets of LCS benchmark instances show that P-LBS with loss functions “upper bound” and “cost-sensitive margin beam” is able to learn suitable policies for BS such that results highly competitive to the state-of-the-art can be obtained.


  • Beam Search
  • Machine Learning
  • Reinforcement Learning
  • Longest Common Subsequence Problem

This project is partially funded by the Doctoral Program “Vienna Graduate School on Computational Optimization”, Austrian Science Foundation (FWF), grant W1260-N35.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. Abe, K., Xu, Z., Sato, I., Sugiyama, M.: Solving NP-hard problems on graphs with extended alphago zero. arXiv:1905.11623 [cs, stat] (2020)

  2. Bezerra, F.: A longest common subsequence approach to detect cut and wipe video transitions. In: Proceedings of the 17th Brazilian Symposium on Computer Graphics and Image Processing, pp. 154–160. IEEE Press (2004)

    Google Scholar 

  3. Chang, K.W., Krishnamurthy, A., Agarwal, A., Daumé, H., Langford, J.: Learning to search better than your teacher. In: Proceedings of the 32nd International Conference on Machine Learning, vol. 37, pp. 2058–2066 (2015)

    Google Scholar 

  4. Collins, M., Roark, B.: Incremental parsing with the perceptron algorithm. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, pp. 111-es (2004)

    Google Scholar 

  5. Daumé, H., Marcu, D.: Learning as search optimization: approximate large margin methods for structured prediction. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 169–176. ACM Press (2005)

    Google Scholar 

  6. Djukanovic, M., Raidl, G.R., Blum, C.: A beam search for the longest common subsequence problem guided by a novel approximate expected length calculation. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds.) LOD 2019. LNCS, vol. 11943, pp. 154–167. Springer, Cham (2019).

    CrossRef  Google Scholar 

  7. Easton, T., Singireddy, A.: A large neighborhood search heuristic for the longest common subsequence problem. J. Heuristics 14(3), 271–283 (2008)

    CrossRef  MATH  Google Scholar 

  8. Graves, A., Jaitly, N.: Towards end-to-end speech recognition with recurrent neural networks. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1764–1772. PMLR (2014)

    Google Scholar 

  9. Gusfield, D.: Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)

    CrossRef  MATH  Google Scholar 

  10. Huang, L., et al.: Linearfold: linear-time approximate RNA folding by 5’-to-3’ dynamic programming and beam search. Bioinformatics 35(14), i295–i304 (2019)

    CrossRef  Google Scholar 

  11. Huber, M., Raidl, G.R.: Learning beam search: utilizing machine learning to guide beam search for solving combinatorial optimization problems. In: Nicosia, G., et al. (eds.) Machine Learning, Optimization, and Data Science. LNCS, vol. 13164, pp. 283–298. Springer, Cham (2022).

  12. Huber, M., Raidl, G.R.: A relative value function based learning beam search for the longest common subsequence problem. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds.) EUROCAST 2022. LNCS, vol. 13789, pp. 87–95. Springer, Cham (2022).

  13. Laterre, A., et al.: Ranked reward: enabling self-play reinforcement learning for combinatorial optimization. In: AAAI 2019 Workshop on Reinforcement Learning on Games. AAAI Press (2018)

    Google Scholar 

  14. Lowerre, B.T.: The harpy speech recognition system. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA (1976)

    Google Scholar 

  15. Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM 25(2), 322–336 (1978)

    CrossRef  MathSciNet  MATH  Google Scholar 

  16. Negrinho, R., Gormley, M., Gordon, G.J.: Learning beam search policies via imitation learning. In: Bengio, S., et al. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 10652–10661. Curran Associates, Inc. (2018)

    Google Scholar 

  17. Ning, K., Ng, H.K., Leong, H.W.: Analysis of the relationships among longest common subsequences, shortest common supersequences and patterns and its application on pattern discovery in biological sequences. Int. J. Data Min. Bioinf. 5(6), 611–625 (2011)

    CrossRef  Google Scholar 

  18. Ossman, M., Hussein, L.F.: Fast longest common subsequences for bioinformatics dynamic programming. Int. J. Comput. Appl. 975, 8887 (2012)

    Google Scholar 

  19. Shyu, S.J., Tsai, C.Y.: Finding the longest common subsequence for multiple biological sequences by ant colony optimization. Comput. Oper. Res. 36(1), 73–91 (2009)

    CrossRef  MathSciNet  MATH  Google Scholar 

  20. Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362(6419), 1140–1144 (2018)

    CrossRef  MathSciNet  MATH  Google Scholar 

  21. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, vol. 27. Curran Associates, Inc. (2014)

    Google Scholar 

  22. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  23. Xu, Y., Fern, A.: On learning linear ranking functions for beam search. In: Proceedings of the 24th International Conference on Machine Learning, pp. 1047–1054. ACM Press (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Marc Huber .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ettrich, R., Huber, M., Raidl, G.R. (2023). A Policy-Based Learning Beam Search for Combinatorial Optimization. In: Pérez Cáceres, L., Stützle, T. (eds) Evolutionary Computation in Combinatorial Optimization. EvoCOP 2023. Lecture Notes in Computer Science, vol 13987. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30034-9

  • Online ISBN: 978-3-031-30035-6

  • eBook Packages: Computer ScienceComputer Science (R0)