Skip to main content
Log in

Reinforcement learning of simplex pivot rules: a proof of concept

  • Short Communication
  • Published:
Optimization Letters Aims and scope Submit manuscript

Abstract

At each iteration of the simplex method there are typically many possible entering columns. We use deep value-based reinforcement learning to choose dynamically between two popular pivoting rules. We consider LP relaxations of the MTZ formulation of non-Euclidean TSPs with five cities. We obtain a 20–50% speed up on these very small instances. Although our methods are not remotely competitive or viable on large instances, our results indicate that there may be scope to substantially accelerate current LP solvers by augmenting them with a learned pivoting strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

References

  1. Applegate, D.L., Bixby, R.E., Chvátal, V., Cook, W.J.: Implementing the Dantzig-Fulkerson-Johnson algorithm for large traveling salesman problems. Math. Program. 97(1), 91–153 (2003)

    Article  MathSciNet  Google Scholar 

  2. Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural combinatorial optimization with reinforcement learning. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings. https://openreview.net (2017)

  3. Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur. J. Oper. Res. 290(2), 405–421 (2021)

    Article  MathSciNet  Google Scholar 

  4. Bertsimas, D., Stellato, B.: Online mixed-integer optimization in milliseconds. arXiv preprint arXiv:1907.02206 (2019)

  5. Bonami, P., Lodi, A., Zarpellon, G.: Learning a classification of mixed-integer quadratic programming problems. In: van Hoeve, W.-J. (ed.) Integration of Constraint Programming, Artificial Intelligence, and Operations Research, pp. 595–604. Springer International Publishing, Cham (2018)

    Chapter  Google Scholar 

  6. Dantzig, G.B.: Linear Programming and Extensions. Princeton University Press, Princeton (1965)

    Google Scholar 

  7. Goldfarb, D., Forrest, J.J.: Steepest-edge simplex algorithms for linear programming. Math. Program. 57, 341–374 (1992)

    Article  MathSciNet  Google Scholar 

  8. Hansknecht, C., Joormann, I., Stiller, S.: Cuts, primal heuristics, and learning to branch for the time-dependent traveling salesman problem. arXiv preprint arXiv:1805.01415 (2018)

  9. Khalil, E., Dai, H., Zhang, Y., Dilkina, B., Song, L.: Learning combinatorial optimization algorithms over graphs. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 6348–6358. Curran Associates Inc., New York (2017)

    Google Scholar 

  10. Khalil, E.B., Bodic, P.L., Song, L., Nemhauser, G., Dilkina, B.: Learning to branch in mixed integer programming. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pp. 724–731. AAAI Press (2016)

  11. Klee, V., Minty, G.J.: How good is the simplex algorithm In: Shisha, O. (ed.) Inequalities: III. Acad Press, New York (1972)

  12. Kuhn, H.W., Quandt, R.E.: An experimental study of the simplex method. In: Proceedings of Symposia in Applied Maths, vol. XV, pp. 107–124 (1963)

  13. Miller, C.E., Tucker, A.W., Zemlin, R.A.: Integer programming formulations and traveling salesman problems. J. Assoc. Comput. Mach. 7(4), 326–329 (1960)

    Article  MathSciNet  Google Scholar 

  14. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)

  15. Ploskas, N., Samaras, N.: Pivoting rules for the revised simplex algorithm. Yugosl. J. Oper. Res. 24, 321–332 (2014)

    Article  MathSciNet  Google Scholar 

  16. Thomadakis, M.E.: Implementation and evaluation of primal and dual simplex methods with different pivot-selection techniques in the LPBench environment, a research report. Texas A &M University, Department of Computer Science (1994)

  17. Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 2692–2700. Curran Associates Inc, New York (2015)

    Google Scholar 

  18. Wolfe, P., Cutler, L.: Experiments in linear programming. In: Graves, R.L., Wolfe, P. (eds.) Recent Advances in Mathematical Programming. McGraw-Hill, New York (1963)

    MATH  Google Scholar 

  19. Wolpert, D.H., Macready, W.G., et al.: No free lunch theorems for optimization. IEEE Trans. Evolut. Comput. 1(1), 67–82 (1997)

    Article  Google Scholar 

Download references

Acknowledgements

Tavaslıoğlu and Schaefer were partially supported by National Science Foundation grant CMMI-1933373.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew J. Schaefer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suriyanarayana, V., Tavaslıoğlu, O., Patel, A.B. et al. Reinforcement learning of simplex pivot rules: a proof of concept. Optim Lett 16, 2513–2525 (2022). https://doi.org/10.1007/s11590-022-01880-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11590-022-01880-y

Keywords

Navigation