Learning to steer nonlinear interior-point methods

Kuhlmann, Renke

doi:10.1007/s13675-019-00118-4

Learning to steer nonlinear interior-point methods

Original Paper
Published: 19 August 2019

Volume 7, pages 381–419, (2019)
Cite this article

EURO Journal on Computational Optimization

Renke Kuhlmann ORCID: orcid.org/0000-0001-6687-5166¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Interior-point or barrier methods handle nonlinear programs by sequentially solving barrier subprograms with a decreasing sequence of barrier parameters. The specific barrier update rule strongly influences the theoretical convergence properties as well as the practical efficiency. While many global and local convergence analyses consider a monotone update that decreases the barrier parameter for every approximately solved subprogram, computational studies show a superior performance of more adaptive strategies. In this paper we interpret the adaptive barrier update as a reinforcement learning task. A deep Q-learning agent is trained by both imitation and random action selection. Numerical results based on an implementation within the nonlinear programming solver WORHP show that the agent successfully learns to steer the barrier parameter and additionally improves WORHP’s performance on the CUTEst test set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Adaptive Stochastic Gradient-Free Approach for High-Dimensional Blackbox Optimization

Reinforcement Learning

Assessing Policy, Loss and Planning Combinations in Reinforcement Learning Using a New Modular Architecture

Notes

The set of solutions of (1.2) for all \(\mu > 0\).
A step computed by (2.4) with \(\mu =0\).
In this work, we project the barrier parameter back into the interval \([10^{-12}, 10^{-1}]\).
CUTEst version of February 19, 2018 (git commit: 6c7af0a). All programs are used with standard configuration.
Training stability refers to stability of cumulative rewards over training episodes.
IPOPT and WORHP have been chosen because both are freely available for academics, implement all barrier update strategies considered in the paper and belong to the most efficient nonlinear programming solvers [34].
Training set: MPC3, MPC4, MPC5, MPC7, MPC13. Testing set: MPC1, MPC2, MPC6, MPC8, MPC9, MPC10, MPC11, MPC12, MPC14, MPC15, MPC16.
Training set: ACOPP57, AVGASA, DISC2, HS268, LEUVEN7, OBSTCLAE, PALMER1A, PFIT1, QPNSTAIR, READING6. Testing set: A0ESDNDL, A5NSSNSM, BIGBANK, CB2, HET-Z, HIE1327D, HS16, HS59, HS71, HUES-MOD, LINVERSE, LOTSCHD, MINMAXBD, MISTAKE, POLAK2, PORTFL3, SINEALI, TRAINH, TRY-B, UBH5.
Training set: A0ESDNDL, A5NSSNSM, ACOPP57, AVGASA, BIGBANK, DISC2, HET-Z, HS16, HS268, HS59, HUES-MOD, LEUVEN7, LINVERSE, MINMAXBD, OBSTCLAE, PALMER1A, PFIT1, POLAK2, PORTFL3, QPNSTAIR, READING6, SINEALI, TRAINH, TRY-B, UBH5. Testing set: A0NNDNSL, AUG2DQP, CANTILVR, CB2, CONCON, DALLASS, GOFFIN, HATFLDB, HIE1327D, HIMMELP1, HS108, HS110, HS117, HS60, HS70, HS71, LOTSCHD, MISTAKE, OPTPRLOC, PFIT2LS, POLAK5, QPCBLEND, SIMPLLPA, STEENBRB, TORSION3.
Training set: A0ESDNDL, A0NNDNSL, A5NSSNSM, ACOPP57, AUG2DQP, AVGASA, BIGBANK, CANTILVR, CB2, CONCON, DALLASS, DISC2, GOFFIN, HATFLDB, HET-Z, HIE1327D, HIMMELP1, HS108, HS110, HS117, HS16, HS268, HS59, HS60, HS70, HS71, HUES-MOD, LEUVEN7, LINVERSE, LOTSCHD, MINMAXBD, MISTAKE, OBSTCLAE, OPTPRLOC, PALMER1A, PFIT1, PFIT2LS, POLAK2, POLAK5, PORTFL3, QPCBLEND, QPNSTAIR, READING6, SIMPLLPA, SINEALI, STEENBRB, TORSION3, TRAINH, TRY-B, UBH5. Testing set: A0ENSNDL, A5ESINDL, ACOPR57, ALJAZZAF, BQPGAUSS, EXTRASIM, FCCU, GMNCASE3, GRIDNETG, HS105, HS11, HS119, HS2, HS96, LISWET1, LSQFIT, LUKVLI13, NET2, NGONE, OET3, POLAK4, STEERING, TORSIONA, UBH1, WALL10.

References

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. OSDI 16:265–283
Google Scholar
Armand P, Benoist J (2008) A local convergence property of primal-dual methods for nonlinear programming. Math Program 115(2):199–222. https://doi.org/10.1007/s10107-007-0136-2
Article Google Scholar
Armand P, Benoist J, Orban D (2008a) Dynamic updates of the barrier parameter in primal-dual methods for nonlinear programming. Comput Optim Appl 41(1):1–25. https://doi.org/10.1007/s10589-007-9095-z
Article Google Scholar
Armand P, Orban D, Benoist J (2008b) Global convergence of primal-dual methods for nonlinear programming. In: Technical report, Laboratoire XLIM et Université de Limoges
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2):235–256. https://doi.org/10.1023/A:1013689704352
Article Google Scholar
Balcan MF, Dick T, Sandholm T, Vitercik E (2018) Learning to branch. arXiv preprint arXiv:1803.10150
Byrd RH, Liu G, Nocedal J (1997) On the local behavior of an interior point method for nonlinear programming. Numer Anal 1997:37–56
Google Scholar
Büskens C, Wassel D (2013) The ESA NLP solver WORHP. In: Fasano G, Pintér JD. (eds.) Modeling and optimization in space engineering, Springer optimization and its applications, vol 73, pp 85–110. Springer, New York. https://doi.org/10.1007/978-1-4614-4469-5_4
Google Scholar
Chen SY, Yu Y, Da Q, Tan J, Huang HK, Tang HH (2018) Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’18, pp 1187–1196. ACM, New York, NY, USA. https://doi.org/10.1145/3219819.3220122
Curtis FE (2012) A penalty-interior-point algorithm for nonlinear constrained optimization. Math Program Comput 4(2):181–209. https://doi.org/10.1007/s12532-012-0041-4
Article Google Scholar
Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Program 91(2):201–213. https://doi.org/10.1007/s101070100263
Article Google Scholar
El-Bakry AS, Tapia RA, Tsuchiya T, Zhang Y (1996) On the formulation and theory of the newton interior-point method for nonlinear programming. J Optim Theory Appl 89(3):507–541. https://doi.org/10.1007/BF02275347
Article Google Scholar
Fiacco AV, McCormick GP (1990) Nonlinear programming: sequential unconstrained minimization techniques, vol 4. Siam
Forsgren A, Gill PE, Wright MH (2002) Interior methods for nonlinear optimization. SIAM Rev 44(4):525–597. https://doi.org/10.1137/S0036144502414942
Article Google Scholar
Geffken S, Büskens C (2016) WORHP multi-core interface, parallelisation approaches for an NLP solver. In: Proceedings of the 6th international conference on astrodynamics tools and techniques, Darmstadt, Germany
Gertz EM, Wright SJ (2003) Object-oriented software for quadratic programming. ACM Trans Math Softw 29(1):58–81. https://doi.org/10.1145/641876.641880
Article Google Scholar
Gondzio J, Grothey A (2008) A new unblocking technique to warmstart interior point methods based on sensitivity analysis. SIAM J Optim 19(3):1184–1210. https://doi.org/10.1137/060678129
Article Google Scholar
Gould NIM, Orban D, Toint PL (2015) CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Comput Optim Appl 60(3):545–557. https://doi.org/10.1007/s10589-014-9687-3
Article Google Scholar
Gould NIM, Toint PL (2006) Global convergence of a non-monotone trust-region filter algorithm for nonlinear programming, pp 125–150. Springer US, Boston, MA. https://doi.org/10.1007/0-387-29550-X_5
Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable MDPs. In: AAAI fall symposium on sequential decision making for intelligent agents
Hendel G, Miltenberger M, Witzig J (2018) Adaptive algorithmic behavior for solving mixed integer programs using bandit algorithms. In: Technical report, pp. 18–36, ZIB
Hoos HH (2012) Automated algorithm configuration and parameter tuning, pp 37–71. Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21434-9_3
Chapter Google Scholar
Hutter F, Hoos HH, Leyton-Brown K (2010) Automated configuration of mixed integer programming solvers. In: Lodi A, Milano M, Toth P (eds) Integration of AI and OR techniques in constraint programming for combinatorial optimization problems. Springer, Berlin, pp 186–202
Chapter Google Scholar
Kadioglu S, Malitsky Y, Sellmann M, Tierney K (2010) ISAC –instance-specific algorithm configuration. In: Proceedings of the 2010 conference on ECAI 2010: 19th European conference on artificial intelligence, pp 751–756. IOS Press, Amsterdam
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1):99–134. https://doi.org/10.1016/S0004-3702(98)00023-X
Article Google Scholar
Khalil EB, Le Bodic P, Song L, Nemhauser G, Dilkina B (2016) Learning to branch in mixed integer programming. In: 30th AAAI conference on artificial intelligence
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Kruber M, Lübbecke ME, Parmentier A (2017) Learning when to use a decomposition. In: Salvagnin D, Lombardi M. (eds.) Integration of AI and OR techniques in constraint programming, pp 202–210. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-59776-8_16
Chapter Google Scholar
Kuhlmann R (2018) A primal-dual augmented lagrangian penalty-interior-point algorithm for nonlinear programming. Ph.d. thesis, Universität Bremen
Kuhlmann R, Büskens C (2018) A primal-dual augmented lagrangian penalty-interior-point filter line search algorithm. Math Methods Oper Res 87(3):451–483. https://doi.org/10.1007/s00186-017-0625-x
Article Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971
Lodi A, Zarpellon G (2017) On learning and branching: a survey. TOP 25(2):207–236. https://doi.org/10.1007/s11750-017-0451-6
Article Google Scholar
Mehrotra S (1992) On the implementation of a primal-dual interior point method. SIAM J Optim 2(4):575–601. https://doi.org/10.1137/0802028
Article Google Scholar
Mittelmann H. Benchmarks for optimization software. http://plato.asu.edu/ftp/ampl-nlp.html. Accessed 15 April 2019
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602
Morales JL, Nocedal J, Waltz RA, Liu G, Goux JP (2003) Assessing the potential of interior methods for nonlinear optimization. In: Biegler LT, Heinkenschloss M, Ghattas O, van Bloemen Waanders B. (eds.) Large-Scale PDE-Constrained Optimization. In: Lecture Notes in Computational Science and Engineering, vol 30, pp 167–183. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-55508-4_10
Chapter Google Scholar
Nocedal J, Wächter A, Waltz RA (2009) Adaptive barrier update strategies for nonlinear interior methods. SIAM J Optim 19(4):1674–1693. https://doi.org/10.1137/060649513
Article Google Scholar
Baltean-Lugojan Radu, Bonami Pierre, Misener R, Tramontani A (2018) Selecting cutting planes for quadratic semidefinite outer-approximation via trained neural networks. In: Technical report, Imperial College London
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
Shen C, Leyffer S, Fletcher R (2011) A nonmonotone filter method for nonlinear optimization. Comput Optim Appl 52(3):583–607. https://doi.org/10.1007/s10589-011-9430-2
Article Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Google Scholar
Tits AL, Wächter A, Bakhtiari S, Urban TJ, Lawrence CT (2003) A primal-dual interior-point method for nonlinear programming with strong global and local convergence properties. SIAM J Optim 14(1):173–199. https://doi.org/10.1137/S1052623401392123
Article Google Scholar
Ulbrich M, Ulbrich S, Vicente NL (2004) A globally convergent primal-dual interior-point filter method for nonlinear programming. Math Program 100(2):379–410. https://doi.org/10.1007/s10107-003-0477-4
Article Google Scholar
Vanderbei RJ, Shanno DF (1999) An interior-point algorithm for nonconvex nonlinear programming. Comput Optim Appl 13(1–3):231–252. https://doi.org/10.1023/A:1008677427361
Article Google Scholar
Waltz R, Morales J, Nocedal J, Orban D (2006) An interior algorithm for nonlinear optimization that combines line search and trust region steps. Math Program 107(3):391–408. https://doi.org/10.1007/s10107-004-0560-5
Article Google Scholar
Wang Z, Schaul T, Hessel M, Van Hasselt H, Lanctot M, De Freitas N (2015) Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3):279–292. https://doi.org/10.1007/BF00992698
Article Google Scholar
Wächter A, Biegler LT (2006) On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Math Program 106(1):25–57. https://doi.org/10.1007/s10107-004-0559-y
Article Google Scholar

Download references

Author information

Authors and Affiliations

Optimization and Optimal Control, Center for Industrial Mathematics (ZeTeM), Bibliothekstr. 5, 28359, Bremen, Germany
Renke Kuhlmann

Authors

Renke Kuhlmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Renke Kuhlmann.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Additional numerical results

The appendix provides additional numerical results for the setups of Sect. 4.

Table 5 Overall training iterations and CPU time together with mean \(\overline{{\hat{\varphi }}}\) and standard deviation \(s({\hat{\varphi }})\) over the last 1000 episodes of five different training runs with parallel testing on the training and testing set with general nonlinear programming instances

Full size table

Table 5 and Fig. 9 compare the performance of the deep Q-learning agent using the most successful configurations in Sect. 4.2, i.e., \(\mathcal {O}_3\) and \(\mathcal {O}_3\) without regret update, to the configurations \(\mathcal {O}_1\) and \(\mathcal {O}_1\) without regret update as the latter observation space also yields good performance in Sect. 4.1. The results, however, show that the observation space \(\mathcal {O}_3\), i.e., the addition of the extrapolation approach of Sect. 2.2.3, provides better results in general.

Table 6 Overall training iterations and CPU time together with mean \(\overline{{\hat{\varphi }}}\) and standard deviation \(s({\hat{\varphi }})\) over the last 1000 episodes of five different training runs with parallel testing on the training and testing set with general nonlinear programming instances

Full size table

If the number of nonlinear programming instances is increased in the training set, the learning task becomes more complex and may require more episodes. This effect is illustrated in Table 6 and Fig. 10 which consider in addition to the training and testing set of Sect. 4.2 one with 25 training and testing instances^{Footnote 9} and one with 50 training and 25 testing instances^{Footnote 10}. The mean performance drops significantly for the larger training sets and in two replications even yield worse mean performance over the last 1000 episodes compared to the monotone Fiacco–McCormick update of Sect. 2.2.1. Note that in the current implementation running the testing set is just invoked if the training performance is above the baseline, which is why we get a steady testing performance for the first run of the 50/25 sets (cf. Fig. 10, right). The comparison of the learning performance between the different training and testing sets, however, is actually difficult, because the addition of a new nonlinear program may introduce higher complexity by its own, i.e., the new nonlinear program may be hard to solve and/or very sensitive to barrier parameter updates. Therefore, these results should be understood more like an outlook on what future research can address as they indicate that large training sets with very different nonlinear programs are currently still complicated and may need many more episodes to learn.

Table 7 Comparison of the barrier update strategies monotone Fiacco–McCormick (mono), LOQO rule (loqo) and extrapolation approach (quality) for IPOPT and WORHP with the reinforcement learning-based strategy (WORHP IP learned) on the training and testing set of Sect. 4.1

Full size table

Table 8 Comparison of the barrier update strategies monotone Fiacco–McCormick (mono), LOQO rule (loqo) and extrapolation approach (quality) for IPOPT and WORHP with the reinforcement learning-based strategy (WORHP IP learned) on the training and testing set of Sect. 4.2

Full size table

Finally, Tables 7 and 8 list the detailed numerical results of Sects. 4.1 and 4.2, respectively, for all considered barrier update strategies of IPOPT and WORHP. This includes the solver termination status, number of iterations, CPU time, objective function values and number of function evaluations for all nonlinear programming instances of the considered training and testing sets. For WORHP IP learned CPU time does not include the Python code, e.g., starting the Python interpreter within the CUTEst-C-interface of WORHP. The results given for WORHP IP learned have the global convergence framework enabled and list the number of switches to the monotone strategy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuhlmann, R. Learning to steer nonlinear interior-point methods. EURO J Comput Optim 7, 381–419 (2019). https://doi.org/10.1007/s13675-019-00118-4

Download citation

Received: 12 December 2018
Accepted: 27 July 2019
Published: 19 August 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s13675-019-00118-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning to steer nonlinear interior-point methods

Abstract

Access this article

Similar content being viewed by others