Skip to main content
Log in

Learning to steer nonlinear interior-point methods

  • Original Paper
  • Published:
EURO Journal on Computational Optimization

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Interior-point or barrier methods handle nonlinear programs by sequentially solving barrier subprograms with a decreasing sequence of barrier parameters. The specific barrier update rule strongly influences the theoretical convergence properties as well as the practical efficiency. While many global and local convergence analyses consider a monotone update that decreases the barrier parameter for every approximately solved subprogram, computational studies show a superior performance of more adaptive strategies. In this paper we interpret the adaptive barrier update as a reinforcement learning task. A deep Q-learning agent is trained by both imitation and random action selection. Numerical results based on an implementation within the nonlinear programming solver WORHP show that the agent successfully learns to steer the barrier parameter and additionally improves WORHP’s performance on the CUTEst test set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The set of solutions of (1.2) for all \(\mu > 0\).

  2. A step computed by (2.4) with \(\mu =0\).

  3. In this work, we project the barrier parameter back into the interval \([10^{-12}, 10^{-1}]\).

  4. CUTEst version of February 19, 2018 (git commit: 6c7af0a). All programs are used with standard configuration.

  5. Training stability refers to stability of cumulative rewards over training episodes.

  6. IPOPT and WORHP have been chosen because both are freely available for academics, implement all barrier update strategies considered in the paper and belong to the most efficient nonlinear programming solvers [34].

  7. Training set: MPC3, MPC4, MPC5, MPC7, MPC13. Testing set: MPC1, MPC2, MPC6, MPC8, MPC9, MPC10, MPC11, MPC12, MPC14, MPC15, MPC16.

  8. Training set: ACOPP57, AVGASA, DISC2, HS268, LEUVEN7, OBSTCLAE, PALMER1A, PFIT1, QPNSTAIR, READING6. Testing set: A0ESDNDL, A5NSSNSM, BIGBANK, CB2, HET-Z, HIE1327D, HS16, HS59, HS71, HUES-MOD, LINVERSE, LOTSCHD, MINMAXBD, MISTAKE, POLAK2, PORTFL3, SINEALI, TRAINH, TRY-B, UBH5.

  9. Training set: A0ESDNDL, A5NSSNSM, ACOPP57, AVGASA, BIGBANK, DISC2, HET-Z, HS16, HS268, HS59, HUES-MOD, LEUVEN7, LINVERSE, MINMAXBD, OBSTCLAE, PALMER1A, PFIT1, POLAK2, PORTFL3, QPNSTAIR, READING6, SINEALI, TRAINH, TRY-B, UBH5. Testing set: A0NNDNSL, AUG2DQP, CANTILVR, CB2, CONCON, DALLASS, GOFFIN, HATFLDB, HIE1327D, HIMMELP1, HS108, HS110, HS117, HS60, HS70, HS71, LOTSCHD, MISTAKE, OPTPRLOC, PFIT2LS, POLAK5, QPCBLEND, SIMPLLPA, STEENBRB, TORSION3.

  10. Training set: A0ESDNDL, A0NNDNSL, A5NSSNSM, ACOPP57, AUG2DQP, AVGASA, BIGBANK, CANTILVR, CB2, CONCON, DALLASS, DISC2, GOFFIN, HATFLDB, HET-Z, HIE1327D, HIMMELP1, HS108, HS110, HS117, HS16, HS268, HS59, HS60, HS70, HS71, HUES-MOD, LEUVEN7, LINVERSE, LOTSCHD, MINMAXBD, MISTAKE, OBSTCLAE, OPTPRLOC, PALMER1A, PFIT1, PFIT2LS, POLAK2, POLAK5, PORTFL3, QPCBLEND, QPNSTAIR, READING6, SIMPLLPA, SINEALI, STEENBRB, TORSION3, TRAINH, TRY-B, UBH5. Testing set: A0ENSNDL, A5ESINDL, ACOPR57, ALJAZZAF, BQPGAUSS, EXTRASIM, FCCU, GMNCASE3, GRIDNETG, HS105, HS11, HS119, HS2, HS96, LISWET1, LSQFIT, LUKVLI13, NET2, NGONE, OET3, POLAK4, STEERING, TORSIONA, UBH1, WALL10.

References

  • Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. OSDI 16:265–283

    Google Scholar 

  • Armand P, Benoist J (2008) A local convergence property of primal-dual methods for nonlinear programming. Math Program 115(2):199–222. https://doi.org/10.1007/s10107-007-0136-2

    Article  Google Scholar 

  • Armand P, Benoist J, Orban D (2008a) Dynamic updates of the barrier parameter in primal-dual methods for nonlinear programming. Comput Optim Appl 41(1):1–25. https://doi.org/10.1007/s10589-007-9095-z

    Article  Google Scholar 

  • Armand P, Orban D, Benoist J (2008b) Global convergence of primal-dual methods for nonlinear programming. In: Technical report, Laboratoire XLIM et Université de Limoges

  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2):235–256. https://doi.org/10.1023/A:1013689704352

    Article  Google Scholar 

  • Balcan MF, Dick T, Sandholm T, Vitercik E (2018) Learning to branch. arXiv preprint arXiv:1803.10150

  • Byrd RH, Liu G, Nocedal J (1997) On the local behavior of an interior point method for nonlinear programming. Numer Anal 1997:37–56

    Google Scholar 

  • Büskens C, Wassel D (2013) The ESA NLP solver WORHP. In: Fasano G, Pintér JD. (eds.) Modeling and optimization in space engineering, Springer optimization and its applications, vol 73, pp 85–110. Springer, New York. https://doi.org/10.1007/978-1-4614-4469-5_4

    Google Scholar 

  • Chen SY, Yu Y, Da Q, Tan J, Huang HK, Tang HH (2018) Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’18, pp 1187–1196. ACM, New York, NY, USA. https://doi.org/10.1145/3219819.3220122

  • Curtis FE (2012) A penalty-interior-point algorithm for nonlinear constrained optimization. Math Program Comput 4(2):181–209. https://doi.org/10.1007/s12532-012-0041-4

    Article  Google Scholar 

  • Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Program 91(2):201–213. https://doi.org/10.1007/s101070100263

    Article  Google Scholar 

  • El-Bakry AS, Tapia RA, Tsuchiya T, Zhang Y (1996) On the formulation and theory of the newton interior-point method for nonlinear programming. J Optim Theory Appl 89(3):507–541. https://doi.org/10.1007/BF02275347

    Article  Google Scholar 

  • Fiacco AV, McCormick GP (1990) Nonlinear programming: sequential unconstrained minimization techniques, vol 4. Siam

  • Forsgren A, Gill PE, Wright MH (2002) Interior methods for nonlinear optimization. SIAM Rev 44(4):525–597. https://doi.org/10.1137/S0036144502414942

    Article  Google Scholar 

  • Geffken S, Büskens C (2016) WORHP multi-core interface, parallelisation approaches for an NLP solver. In: Proceedings of the 6th international conference on astrodynamics tools and techniques, Darmstadt, Germany

  • Gertz EM, Wright SJ (2003) Object-oriented software for quadratic programming. ACM Trans Math Softw 29(1):58–81. https://doi.org/10.1145/641876.641880

    Article  Google Scholar 

  • Gondzio J, Grothey A (2008) A new unblocking technique to warmstart interior point methods based on sensitivity analysis. SIAM J Optim 19(3):1184–1210. https://doi.org/10.1137/060678129

    Article  Google Scholar 

  • Gould NIM, Orban D, Toint PL (2015) CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Comput Optim Appl 60(3):545–557. https://doi.org/10.1007/s10589-014-9687-3

    Article  Google Scholar 

  • Gould NIM, Toint PL (2006) Global convergence of a non-monotone trust-region filter algorithm for nonlinear programming, pp 125–150. Springer US, Boston, MA. https://doi.org/10.1007/0-387-29550-X_5

  • Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable MDPs. In: AAAI fall symposium on sequential decision making for intelligent agents

  • Hendel G, Miltenberger M, Witzig J (2018) Adaptive algorithmic behavior for solving mixed integer programs using bandit algorithms. In: Technical report, pp. 18–36, ZIB

  • Hoos HH (2012) Automated algorithm configuration and parameter tuning, pp 37–71. Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21434-9_3

    Chapter  Google Scholar 

  • Hutter F, Hoos HH, Leyton-Brown K (2010) Automated configuration of mixed integer programming solvers. In: Lodi A, Milano M, Toth P (eds) Integration of AI and OR techniques in constraint programming for combinatorial optimization problems. Springer, Berlin, pp 186–202

    Chapter  Google Scholar 

  • Kadioglu S, Malitsky Y, Sellmann M, Tierney K (2010) ISAC –instance-specific algorithm configuration. In: Proceedings of the 2010 conference on ECAI 2010: 19th European conference on artificial intelligence, pp 751–756. IOS Press, Amsterdam

  • Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1):99–134. https://doi.org/10.1016/S0004-3702(98)00023-X

    Article  Google Scholar 

  • Khalil EB, Le Bodic P, Song L, Nemhauser G, Dilkina B (2016) Learning to branch in mixed integer programming. In: 30th AAAI conference on artificial intelligence

  • Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  • Kruber M, Lübbecke ME, Parmentier A (2017) Learning when to use a decomposition. In: Salvagnin D, Lombardi M. (eds.) Integration of AI and OR techniques in constraint programming, pp 202–210. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-59776-8_16

    Chapter  Google Scholar 

  • Kuhlmann R (2018) A primal-dual augmented lagrangian penalty-interior-point algorithm for nonlinear programming. Ph.d. thesis, Universität Bremen

  • Kuhlmann R, Büskens C (2018) A primal-dual augmented lagrangian penalty-interior-point filter line search algorithm. Math Methods Oper Res 87(3):451–483. https://doi.org/10.1007/s00186-017-0625-x

    Article  Google Scholar 

  • Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971

  • Lodi A, Zarpellon G (2017) On learning and branching: a survey. TOP 25(2):207–236. https://doi.org/10.1007/s11750-017-0451-6

    Article  Google Scholar 

  • Mehrotra S (1992) On the implementation of a primal-dual interior point method. SIAM J Optim 2(4):575–601. https://doi.org/10.1137/0802028

    Article  Google Scholar 

  • Mittelmann H. Benchmarks for optimization software. http://plato.asu.edu/ftp/ampl-nlp.html. Accessed 15 April 2019

  • Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602

  • Morales JL, Nocedal J, Waltz RA, Liu G, Goux JP (2003) Assessing the potential of interior methods for nonlinear optimization. In: Biegler LT, Heinkenschloss M, Ghattas O, van Bloemen Waanders B. (eds.) Large-Scale PDE-Constrained Optimization. In: Lecture Notes in Computational Science and Engineering, vol 30, pp 167–183. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-55508-4_10

    Chapter  Google Scholar 

  • Nocedal J, Wächter A, Waltz RA (2009) Adaptive barrier update strategies for nonlinear interior methods. SIAM J Optim 19(4):1674–1693. https://doi.org/10.1137/060649513

    Article  Google Scholar 

  • Baltean-Lugojan Radu, Bonami Pierre, Misener R, Tramontani A (2018) Selecting cutting planes for quadratic semidefinite outer-approximation via trained neural networks. In: Technical report, Imperial College London

  • Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347

  • Shen C, Leyffer S, Fletcher R (2011) A nonmonotone filter method for nonlinear optimization. Comput Optim Appl 52(3):583–607. https://doi.org/10.1007/s10589-011-9430-2

    Article  Google Scholar 

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    Google Scholar 

  • Tits AL, Wächter A, Bakhtiari S, Urban TJ, Lawrence CT (2003) A primal-dual interior-point method for nonlinear programming with strong global and local convergence properties. SIAM J Optim 14(1):173–199. https://doi.org/10.1137/S1052623401392123

    Article  Google Scholar 

  • Ulbrich M, Ulbrich S, Vicente NL (2004) A globally convergent primal-dual interior-point filter method for nonlinear programming. Math Program 100(2):379–410. https://doi.org/10.1007/s10107-003-0477-4

    Article  Google Scholar 

  • Vanderbei RJ, Shanno DF (1999) An interior-point algorithm for nonconvex nonlinear programming. Comput Optim Appl 13(1–3):231–252. https://doi.org/10.1023/A:1008677427361

    Article  Google Scholar 

  • Waltz R, Morales J, Nocedal J, Orban D (2006) An interior algorithm for nonlinear optimization that combines line search and trust region steps. Math Program 107(3):391–408. https://doi.org/10.1007/s10107-004-0560-5

    Article  Google Scholar 

  • Wang Z, Schaul T, Hessel M, Van Hasselt H, Lanctot M, De Freitas N (2015) Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581

  • Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3):279–292. https://doi.org/10.1007/BF00992698

    Article  Google Scholar 

  • Wächter A, Biegler LT (2006) On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Math Program 106(1):25–57. https://doi.org/10.1007/s10107-004-0559-y

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Renke Kuhlmann.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Additional numerical results

Appendix A: Additional numerical results

The appendix provides additional numerical results for the setups of Sect. 4.

Table 5 Overall training iterations and CPU time together with mean \(\overline{{\hat{\varphi }}}\) and standard deviation \(s({\hat{\varphi }})\) over the last 1000 episodes of five different training runs with parallel testing on the training and testing set with general nonlinear programming instances

Table 5 and Fig. 9 compare the performance of the deep Q-learning agent using the most successful configurations in Sect. 4.2, i.e., \(\mathcal {O}_3\) and \(\mathcal {O}_3\) without regret update, to the configurations \(\mathcal {O}_1\) and \(\mathcal {O}_1\) without regret update as the latter observation space also yields good performance in Sect. 4.1. The results, however, show that the observation space \(\mathcal {O}_3\), i.e., the addition of the extrapolation approach of Sect. 2.2.3, provides better results in general.

Fig. 9
figure 9

Performance measure \({\hat{\varphi }}\) for training (training set) and parallel testing (testing set) with general nonlinear programming instances. Considered configurations are \(\mathcal {O}_1\), \(\mathcal {O}_1\) without regret update, all trained by \(\varepsilon \)-greedy action selection. For further explanations, see Fig. 1

Table 6 Overall training iterations and CPU time together with mean \(\overline{{\hat{\varphi }}}\) and standard deviation \(s({\hat{\varphi }})\) over the last 1000 episodes of five different training runs with parallel testing on the training and testing set with general nonlinear programming instances
Fig. 10
figure 10

Performance measure \({\hat{\varphi }}\) for training (training set) and parallel testing (testing set) with general nonlinear programming instances. Considered configurations are \(\mathcal {O}_3\) with 10/20-, 25/25- and 50/25 training/testing sets, trained by \(\varepsilon \)-greedy action selection. For further explanations, see Fig. 1

If the number of nonlinear programming instances is increased in the training set, the learning task becomes more complex and may require more episodes. This effect is illustrated in Table 6 and Fig. 10 which consider in addition to the training and testing set of Sect. 4.2 one with 25 training and testing instancesFootnote 9 and one with 50 training and 25 testing instancesFootnote 10. The mean performance drops significantly for the larger training sets and in two replications even yield worse mean performance over the last 1000 episodes compared to the monotone Fiacco–McCormick update of Sect. 2.2.1. Note that in the current implementation running the testing set is just invoked if the training performance is above the baseline, which is why we get a steady testing performance for the first run of the 50/25 sets (cf. Fig. 10, right). The comparison of the learning performance between the different training and testing sets, however, is actually difficult, because the addition of a new nonlinear program may introduce higher complexity by its own, i.e., the new nonlinear program may be hard to solve and/or very sensitive to barrier parameter updates. Therefore, these results should be understood more like an outlook on what future research can address as they indicate that large training sets with very different nonlinear programs are currently still complicated and may need many more episodes to learn.

Table 7 Comparison of the barrier update strategies monotone Fiacco–McCormick (mono), LOQO rule (loqo) and extrapolation approach (quality) for IPOPT and WORHP with the reinforcement learning-based strategy (WORHP IP learned) on the training and testing set of Sect. 4.1
Table 8 Comparison of the barrier update strategies monotone Fiacco–McCormick (mono), LOQO rule (loqo) and extrapolation approach (quality) for IPOPT and WORHP with the reinforcement learning-based strategy (WORHP IP learned) on the training and testing set of Sect. 4.2

Finally, Tables 7 and 8 list the detailed numerical results of Sects. 4.1 and 4.2, respectively, for all considered barrier update strategies of IPOPT and WORHP. This includes the solver termination status, number of iterations, CPU time, objective function values and number of function evaluations for all nonlinear programming instances of the considered training and testing sets. For WORHP IP learned CPU time does not include the Python code, e.g., starting the Python interpreter within the CUTEst-C-interface of WORHP. The results given for WORHP IP learned have the global convergence framework enabled and list the number of switches to the monotone strategy.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kuhlmann, R. Learning to steer nonlinear interior-point methods. EURO J Comput Optim 7, 381–419 (2019). https://doi.org/10.1007/s13675-019-00118-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13675-019-00118-4

Keywords

Mathematics Subject Classification

Navigation