Abstract
Interior-point or barrier methods handle nonlinear programs by sequentially solving barrier subprograms with a decreasing sequence of barrier parameters. The specific barrier update rule strongly influences the theoretical convergence properties as well as the practical efficiency. While many global and local convergence analyses consider a monotone update that decreases the barrier parameter for every approximately solved subprogram, computational studies show a superior performance of more adaptive strategies. In this paper we interpret the adaptive barrier update as a reinforcement learning task. A deep Q-learning agent is trained by both imitation and random action selection. Numerical results based on an implementation within the nonlinear programming solver WORHP show that the agent successfully learns to steer the barrier parameter and additionally improves WORHP’s performance on the CUTEst test set.
Similar content being viewed by others
Notes
The set of solutions of (1.2) for all \(\mu > 0\).
A step computed by (2.4) with \(\mu =0\).
In this work, we project the barrier parameter back into the interval \([10^{-12}, 10^{-1}]\).
CUTEst version of February 19, 2018 (git commit: 6c7af0a). All programs are used with standard configuration.
Training stability refers to stability of cumulative rewards over training episodes.
IPOPT and WORHP have been chosen because both are freely available for academics, implement all barrier update strategies considered in the paper and belong to the most efficient nonlinear programming solvers [34].
Training set: MPC3, MPC4, MPC5, MPC7, MPC13. Testing set: MPC1, MPC2, MPC6, MPC8, MPC9, MPC10, MPC11, MPC12, MPC14, MPC15, MPC16.
Training set: ACOPP57, AVGASA, DISC2, HS268, LEUVEN7, OBSTCLAE, PALMER1A, PFIT1, QPNSTAIR, READING6. Testing set: A0ESDNDL, A5NSSNSM, BIGBANK, CB2, HET-Z, HIE1327D, HS16, HS59, HS71, HUES-MOD, LINVERSE, LOTSCHD, MINMAXBD, MISTAKE, POLAK2, PORTFL3, SINEALI, TRAINH, TRY-B, UBH5.
Training set: A0ESDNDL, A5NSSNSM, ACOPP57, AVGASA, BIGBANK, DISC2, HET-Z, HS16, HS268, HS59, HUES-MOD, LEUVEN7, LINVERSE, MINMAXBD, OBSTCLAE, PALMER1A, PFIT1, POLAK2, PORTFL3, QPNSTAIR, READING6, SINEALI, TRAINH, TRY-B, UBH5. Testing set: A0NNDNSL, AUG2DQP, CANTILVR, CB2, CONCON, DALLASS, GOFFIN, HATFLDB, HIE1327D, HIMMELP1, HS108, HS110, HS117, HS60, HS70, HS71, LOTSCHD, MISTAKE, OPTPRLOC, PFIT2LS, POLAK5, QPCBLEND, SIMPLLPA, STEENBRB, TORSION3.
Training set: A0ESDNDL, A0NNDNSL, A5NSSNSM, ACOPP57, AUG2DQP, AVGASA, BIGBANK, CANTILVR, CB2, CONCON, DALLASS, DISC2, GOFFIN, HATFLDB, HET-Z, HIE1327D, HIMMELP1, HS108, HS110, HS117, HS16, HS268, HS59, HS60, HS70, HS71, HUES-MOD, LEUVEN7, LINVERSE, LOTSCHD, MINMAXBD, MISTAKE, OBSTCLAE, OPTPRLOC, PALMER1A, PFIT1, PFIT2LS, POLAK2, POLAK5, PORTFL3, QPCBLEND, QPNSTAIR, READING6, SIMPLLPA, SINEALI, STEENBRB, TORSION3, TRAINH, TRY-B, UBH5. Testing set: A0ENSNDL, A5ESINDL, ACOPR57, ALJAZZAF, BQPGAUSS, EXTRASIM, FCCU, GMNCASE3, GRIDNETG, HS105, HS11, HS119, HS2, HS96, LISWET1, LSQFIT, LUKVLI13, NET2, NGONE, OET3, POLAK4, STEERING, TORSIONA, UBH1, WALL10.
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. OSDI 16:265–283
Armand P, Benoist J (2008) A local convergence property of primal-dual methods for nonlinear programming. Math Program 115(2):199–222. https://doi.org/10.1007/s10107-007-0136-2
Armand P, Benoist J, Orban D (2008a) Dynamic updates of the barrier parameter in primal-dual methods for nonlinear programming. Comput Optim Appl 41(1):1–25. https://doi.org/10.1007/s10589-007-9095-z
Armand P, Orban D, Benoist J (2008b) Global convergence of primal-dual methods for nonlinear programming. In: Technical report, Laboratoire XLIM et Université de Limoges
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2):235–256. https://doi.org/10.1023/A:1013689704352
Balcan MF, Dick T, Sandholm T, Vitercik E (2018) Learning to branch. arXiv preprint arXiv:1803.10150
Byrd RH, Liu G, Nocedal J (1997) On the local behavior of an interior point method for nonlinear programming. Numer Anal 1997:37–56
Büskens C, Wassel D (2013) The ESA NLP solver WORHP. In: Fasano G, Pintér JD. (eds.) Modeling and optimization in space engineering, Springer optimization and its applications, vol 73, pp 85–110. Springer, New York. https://doi.org/10.1007/978-1-4614-4469-5_4
Chen SY, Yu Y, Da Q, Tan J, Huang HK, Tang HH (2018) Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’18, pp 1187–1196. ACM, New York, NY, USA. https://doi.org/10.1145/3219819.3220122
Curtis FE (2012) A penalty-interior-point algorithm for nonlinear constrained optimization. Math Program Comput 4(2):181–209. https://doi.org/10.1007/s12532-012-0041-4
Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Program 91(2):201–213. https://doi.org/10.1007/s101070100263
El-Bakry AS, Tapia RA, Tsuchiya T, Zhang Y (1996) On the formulation and theory of the newton interior-point method for nonlinear programming. J Optim Theory Appl 89(3):507–541. https://doi.org/10.1007/BF02275347
Fiacco AV, McCormick GP (1990) Nonlinear programming: sequential unconstrained minimization techniques, vol 4. Siam
Forsgren A, Gill PE, Wright MH (2002) Interior methods for nonlinear optimization. SIAM Rev 44(4):525–597. https://doi.org/10.1137/S0036144502414942
Geffken S, Büskens C (2016) WORHP multi-core interface, parallelisation approaches for an NLP solver. In: Proceedings of the 6th international conference on astrodynamics tools and techniques, Darmstadt, Germany
Gertz EM, Wright SJ (2003) Object-oriented software for quadratic programming. ACM Trans Math Softw 29(1):58–81. https://doi.org/10.1145/641876.641880
Gondzio J, Grothey A (2008) A new unblocking technique to warmstart interior point methods based on sensitivity analysis. SIAM J Optim 19(3):1184–1210. https://doi.org/10.1137/060678129
Gould NIM, Orban D, Toint PL (2015) CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Comput Optim Appl 60(3):545–557. https://doi.org/10.1007/s10589-014-9687-3
Gould NIM, Toint PL (2006) Global convergence of a non-monotone trust-region filter algorithm for nonlinear programming, pp 125–150. Springer US, Boston, MA. https://doi.org/10.1007/0-387-29550-X_5
Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable MDPs. In: AAAI fall symposium on sequential decision making for intelligent agents
Hendel G, Miltenberger M, Witzig J (2018) Adaptive algorithmic behavior for solving mixed integer programs using bandit algorithms. In: Technical report, pp. 18–36, ZIB
Hoos HH (2012) Automated algorithm configuration and parameter tuning, pp 37–71. Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21434-9_3
Hutter F, Hoos HH, Leyton-Brown K (2010) Automated configuration of mixed integer programming solvers. In: Lodi A, Milano M, Toth P (eds) Integration of AI and OR techniques in constraint programming for combinatorial optimization problems. Springer, Berlin, pp 186–202
Kadioglu S, Malitsky Y, Sellmann M, Tierney K (2010) ISAC –instance-specific algorithm configuration. In: Proceedings of the 2010 conference on ECAI 2010: 19th European conference on artificial intelligence, pp 751–756. IOS Press, Amsterdam
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1):99–134. https://doi.org/10.1016/S0004-3702(98)00023-X
Khalil EB, Le Bodic P, Song L, Nemhauser G, Dilkina B (2016) Learning to branch in mixed integer programming. In: 30th AAAI conference on artificial intelligence
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Kruber M, Lübbecke ME, Parmentier A (2017) Learning when to use a decomposition. In: Salvagnin D, Lombardi M. (eds.) Integration of AI and OR techniques in constraint programming, pp 202–210. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-59776-8_16
Kuhlmann R (2018) A primal-dual augmented lagrangian penalty-interior-point algorithm for nonlinear programming. Ph.d. thesis, Universität Bremen
Kuhlmann R, Büskens C (2018) A primal-dual augmented lagrangian penalty-interior-point filter line search algorithm. Math Methods Oper Res 87(3):451–483. https://doi.org/10.1007/s00186-017-0625-x
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971
Lodi A, Zarpellon G (2017) On learning and branching: a survey. TOP 25(2):207–236. https://doi.org/10.1007/s11750-017-0451-6
Mehrotra S (1992) On the implementation of a primal-dual interior point method. SIAM J Optim 2(4):575–601. https://doi.org/10.1137/0802028
Mittelmann H. Benchmarks for optimization software. http://plato.asu.edu/ftp/ampl-nlp.html. Accessed 15 April 2019
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602
Morales JL, Nocedal J, Waltz RA, Liu G, Goux JP (2003) Assessing the potential of interior methods for nonlinear optimization. In: Biegler LT, Heinkenschloss M, Ghattas O, van Bloemen Waanders B. (eds.) Large-Scale PDE-Constrained Optimization. In: Lecture Notes in Computational Science and Engineering, vol 30, pp 167–183. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-55508-4_10
Nocedal J, Wächter A, Waltz RA (2009) Adaptive barrier update strategies for nonlinear interior methods. SIAM J Optim 19(4):1674–1693. https://doi.org/10.1137/060649513
Baltean-Lugojan Radu, Bonami Pierre, Misener R, Tramontani A (2018) Selecting cutting planes for quadratic semidefinite outer-approximation via trained neural networks. In: Technical report, Imperial College London
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
Shen C, Leyffer S, Fletcher R (2011) A nonmonotone filter method for nonlinear optimization. Comput Optim Appl 52(3):583–607. https://doi.org/10.1007/s10589-011-9430-2
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Tits AL, Wächter A, Bakhtiari S, Urban TJ, Lawrence CT (2003) A primal-dual interior-point method for nonlinear programming with strong global and local convergence properties. SIAM J Optim 14(1):173–199. https://doi.org/10.1137/S1052623401392123
Ulbrich M, Ulbrich S, Vicente NL (2004) A globally convergent primal-dual interior-point filter method for nonlinear programming. Math Program 100(2):379–410. https://doi.org/10.1007/s10107-003-0477-4
Vanderbei RJ, Shanno DF (1999) An interior-point algorithm for nonconvex nonlinear programming. Comput Optim Appl 13(1–3):231–252. https://doi.org/10.1023/A:1008677427361
Waltz R, Morales J, Nocedal J, Orban D (2006) An interior algorithm for nonlinear optimization that combines line search and trust region steps. Math Program 107(3):391–408. https://doi.org/10.1007/s10107-004-0560-5
Wang Z, Schaul T, Hessel M, Van Hasselt H, Lanctot M, De Freitas N (2015) Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3):279–292. https://doi.org/10.1007/BF00992698
Wächter A, Biegler LT (2006) On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Math Program 106(1):25–57. https://doi.org/10.1007/s10107-004-0559-y
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Additional numerical results
Appendix A: Additional numerical results
The appendix provides additional numerical results for the setups of Sect. 4.
Table 5 and Fig. 9 compare the performance of the deep Q-learning agent using the most successful configurations in Sect. 4.2, i.e., \(\mathcal {O}_3\) and \(\mathcal {O}_3\) without regret update, to the configurations \(\mathcal {O}_1\) and \(\mathcal {O}_1\) without regret update as the latter observation space also yields good performance in Sect. 4.1. The results, however, show that the observation space \(\mathcal {O}_3\), i.e., the addition of the extrapolation approach of Sect. 2.2.3, provides better results in general.
If the number of nonlinear programming instances is increased in the training set, the learning task becomes more complex and may require more episodes. This effect is illustrated in Table 6 and Fig. 10 which consider in addition to the training and testing set of Sect. 4.2 one with 25 training and testing instancesFootnote 9 and one with 50 training and 25 testing instancesFootnote 10. The mean performance drops significantly for the larger training sets and in two replications even yield worse mean performance over the last 1000 episodes compared to the monotone Fiacco–McCormick update of Sect. 2.2.1. Note that in the current implementation running the testing set is just invoked if the training performance is above the baseline, which is why we get a steady testing performance for the first run of the 50/25 sets (cf. Fig. 10, right). The comparison of the learning performance between the different training and testing sets, however, is actually difficult, because the addition of a new nonlinear program may introduce higher complexity by its own, i.e., the new nonlinear program may be hard to solve and/or very sensitive to barrier parameter updates. Therefore, these results should be understood more like an outlook on what future research can address as they indicate that large training sets with very different nonlinear programs are currently still complicated and may need many more episodes to learn.
Finally, Tables 7 and 8 list the detailed numerical results of Sects. 4.1 and 4.2, respectively, for all considered barrier update strategies of IPOPT and WORHP. This includes the solver termination status, number of iterations, CPU time, objective function values and number of function evaluations for all nonlinear programming instances of the considered training and testing sets. For WORHP IP learned CPU time does not include the Python code, e.g., starting the Python interpreter within the CUTEst-C-interface of WORHP. The results given for WORHP IP learned have the global convergence framework enabled and list the number of switches to the monotone strategy.
Rights and permissions
About this article
Cite this article
Kuhlmann, R. Learning to steer nonlinear interior-point methods. EURO J Comput Optim 7, 381–419 (2019). https://doi.org/10.1007/s13675-019-00118-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13675-019-00118-4
Keywords
- Nonlinear programming
- Constrained optimization
- Interior-point algorithm
- Reinforcement learning
- Deep Q-learning