Modeling design and control problems involving neural network surrogates

Yang, Dominic; Balaprakash, Prasanna; Leyffer, Sven

doi:10.1007/s10589-022-00404-9

Modeling design and control problems involving neural network surrogates

Published: 14 November 2022

Volume 83, pages 759–800, (2022)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

535 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

We consider nonlinear optimization problems that involve surrogate models represented by neural networks. We demonstrate first how to directly embed neural network evaluation into optimization models, highlight a difficulty with this approach that can prevent convergence, and then characterize stationarity of such models. We then present two alternative formulations of these problems in the specific case of feedforward neural networks with ReLU activation: as a mixed-integer optimization problem and as a mathematical program with complementarity constraints. For the latter formulation we prove that stationarity at a point for this problem corresponds to stationarity of the embedded formulation. Each of these formulations may be solved with state-of-the-art optimization methods, and we show how to obtain good initial feasible solutions for these methods. We compare our formulations on three practical applications arising in the design and control of combustion engines, in the generation of adversarial attacks on classifier networks, and in the determination of optimal flows in an oil well network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 12

Neural Network-Based Sequential Global Sensitivity Analysis Algorithm

DC-programming for neural network optimizations

Article Open access 02 January 2024

Pranjal Awasthi, Anqi Mao, … Yutao Zhong

Physics-Informed Deep Neural Operator Networks

Data availability

The engine design and adversarial attack generation datasets analyzed in Sections 5.1 and 5.2 are available from the corresponding author upon request. The oil well dataset analyzed in Section 5.3 was provided by the authors of [28] and can be found at the following repository: https://github.com/bgrimstad/relu-opt-public.

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 16), 265–283 (2016)
Aithal, SM., Balaprakash, P.: MaLTESE: Large-scale simulation-driven machine learning for transient driving cycles. In: High Performance Computing, 186–205. Springer, Cham (2019)
Anderson, R., Huchette, J., Tjandraatmadja, C., Vielma, JP.: Strong mixed-integer programming formulations for trained neural networks. In: International Conference on Integer Programming and Combinatorial Optimization, 27–42 (2019)
Baumrucker, B., Renfro, J., Biegler, L.: Mpec problem formulations and solution strategies with chemical engineering applications. Comput. Chem. Eng. 32(12), 2903–2913 (2008)
Article Google Scholar
Belotti, P.: Couenne: A user’s manual. Technical report, FICO (2020)
Bergman, D., Huang, T., Brooks, P., Lodi, A., Raghunathan, AU.: Janos: an integrated predictive and prescriptive modeling framework. INFORMS J. Comput. (2021)
Bolte, J., Pauwels, E.: Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning. Math. Program. 1–33 (2020)
Bonami, P., Lee, J.: BONMIN user’s manual. Numer. Math. 4, 1–32 (2007)
Google Scholar
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), 39–57 (2017)
Cheng, CH., Nührenberg, G., Ruess, H.: Maximum resilience of artificial neural networks. In: International Symposium on Automated Technology for Verification and Analysis, 251–268. Springer, (2017)
Cheon, MS.: An outer-approximation guided optimization approach for constrained neural network inverse problems. Math. Program. 1–30 (2021)
Clarke, L., Linderoth, J., Johnson, E., Nemhauser, G., Bhagavan, R., Jordan, M.: Using OSL to improve the computational results of a MIP logistics model. EKKNEWS 16 (1996)
Delarue, A., Anderson, R., Tjandraatmadja, C.: Reinforcement learning with combinatorial actions: an application to vehicle routing. Adv. Neural Inf. Process. Syst. 33 (2020)
Du, SS., Zhai, X., Poczos, B., Singh, A.: Gradient descent provably optimizes over-parameterized neural networks. In: International Conference on Learning Representations, (2018)
Dunning, I., Huchette, J., Lubin, M.: Jump: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017)
Article MathSciNet MATH Google Scholar
Duran, M., Grossmann, I.: A mixed-integer nonlinear programming algorithm for process systems synthesis. AIChE J. 32(4), 592–606 (1986)
Article Google Scholar
Duran, M.A., Grossmann, I.: An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Math. Program. 36, 307–339 (1986)
Article MathSciNet MATH Google Scholar
Dutta, S., Jha, S., Sankaranarayanan, S., Tiwari, A.: Output range analysis for deep feedforward neural networks. In: NASA Formal Methods Symposium, pp. 121–138. Springer, (2018)
Fischetti, M., Jo, J.: Deep neural networks and mixed integer linear optimization. Constraints 23(3), 296–309 (2018)
Article MathSciNet MATH Google Scholar
Fletcher, R., Leyffer, S.: Solving mathematical program with complementarity constraints as nonlinear programs. Optim. Methods Softw. 19(1), 15–40 (2004)
Article MathSciNet MATH Google Scholar
Fletcher, R., Leyffer, S., Ralph, D., Scholtes, S.: Local convergence of SQP methods for mathematical programs with equilibrium constraints. SIAM J. Optim. 17(1), 259–286 (2006)
Article MathSciNet MATH Google Scholar
Fourer, R., Gay, DM., Kernighan, BW.: AMPL: A Modeling Language for Mathematical Programming. The Scientific Press (1993)
Gale, D.: Neighborly and cyclic polytopes. In: Proc. Sympos. Pure Math 7, pp. 225–232 (1963)
Gleixner, A.M., Berthold, T., Müller, B., Weltge, S.: Three enhancements for optimization-based bound tightening. J. Global Optim. 67(4), 731–757 (2017)
Article MathSciNet MATH Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 315–323. JMLR Workshop and Conference Proceedings, (2011)
Goodfellow, I., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations, (2015)
Goodfellow, IJ., Vinyals, O., Saxe, AM.: Qualitatively characterizing neural network optimization problems. arXiv preprintarXiv:1412.6544, (2014)
Grimstad, B., Andersson, H.: ReLU networks as surrogate models in mixed-integer linear programs. Comput. Chem. Eng. 131, 106580 (2019)
Article Google Scholar
Gurobi optimizer reference manual, version 5.0. Gurobi Optim. Inc. (2012)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
IBM Corp. IBM Ilog CPLEX V12.1: User’s Manual for CPLEX, (2009)
Katz, G., Barrett, C., Dill, DL., Julian, K., Kochenderfer, MJ.: Reluplex: An efficient SMT solver for verifying deep neural networks. In: International Conference on Computer Aided Verification, pp. 97–117. Springer, (2017)
Khalil, EB., Gupta, A., Dilkina, B.: Combinatorial attacks on binarized neural networks. In: International Conference on Learning Representations, (2018)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
Kurakin, A., Goodfellow, IJ., Bengio, S.: Adversarial examples in the physical world. In: Artificial Intelligence Safety and Security, pp. 99–112. Chapman and Hall/CRC, (2018)
LeCun, Y.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/, (1998)
Leyffer, S.: Mathematical programs with complementarity constraints. SIAG/OPT Views News 14(1), 15–18 (2003)
MathSciNet MATH Google Scholar
Leyffer, S., Lopez-Calva, G., Nocedal, J.: Interior methods for mathematical programs with complementarity constraints. SIAM J. Optim. 17(1), 52–77 (2006)
Article MathSciNet MATH Google Scholar
Li, Y., Yuan, Y.: Convergence analysis of two-layer neural networks with ReLU activation. Adv. Neural. Inf. Process. Syst. 30, 597–607 (2017)
Google Scholar
Lombardi, M., Milano, M., Bartolini, A.: Empirical decision model learning. Artif. Intell. 244, 343–367 (2017)
Article MathSciNet MATH Google Scholar
Mahajan, A., Leyffer, S., Linderoth, J., Luedtke, J., Munson, T.: MINOTAUR: a toolkit for solving mixed-integer nonlinear optimization. wiki-page, (2011). http://wiki.mcs.anl.gov/minotaur
Montufar, G.F., Pascanu, R., Cho, K., Bengio, Y.: On the number of linear regions of deep neural networks. Adv. Neural. Inf. Process. Syst. 27, 2924–2932 (2014)
Google Scholar
Papalexopoulos, T., Tjandraatmadja, C., Anderson, R., Vielma, JP., Belanger, D.: Constrained discrete black-box optimization using mixed-integer programming. arXiv preprintarXiv:2110.09569, (2021)
Pascanu, R., Montúfar, G., Bengio, Y.: On the number of response regions of deep feed forward networks with piece-wise linear activations. In: International Conference on Learning Representations, (2014)
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. (2017)
Powell, M.: A method for nonlinear constraints in minimization problems in optimization. In: Fletcher R. (ed.) Optimization. Academic Press, (1969)
Queipo, N.V., Haftka, R.T., Shyy, W., Goel, T., Vaidyanathan, R., Tucker, P.K.: Surrogate-based analysis and optimization. Prog. Aerosp. Sci. 41(1), 1–28 (2005)
Article Google Scholar
Raghunathan, A., Biegler, L.T.: An interior point method for mathematical programs with complementarity constraints (MPCCs). SIAM J. Optim. 15(3), 720–750 (2005)
Article MathSciNet MATH Google Scholar
Ramachandran, P., Zoph, B., Le, QV: Searching for activation functions. arXiv preprintarXiv:1710.05941, (2017)
Ryu, M., Chow, Y., Anderson, R., Tjandraatmadja, C., Boutilier, C.: Caql: Continuous action q-learning. In: International Conference on Learning Representations, (2019)
Sahinidis, N.V.: BARON: a general purpose global optimization software package. J. Global Optim. 8(2), 201–205 (1996)
Article MathSciNet MATH Google Scholar
Scheel, H., Scholtes, S.: Mathematical program with complementarity constraints: Stationarity, optimality and sensitivity. Math. Oper. Res. 25, 1–22 (2000)
Article MathSciNet MATH Google Scholar
Schweidtmann, A.M., Mitsos, A.: Deterministic global optimization with artificial neural networks embedded. J. Optim. Theory Appl. 180(3), 925–948 (2019)
Article MathSciNet MATH Google Scholar
Serra, T., Ramalingam, S.: Empirical bounds on linear regions of deep rectifier networks. In: AAAI, pp. 5628–5635 (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations, (2014)
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. In: 2nd International Conference on Learning Representations, (2014)
Tawarmalani, M., Sahinidis, N.V.: Convexification and Global Optimization in Continuous and Mixed-Integer Nonlinear Programming: Theory, Algorithms, Software, and Applications. Kluwer Academic Publishers, Boston MA (2002)
Book MATH Google Scholar
Tjeng, V., Xiao, K., Tedrake, R.: Evaluating robustness of neural networks with mixed integer programming. arXiv preprintarXiv:1711.07356, (2017)
Tsay, C., Kronqvist, J., Thebelt, A., Misener, R.: Partition-based formulations for mixed-integer optimization of trained relu neural networks. Adv. Neural Inf. Process. Syst. 34, (2021)
Wächter, A., Biegler, L.T.: On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006)
Article MathSciNet MATH Google Scholar
Zaslavsky, T.: Facing up to arrangements: Face-count formulas for partitions of space by hyperplanes: Face-count formulas for partitions of space by hyperplanes, vol. 154. American Mathematical Soc., (1975)
Zhang, Z., Brand, M.: Convergent block coordinate descent for training tikhonov regularized deep neural networks, (2017)

Download references

Acknowledgements

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357. This work was also supported by the U.S. Department of Energy through grant DE-FG02-05ER25694. The first author was also supported through an NSF-MSGI fellowship.

Author information

Authors and Affiliations

University of California at Los Angeles, Los Angeles, US
Dominic Yang
Argonne National Laboratory, Lemont, USA
Prasanna Balaprakash & Sven Leyffer

Authors

Dominic Yang
View author publications
You can also search for this author in PubMed Google Scholar
Prasanna Balaprakash
View author publications
You can also search for this author in PubMed Google Scholar
Sven Leyffer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dominic Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (“Argonne”). Argonne, a U.S. Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan.http://energy.gov/downloads/doe-public-access-plan.

A additional engine design problem results

In this appendix, we include the fully tabulated results for selected figures from Section 5.1. In Table 7, we include the data from Fig. 9 which records the average time until each solver found the best solution to its respective formulation. In Table 8, we include the data from Fig. 10 which records the average percentage gap between the optimal solution for each formulation and the best-known solution.

We also include results recording the full experiment solve times for the engine design problem. A bar plot depicting the average time over all 10 runs for each number of hidden layers, and number of timesteps on each formulation is presented in Fig. 13. The results are also tabulated in Table 9. These results are distinct from the results presented in the main section of the paper as they include time after the solver finds its optimal iterate which may be large due to convergence issues. This is especially the case for the embedded formulation for which the solver often reaches the iteration limit after finding a good solution. This happens because Ipopt will jump away from this solution as it cannot prove it to be locally optimal (if the solution is in the neighborhood of a nondifferentiable point) and will continue the solve in a different area. The MIP solves also have a potentially large amount of time after it finds its optimal solution. This extra time is used exploring the remainder of the solution tree to determine that there is no other feasible solution better than the best seen so far. The MPCC times presented here are largely the same as when Ipopt finds its optimal solution in these settings, it usually is able to determine that it is locally optimal and terminate.

Table 7 Fully tabulated results of the average time until each solver found its best solution for the engine design problem. L indicates the number of hidden layers. This is the data used to generate Fig. 9

Full size table

Table 8 Fully tabulated results of the average percentage gap in objective value and best-known objective value for the engine design problem. L indicates the number of hidden layers. This is the data used to generate Fig. 10

Full size table

Table 9 The fully tabulated data which records the average full run time for each experiment from the engine design problem

Full size table

We also remark that the full experiment time for the embedded formulations is very similar between the warmstarted and non-warmstarted solves. This similarity can be explained by how, in most cases, the solver reaches the iteration limit for both problems and fails to converge to a point. The warmstarted problem generally finds a good solution faster, but in neither setting is Ipopt able to verify that solutions found are locally optimal.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, D., Balaprakash, P. & Leyffer, S. Modeling design and control problems involving neural network surrogates. Comput Optim Appl 83, 759–800 (2022). https://doi.org/10.1007/s10589-022-00404-9

Download citation

Received: 21 November 2021
Accepted: 25 July 2022
Published: 14 November 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10589-022-00404-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling design and control problems involving neural network surrogates

Abstract

Access this article

Similar content being viewed by others

Neural Network-Based Sequential Global Sensitivity Analysis Algorithm

DC-programming for neural network optimizations

Physics-Informed Deep Neural Operator Networks

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A additional engine design problem results

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modeling design and control problems involving neural network surrogates

Abstract

Access this article

Similar content being viewed by others

Neural Network-Based Sequential Global Sensitivity Analysis Algorithm

DC-programming for neural network optimizations

Physics-Informed Deep Neural Operator Networks

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A additional engine design problem results

A additional engine design problem results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation