Deterministic Global Optimization with Artificial Neural Networks Embedded

Abstract

Artificial neural networks are used in various applications for data-driven black-box modeling and subsequent optimization. Herein, we present an efficient method for deterministic global optimization of optimization problems with artificial neural networks embedded. The proposed method is based on relaxations of algorithms using McCormick relaxations in a reduced space (Mitsos et al. in SIAM J Optim 20(2):573–601, 2009) employing the convex and concave envelopes of the nonlinear activation function. The optimization problem is solved using our in-house deterministic global solver. The performance of the proposed method is shown in four optimization examples: an illustrative function, a fermentation process, a compressor plant and a chemical process. The results show that computational solution time is favorable compared to a state-of-the-art global general-purpose optimization solver.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. 1.

    Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989). https://doi.org/10.1016/0893-6080(89)90020-8

    MATH  Article  Google Scholar 

  2. 2.

    Gasteiger, J., Zupan, J.: Neural networks in chemistry. Angew. Chem. Int. Edn. Engl. 32(4), 503–527 (1993). https://doi.org/10.1002/anie.199305031

    Article  Google Scholar 

  3. 3.

    Azlan Hussain, M.: Review of the applications of neural networks in chemical process control—simulation and online implementation. Artif. Intell. Eng. 13(1), 55–68 (1999). https://doi.org/10.1016/S0954-1810(98)00011-9

    Article  Google Scholar 

  4. 4.

    Agatonovic-Kustrin, S., Beresford, R.: Basic concepts of artificial neural network modeling and its application in pharmaceutical research. J. Pharm. Biomed. Anal. 22(5), 717–727 (2000). https://doi.org/10.1016/S0731-7085(99)00272-1

    Article  Google Scholar 

  5. 5.

    Witek-Krowiak, A., Chojnacka, K., Podstawczyk, D., Dawiec, A., Pokomeda, K.: Application of response surface methodology and artificial neural network methods in modelling and optimization of biosorption process. Bioresour. Technol. 160, 150–160 (2014). https://doi.org/10.1016/j.biortech.2014.01.021

    Article  Google Scholar 

  6. 6.

    Meireles, M., Almeida, P., Simoes, M.G.: A comprehensive review for industrial applicability of artificial neural networks. IEEE Trans. Ind. Electron. 50(3), 585–601 (2003). https://doi.org/10.1109/TIE.2003.812470

    Article  Google Scholar 

  7. 7.

    Del Rio-Chanona, E.A., Fiorelli, F., Zhang, D., Ahmed, N.R., Jing, K., Shah, N.: An efficient model construction strategy to simulate microalgal lutein photo-production dynamic process. Biotechnol. Bioeng. 114(11), 2518–2527 (2017). https://doi.org/10.1002/bit.26373

    Article  Google Scholar 

  8. 8.

    Cheema, J.J.S., Sankpal, N.V., Tambe, S.S., Kulkarni, B.D.: Genetic programming assisted stochastic optimization strategies for optimization of glucose to gluconic acid fermentation. Biotechnol. Progr. 18(6), 1356–1365 (2002). https://doi.org/10.1021/bp015509s

    Article  Google Scholar 

  9. 9.

    Desai, K.M., Survase, S.A., Saudagar, P.S., Lele, S.S., Singhal, R.S.: Comparison of artificial neural network and response surface methodology in fermentation media optimization: case study of fermentative production of scleroglucan. Biochem. Eng. J. 41(3), 266–273 (2008). https://doi.org/10.1016/j.bej.2008.05.009

    Article  Google Scholar 

  10. 10.

    Nagata, Y., Chu, K.H.: Optimization of a fermentation medium using neural networks and genetic algorithms. Biotechnol. Lett. 25(21), 1837–1842 (2003). https://doi.org/10.1023/A:1026225526558

    Article  Google Scholar 

  11. 11.

    Fahmi, I., Cremaschi, S.: Process synthesis of biodiesel production plant using artificial neural networks as the surrogate models. Comput. Chem. Eng. 46, 105–123 (2012). https://doi.org/10.1016/j.compchemeng.2012.06.006

    Article  Google Scholar 

  12. 12.

    Nascimento, C.A.O., Giudici, R., Guardani, R.: Neural network based approach for optimization of industrial chemical processes. Comput. Chem. Eng. 24(9–10), 2303–2314 (2000). https://doi.org/10.1016/S0098-1354(00)00587-1

    Article  Google Scholar 

  13. 13.

    Nascimento, C.A.O., Giudici, R.: Neural network based approach for optimisation applied to an industrial nylon-6,6 polymerisation process. Comput. Chem. Eng. 22, 595–S600 (1998). https://doi.org/10.1016/S0098-1354(98)00105-7

    Article  Google Scholar 

  14. 14.

    Chambers, M., Mount-Campbell, C.A.: Process optimization via neural network metamodeling. Int. J. Prod. Econ. 79(2), 93–100 (2002). https://doi.org/10.1016/S0925-5273(00)00188-2

    Article  Google Scholar 

  15. 15.

    Henao, C.A., Maravelias, C.T.: Surrogate-based process synthesis. In: Pierucci, S., Ferraris, G.B. (eds.) 20th European Symposium on Computer Aided Process Engineering, Computer Aided Chemical Engineering, vol. 28, pp. 1129–1134. Elsevier, Milan, Italy (2010). https://doi.org/10.1016/S1570-7946(10)28189-0

  16. 16.

    Henao, C.A., Maravelias, C.T.: Surrogate-based superstructure optimization framework. AIChE J. 57(5), 1216–1232 (2011). https://doi.org/10.1002/aic.12341

    Article  Google Scholar 

  17. 17.

    Sant Anna, H.R., Barreto, A.G., Tavares, F.W., de Souza, M.B.: Machine learning model and optimization of a PSA unit for methane–nitrogen separation. Comput. Chem. Eng. 104, 377–391 (2017). https://doi.org/10.1016/j.compchemeng.2017.05.006

    Article  Google Scholar 

  18. 18.

    Smith, J.D., Neto, A.A., Cremaschi, S., Crunkleton, D.W.: CFD-based optimization of a flooded bed algae bioreactor. Ind. Eng. Chem. Res. 52(22), 7181–7188 (2013). https://doi.org/10.1021/ie302478d

    Article  Google Scholar 

  19. 19.

    Henao, C.A.: A superstructure modeling framework for process synthesis using surrogate models. Dissertation, University of Wisconsin, Madison (2012)

  20. 20.

    Kajero, O.T., Chen, T., Yao, Y., Chuang, Y.C., Wong, D.S.H.: Meta-modelling in chemical process system engineering. J. Taiwan Inst. Chem. Eng. 73, 135–145 (2017). https://doi.org/10.1016/j.jtice.2016.10.042

    Article  Google Scholar 

  21. 21.

    Lewandowski, J., Lemcoff, N.O., Palosaari, S.: Use of neural networks in the simulation and optimization of pressure swing adsorption processes. Chem. Eng. Technol. 21(7), 593–597 (1998). https://doi.org/10.1002/(SICI)1521-4125(199807)21:7\(<\)593::AID-CEAT593\(>\)3.0.CO;2-U

  22. 22.

    Gutiérrez-Antonio, C.: Multiobjective stochastic optimization of dividing-wall distillation columns using a surrogate model based on neural networks. Chem. Biochem. Eng. Q. 29(4), 491–504 (2016). https://doi.org/10.15255/CABEQ.2014.2132

  23. 23.

    Chen, C.R., Ramaswamy, H.S.: Modeling and optimization of variable retort temperature thermal processing using coupled neural networks and genetic algorithms. J. Food Eng. 53(3), 209–220 (2002). https://doi.org/10.1016/S0260-8774(01)00159-5

    Article  Google Scholar 

  24. 24.

    Dornier, M., Decloux, M., Trystram, G., Lebert, A.: Interest of neural networks for the optimization of the crossflow filtration process. LWT-Food Sci. Technol. 28(3), 300–309 (1995). https://doi.org/10.1016/S0023-6438(95)94364-1

    Article  Google Scholar 

  25. 25.

    Fernandes, F.A.N.: Optimization of Fischer–Tropsch synthesis using neural networks. Chem. Eng. Technol. 29(4), 449–453 (2006). https://doi.org/10.1002/ceat.200500310

    Article  Google Scholar 

  26. 26.

    Grossmann, I.E., Viswanathan, J., Vecchietti, A., Raman, R., Kalvelagen, E.: GAMS/DICOPT: A discrete continuous optimization package. GAMS Corporation Inc, Cary (2002)

    Google Scholar 

  27. 27.

    Drud, A.S.: Conopt—a large-scale GRG code. ORSA J. Comput. 6(2), 207–216 (1994). https://doi.org/10.1287/ijoc.6.2.207

    MATH  Article  Google Scholar 

  28. 28.

    Nandi, S., Ghosh, S., Tambe, S.S., Kulkarni, B.D.: Artificial neural-network-assisted stochastic process optimization strategies. AIChE J. 47(1), 126–141 (2001). https://doi.org/10.1002/aic.690470113

    Article  Google Scholar 

  29. 29.

    Tawarmalani, M., Sahinidis, N.V.: A polyhedral branch-and-cut approach to global optimization. Math. Program. 103(2), 225–249 (2005). https://doi.org/10.1007/s10107-005-0581-8

    MathSciNet  MATH  Article  Google Scholar 

  30. 30.

    de Weerdt, E., Chu, Q.P., Mulder, J.A.: Neural network output optimization using interval analysis. IEEE Trans. Neural Netw. 20(4), 638–653 (2009). https://doi.org/10.1109/TNN.2008.2011267

    Article  Google Scholar 

  31. 31.

    Moore, R.E., Bierbaum, F.: Methods and Applications of Interval Analysis, 2 edn. SIAM Studies in Applied Mathematics. Society for Industrial and Applied Mathematics, Philadelphia (1979). https://doi.org/10.1137/1.9781611970906

  32. 32.

    Misener, R., Floudas, C.A.: ANTIGONE: Algorithms for continuous/integer global optimization of nonlinear equations. J. Glob. Optim. 59(2), 503–526 (2014). https://doi.org/10.1007/s10898-014-0166-2

    MathSciNet  MATH  Article  Google Scholar 

  33. 33.

    Maher, S.J., Fischer, T., Gally, T., Gamrath, G., Gleixner, A., Gottwald, R.L., Hendel, G., Koch, T., Lübbecke, M.E., Miltenberger, M., Müller, B., Pfetsch, M.E., Puchert, C., Rehfeldt, D., Schenker, S., Schwarz, R., Serrano, F., Shinano, Y., Weninger, D., Witt, J.T., Witzig, J.: The SCIP optimization suite (version 4.0)

  34. 34.

    Epperly, T.G.W., Pistikopoulos, E.N.: A reduced space branch and bound algorithm for global optimization. J. Glob. Optim. 11(3), 287–311 (1997). https://doi.org/10.1023/A:1008212418949

    MathSciNet  MATH  Article  Google Scholar 

  35. 35.

    Mitsos, A., Chachuat, B., Barton, P.I.: McCormick-based relaxations of algorithms. SIAM J. Optim. 20(2), 573–601 (2009). https://doi.org/10.1137/080717341

    MathSciNet  MATH  Article  Google Scholar 

  36. 36.

    Scott, J.K., Stuber, M.D., Barton, P.I.: Generalized McCormick relaxations. J. Glob. Optim. 51(4), 569–606 (2011). https://doi.org/10.1007/s10898-011-9664-7

    MathSciNet  MATH  Article  Google Scholar 

  37. 37.

    Bongartz, D., Mitsos, A.: Deterministic global optimization of process flowsheets in a reduced space using McCormick relaxations. J. Glob. Optim. 20(9), 419 (2017). https://doi.org/10.1007/s10898-017-0547-4

    MathSciNet  MATH  Article  Google Scholar 

  38. 38.

    Huster, W.R., Bongartz, D., Mitsos, A.: Deterministic global optimization of the design of a geothermal organic rankine cycle. Energy Proc. 129, 50–57 (2017). https://doi.org/10.1016/j.egypro.2017.09.181

    Article  Google Scholar 

  39. 39.

    McCormick, G.P.: Computability of global solutions to factorable nonconvex programs: part I—convex underestimating problems. Math. Program. 10(1), 147–175 (1976). https://doi.org/10.1007/BF01580665

    MATH  Article  Google Scholar 

  40. 40.

    Bompadre, A., Mitsos, A.: Convergence rate of McCormick relaxations. J. Glob. Optim. 52(1), 1–28 (2012). https://doi.org/10.1007/s10898-011-9685-2

    MathSciNet  MATH  Article  Google Scholar 

  41. 41.

    Najman, J., Mitsos, A.: Convergence analysis of multivariate McCormick relaxations. J. Glob. Optim. 66(4), 597–628 (2016). https://doi.org/10.1007/s10898-016-0408-6

    MathSciNet  MATH  Article  Google Scholar 

  42. 42.

    Tsoukalas, A., Mitsos, A.: Multivariate McCormick relaxations. J. Glob. Optim. 59(2–3), 633–662 (2014). https://doi.org/10.1007/s10898-014-0176-0

    MathSciNet  MATH  Article  Google Scholar 

  43. 43.

    Najman, J., Bongartz, D., Tsoukalas, A., Mitsos, A.: Erratum to multivariate McCormick relaxations. J. Glob. Optim. 68(1), 219–225 (2017). https://doi.org/10.1007/s10898-016-0470-0

    MATH  Article  Google Scholar 

  44. 44.

    Khan, K.A., Watson, H.A.J., Barton, P.I.: Differentiable McCormick relaxations. J. Glob. Optim. 67(4), 687–729 (2017). https://doi.org/10.1007/s10898-016-0440-6

    MathSciNet  MATH  Article  Google Scholar 

  45. 45.

    Khan, K.A., Wilhelm, M., Stuber, M.D., Cao, H., Watson, H.A.J., Barton, P.I.: Corrections to differentiable McCormick relaxations. J. Glob. Optim. 70(3), 705–706 (2018). https://doi.org/10.1007/s10898-017-0601-2

    MATH  Article  Google Scholar 

  46. 46.

    Bongartz, D., Mitsos, A.: Infeasible path global flowsheet optimization using McCormick relaxations. In: Espuna, A. (ed.) 27th European Symposium on Computer Aided Process Engineering, Computer Aided Chemical Engineering, vol. 40. Elsevier, San Diego (2017). https://doi.org/10.1016/B978-0-444-63965-3.50107-0

  47. 47.

    Wechsung, A., Scott, J.K., Watson, H.A.J., Barton, P.I.: Reverse propagation of McCormick relaxations. J. Glob. Optim. 63(1), 1–36 (2015). https://doi.org/10.1007/s10898-015-0303-6

    MathSciNet  MATH  Article  Google Scholar 

  48. 48.

    Stuber, M.D., Scott, J.K., Barton, P.I.: Convex and concave relaxations of implicit functions. Optim. Methods Softw. 30(3), 424–460 (2015). https://doi.org/10.1080/10556788.2014.924514

    MathSciNet  MATH  Article  Google Scholar 

  49. 49.

    Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics, 8th edn. Springer, New York (2009)

    Google Scholar 

  50. 50.

    Bertsekas, D.P., Nedic, A., Ozdaglar, A.E.: Convex Analysis and Optimization, Athena Scientific Optimization and Computation Series, vol. 1. Athena Scientific, Belmont (2003)

    Google Scholar 

  51. 51.

    Bongartz, D., Najman, J., Sass, S., Mitsos, A.: MAiNGO: McCormick based Algorithm for mixed integer Nonlinear Global Optimization. Technical report (2018)

  52. 52.

    Chachuat, B.: MC++ (version 2.0): A toolkit for bounding factorable functions (2014)

  53. 53.

    Chachuat, B., Houska, B., Paulen, R., Peri’c, N., Rajyaguru, J., Villanueva, M.E.: Set-theoretic approaches in analysis, estimation and control of nonlinear systems. IFAC-PapersOnLine 48(8), 981–995 (2015). https://doi.org/10.1016/j.ifacol.2015.09.097

    Article  Google Scholar 

  54. 54.

    Hofschuster, W., Krämer, W.: FILIB++ Interval Library (version 3.0.2) (1998)

  55. 55.

    International Business Machies: IBM ilog CPLEX (version 12.1) (2009)

  56. 56.

    Gleixner, A.M., Berthold, T., Müller, B., Weltge, S.: Three enhancements for optimization-based bound tightening. J. Glob. Optim. 67(4), 731–757 (2017). https://doi.org/10.1007/s10898-016-0450-4

    MathSciNet  MATH  Article  Google Scholar 

  57. 57.

    Ryoo, H.S., Sahinidis, N.V.: Global optimization of nonconvex NLPs and MINLPs with applications in process design. Comput. Chem. Eng. 19(5), 551–566 (1995). https://doi.org/10.1016/0098-1354(94)00097-2

    Article  Google Scholar 

  58. 58.

    Locatelli, M., Schoen, F. (eds.): Global optimization: theory, algorithms, and applications. MOS-SIAM series on optimization. Mathematical Programming Society, Philadelphia, PA (2013). https://doi.org/10.1137/1.9781611972672

  59. 59.

    Kraft, D.: A software package for sequential quadratic programming. Deutsche Forschungs- und Versuchsanstalt für Luft- und Raumfahrt Köln: Forschungsbericht. Wiss. Berichtswesen d. DFVLR, Köln (1988)

  60. 60.

    Johnson, S.G.: The NLopt nonlinear-optimization package (version 2.4.2) (2016)

  61. 61.

    Bendtsen, C., Stauning, O.: Fadbad++ (version 2.1): a flexible C++ package for automatic differentiation (2012)

  62. 62.

    Najman, J., Mitsos, A.: Tighter McCormick relaxations through subgradient propagation. Optimization online. http://www.optimization-online.org/DB_FILE/2017/10/6296.pdf (2017)

  63. 63.

    Ghorbanian, K., Gholamrezaei, M.: An artificial neural network approach to compressor performance prediction. Appl. Energy 86(7–8), 1210–1221 (2009). https://doi.org/10.1016/j.apenergy.2008.06.006

    Article  Google Scholar 

  64. 64.

    Luyben, W.L.: Design and control of the cumene process. Ind. Eng. Chem. Res. 49(2), 719–734 (2010). https://doi.org/10.1021/ie9011535

    Article  Google Scholar 

  65. 65.

    Schultz, E.S., Trierweiler, J.O., Farenzena, M.: The importance of nominal operating point selection in self-optimizing control. Ind. Eng. Chem. Res. 55(27), 7381–7393 (2016). https://doi.org/10.1021/acs.iecr.5b02044

    Article  Google Scholar 

  66. 66.

    Lee, U., Burre, J., Caspari, A., Kleinekorte, J., Schweidtmann, A.M., Mitsos, A.: Techno-economic optimization of a green-field post-combustion CO\(_2\) capture process using superstructure and rate-based models. Ind. Eng. Chem. Res. 55(46), 12014–12026 (2016). https://doi.org/10.1021/acs.iecr.6b01668

    Article  Google Scholar 

  67. 67.

    Helmdach, D., Yaseneva, P., Heer, P.K., Schweidtmann, A.M., Lapkin, A.A.: A multiobjective optimization including results of life cycle assessment in developing biorenewables-based processes. ChemSusChem 10(18), 3632–3643 (2017). https://doi.org/10.1002/cssc.201700927

    Article  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge the financial support of the Kopernikus project SynErgie by the Federal Ministry of Education and Research (BMBF) and the project supervision by the project management organization Projektträger Jülich (PtJ). We are grateful to Jaromił Najam, Dominik Bongartz and Susanne Sass for their work on MAiNGO and Benoît Chachuat for providing MC++. We thank Eduardo Schultz for providing the model of the Cumene process, Adrian Caspari and Pascal Schäfer for helpful discussions and Linus Netze and Nils Graß  for implementing case studies. Finally, we thank the associate editor and the anonymous reviewers for their valuable comments and suggestions.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Alexander Mitsos.

Additional information

Communicated by Benoît Chachuat.

Appendices

Appendix A: Convex and Concave Envelopes of the Hyperbolic Tangent Activation Function

In this subsection, the envelopes of the hyperbolic tangent function are derived on a compact interval \(D = [x^{\mathrm{L}}, x^{\mathrm{U}}]\). As the hyperbolic tangent function is one-dimensional, McCormick [39] gives a method to construct its envelopes. More specifically, as the hyperbolic tangent function is convex on \(]-\infty ,0]\) and concave on \([0,+\infty [\), its convex envelope, \(F^{\mathrm{cv}}: \mathbb {R} \rightarrow \mathbb {R}\), and concave envelope, \(F^{\mathrm{cc}}: \mathbb {R} \rightarrow \mathbb {R}\), are given:

$$\begin{aligned} F^{\mathrm{cv}}(x)= & {} {\left\{ \begin{array}{ll} \tanh (x), &{}\quad x^{\mathrm{U}} \leqslant 0, \\ {{\mathrm{sct}}}(x), &{}\quad 0 \leqslant x^{\mathrm{L}}, \\ F^{\mathrm{cv}}_3(x), &{}\quad \text {otherwise}, \end{array}\right. } \end{aligned}$$
(2)
$$\begin{aligned} F^{\mathrm{cc}}(x)= & {} {\left\{ \begin{array}{ll} {{\mathrm{sct}}}(x), &{}\quad x^{\mathrm{U}} \leqslant 0, \\ \tanh (x), &{}\quad 0 \leqslant x^{\mathrm{L}}, \\ F^{\mathrm{cc}}_3(x), &{}\quad \text {otherwise}, \end{array}\right. } \end{aligned}$$
(3)

where the secant is given as \({{\mathrm{sct}}}(x)=\frac{\tanh (x^{\mathrm{U}}) - \tanh (x^{\mathrm{L}})}{x^{\mathrm{U}} - x^{\mathrm{L}}} x + \frac{x^{\mathrm{U}} \tanh (x^{\mathrm{L}}) - x^{\mathrm{L}} \tanh (x^{\mathrm{U}})}{x^{\mathrm{U}} - x^{\mathrm{L}}}\). For \(x^\mathrm{L}< 0 < x^\mathrm{U} \), the hyperbolic tangent function is nonconvex and nonconcave. The convex envelope, \(F^{\mathrm{cv}}_3: \mathbb {R} \rightarrow \mathbb {R}\), for this case is:

$$\begin{aligned} F^{\mathrm{cv}}_3(x)= {\left\{ \begin{array}{ll} \tanh (x), &{}\quad x \leqslant x^{\mathrm{u}}_{\mathrm{c}}, \\ \frac{\tanh (x^{\mathrm{U}})-\tanh (x^{\mathrm{u}}_{\mathrm{c}})}{x^{\mathrm{U}}-x^{\mathrm{u}}_{\mathrm{c}}} \cdot (x-x^{\mathrm{u}}_{\mathrm{c}}) + \tanh (x^{\mathrm{u}}_{\mathrm{c}}), &{}\quad x > x^{\mathrm{u}}_{\mathrm{c}}, \end{array}\right. } \end{aligned}$$
(4)

where \(x^{\mathrm{u}}_{\mathrm{c}} = \max (x^{\mathrm{u}*}_{\mathrm{c}},x^{\mathrm{L}})\) and \(x^{\mathrm{u}*}_{\mathrm{c}}\) is the solution of:

$$\begin{aligned} 1 - \tanh ^2(x)=\frac{\tanh (x^{\mathrm{U}})-\tanh (x)}{x^{\mathrm{U}}-x} , \quad x \leqslant 0 \end{aligned}$$
(5)

Similarly, \(F^{\mathrm{cc}}_3: \mathbb {R} \rightarrow \mathbb {R}\) is given by:

$$\begin{aligned} F^{\mathrm{cc}}_3(x)= {\left\{ \begin{array}{ll} \frac{\tanh (x^\mathrm{o}_{\mathrm{c}})- \tanh (x^{\mathrm{L}})}{x^\mathrm{o}_{\mathrm{c}}-x^{\mathrm{L}}} \cdot (x-x^{\mathrm{L}}) + \tanh (x^{\mathrm{L}}), &{}\quad x < x^\mathrm{o}_{\mathrm{c}}, \\ \tanh (x), &{}\quad x \geqslant x^\mathrm{o}_{\mathrm{c}}, \end{array}\right. } \end{aligned}$$
(6)

where \(x^\mathrm{o}_{\mathrm{c}} = \min (x^{\mathrm{o}*}_{\mathrm{c}},x^{\mathrm{U}})\) and \(x^{\mathrm{o}*}_{\mathrm{c}}\) is the solution of:

$$\begin{aligned} 1 - \tanh ^2(x)=\frac{\tanh (x)-\tanh (x^{\mathrm{L}})}{x-x^{\mathrm{L}}} , \quad x \geqslant 0 \end{aligned}$$
(7)

In the following, we show that the convex and concave envelopes of the hyperbolic tangent function are smooth (\(C^1\)) and strictly monotonically increasing.

Proposition A.1

(Smoothness of hyperbolic tangent relaxations) The convex and concave envelopes of the hyperbolic tangent function, \(F^{\mathrm{cv}}(x)\) and \(F^{\mathrm{cc}}\), are once continuously differentiable (\(C^1\)) and in general not \(C^2\).

Proof

The envelopes are at least \(C^1\) because they are derived by matching the derivatives of the hyperbolic tangent function and the secant line at the inflection point for convexity. They are at most \(C^1\) because \( \frac{ \text {d}^2(F_3^{\mathrm{cv}})}{\text {d} x^2} |_{x} = -2 {{\mathrm{sech}}}^2(x) \tanh (x) \ne 0\) for \(x < x^{\mathrm{u}}_{\mathrm{c}}\) and \(\left. \frac{ \text {d}^2(F_3^{\mathrm{cc}})}{\text {d} x^2} \right| _{x} = -2 {{\mathrm{sech}}}^2(x) \tanh (x) \ne \) for \(x \geqslant x^\mathrm{o}_{\mathrm{c}}\), where \({{\mathrm{sech}}}(x)\) is the hyperbolic secant function. \(\square \)

As shown in Proof A.1, the first derivative of the convex and concave envelopes of the hyperbolic tangent function, \(\frac{ \text {d}(F^{\mathrm{cv}})}{\text {d} x}\) and \(\frac{ \text {d}(F^{\mathrm{cc}})}{\text {d} x}\), is continuous but not continuously differentiable. The following proof shows that the first derivative of the convex and concave envelopes of the hyperbolic tangent function is Lipschitz continuous.

Proposition A.2

(Lipschitz continuity of first derivative of hyperbolic tangent relaxations) The second derivative of the convex and concave envelopes of the hyperbolic tangent function is bounded. This implies that the first derivative of the convex and concave envelopes of the hyperbolic tangent function, \(\frac{ \text {d}(F_3^{\mathrm{cv}})}{\text {d} x}\) and \(\frac{ \text {d}(F_3^{\mathrm{cc}})}{\text {d} x}\), is at least Lipschitz continuous.

Proof

For \(x^\mathrm{U} \leqslant 0, F^{\mathrm{cv}}(x) = \tanh (x)\) and \(F^{\mathrm{cc}}(x)={{\mathrm{sct}}}(x)\) which are \(C^\infty \). Similarly, for \(0 \leqslant x^\mathrm{L}, F^{\mathrm{cc}}(x) = \tanh (x)\) and \(F^{\mathrm{cv}}(x)={{\mathrm{sct}}}(x)\) which are \(C^\infty \). For \(x^\mathrm{L}< 0 < x^\mathrm{U}\), the second derivative of the convex and concave envelopes of the hyperbolic tangent function can be bounded by

$$\begin{aligned} \left| \frac{ \text {d}^2(F_3^{\mathrm{cv}}(x))}{\text {d} x^2} \right| \leqslant 2 \left| {{\mathrm{sech}}}^2(\tilde{x}^{\mathrm{cv}}) \tanh (\tilde{x}^{\mathrm{cv}}) \right| = 2\left| \frac{\sinh (\tilde{x}^{\mathrm{cv}})}{\cosh ^3(\tilde{x}^{\mathrm{cv}})} \right| \leqslant 2 \sinh \left( \left| x^\mathrm{L} \right| \right) \end{aligned}$$
(8)

with \(x^\mathrm{L} \leqslant x \leqslant x^\mathrm{U}, x^\mathrm{L} \leqslant \tilde{x}^{\mathrm{cv}} \leqslant x_{\mathrm{c}}^{\mathrm{u}}\), and \(\cosh (x) \geqslant 1\). Thus, \(\frac{ \text {d}(F^{\mathrm{cv}})}{\text {d} x}\) is at least Lipschitz continuous with a Lipschitz constant of at most \(L^{\mathrm{cv}} = 2 \sinh (\left| x^\mathrm{L} \right| )\). Similarly, it holds that \(\frac{ \text {d}(F^{\mathrm{cc}})}{\text {d} x}\) is at least Lipschitz continuous with a Lipschitz constant of at most \(L^{\mathrm{cc}} = 2 \sinh (\left| x^\mathrm{U} \right| )\). \(\square \)

Proposition A.3

(Monotonicity of hyperbolic tangent relaxations) The convex and concave envelopes of the hyperbolic tangent function are strictly monotonically increasing.

Proof

For \(x^\mathrm{U} \leqslant 0, F^{\mathrm{cv}}(x) = \tanh (x)\) and \({F^{\mathrm{cc}}(x)={{\mathrm{sct}}}(x)}\). Similarly, for \(0 \leqslant x^\mathrm{L}, F^{\mathrm{cc}}(x) = \tanh (x)\) and \(F^{\mathrm{cv}}(x)={{\mathrm{sct}}}(x)\). As \(\frac{ \text {d}(\tanh (x))}{\text {d} x} = 1 - \tanh ^2(x) > 0, \tanh (x)\) is strictly monotonically increasing. As \(x^\mathrm{U} > x^\mathrm{L}, {{\mathrm{sct}}}(x)\) is strictly monotonically increasing. For \(x^\mathrm{L}< 0 < x^\mathrm{U}\), the envelopes are given by (4) and (6). These are again strictly monotonically increasing because \(x_{\mathrm{c}}^\mathrm{o} > x^\mathrm{L}\) and \(x^\mathrm{U} > x_{\mathrm{c}}^{\mathrm{u}}\).\(\square \)

Appendix B: Detailed Formulation of Peaks Case Study

The FS formulation of the peaks case study is given in Eqs. (9)–(17).

$$\begin{aligned}&\underset{\mathbf{x \in D, \mathbf z \in Z}}{\min }&y \end{aligned}$$
(9)
$$\begin{aligned}&\text {s.t.}&&\end{aligned}$$
(10)
$$\begin{aligned}&z^{(1)}_{1}&= \frac{x_{1} - x_{1}^{\mathrm{L}}}{x_{1}^{\mathrm{U}}-x_{1}^{\mathrm{L}}} \cdot 2 -1 \end{aligned}$$
(11)
$$\begin{aligned}&z^{(1)}_{2}&= \frac{x_{2} - x_{2}^{\mathrm{L}}}{x_{2}^{\mathrm{U}}-x_{2}^{\mathrm{L}}} \cdot 2 -1 \end{aligned}$$
(12)
$$\begin{aligned}&v^{(1)}_{i}&= \sum _{j = 1}^{2} (w^{(1)}_{j,i} z^{(1)}_{j}) + b^{(1)}_{i}&\forall i = 1,2,\ldots ,47 \end{aligned}$$
(13)
$$\begin{aligned}&z^{(2)}_{i}&= \tanh \left( v^{(1)}_{i} \right)&\forall i = 1,2,\ldots ,47 \end{aligned}$$
(14)
$$\begin{aligned}&v^{(2)}_{1}&= \sum _{j = 1}^{47} (w^{(2)}_{j,1} z^{(2)}_{j}) + b^{(2)}_{1} \end{aligned}$$
(15)
$$\begin{aligned}&z^{(3)}_{1}&= \left( v^{(2)}_{1} \right)&\end{aligned}$$
(16)
$$\begin{aligned}&y&= (z^{(3)}_{1} +1) \cdot \frac{y^{\mathrm{U}}-y^{\mathrm{L}} }{2} + y^{\mathrm{L}}&\end{aligned}$$
(17)

Herein, the objective (Eq. 9) is to minimize the output of the MLP, i.e., y. Equations (11) and (12) scale the DoFs (\(\mathbf x =(x_1,x_2)\)) onto \([-1,1]\). This is necessary as MLPs are usually trained on scaled data. Equation (13) computes the argument of the activation function (\(v^{(1)}_{i}\)) for each neuron in layer 2, i.e., the hidden layer. Equation (14) computes the output of each neuron in layer 2 (\(z^{(2)}_{i}\)). Equation (15) computes the argument of the activation function (\(v^{(3)}_{1}\)) for the neuron in layer 3, i.e., the output layer. Equation (16) computes the output of the neuron in the output layer (\(z^{(3)}_{1}\)). In this example, the equation is simplified because the identity output activation function is used. However, in the more general case, Eq. (16) includes a nonlinear transformation. Finally, Eq. (17) scales the output of the neuron (\(z^{(3)}_{1}\)) back to the actual domain of the training data. The FS optimization problem constitutes 99 equality constraints. The original DoF (\(\mathbf x =(x_1,x_2)\)) has the dimension \(n_x=2\). The additional optimization variables \(\mathbf z =(z^{(1)}_{1},z^{(1)}_{2},v^{(1)}_{1},v^{(1)}_{2},\ldots ,v^{(1)}_{47},z^{(2)}_{1},z^{(2)}_{2},\ldots , z^{(2)}_{47},v^{(2)}_{1},z^{(3)}_{1},y )\) have a dimension of \(n_z=99\).

The RS formulation of the peaks case study is given in Eq. (18).

$$\begin{aligned} \underset{\mathbf{x \in D}}{\min } \qquad \hat{y}(\mathbf x ) \end{aligned}$$
(18)

Herein, the objective (\(\hat{y}\)) is a function of the Dof (x). In the objective function, the complete MLP as well as the scaling of the inputs and outputs is included and effectively hidden to the B&B algorithm. The original DoF (\(\mathbf x =(x_1,x_2)\)) have the dimension \(n_x=2\). The additional optimization variables \(\mathbf z =(\emptyset )\) have a dimension of \(n_z=0\).

Finally, the bounds on \(\mathbf x \) and \(\mathbf z \) have to be provided for the FS formulation (see Table 8). For the RS formulation, only bounds on x have to be provided. The variables \(\mathbf x \) and their bounds have usually a physical meaning. In contrast, the additional optimization variables \(\mathbf z \) usually do not have a meaning. These bounds are rather lose because we intend to use the same for all optimization problems (see Table 8). Tighter bounds can be derived by natural interval extensions and are shrunk by bound tightening techniques in MAiNGO and BARON.

Table 8 Bounds on the variables of the peaks case study

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Schweidtmann, A.M., Mitsos, A. Deterministic Global Optimization with Artificial Neural Networks Embedded. J Optim Theory Appl 180, 925–948 (2019). https://doi.org/10.1007/s10957-018-1396-0

Download citation

Keywords

  • Surrogate-based optimization
  • Multilayer perceptron
  • McCormick relaxations
  • Machine learning
  • MAiNGO

Mathematics Subject Classification

  • 90C26
  • 90C30
  • 90C90
  • 68T01