Skip to main content

Strong Mixed-Integer Programming Formulations for Trained Neural Networks

  • Conference paper
  • First Online:
Integer Programming and Combinatorial Optimization (IPCO 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11480))

Abstract

We present an ideal mixed-integer programming (MIP) formulation for a rectified linear unit (ReLU) appearing in a trained neural network. Our formulation requires a single binary variable and no additional continuous variables beyond the input and output variables of the ReLU. We contrast it with an ideal “extended” formulation with a linear number of additional continuous variables, derived through standard techniques. An apparent drawback of our formulation is that it requires an exponential number of inequality constraints, but we provide a routine to separate the inequalities in linear time. We also prove that these exponentially-many constraints are facet-defining under mild conditions. Finally, we study network verification problems and observe that dynamically separating from the exponential inequalities (1) is much more computationally efficient and scalable than the extended formulation, (2) decreases the solve time of a state-of-the-art MIP solver by a factor of 7 on smaller instances, and (3) nearly matches the dual bounds of a state-of-the-art MIP solver on harder instances, after just a few rounds of separation and in orders of magnitude less time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Further analysis of the interactions between neurons can be found in the full-length version of this extended abstract [2].

  2. 2.

    We use cut callbacks in Gurobi to inject separated inequalities into the cut loop. While this offers little control over when the separation procedure is run, it allows us to take advantage of Gurobi’s sophisticated cut management implementation.

References

  1. Amos, B., Xu, L., Kolter, J.Z.: Input convex neural networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 146–155. PMLR, International Convention Centre, Sydney, Australia, 06–11 August 2017

    Google Scholar 

  2. Anderson, R., Huchette, J., Tjandraatmadja, C., Vielma, J.P.: Strong convex relaxations and mixed-integer programming formulations for trained neural networks (2018). https://arxiv.org/abs/1811.01988

  3. Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)

    Article  Google Scholar 

  4. Atamtürk, A., Gómez, A.: Strong formulations for quadratic optimization with M-matrices and indicator variables. Math. Program. 170, 141–176 (2018)

    Article  MathSciNet  Google Scholar 

  5. Balas, E.: Disjunctive programming and a hierarchy of relaxations for discrete optimization problems. SIAM J. Algorithmic Discret. Methods 6(3), 466–486 (1985)

    Article  MathSciNet  Google Scholar 

  6. Balas, E.: Disjunctive programming: properties of the convex hull of feasible points. Discret. Appl. Math. 89, 3–44 (1998)

    Article  MathSciNet  Google Scholar 

  7. Bartolini, A., Lombardi, M., Milano, M., Benini, L.: Neuron constraints to model complex real-world problems. In: Lee, J. (ed.) CP 2011. LNCS, vol. 6876, pp. 115–129. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23786-7_11

    Chapter  Google Scholar 

  8. Bartolini, A., Lombardi, M., Milano, M., Benini, L.: Optimization and controlled systems: a case study on thermal aware workload dispatching. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 427–433 (2012)

    Google Scholar 

  9. Bastani, O., Ioannou, Y., Lampropoulos, L., Vytiniotis, D., Nori, A.V., Criminisi, A.: Measuring neural net robustness with constraints. In: Advances in Neural Information Processing Systems, pp. 2613–2621 (2016)

    Google Scholar 

  10. Belotti, P., et al.: On handling indicator constraints in mixed integer programming. Comput. Optim. Appl. 65(3), 545–566 (2016)

    Article  MathSciNet  Google Scholar 

  11. Bertsimas, D., Kallus, N.: From predictive to prescriptive analytics. Management Science (2018). https://arxiv.org/abs/1402.5481

  12. Biggs, M., Hariss, R.: Optimizing objective functions determined from random forests (2017). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2986630

  13. Bonami, P., Lodi, A., Tramontani, A., Wiese, S.: On mathematical programming with indicator constraints. Math. Program. 151(1), 191–223 (2015)

    Article  MathSciNet  Google Scholar 

  14. Bunel, R., Turkaslan, I., Torr, P.H., Kohli, P., Kumar, M.P.: A unified view of piecewise linear neural network verification. In: Advances in Neural Information Processing Systems (2018)

    Google Scholar 

  15. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57 (2017)

    Google Scholar 

  16. Cheng, C.-H., Nührenberg, G., Ruess, H.: Maximum resilience of artificial neural networks. In: D’Souza, D., Narayan Kumar, K. (eds.) ATVA 2017. LNCS, vol. 10482, pp. 251–268. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68167-2_18

    Chapter  Google Scholar 

  17. Deng, Y., Liu, J., Sen, S.: Coalescing data and decision sciences for analytics. In: Recent Advances in Optimization and Modeling of Contemporary Problems. INFORMS (2018)

    Google Scholar 

  18. Donti, P., Amos, B., Kolter, J.Z.: Task-based end-to-end model learning in stochastic optimization. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5484–5494. Curran Associates, Inc. (2017)

    Google Scholar 

  19. Dulac-Arnold, G., et al.: Deep reinforcement learning in large discrete action spaces (2015). https://arxiv.org/abs/1512.07679

  20. Dutta, S., Jha, S., Sankaranarayanan, S., Tiwari, A.: Output range analysis for deep feedforward neural networks. In: Dutle, A., Muñoz, C., Narkawicz, A. (eds.) NFM 2018. LNCS, vol. 10811, pp. 121–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77935-5_9

    Chapter  Google Scholar 

  21. Dvijotham, K., et al.:: Training verified learners with learned verifiers (2018). https://arxiv.org/abs/1805.10265

  22. Dvijotham, K., Stanforth, R., Gowal, S., Mann, T., Kohli, P.: A dual approach to scalable verification of deep networks. In: Thirty-Fourth Conference Annual Conference on Uncertainty in Artificial Intelligence (2018)

    Google Scholar 

  23. Ehlers, R.: Formal verification of piece-wise linear feed-forward neural networks. In: D’Souza, D., Narayan Kumar, K. (eds.) ATVA 2017. LNCS, vol. 10482, pp. 269–286. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68167-2_19

    Chapter  Google Scholar 

  24. Elmachtoub, A.N., Grigas, P.: Smart "Predict, then Optimize" (2017). https://arxiv.org/abs/1710.08005

  25. Fischetti, M., Jo, J.: Deep neural networks and mixed integer linear optimization. Constraints 23, 296–309 (2018)

    Article  MathSciNet  Google Scholar 

  26. Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style (2015). https://arxiv.org/abs/1508.06576

  27. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  28. den Hertog, D., Postek, K.: Bridging the gap between predictive and prescriptive analytics - new optimization methodology needed (2016). http://www.optimization-online.org/DB_HTML/2016/12/5779.html

  29. Hijazi, H., Bonami, P., Cornuéjols, G., Ouorou, A.: Mixed-integer nonlinear programs featuring “on/off” constraints. Comput. Optim. Appl. 52(2), 537–558 (2012)

    Article  MathSciNet  Google Scholar 

  30. Hijazi, H., Bonami, P., Ouorou, A.: A note on linear on/off constraints (2014). http://www.optimization-online.org/DB_FILE/2014/04/4309.pdf

  31. Hooker, J.: Logic-Based Methods for Optimization: Combining Optimization and Constraint Satisfaction. Wiley, Hoboken (2011)

    MATH  Google Scholar 

  32. Huchette, J.: Advanced mixed-integer programming formulations: methodology, computation, and application. Ph.D. thesis, Massachusetts Institute of Technology (June 2018)

    Google Scholar 

  33. Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63387-9_5

    Chapter  Google Scholar 

  34. Khalil, E.B., Gupta, A., Dilkina, B.: Combinatorial attacks on binarized neural networks. In: International Conference on Learning Representations (2019)

    Google Scholar 

  35. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  36. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)

    Article  Google Scholar 

  37. Lombardi, M., Gualandi, S.: A lagrangian propagator for artificial neural networks in constraint programming. Constraints 21(4), 435–462 (2016)

    Article  MathSciNet  Google Scholar 

  38. Lomuscio, A., Maganti, L.: An approach to reachability analysis for feed-forward ReLU neural networks (2017). https://arxiv.org/abs/1706.07351

  39. Mišić, V.V.: Optimization of tree ensembles (2017). https://arxiv.org/abs/1705.10883

  40. Mladenov, M., Boutilier, C., Schuurmans, D., Elidan, G., Meshi, O., Lu, T.: Approximate linear programming for logistic Markov decision processes. In: Proceedings of the Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI 2017), pp. 2486–2493, Melbourne, Australia (2017)

    Google Scholar 

  41. Mordvintsev, A., Olah, C., Tyka, M.: Inceptionism: going deeper into neural networks (2015). https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html

  42. Olah, C., Mordvintsev, A., Schubert, L.: Feature Visualization. Distill (2017). https://distill.pub/2017/feature-visualization

  43. Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: IEEE European Symposium on Security and Privacy, pp. 372–387, March 2016

    Google Scholar 

  44. Say, B., Wu, G., Zhou, Y.Q., Sanner, S.: Nonlinear hybrid planning with deep net learned transition models and mixed-integer linear programming. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, pp. 750–756 (2017)

    Google Scholar 

  45. Schweidtmann, A.M., Mitsos, A.: Global deterministic optimization with artificial neural networks embedded. J. Optim. Theory Appl. 180(3), 925–948 (2019)

    Article  MathSciNet  Google Scholar 

  46. Serra, T., Ramalingam, S.: Empirical bounds on linear regions of deep rectifier networks (2018). https://arxiv.org/abs/1810.03370

  47. Serra, T., Tjandraatmadja, C., Ramalingam, S.: Bounding and counting linear regions of deep neural networks. In: Thirty-Fifth International Conference on Machine Learning (2018)

    Google Scholar 

  48. Szegedy, C., et al.: Intriguing properties of neural networks. In: International Conference on Learning Representations (2014)

    Google Scholar 

  49. Tjeng, V., Xiao, K., Tedrake, R.: Verifying neural networks with mixed integer programming. In: International Conference on Learning Representations (2019)

    Google Scholar 

  50. Vielma, J.P.: Mixed integer linear programming formulation techniques. SIAM Rev. 57(1), 3–57 (2015)

    Article  MathSciNet  Google Scholar 

  51. Vielma, J.P.: Small and strong formulations for unions of convex sets from the Cayley embedding. Math. Program. (2018)

    Google Scholar 

  52. Vielma, J.P., Nemhauser, G.: Modeling disjunctive constraints with a logarithmic number of binary variables and constraints. Math. Program. 128(1–2), 49–72 (2011)

    Article  MathSciNet  Google Scholar 

  53. Wong, E., Kolter, J.Z.: Provable defenses against adversarial examples via the convex outer adversarial polytope. In: International Conference on Machine Learning (2018)

    Google Scholar 

  54. Wong, E., Schmidt, F., Metzen, J.H., Kolter, J.Z.: Scaling provable adversarial defenses. In: 32nd Conference on Neural Information Processing Systems (2018)

    Google Scholar 

  55. Wu, G., Say, B., Sanner, S.: Scalable planning with Tensorflow for hybrid nonlinear domains. In: Advances in Neural Information Processing Systems, pp. 6276–6286 (2017)

    Google Scholar 

  56. Xiao, K.Y., Tjeng, V., Shafiullah, N.M., Madry, A.: Training for faster adversarial robustness verification via inducing ReLU stability. In: International Conference on Learning Representations (2019)

    Google Scholar 

Download references

Acknowledgement

The authors gratefully acknowledge Yeesian Ng and Ondřej Sýkora for many discussions on the topic of this paper, and for their work on the development of the tf.opt package used in the computational experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joey Huchette .

Editor information

Editors and Affiliations

A Deferred Proofs

A Deferred Proofs

1.1 A.1 Proof of Proposition 1

Proof

The result follows from applying Fourier–Motzkin elimination to (5a5f) to project out the \(x^0\), \(x^1\), \(y^0\), and \(y^1\) variables; see [31, Chap. 13] for an explanation of the approach. We start by eliminating the \(x^1\), \(y^0\), and \(y^1\) using the equations in (5a), (5b), and (5c), respectively, leaving only \(x^0\).

First, if there is some input component i with \(w_i=0\), then \(x^0_i\) only appears in the constraints (5d5e), and so the elimination step produces .

Second, if there is some i with \(w_i < 0\), then we introduce an auxiliary variable \(\tilde{x}_i\) with the equation \(\tilde{x}_i = -x_i\). We then replace \(w_i \leftarrow |w_i|\), \(L_i \leftarrow -U_i\), and \(U_i \leftarrow -L_i\), and proceed as follows under the assumption that \(w > 0\).

Applying the Fourier-Motzkin procedure to eliminate \(x^0_1\) gives the inequalities

along with the existing inequalities in (5a5f) where the \(x^0_1\) coefficient is zero. Repeating this procedure for each remaining component of \(x^0\) yields the linear system

$$\begin{aligned} y \geqslant w \cdot x + b \end{aligned}$$
(7a)
$$\begin{aligned} y \leqslant \sum _{i \in I} w_i x_i - \sum _{i \in I} w_iL_i(1-z) + \left( b + \sum _{i \not \in I} w_iU_i\right) z \quad \forall I \subseteq {\text {supp}}(w) \end{aligned}$$
(7b)
$$\begin{aligned} y \geqslant \sum _{i \in I} w_i x_i - \sum _{i \in I} w_iU_i(1-z) + \left( b + \sum _{i \not \in I} w_iL_i\right) z \quad \forall I \subseteq {\text {supp}}(w) \end{aligned}$$
(7c)
$$\begin{aligned} (x,y,z) \in [L,U] \times \mathbb {R}_{\geqslant 0} \times [0,1]. \end{aligned}$$
(7d)

Moreover, we can show that the family of inequalities (7c) is redundant, and can therefore be removed. Fix some \(I \subseteq {\text {supp}}(w)\), and take . If \(h(\llbracket \eta \rrbracket \setminus I) \geqslant 0\), we can express the inequality in (7c) corresponding to the set I as a conic combination of the remaining constraints as:

Alternatively, if \(h(\llbracket \eta \rrbracket \setminus I) < 0\), we can express the inequality in (7c) corresponding to the set I as a conic combination of the remaining constraints as:

To complete the proof, for any components i where we introduced an auxiliary variable \(\tilde{x}_i\), we use the corresponding equation \(\tilde{x}_i = -x_i\) to eliminate \(x_i\) and replace it \(\tilde{x}_i\), giving the result.    \(\square \)

1.2 A.2 Proof of Proposition 2

Proof

We fix \(I = \{\kappa +1,\ldots ,\eta \}\) for some \(\kappa \); this is without loss of generality by permuting the rows of the matrices presented below. Additionally, we presume that , which allows us to infer that \(\breve{L} = L\) and \(\breve{U} = U\). This is also without loss of generality by appropriately interchanging \(+\) and − in the definition of the \(\tilde{p}^k\) below. In the following, references to (6b) are taken to be references to the inequality in (6b) corresponding to the subset I.

Take the two points \(p^0 = (x,y,z) = (L, 0, 0)\) and \(p^1 = (U, f(U), 1)\). Each point is feasible with respect to (6a6c) and satisfies (6b) at equality. Then for some \(\epsilon > 0\) and for each \(i \in \llbracket \eta \rrbracket \backslash I\), take \(\tilde{p}^i = (x,y,z) = (L + \epsilon \mathbf{e}^i, 0, 0)\). Similarly, for each \(i \in I\), take \(\tilde{p}^i = (x,y,z) = (U - \epsilon \mathbf{e}^i, f(U - \epsilon \mathbf{e}^i), 1)\). From the strict activity assumption, there exists some \(\epsilon > 0\) sufficiently small such that each \(\tilde{p}^k\) is feasible with respect to (6a6c) and satisfies (6b) at equality.

This leaves us with \(\eta + 2\) feasible points satisfying (6b) at equality; the result then follows by showing that the points are affinely independent. Take the matrix

$$ \begin{pmatrix} p^1 - p^0 \\ \tilde{p}^1 - p^0 \\ \vdots \\ \tilde{p}^\kappa - p^0 \\ \tilde{p}^{\kappa +1} - p^0 \\ \vdots \\ \tilde{p}^\eta - p^0 \end{pmatrix} = \begin{pmatrix} U-L &{} f(U) &{} 1 \\ \epsilon \mathbf{e}^1 &{} 0 &{} 0 \\ \vdots &{} \vdots &{} \vdots \\ \epsilon \mathbf{e}^\kappa &{} 0 &{} 0 \\ U - L - \epsilon \mathbf{e}^{\kappa +1} &{} f(U-\epsilon \mathbf{e}^{\kappa +1}) &{} 1 \\ \vdots &{} \vdots &{} \vdots \\ U - L - \epsilon \mathbf{e}^{\eta } &{} f(U-\epsilon \mathbf{e}^{\eta }) &{} 1 \end{pmatrix} \cong \begin{pmatrix} U - L &{} f(U) &{} 1 \\ \epsilon \mathbf{e}^1 &{} 0 &{} 0 \\ \vdots &{} \vdots &{} \vdots \\ \epsilon \mathbf{e}^\kappa &{} 0 &{} 0 \\ -\epsilon \mathbf{e}^{\kappa +1} &{} -w_{\kappa +1}\epsilon &{} 0 \\ \vdots &{} \vdots &{} \vdots \\ -\epsilon \mathbf{e}^{\eta } &{} -w_\eta \epsilon &{} 0 \end{pmatrix}, $$

where the third matrix is constructed by subtracting the first row to each of row \(\kappa +2\) to \(\eta +1\) (i.e. those corresponding to \(\tilde{p}^i-p^0\) for \(i > \kappa \)), and is taken to mean congruency with respect to elementary row operations. If we permute the last column (corresponding to the z variable) to the first column, we observe that the resulting matrix is upper triangular with a nonzero diagonal, and so has full row rank. Therefore, the starting matrix also has full row rank, as we only applied elementary row operations, and therefore the \(\eta +2\) points are affinely independent, giving the result.    \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Anderson, R., Huchette, J., Tjandraatmadja, C., Vielma, J.P. (2019). Strong Mixed-Integer Programming Formulations for Trained Neural Networks. In: Lodi, A., Nagarajan, V. (eds) Integer Programming and Combinatorial Optimization. IPCO 2019. Lecture Notes in Computer Science(), vol 11480. Springer, Cham. https://doi.org/10.1007/978-3-030-17953-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-17953-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-17952-6

  • Online ISBN: 978-3-030-17953-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics