Advertisement

Constraints

, Volume 23, Issue 3, pp 296–309 | Cite as

Deep neural networks and mixed integer linear optimization

  • Matteo Fischetti
  • Jason Jo
Article
  • 556 Downloads
Part of the following topical collections:
  1. Topical Collection on Integration of Constraint Programming, Artificial Intelligence, and Operations Research

Abstract

Deep Neural Networks (DNNs) are very popular these days, and are the subject of a very intense investigation. A DNN is made up of layers of internal units (or neurons), each of which computes an affine combination of the output of the units in the previous layer, applies a nonlinear operator, and outputs the corresponding value (also known as activation). A commonly-used nonlinear operator is the so-called rectified linear unit (ReLU), whose output is just the maximum between its input value and zero. In this (and other similar cases like max pooling, where the max operation involves more than one input value), for fixed parameters one can model the DNN as a 0-1 Mixed Integer Linear Program (0-1 MILP) where the continuous variables correspond to the output values of each unit, and a binary variable is associated with each ReLU to model its yes/no nature. In this paper we discuss the peculiarity of this kind of 0-1 MILP models, and describe an effective bound-tightening technique intended to ease its solution. We also present possible applications of the 0-1 MILP model arising in feature visualization and in the construction of adversarial examples. Computational results are reported, aimed at investigating (on small DNNs) the computational performance of a state-of-the-art MILP solver when applied to a known test case, namely, hand-written digit recognition.

Keywords

Deep neural networks Mixed-integer programming Deep learning Mathematical optimization Computational experiments 

Notes

Acknowledgements

The research of the first author was partially funded by the Vienna Science and Technology Fund (WWTF) through project ICT15-014, any by MiUR, Italy, through project PRIN2015 “Nonlinear and Combinatorial Aspects of Complex Networks”. The research of the second author was funded by the Institute for Data Valorization (IVADO), Montreal. We thank Yoshua Bengio and Andrea Lodi for helpful discussions.

References

  1. 1.
    Belotti, P., Bonami, P., Fischetti, M., Lodi, A., Monaci, M., Nogales-Gomez, A., Salvagnin, D. (2016). On handling indicator constraints in mixed integer programming. Computational Optimization and Applications, 65, 545–566.MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Cheng, C.-H., Nührenberg, G., Ruess, H. (2017). Maximum resilience of artificial neural networks. In D’Souza, D., & Narayan Kumar, K. (Eds.) Automated technology for verification and analysis (pp. 251–268). Cham: Springer International Publishing.Google Scholar
  3. 3.
    Le Cun, Y.L., Bottou, L., Bengio, Y., Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of IEEE, 86(11), 2278–2324.CrossRefGoogle Scholar
  4. 4.
    Erhan, D., Bengio, Y, Courville, A., Vincent, P. (2009). Visualizing higher-layer features of a deep network.Google Scholar
  5. 5.
    Fischetti, M. (2016). Fast training of support vector machines with Gaussian kernel. Discrete Optimization, 22(Part A), 183–194. SI:ISCO 2014.MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Fischetti, M., & Lodi, A. (2003). Local branching. Mathematical Programming, 98(1-3), 23–47.MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Fischetti, M., & Monaci, M. (2014). Proximity search for 0-1 mixed-integer convex programming. Journal of Heuristics, 20(6), 709–731.CrossRefzbMATHGoogle Scholar
  8. 8.
    Goodfellow, I, Bengio, Y, Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org.
  9. 9.
    ILOG IBM. Cplex 12.7 user’s manual (2017).Google Scholar
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.E. (2017). Imagenet classification with deep convolutional neural networks. Communication of ACM, 60(6), 84–90.CrossRefGoogle Scholar
  11. 11.
    Nair, V., & Hinton, G.E. (2010). Rectified linear units improve restricted Boltzmann machines. In Fürnkranz, J, & Joachims, T (Eds.) Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 807–814): Omnipress.Google Scholar
  12. 12.
    Rothberg, E. (2007). An evolutionary algorithm for polishing mixed integer programming solutions. INFORMS Journal on Computing, 19(4), 534–541.CrossRefzbMATHGoogle Scholar
  13. 13.
    Serra, T., Tjandraatmadja, C., Ramalingam, S. (2017). Bounding and counting linear regions of deep neural networks. CoRR arXiv:1711.02114.
  14. 14.
    Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.J., Fergus, R. (2013). Intriguing properties of neural networks. CoRR arXiv:1312.6199.
  15. 15.
    Tjeng, V., & Tedrake, R. (2017). Verifying neural networks with mixed integer programming. CoRR arXiv:1711.07356.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Information Engineering (DEI)University of PadovaPaduaItaly
  2. 2.Montreal Institute for Learning Algorithms (MILA)MontrealCanada
  3. 3.Institute for Data Valorization (IVADO)MontrealCanada

Personalised recommendations