Skip to main content
Log in

Efficient differentiable quadratic programming layers: an ADMM approach

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

Recent advances in neural-network architecture allow for seamless integration of convex optimization problems as differentiable layers in an end-to-end trainable neural network. Integrating medium and large scale quadratic programs into a deep neural network architecture, however, is challenging as solving quadratic programs exactly by interior-point methods has worst-case cubic complexity in the number of variables. In this paper, we present an alternative network layer architecture based on the alternating direction method of multipliers (ADMM) that is capable of scaling to moderate sized problems with 100–1000 decision variables and thousands of training examples. Backward differentiation is performed by implicit differentiation of a customized fixed-point iteration. Simulated results demonstrate the computational advantage of the ADMM layer, which for medium scale problems is approximately an order of magnitude faster than the state-of-the-art layers. Furthermore, our novel backward-pass routine is computationally efficient in comparison to the standard approach based on unrolled differentiation or implicit differentiation of the KKT optimality conditions. We conclude with examples from portfolio optimization in the integrated prediction and optimization paradigm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The U.S stock data used for computational experiments 3 and 4 was obtained from Quandl https://data.nasdaq.com. The Famma-French factor data was obtained from the Kenneth R. French data library https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html.

References

  1. Agrawal, A., Brandon, A., Barratt, S., Boyd, S., Diamond, S., Kolter, J.Z: Differentiable convex optimization layers. In: Advances in Neural Information Processing Systems, volume 32, pages 9562–9574. Curran Associates, Inc., (2019)

  2. Agrawal, A., Barratt, S., Boyd, S., Busseti, E., Moursi, W.M.: Differentiating through a cone program, (2019). arXiv:1904.09043

  3. Amos, B., Kolter, Z.J.: Optnet: Differentiable optimization as a layer in neural networks, (2017). arXiv:1703.00443

  4. Amos, B., Rodriguez, Jimenez, I.D Sacks, J., Boots, B., Kolter, J.Z: Differentiable mpc for end-to-end planning and control (2019). arXiv:1810.13400

  5. Anderson, D.G.M.: Iterative procedures for nonlinear integral equations. J. ACM 12, 547–560 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  6. Black, F., Litterman, R.: Asset allocation combining investor views with market equilibrium. J. Fixed Income 1(2), 7–18 (1991)

    Article  Google Scholar 

  7. Blondel, M., Berthet, Q., Cuturi, M., Frostig, R., Hoyer, S., Llinares-Lopez, F., Pedregosa, F., Vert, J.-P.: Efficient and modular implicit differentiation, (2021). arXiv:2105.15183

  8. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004). https://doi.org/10.1017/CBO9780511804441

    Book  MATH  Google Scholar 

  9. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3, 1–122 (2011). https://doi.org/10.1561/2200000016

    Article  MATH  Google Scholar 

  10. Busseti, E., Moursi, W.M., Boyd, S.: Solution refinement at regular points of conic problems. Comput. Optim. Appl. 74(3), 627–643 (2019). https://doi.org/10.1007/s10589-019-00122-

    Article  MathSciNet  MATH  Google Scholar 

  11. Butler, A., Kwon, R.: Covariance estimation for risk-based portfolio optimization: an integrated approach. J. Risk, 24(2), (2021)

  12. Butler, A., Kwon, R.H.: Integrating prediction in mean-variance portfolio optimization, (2021). arXiv:2102.09287

  13. Cornuejols, G., Tutuncu, R.: Optimization Methods in Finance. Cambridge University Press, Cambridge (2007). https://doi.org/10.1017/CBO9780511753886

    Book  MATH  Google Scholar 

  14. Diamond, S., Sitzmann, V., Heide, F., Wetzstein, G.: Unrolled optimization with deep priors, (2017). arXiv:1705.08041

  15. Domke, J.: Generic methods for optimization-based modeling. In: Lawrence, Neil D., Girolami, M. (Eds), Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, volume 22 of Proceedings of Machine Learning Research, pages 318–326, La Palma, Canary Islands, 21–23 Apr (2012). PMLR. URL https://proceedings.mlr.press/v22/domke12.html

  16. Dontchev, A., Rockafellar, R.: Implicit Functions and Solution Mappings: A View from Variational Analysis. Springer, New York (2009)

    Book  MATH  Google Scholar 

  17. Donti, P., Amos, B., Zico Kolter, J.: Task-based end-to-end model learning in stochastic optimization. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5484–5494. Curran Associates Inc., New York (2017)

    Google Scholar 

  18. Fama, Eugene F., French, K.R.: A five-factor asset pricing model. J. Financ. Econ. 116(1), 1–22 (2015)

    Article  Google Scholar 

  19. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2, 17–40 (1976)

    Article  MATH  Google Scholar 

  20. Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. ESAIM: Math. Model. Numer. Anal. Modélisation Mathématique et Analyse Numérique 9(2), 41–76 (1975)

    MATH  Google Scholar 

  21. Goldfarb, D., Liu, S.: An o(n3l) primal interior point algorithm for convex quadratic programming. Math. Program. 49, 325–340 (1991)

    Article  MATH  Google Scholar 

  22. Ho, M., Sun, Z., Xin, J.: Weighted elastic net penalized mean-variance portfolio design and computation. SIAM J. Financ. Math. 6(1), 1220–1244 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  23. Kim, S.-J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: An interior-point method for large-scale l1-regularized least squares. Sel. Top. Signal Process. IEEE J. 1, 606–617 (2008). https://doi.org/10.1109/JSTSP.2007.910971

    Article  Google Scholar 

  24. Mahapatruni, R.S.G., Gray, A.: Cake: convex adaptive kernel density estimation. In: Gordon, G., Dunson, D., Dudik, M. (Eds), Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, pages 498–506, Fort Lauderdale, FL, USA, 11–13 (Apr 2011). PMLR. URL https://proceedings.mlr.press/v15/mahapatruni11a.html

  25. Mandi, J., Guns, T.: Interior point solving for lp-based prediction+optimisation, (2020). arXiv:2010.13943

  26. Mandi, J., Demirovic, E., Stuckey, P.J., Guns, T.: Smart predict-and-optimize for hard combinatorial optimization problems, (2019). arXiv:1911.10092

  27. Markowitz, H.: Portfolio selection. J. Financ. 7(1), 77–91 (1952)

    Google Scholar 

  28. Michaud, R., Michaud, R.: Estimation error and portfolio optimization: a resampling solution. J. Invest. Manag. 6(1), 8–28 (2008)

    Google Scholar 

  29. O’Donoghue, B., Chu, E., Parikh, N., Boyd, S.: Conic optimization via operator splitting and homogeneous self-dual embedding, (2016)

  30. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)

    Article  MATH  Google Scholar 

  31. Schubiger, M., Banjac, G., Lygeros, J.: Gpu acceleration of admm for large-scale quadratic programming. J. Parallel Distrib. Comput. 144, 55–67 (2020)

    Article  Google Scholar 

  32. Sopasakis, P., Menounou, K., Patrinos, P.: Superscs: fast and accurate large-scale conic optimization, (2019). arXiv:1903.06477

  33. Stellato, B., Banjac, G., Goulart, P., Bemporad, A., Boyd, S.: OSQP: An operator splitting solver for quadratic programs. Math. Program. Comput. 12(4), 637–672 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  34. Tibshirani, Robert: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58(1), 267 (1996)

    MathSciNet  MATH  Google Scholar 

  35. Tikhonov, A.N.: Solution of incorrectly formulated problemsand the regularization method. Soviet Math. pp 1035–1038 (1963)

  36. Uysal, A.S, Li, X., Mulvey, J.M.: End-to-end risk budgeting portfolio optimization with neural networks, (2021). arXiv:2107.04636

  37. Walker, H., Ni, P.: Anderson acceleration for fixed-point iterations. SIAM J. Numer. Anal. 49, 1715–1735 (2011). https://doi.org/10.2307/23074353

    Article  MathSciNet  MATH  Google Scholar 

  38. Xie, X., Wu, J., Zhong, Z., Liu, G., Lin, Z.: Differentiable linearized ADMM, (2019). arXiv:1905.06179

  39. Yang, Y., Sun, J., Li, H., Xu, Z.: Admm-net: A deep learning approach for compressive sensing MRI (2017). arXiv:1705.06869

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roy H. Kwon.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A Appendix

A Appendix

1.1 A.1 Proof of Proposition 1

We define \(\mathbf{v}^k = \mathbf{x}^{k+1} +\varvec{\mu }^k\). We can therefore express Equation (14b) as:

$$\begin{aligned} \mathbf{z}^{k+1} = \Pi ( \mathbf{x}^{k+1} + {\varvec{\mu }}^k ) = \Pi ( \mathbf{v}^k ), \end{aligned}$$
(42)

and Equation (14c) as:

$$\begin{aligned} {\varvec{\mu }}^{k+1} = {\varvec{\mu }}^{k} + \mathbf{x}^{k+1} - \mathbf{z}^{k+1} = \mathbf{v}^k - \Pi ( \mathbf{v}^k ). \end{aligned}$$
(43)

Substituting Equations (42) and (43) into Equation (14a) gives the desired fixed-point iteration:

$$\begin{aligned} \begin{bmatrix} \mathbf{v}^{k+1}\\ \varvec{\eta }^{k+2} \end{bmatrix}&= \begin{bmatrix} \mathbf{x}^{k+2} +\varvec{\mu }^{k+1}\\ \varvec{\eta }^{k+2} \end{bmatrix} \end{aligned}$$
(44)
$$\begin{aligned}&= -\begin{bmatrix} \mathbf{Q}+ \rho \mathbf{I}_{\mathbf{v}} &{} \mathbf{A}^T \\ \mathbf{A}&{} 0 \end{bmatrix}^{-1} \begin{bmatrix} \mathbf{p}- \rho (\mathbf{z}^{k+1} - \varvec{\mu }^{k+1})\\ -\mathbf{b}\end{bmatrix} + \begin{bmatrix} \varvec{\mu }^{k+1}\\ 0 \end{bmatrix} \end{aligned}$$
(45)
$$\begin{aligned}&= -\begin{bmatrix} \mathbf{Q}+ \rho \mathbf{I}_{\mathbf{v}} &{} \mathbf{A}^T \\ \mathbf{A}&{} 0 \end{bmatrix}^{-1} \begin{bmatrix} \mathbf{p}- \rho (2\Pi (\mathbf{v}^{k}) - \mathbf{v}^{k})\\ -\mathbf{b}\end{bmatrix} + \begin{bmatrix} \mathbf{v}^{k}\\ \varvec{\eta }^{k+1} \end{bmatrix} - \begin{bmatrix} \Pi (\mathbf{v}^{k})\\ \varvec{\eta }^{k+1} \end{bmatrix}. \end{aligned}$$
(46)

1.2 A.2 Proof of Proposition 2

We define \(F :{\mathbb {R}}^{d_{v}} \times {\mathbb {R}}^{d_\eta } \rightarrow {\mathbb {R}}^{d_{v}} \times {\mathbb {R}}^{d_\eta }\) as:

$$\begin{aligned} F(\mathbf{v},\varvec{\eta }) = -\begin{bmatrix} \mathbf{Q}+ \rho \mathbf{I}_{\mathbf{v}} &{} \mathbf{A}^T \\ \mathbf{A}&{} 0 \end{bmatrix}^{-1} \begin{bmatrix} \mathbf{p}- \rho (2\Pi (\mathbf{v}) - \mathbf{v})\\ -\mathbf{b}\end{bmatrix} + \begin{bmatrix} \mathbf{v}\\ \varvec{\eta }\end{bmatrix} - \begin{bmatrix} \Pi (\mathbf{v})\\ \varvec{\eta }\end{bmatrix}, \end{aligned}$$
(47)

and let

$$\begin{aligned} \mathbf{M}= \begin{bmatrix} \mathbf{Q}+ \rho \mathbf{I}_{\mathbf{v}} &{} \mathbf{A}^T \\ \mathbf{A}&{} 0 \end{bmatrix}. \end{aligned}$$
(48)

Therefore we have

$$\begin{aligned} \mathbf{M}F(\mathbf{v},\varvec{\eta }) = - \begin{bmatrix} \mathbf{p}- \rho (2\Pi (\mathbf{v}) - \mathbf{v})\\ -\mathbf{b}\end{bmatrix} + \mathbf{M}\begin{bmatrix} \mathbf{v}\\ \varvec{\eta }\end{bmatrix} - \mathbf{M}\begin{bmatrix} \Pi (\mathbf{v})\\ \varvec{\eta }\end{bmatrix}. \end{aligned}$$
(49)

Taking the partial differentials of Equation (49) with respect to the relevant problem variables therefore gives:

$$\begin{aligned} \mathbf{M}\partial F(\mathbf{v},\varvec{\eta })&= - \begin{bmatrix} \partial \mathbf{p}\\ -\partial \mathbf{b}\end{bmatrix} + \partial \mathbf{M}\begin{bmatrix} \mathbf{v}\\ \varvec{\eta }\end{bmatrix} - \partial \mathbf{M}\begin{bmatrix} \Pi (\mathbf{v})\\ \varvec{\eta }\end{bmatrix} -\partial \mathbf{M}F(\mathbf{v},\varvec{\eta })\nonumber \\&= - \begin{bmatrix} \partial \mathbf{p}\\ -\partial \mathbf{b}\end{bmatrix} -\partial \mathbf{M}\Bigg [-\mathbf{M}^{-1} \begin{bmatrix} \mathbf{p}- \rho (2\Pi (\mathbf{v}) - \mathbf{v})\\ -\mathbf{b}\end{bmatrix} \Bigg ]\nonumber \\&= - \begin{bmatrix} \partial \mathbf{p}\\ -\partial \mathbf{b}\end{bmatrix} -\partial \mathbf{M}\begin{bmatrix} \mathbf{x}^* \\ \varvec{\eta }^* \end{bmatrix}\nonumber \\&= - \begin{bmatrix} \partial \mathbf{p}+ \frac{1}{2} (\partial \mathbf{Q}\mathbf{x}^* + \partial \mathbf{Q}^T\mathbf{x}^*) + \partial \mathbf{A}^T\varvec{\eta }^* \\ -\partial \mathbf{b}+ \partial \mathbf{A}\mathbf{x}^* \end{bmatrix}. \end{aligned}$$
(50)

From Equation (49) we have that the differential \(\partial F(\mathbf{v},\varvec{\eta })\) is given by:

$$\begin{aligned} \partial F(\mathbf{v},\varvec{\eta }) = - \mathbf{M}^{-1} \begin{bmatrix} \partial \mathbf{p}+ \frac{1}{2} (\partial \mathbf{Q}\mathbf{x}^* + \partial \mathbf{Q}^T\mathbf{x}^*) +\partial \mathbf{A}^T\varvec{\eta }^* \\ -\partial \mathbf{b}+ \partial \mathbf{A}\mathbf{x}^* \end{bmatrix}. \end{aligned}$$
(51)

Substituting the gradient action of Equation (51) into Equation (26) and taking the left matrix-vector product of the transposed Jacobian with the previous backward-pass gradient, \(\frac{\partial \ell }{\partial \mathbf{z}^*}\), gives the desired result.

$$\begin{aligned} \begin{bmatrix} \hat{ \mathbf{d}}_{\mathbf{x}} \\ \hat{ \mathbf{d}}_{\varvec{\eta }} \end{bmatrix} = \begin{bmatrix} \mathbf{Q}+ \rho \mathbf{I}_{\mathbf{v}} &{} \mathbf{A}^T \\ \mathbf{A}&{} 0 \end{bmatrix}^{-1} \Big [\mathbf{I}_{\tilde{\mathbf{v}}} - \nabla _{\tilde{\mathbf{v}}} F(\tilde{\mathbf{v}}(\varvec{\theta }),\varvec{\theta }) \Big ]^{-T} \begin{bmatrix} D\Pi (\mathbf{v}) &{} 0\\ 0 &{} \mathbf{I}_{\varvec{\eta }} \end{bmatrix} \begin{bmatrix} \big ( - \frac{\partial \ell }{\partial \mathbf{z}^*} \big )^T \\ 0 \end{bmatrix}. \end{aligned}$$
(52)

From Equation (24) we have:

$$\begin{aligned} \mathbf{I}_{\tilde{\mathbf{v}}} - \nabla _{\tilde{\mathbf{v}}} F(\tilde{\mathbf{v}}(\varvec{\theta }),\varvec{\theta }) =\begin{bmatrix} \mathbf{Q}+ \rho \mathbf{I}_{\mathbf{v}} &{} \mathbf{A}^T \\ \mathbf{A}&{} \mathbf{0}\end{bmatrix}^{-1} \begin{bmatrix} - \rho (2D\Pi (\mathbf{v}) - \mathbf{I}_{\mathbf{v}}) &{} 0\\ 0 &{} 0 \end{bmatrix} + \begin{bmatrix} D\Pi (\mathbf{v}) &{} 0\\ 0 &{} \mathbf{I}_{\varvec{\eta }} \end{bmatrix}. \end{aligned}$$
(53)

Simplifying Equation (52) with Equation (53) yields the final expression:

$$\begin{aligned} \begin{bmatrix} \hat{ \mathbf{d}}_{\mathbf{x}} \\ \hat{ \mathbf{d}}_{\varvec{\eta }} \end{bmatrix}&= \Bigg [\begin{bmatrix} D\Pi (\mathbf{v}) &{} 0\\ 0 &{} \mathbf{I}_{\varvec{\eta }} \end{bmatrix} \begin{bmatrix} \mathbf{Q}+ \rho \mathbf{I}_{\mathbf{v}} &{} \mathbf{A}^T \\ \mathbf{A}&{} 0 \end{bmatrix} + \begin{bmatrix} - \rho (2D\Pi (\mathbf{v}) - \mathbf{I}_{\mathbf{v}}) &{} 0\\ 0 &{} 0 \end{bmatrix}\Bigg ]^{-1}\nonumber \\&\quad \begin{bmatrix} D\Pi (\mathbf{v}) &{} 0\\ 0 &{} \mathbf{I}_{\varvec{\eta }} \end{bmatrix} \begin{bmatrix} \big ( - \frac{\partial \ell }{\partial \mathbf{z}^*} \big )^T \\ 0 \end{bmatrix}. \end{aligned}$$
(54)

1.3 A.3 Proof of Proposition 3

From the KKT system of equations (20) we have:

$$\begin{aligned} \mathbf{G}^T{{\,\mathrm{diag}\,}}(\tilde{\varvec{\lambda }}^*) \hat{ \mathbf{d}}_{\varvec{\lambda }} ={{\,\mathrm{diag}\,}}(\rho \varvec{\mu }^*) \hat{ \mathbf{d}}_{\varvec{\lambda }} =\Big ( - \Big (\frac{\partial \ell }{\partial \mathbf{z}^*}\Big )^T -\mathbf{Q}\hat{ \mathbf{d}}_{\mathbf{x}} - \mathbf{A}^T \hat{ \mathbf{d}}_{\varvec{\eta }} \Big ). \end{aligned}$$
(55)

From Equation (21) it follows that:

$$\begin{aligned} \frac{\partial \ell }{\partial \mathbf{l}} \ne 0 \iff \varvec{\lambda }^*_-> 0 \quad \text {and} \quad \frac{\partial \ell }{\partial \mathbf{u}} \ne 0 \iff \varvec{\lambda }^*_+ > 0, \end{aligned}$$
(56)

and therefore Equation (55) uniquely determines the relevant non-zero gradients. Let \(\tilde{\varvec{\mu }}^*\) be as defined by Equation (29), then it follows that:

$$\begin{aligned} \hat{ \mathbf{d}}_{\varvec{\lambda }} = {{\,\mathrm{diag}\,}}(\rho \tilde{\varvec{\mu }}^*)^{-1} \Big ( - \Big (\frac{\partial \ell }{\partial \mathbf{z}^*}\Big )^T -\mathbf{Q}\hat{ \mathbf{d}}_{\mathbf{x}} - \mathbf{A}^T \hat{ \mathbf{d}}_{\varvec{\eta }} \Big ). \end{aligned}$$
(57)

Substituting \(\hat{ \mathbf{d}}_{\varvec{\lambda }}\) into Equation (21) gives the desired gradients.

1.4 A.4 Data Summary

See Table 3.

Table 3 U.S. stock data, sorted by GICS Sector. Data provided by Quandl

1.5 A.5 Experiment 1: relative performance

See Table 4.

Table 4 Computational performance of ADMM KKT, ADMM Unroll, Optnet and SCS relative to ADMM FP for various problem sizes, \(d_z\), and stopping tolerances. Batch size \(=128\)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Butler, A., Kwon, R.H. Efficient differentiable quadratic programming layers: an ADMM approach. Comput Optim Appl 84, 449–476 (2023). https://doi.org/10.1007/s10589-022-00422-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-022-00422-7

Keywords

Navigation