Data-driven inverse optimization with imperfect information

Abstract

In data-driven inverse optimization an observer aims to learn the preferences of an agent who solves a parametric optimization problem depending on an exogenous signal. Thus, the observer seeks the agent’s objective function that best explains a historical sequence of signals and corresponding optimal actions. We focus here on situations where the observer has imperfect information, that is, where the agent’s true objective function is not contained in the search space of candidate objectives, where the agent suffers from bounded rationality or implementation errors, or where the observed signal-response pairs are corrupted by measurement noise. We formalize this inverse optimization problem as a distributionally robust program minimizing the worst-case risk that the predicted decision (i.e., the decision implied by a particular candidate objective) differs from the agent’s actual response to a random signal. We show that our framework offers rigorous out-of-sample guarantees for different loss functions used to measure prediction errors and that the emerging inverse optimization problems can be exactly reformulated as (or safely approximated by) tractable convex programs when a new suboptimality loss function is used. We show through extensive numerical tests that the proposed distributionally robust approach to inverse optimization attains often better out-of-sample performance than the state-of-the-art approaches.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. 1.

    Proposition 4.4 readily extends to the case \(n+m = 2p\) at the expense of additional notation by leveraging [21, Theorem 2].

  2. 2.

    A possible choice is \(\beta _N=\exp (-\sqrt{N})\).

  3. 3.

    Strictly speaking, if \(\Theta \) is an \(\infty \)-norm ball of the form (16a), then (18) can be viewed as a family of 2n finite conic programs because \(\Theta \) is non-convex but decomposes into 2n convex polytopes.

  4. 4.

    We did not consider the ERM approach in Sect. 5 because it coincides with the VI approach for linear hypotheses.

References

  1. 1.

    Ackerberg, D., Benkard, C.L., Berry, S., Pakes, A.: Econometric tools for analyzing market outcomes. In: Heckman, J., Leamer, E. (eds.) Handbook of Econometrics, pp. 4171–4276. Elsevier, Amsterdam (2007)

    Google Scholar 

  2. 2.

    Ahmed, S., Guan, Y.: The inverse optimal value problem. Math. Program. A 102, 91–110 (2005)

    MathSciNet  Article  MATH  Google Scholar 

  3. 3.

    Ahuja, R.K., Orlin, J.B.: A faster algorithm for the inverse spanning tree problem. J. Algorithms 34, 177–193 (2000)

    MathSciNet  Article  MATH  Google Scholar 

  4. 4.

    Ahuja, R.K., Orlin, J.B.: Inverse optimization. Oper. Res. 49, 771–783 (2001)

    MathSciNet  Article  MATH  Google Scholar 

  5. 5.

    Anderson, B.D., Moore, J.B.: Optimal Control: Linear Quadratic Methods. Courier Corporation, North Chelmsford (2007)

    Google Scholar 

  6. 6.

    Aswani, A., Shen, Z.-J.M., Siddiq, A.: Inverse optimization with noisy data. Preprint arXiv:1507.03266 (2015)

  7. 7.

    Bajari, P., Benkard, C.L., Levin, J.: Estimating dynamic models of imperfect competition. Technical Report, National Bureau of Economic Research (2004)

  8. 8.

    Ben-Ayed, O., Blair, C.E.: Computational difficulties of bilevel linear programming. Oper. Res. 38, 556–560 (1990)

    MathSciNet  Article  MATH  Google Scholar 

  9. 9.

    Ben-Tal, A., Ghaoui, El, Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)

    Google Scholar 

  10. 10.

    Bertsimas, D., Gupta, V., Paschalidis, I.C.: Inverse optimization: a new perspective on the Black–Litterman model. Oper. Res. 60, 1389–1403 (2012)

    MathSciNet  Article  MATH  Google Scholar 

  11. 11.

    Bertsimas, D., Gupta, V., Paschalidis, I.C.: Data-driven estimation in equilibrium using inverse optimization. Math. Program. 153, 595–633 (2015)

    MathSciNet  Article  MATH  Google Scholar 

  12. 12.

    Boissard, E.: Simple bounds for convergence of empirical and occupation measures in 1-Wasserstein distance. Electron. J. Probab. 16, 2296–2333 (2011)

    MathSciNet  Article  MATH  Google Scholar 

  13. 13.

    Bottasso, C.L., Prilutsky, B.I., Croce, A., Imberti, E., Sartirana, S.: A numerical procedure for inferring from experimental data the optimization cost functions using a multibody model of the neuro-musculoskeletal system. Multibody Syst. Dyn. 16, 123–154 (2006)

    MathSciNet  Article  MATH  Google Scholar 

  14. 14.

    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2009)

    Google Scholar 

  15. 15.

    Braess, D., Nagurney, A., Wakolbinger, T.: On a paradox of traffic planning. Transp. Sci. 39, 446–450 (2005)

    Article  Google Scholar 

  16. 16.

    Bubeck, S.: Convex optimization: algorithms and complexity. Found. Trends Mach. Learn. 8, 231–357 (2015)

    Article  MATH  Google Scholar 

  17. 17.

    Burton, D., Toint, P.L.: On an instance of the inverse shortest paths problem. Math. Program. 53, 45–61 (1992)

    MathSciNet  Article  MATH  Google Scholar 

  18. 18.

    Carr, S., Lovejoy, W.: The inverse newsvendor problem: choosing an optimal demand portfolio for capacitated resources. Manag. Sci. 46, 912–927 (2000)

    Article  MATH  Google Scholar 

  19. 19.

    Chan, T.C., Craig, T., Lee, T., Sharpe, M.B.: Generalized inverse multiobjective optimization with application to cancer therapy. Oper. Res. 62, 680–695 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  20. 20.

    Faragó, A., Szentesi, Á., Szviatovszki, B.: Inverse optimization in high-speed networks. Discrete Appl. Math. 129, 83–98 (2003)

    MathSciNet  Article  MATH  Google Scholar 

  21. 21.

    Fournier, N., Guillin, A.: On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Relat. Fields 162, 707–738 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  22. 22.

    Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, Berlin (2001)

    Google Scholar 

  23. 23.

    Harker, P., Pang, J.: Finite-dimensional variational inequality and nonlinear complementarity problems: a survey of theory, algorithms and applications. Math. Program. 48, 161–220 (1990)

    MathSciNet  Article  MATH  Google Scholar 

  24. 24.

    Heuberger, C.: Inverse combinatorial optimization: a survey on problems, methods, and results. J. Comb. Optim. 8, 329–361 (2004)

    MathSciNet  Article  MATH  Google Scholar 

  25. 25.

    Hochbaum, D.S.: Efficient algorithms for the inverse spanning-tree problem. Oper. Res. 51, 785–797 (2003)

    MathSciNet  Article  MATH  Google Scholar 

  26. 26.

    Iyengar, G., Kang, W.: Inverse conic programming with applications. Oper. Res. Lett. 33, 319–330 (2005)

    MathSciNet  Article  MATH  Google Scholar 

  27. 27.

    Keshavarz, A., Wang, Y., Boyd, S.: Imputing a convex objective function. In: IEEE International Symposium on Intelligent Control, pp. 613–619 (2011)

  28. 28.

    Lofberg, J.: YALMIP: a toolbox for modeling and optimization in MATLAB. In: IEEE International Symposium on Computer Aided Control Systems Design, pp. 284 –289 (2004)

  29. 29.

    Markowitz, H .M.: Portfolio Selection: Efficient Diversification of Investments. Yale University Press, New Haven (1968)

    Google Scholar 

  30. 30.

    Mohajerin Esfahani, P., Kuhn, D.: Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Math. Program. (2017). https://doi.org/10.1007/s10107-017-1172-1

  31. 31.

    Murty, K.G., Kabadi, S.N.: Some NP-complete problems in quadratic and nonlinear programming. Math. Program. 39, 117–129 (1987)

    MathSciNet  Article  MATH  Google Scholar 

  32. 32.

    Neumann-Denzau, G., Behrens, J.: Inversion of seismic data using tomographical reconstruction techniques for investigations of laterally inhomogeneous media. Geophys. J. Int. 79, 305–315 (1984)

    Article  Google Scholar 

  33. 33.

    Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)

    Article  Google Scholar 

  34. 34.

    Saez-Gallego, J., Morales, J.M., Zugno, M., Madsen, H.: A data-driven bidding model for a cluster of price-responsive consumers of electricity. IEEE Trans. Power Syst. 31, 5001–5011 (2016)

    Article  Google Scholar 

  35. 35.

    Schaefer, A.J.: Inverse integer programming. Optim. Lett. 3, 483–489 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  36. 36.

    Shafieezadeh-Abadeh, S., Kuhn, D., Mohajerin Esfahani, P.: Regularization via mass transportation. Preprint arXiv:1710.10016 (2017)

  37. 37.

    Shafieezadeh-Abadeh, S., Esfahani, P.M., Kuhn, D.: Distributionally robust logistic regression. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 1576–1584. Curran Associates, Inc. (2015)

  38. 38.

    Sion, M.: On general minimax theorems. Pac. J. Math. 8, 171–176 (1958)

    MathSciNet  Article  MATH  Google Scholar 

  39. 39.

    Terekhov, A.V., Pesin, Y.B., Niu, X., Latash, M.L., Zatsiorsky, V.M.: An analytical approach to the problem of inverse optimization with additive objective functions: an application to human prehension. J. Math. Biol. 61, 423–453 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  40. 40.

    Troutt, M.D., Pang, W.-K., Hou, S.-H.: Behavioral estimation of mathematical programming objective function coefficients. Manag. Sci. 52, 422–434 (2006)

    Article  MATH  Google Scholar 

  41. 41.

    Wang, L.: Cutting plane algorithms for the inverse mixed integer linear programming problem. Oper. Res. Lett. 37, 114–116 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  42. 42.

    Woodhouse, J.H., Dziewonski, A.M.: Mapping the upper mantle: three-dimensional modeling of earth structure by inversion of seismic waveforms. J. Geophys. Res. 89, 5953–5986 (1984)

    Article  Google Scholar 

  43. 43.

    Zhang, J., Xu, C.: Inverse optimization for linearly constrained convex separable programming problems. Eur. J. Oper. Res. 200, 671–679 (2010)

    MathSciNet  Article  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported by the Swiss National Science Foundation grant BSCGI0_157733.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Daniel Kuhn.

Appendix A. Convex hypotheses

Appendix A. Convex hypotheses

Consider the class \(\mathcal {F}\) of hypotheses of the form \(F_\theta (s,x):=\big \langle \theta , \Psi (x) \big \rangle \), where each component function of the feature map \(\Psi :{\mathbb {R}}^n\rightarrow {\mathbb {R}}^d\) is convex, and where the weight vector \(\theta \) ranges over a convex closed search space \(\Theta \subseteq {\mathbb {R}}^d_+\). Thus, by construction, \(F_\theta (s,x)\) is convex in x for every fixed \(\theta \in \Theta \). In the remainder we will assume without much loss of generality that the transformation from the signal-response pair (sx) to the signal-feature pair \((s,\Psi (x))\) is Lipschitz continuous with Lipschitz modulus 1, that is, we require

$$\begin{aligned} \Vert (s_1,\Psi (x_1)) - (s_2,\Psi (x_2)) \Vert \le \Vert (s_1,x_1) - (s_2,x_2) \Vert \quad \forall (s_1,x_1), (s_2,x_2)\in {\mathbb {R}}^m\times {\mathbb {R}}^n.\nonumber \\ \end{aligned}$$
(34)

Before devising a safe conic approximation for the inverse optimization problem (13) with convex hypotheses, we recall that the conjugate of a function \(f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) is defined through \(f^*(z)=\sup _{x\in {\mathbb {R}}^n} \big \langle z, x \big \rangle -f(x)\).

Theorem A.1

(Convex hypotheses and suboptimality loss) Assume that \(\mathcal {F}\) represents the class of convex hypotheses induced by the feature map \(\Psi \) and with a convex closed search space \(\Theta \subseteq {\mathbb {R}}_+^d\) and that Assumption 5.1 holds. If the observer uses the suboptimality loss (4b) and measures risk using the \(\mathrm{CVaR}\) at level \(\alpha \in (0,1]\), then the following convex program provides a safe approximation for the distributionally robust inverse optimization problem (13) over the 1-Wasserstein ball:

$$\begin{aligned} \begin{array}{llll} \text {minimize} &{}\quad \displaystyle \tau +\frac{1}{\alpha }\left( {\varepsilon } \lambda + {1 \over N}\sum \limits _{i = 1}^{N} r_i\right) \\ \text {subject to} &{}\quad { \theta \in \Theta ,\;\; \lambda \in {\mathbb {R}}_+,\;\;\tau , r_i\in \mathbb R,\;\; \phi _{i1},\phi _{i2}\in {\mathcal {C}}^*, \;\;\gamma _{i} \in {\mathcal {K}}^* ,\;\; z_{ij}\in {\mathbb {R}}^n } &{}\quad \forall i \le N,\forall j\le d\\ &{}\quad \displaystyle \sum _{j=1}^d \theta _j \Psi ^*_j(z_{ij}/\theta _j) + \big \langle \theta , \Psi (\widehat{x}_i) \big \rangle + \big \langle \phi _{i1}, C \widehat{s}_i - d \big \rangle - \big \langle \gamma _i, H \widehat{s}_i + h \big \rangle \le r_i + \tau &{}\quad \forall i \le N \\ &{}\quad \displaystyle \sum _{j=1}^d z_{ij} = W^\top \gamma _i &{}\quad \forall i \le N\\ &{}\quad \big \langle C \widehat{s}_i-d, \phi _{i2} \big \rangle \le r_i &{}\quad \forall i \le N \\ &{}\quad \left\| \begin{pmatrix} H^\top \gamma _i - C^\top \phi _{i1} \\ \theta \end{pmatrix} \right\| _* \le \lambda ,\;\; \left\| \begin{pmatrix} C^\top \phi _{i2} \\ 0 \end{pmatrix} \right\| _* \le \lambda&\quad \forall i \le N. \end{array} \end{aligned}$$
(35)

Note that Theorem A.1 remains valid if \((\widehat{s}_i, \widehat{x}_i)\notin \Xi \) for some \(i\le N\).

Proof of Theorem A.1

As in the proof of Theorem 5.2 one can show that the objective function of the inverse optimization problem (13) is equivalent to (19). In the remainder, we derive a safe conic approximation for the (intractable) subordinate worst-case expectation problem

$$\begin{aligned} \sup _{\mathbb {Q}\in \mathbb {B}^{1}_{\varepsilon }({\widehat{\mathbb {P}}_N})} \mathbb {E}^\mathbb {Q}\big [\max \{\ell _\theta (s,x) - \tau , 0\} \big ]. \end{aligned}$$
(36)

To this end, note that the suboptimality loss \(\ell _\theta (s,x) = \big \langle \theta , \Psi (x) \big \rangle - \min _{y \in {\mathbb {X}}(s)} \big \langle \theta , \Psi (y) \big \rangle \) depends on x only through \(\Psi (x)\). This motivates us to define a lifted suboptimality loss \(\ell ^\Psi _\theta (s,\psi ) = \big \langle \theta , \psi \big \rangle - \min _{y \in {\mathbb {X}}(s)} \big \langle \theta , \Psi (y) \big \rangle \), where \(\psi \) represents an element of the feature space \({\mathbb {R}}^d\), and the empirical distribution \(\widehat{\mathbb {P}}_N^\Psi = \frac{1}{N}\sum _{i=1}^N \delta _{(\widehat{s}_i, \Psi (\widehat{x}_i))}\) of the signal-feature pairs. Moreover, we denote by \({\mathbb {B}}^1_{\varepsilon } (\widehat{\mathbb {P}}_N^\Psi )\) the 1-Wasserstein ball of all distributions on \({\mathbb {S}} \times {\mathbb {R}}^d\) that have a distance of at most \(\varepsilon \) from \(\widehat{\mathbb {P}}_N^\Psi \). In the following we will show that the worst-case expectation

$$\begin{aligned} \sup _{{\mathbb {Q}} \in {\mathbb {B}}^1_{\varepsilon } (\widehat{\mathbb {P}}_N^\Psi )} \mathbb {E}^\mathbb {Q}\big [\max \{\ell _\theta (s,\psi ) - \tau , 0\} \big ] \end{aligned}$$
(37)

on the signal-feature space \({\mathbb {S}} \times {\mathbb {R}}^d\) provides a tractable upper bound on the worst-case expectation (36). By Definition 4.3, each distribution \(\mathbb {Q}\in {\mathbb {B}}^1_{\varepsilon } (\widehat{\mathbb {P}}_N)\) corresponds to a transportation plan \(\Pi \), that is, a joint distribution of two signal-response pairs (sx) and \((s',x')\) under which (sx) has marginal distribution \(\mathbb {Q}\) and \((s',x')\) has marginal distribution \(\widehat{\mathbb {P}}_N\). By the law of total probability, the transportation plan can be expressed as \(\Pi =\frac{1}{N}\sum _{i=1}^N \delta _{(\widehat{s}_i,\widehat{x}_i)}\otimes \mathbb {Q}_i\), where \(\mathbb {Q}_i\) denotes the conditional distribution of (sx) given \((s',x')=(\widehat{s}_i,\widehat{x}_i)\), \(i\le N\), see also [30, Theorem 4.2]. Thus, we the worst-case expectation (36) satisfies

$$\begin{aligned}&\sup _{\mathbb {Q}\in \mathbb {B}^{1}_{\varepsilon }({\widehat{\mathbb {P}}_N})} \mathbb {E}^\mathbb {Q}\big [\max \{\ell _\theta (s,x) - \tau , 0\} \big ] \\&\quad =\sup \limits _{\mathbb {Q}^i}\frac{1}{N} \sum \limits _{i=1}^N \int _{\Xi } \max \{\ell _\theta (s,\Psi (x)) - \tau , 0\} \, \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}x) \\&\quad \text {s.t.}\frac{1}{N} \sum \limits _{i=1}^N \int _{\Xi } \Vert (s,x) - (\widehat{s}_i, \widehat{x}_i ) \Vert \, \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}x) \le \varepsilon \\&\quad \int _{\Xi } \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}x) = 1 \qquad \forall i \le N \\&\quad \le \sup \limits _{\mathbb {Q}^i}\frac{1}{N} \sum \limits _{i=1}^N \int _{\Xi } \max \{\ell _\theta (s,\Psi (x)) - \tau , 0\} \, \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}x) \\&\quad \text {s.t.}\frac{1}{N} \sum \limits _{i=1}^N \int _{\Xi } \Vert (s,\Psi (x)) - (\widehat{s}_i, \Psi (\widehat{x}_i) ) \Vert \, \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}x) \le \varepsilon \\&\quad \int _{\Xi } \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}x) = 1 \qquad \forall i \le N \\&\quad \le \sup \limits _{\mathbb {Q}^i}\frac{1}{N} \sum \limits _{i=1}^N \int _{{\mathbb {S}} \times {\mathbb {R}}^d} \max \{\ell _\theta (s,\psi ) - \tau , 0\} \, \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}\psi ) \\&\quad \text {s.t.}\frac{1}{N} \sum \limits _{i=1}^N \int _{{\mathbb {S}} \times {\mathbb {R}}^d} \Vert (s,\psi ) - (\widehat{s}_i, \Psi (\widehat{x}_i) ) \Vert \, \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}\psi ) \le \varepsilon \\&\quad \int _{{\mathbb {S}} \times {\mathbb {R}}^d} \mathbb {Q}^i(\mathrm {d}s, \mathrm {d}\psi ) = 1 \qquad \forall i \le N, \end{aligned}$$

where the first inequality follows from (34), while the second inequality follows from relaxing the implicit condition that the signal-feature pair \((s,\psi )\) must be supported on \(\{(s,\Psi (x)):(s,x)\in \Xi \}\subseteq \mathbb {S}\times {\mathbb {R}}^d\). Using a similar reasoning as before, the last expression is readily recognized as the worst-case expectation (37). Thus, (37) provides indeed an upper bound on (36). Duality arguments borrowed from [30, Theorem 4.2] imply that the infinite-dimensional linear program (37) admits a strong dual of the form

$$\begin{aligned} \begin{array}{cll} \inf \limits _{\lambda \ge 0, r_i} &{} \displaystyle \lambda \varepsilon + \frac{1}{N} \sum _{i=1}^N r_i &{} \\ \text {s.t.} &{} \sup \limits _{(s,y) \in \Xi , \psi \in {\mathbb {R}}^d} \big \langle \theta , \psi - \Psi (y) \big \rangle - \tau - \lambda \Vert (s,\psi ) - (\widehat{s}_i, \Psi (\widehat{x}_i)) \Vert \le r_i &{} \forall i \le N \\ &{}\sup \limits _{s \in {\mathbb {S}}, \psi \in {\mathbb {R}}^d} -\lambda \Vert (s,\psi ) - (\widehat{s}_i, \Psi (\widehat{x}_i)) \Vert \le r_i &{} \forall i \le N. \end{array} \end{aligned}$$
(38)

By using the definitions of \(\mathbb S\) and \(\mathbb {X}(s)\) put forth in Assumption 5.1, the i-th member of the first constraint group in (38) is satisfied if and only if the optimal value of the maximization problem

$$\begin{aligned} \begin{array}{cl} \sup \limits _{s,y,\psi } &{} \big \langle \theta , \psi - \Psi (y) \big \rangle - \tau - \lambda \Vert (s,\psi ) - (\widehat{s}_i, \Psi (\widehat{x}_i)) \Vert \\ \text {s.t.}&{} C s \succeq _{\mathcal {C}} d,\; Wy \succeq _{\mathcal {K}} Hs + h \end{array} \end{aligned}$$
(39)

does not exceed \(r_i\). A tedious but routine calculation shows that the dual of (39) can be represented as

$$\begin{aligned} \begin{array}{cll} \inf \limits _{p_i, q_i, \gamma _i, \phi _{i1}, z_{ij}} &{} \displaystyle \sum _{j=1}^d \theta _j \Psi ^*_j(z_{ij}/\theta _j) - \tau + \big \langle \theta , \Psi (\widehat{x}_i) \big \rangle + \big \langle \phi _{i1}, C \widehat{s}_i - d \big \rangle - \big \langle \gamma _i, H \widehat{s}_i + h \big \rangle \\ \text {s.t.} &{} \sum _{j=1}^d z_{ij} = W^\top \gamma _i \\ &{} \Vert (H^\top \gamma _i - C^\top \phi _{i1}, \theta ) \Vert _* \le \lambda \\ &{} \gamma _i \in \mathcal K^*, \phi _{i1} \in \mathcal C^*. \end{array} \end{aligned}$$
(40)

Note that the perspective functions \(\theta _j \Psi ^*_j(z_{ij}/\theta _j)\) in the objective of (40) emerge from taking the conjugate of \(\theta _j \Psi _j(y)\). Thus, for \(\theta _j=0\) we must interpret \(\theta _j \Psi ^*_j(z_{ij}/\theta _j)\) as an indicator function in \(z_{ij}\) which vanishes for \(z_{ij}=0\) and equals \(\infty \) otherwise. By weak duality, (40) provides an upper bound on (39). We conclude that the i-th member of the first constraint group in (38) is satisfied whenever the dual problem (40) has a feasible solution whose objective value does not exceed \(r_i\). A similar reasoning shows that the i-th member of the second constraint group in (38) holds if and only if there exists \(\phi _{i2} \in \mathcal {C}^*\) such that

$$\begin{aligned}&\begin{array}{clll} &{} \big \langle C\widehat{s}_i-d, \phi _{i2} \big \rangle \le r_i \quad \text {and} \quad \left\| \begin{pmatrix}C^\intercal \phi _{i2} \\ 0\end{pmatrix} \right\| _* \le \lambda . \end{array} \end{aligned}$$

Thus, the worst-case expectation (36) is bounded above by the optimal value of the finite convex program

$$\begin{aligned} \begin{array}{clll} \inf \limits _{} &{}\quad \displaystyle {\varepsilon } \lambda + {1 \over N}\sum \limits _{i = 1}^{N} r_i \\ \text {s.t.}&{}\quad { \lambda \in {\mathbb {R}}_+,\;r_i\in {\mathbb {R}},\; \phi _{i1},\phi _{i2}\in \mathcal {C}^*, \;\gamma _{i}\in \mathcal {K}^* } &{}\quad \forall i \le N \\ &{}\quad \sum _{j=1}^d \theta _j \Psi ^*_j(z_{ij}/\theta _j) + \big \langle \theta , \Psi (\widehat{x}_i) \big \rangle + \big \langle \phi _{i1}, C \widehat{s}_i - d \big \rangle - \big \langle \gamma _i, H \widehat{s}_i + h \big \rangle \le r_i + \tau &{}\quad \forall i \le N \\ &{}\quad \sum _{j=1}^d z_{ij} = W^\top \gamma _i &{}\quad \forall i \le N \\ &{}\quad \big \langle C \widehat{s}_i-d, \phi _{i2} \big \rangle \le r_i\\ &{}\quad \left\| \begin{pmatrix} H^\top \gamma _i - C^\top \phi _{i1} \\ \theta \end{pmatrix} \right\| _* \le \lambda ,\;\; \left\| \begin{pmatrix} C^\top \phi _{i2} \\ 0 \end{pmatrix} \right\| _* \le \lambda&\quad \forall i \le N. \end{array} \end{aligned}$$

The claim then follows by substituting this convex program into (19).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mohajerin Esfahani, P., Shafieezadeh-Abadeh, S., Hanasusanto, G.A. et al. Data-driven inverse optimization with imperfect information. Math. Program. 167, 191–234 (2018). https://doi.org/10.1007/s10107-017-1216-6

Download citation

Mathematics Subject Classification

  • C15 Stochastic programming
  • 90C25 Convex programming
  • 90C47 Minimax problems