Robust Budget Allocation Via Continuous Submodular Functions

Abstract

The optimal allocation of resources for maximizing influence, spread of information or coverage, has gained attention in the past years, in particular in machine learning and data mining. But in applications, the parameters of the problem are rarely known exactly, and using wrong parameters can lead to undesirable outcomes. We hence revisit a continuous version of the Budget Allocation or Bipartite Influence Maximization problem introduced by Alon et al. (in: WWW’12 - Proceedings of the 21st Annual Conference on World Wide, ACM, New York, 2012) from a robust optimization perspective, where an adversary may choose the least favorable parameters within a confidence set. The resulting problem is a nonconvex–concave saddle point problem (or game). We show that this nonconvex problem can be solved exactly by leveraging connections to continuous submodular functions, and by solving a constrained submodular minimization problem. Although constrained submodular minimization is hard in general, here, we establish conditions under which such a problem can be solved to arbitrary precision \(\varepsilon \).

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. 1.

    Adamczyk, M., Sviridenko, M., Ward, J.: Submodular stochastic probing on matroids. Math. Oper. Res. 41(3), 1022–1038 (2016). https://doi.org/10.1287/moor.2015.0766

    MathSciNet  Article  MATH  Google Scholar 

  2. 2.

    Alon, N., Gamzu, I., Tennenholtz, M.: Optimizing Budget Allocation Among Channels and Influencers. WWW, pp. 381–388. ACM, New York, NY (2012). https://doi.org/10.1145/2187836.2187888

  3. 3.

    Atamtürk, A., Narayanan, V.: Polymatroids and mean-risk minimization in discrete optimization. Oper. Res. Lett. 36(5), 618–622 (2008). https://doi.org/10.1016/j.orl.2008.04.006

    MathSciNet  Article  MATH  Google Scholar 

  4. 4.

    Bach, F.: Submodular functions: from discrete to continous domains. Math. Program. (2018). https://doi.org/10.1007/s10107-018-1248-6

    Article  Google Scholar 

  5. 5.

    Balkanski, E., Rubinstein, A., Singer, Y.: The power of optimization from samples. In: Proceedings of the NIPS, pp. 4017–4025 (2016)

  6. 6.

    Balkanski, E., Rubinstein, A., Singer, Y.: The limitations of optimization from samples. In: STOC (2017)

  7. 7.

    Becker, S.R., Candès, E.J., Grant, M.C.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3(3), 165–218 (2011)

    MathSciNet  Article  Google Scholar 

  8. 8.

    Ben-Tal, A., Nemirovski, A.: Robust solutions of linear programming problems contaminated with uncertain data. Math. Program. 88(3), 411–424 (2000). https://doi.org/10.1007/PL00011380

    MathSciNet  Article  MATH  Google Scholar 

  9. 9.

    Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)

    Google Scholar 

  10. 10.

    Bertsimas, D., Sim, M.: Robust discrete optimization and network flows. Math. Program. 98(1), 49–71 (2003)

    MathSciNet  Article  Google Scholar 

  11. 11.

    Bertsimas, D., Brown, D.B., Caramanis, C.: Theory and applications of robust optimization. SIAM Rev. 53(3), 464–501 (2011). https://doi.org/10.1137/080734510

    MathSciNet  Article  MATH  Google Scholar 

  12. 12.

    Best, M.J., Chakravarti, N.: Active set algorithms for isotonic regression. A unifying framework. Math. Program. 47(1–3), 425–439 (1990). https://doi.org/10.1007/BF01580873

    MathSciNet  Article  MATH  Google Scholar 

  13. 13.

    Bian, A.A., Mirzasoleiman, B., Buhmann, J.M., Krause, A.: Guaranteed non-convex optimization: submodular maximization over continuous domains. In: AISTATS (2017)

  14. 14.

    Birkhoff, G.: Rings of sets. Duke Math. J. 3(3), 443–454 (1937)

    MathSciNet  Article  Google Scholar 

  15. 15.

    Borgs, C., Brautbar, M., Chayes, J., Lucier, B.: Maximizing social influence in nearly optimal time. In: SODA, pp. 946–957. Philadelphia, PA, USA (2014)

  16. 16.

    Boyd, S., Kim, S.J., Vandenberghe, L., Hassibi, A.: A tutorial on geometric programming. Optim. Eng. 8(1), 67–127 (2007)

    MathSciNet  Article  Google Scholar 

  17. 17.

    Chakrabarty, D., Lee, Y.T., Sidford, A., Wong, S.C.W.: Subquadratic submodular function minimization. In: STOC (2017)

  18. 18.

    Chandrasekaran, V., Shah, P.: Relative entropy relaxations for signomial optimization. SIAM J. Optim. 26(2), 1147–1173 (2016). https://doi.org/10.1137/140988978

    MathSciNet  Article  MATH  Google Scholar 

  19. 19.

    Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: KDD, pp. 199–208 (2009)

  20. 20.

    Chen, W., Wang, C., Wang, Y.: Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: KDD, pp. 1029–1038. New York, NY, USA (2010). https://doi.org/10.1145/1835804.1835934

  21. 21.

    Chen, W., Lin, T., Tan, Z., Zhao, M., Zhou, X.: Robust influence maximization. In: KDD, pp. 795–804. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939745

  22. 22.

    Chiang, M.: Geometric programming for communication systems. Commun. Inf. Theory 2(1/2), 1–154 (2005). https://doi.org/10.1516/0100000005

    Article  MATH  Google Scholar 

  23. 23.

    Deshpande, A., Hellerstein, L., Kletenik, D.: Approximation algorithms for stochastic submodular set cover with applications to Boolean function evaluation and Min-Knapsack. ACM Trans. Algorithms 12(3), 42:1–42:28 (2016). https://doi.org/10.1145/2876506

  24. 24.

    Domingos, P., Richardson, M.: Mining the network value of customers. In: KDD, pp. 57–66 (2001)

  25. 25.

    Du, N., Song, L., Gomez Rodriguez, M., Zha, H.: Scalable influence estimation in continuous-time diffusion networks. In: NIPS, pp. 3147–3155 (2013)

  26. 26.

    Du, N., Liang, Y., Balcan, M.F., Song, L.: Influence function learning in information diffusion networks. In: ICML, pp. 2016–2024 (2014)

  27. 27.

    Dunn, J.C., Harshbarger, S.: Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62(2), 432–444 (1978)

    MathSciNet  Article  Google Scholar 

  28. 28.

    Ecker, J.: Geometric programming: methods, computations and applications. SIAM Rev. 22(3), 338–362 (1980). https://doi.org/10.1137/1022058

    MathSciNet  Article  MATH  Google Scholar 

  29. 29.

    Ene, A., Nguyen, H.L.: A reduction for optimizing lattice submodular functions with diminishing returns. arXiv:1606.08362 (2016)

  30. 30.

    Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Log. Quart. 3(1–2), 95–110 (1956). https://doi.org/10.1002/nav.3800030109

    MathSciNet  Article  Google Scholar 

  31. 31.

    Fujishige, S.: Submodular Functions and Optimization, vol. 58. Elsevier, Amsterdam (2005)

    Google Scholar 

  32. 32.

    Goel, G., Karande, C., Tripathi, P., Wang, L.: Approximability of combinatorial problems with multi-agent submodular cost functions. In: FOCS, pp. 755–764 (2009)

  33. 33.

    Goemans, M., Vondrák, J.: Stochastic Covering and Adaptivity. LATIN 2006: Theoretical Informatics, pp. 532–543. Springer, Berlin (2006). https://doi.org/10.1007/11682462_50

  34. 34.

    Golovin, D., Krause, A.: Adaptive submodularity: theory and applications in active learning and stochastic optimization. J. Artif. Intell. 42, 427–486 (2011)

    MathSciNet  MATH  Google Scholar 

  35. 35.

    Gomez Rodriguez, M., Leskovec, J., Krause, A.: Inferring networks of diffusion and influence. In: KDD, pp. 1019–1028. New York, NY, USA (2010). https://doi.org/10.1145/1835804.1835933

  36. 36.

    Gomez Rodriguez, M., Schölkopf, B.: Influence Maximization in Continuous Time Diffusion Networks. In: ICML (2012)

  37. 37.

    Gottschalk, C., Peis, B.: Submodular function maximization on the bounded integer lattice. In: Proceedings of the 13th International Workshop (WAOA) on Approximation and Online Algorithms (2015)

  38. 38.

    Hassidim, A., Singer, Y.: Submodular Optimization under Noise. In: Kale, S., Shamir, O. (eds.) Proceedings of the 2017 Conference on Learning Theory, Proceedings of Machine Learning Research, vol. 65, pp. 1069–1122. PMLR, Amsterdam, Netherlands (2017). http://proceedings.mlr.press/v65/hassidim17a.html

  39. 39.

    Hatano, D., Fukunaga, T., Maehara, T., Kawarabayashi, K.i.: Lagrangian decomposition algorithm for allocating marketing channels. In: AAAI, pp. 1144–1150 (2015)

  40. 40.

    He, X., Kempe, D.: Robust influence maximization. In: KDD, pp. 885–894. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939760

  41. 41.

    Iwata, S., Nagano, K.: Submodular function minimization under covering constraints. In: FOCS, pp. 671–680 (2009)

  42. 42.

    Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization. In: ICML, pp. 427–435 (2013)

  43. 43.

    Kempe, D., Kleinberg, J., Tardos, É.: Maximizing the spread of influence through a social network. In: KDD, pp. 137–146. New York, NY, USA (2003). https://doi.org/10.1145/956750.956769

  44. 44.

    Khachaturov, V.R., Khachaturov, R.V., Khachaturov, R.V.: Supermodular programming on finite lattices. Comput. Math. Math. Phys. 52(6), 855–878 (2012). https://doi.org/10.1134/S0965542512060097

    MathSciNet  Article  MATH  Google Scholar 

  45. 45.

    Kim, S., Kojima, M.: Exact solutions of some nonconvex quadratic optimization problems via SDP and SOCP relaxations. Comput. Optim. Appl. 26(2), 143–154 (2003). https://doi.org/10.1023/A:1025794313696

    MathSciNet  Article  MATH  Google Scholar 

  46. 46.

    Kolmogorov, V., Shioura, A.: New algorithms for convex cost tension problem with application to computer vision. Discrete Optim. 6, 378–393 (2009)

    MathSciNet  Article  Google Scholar 

  47. 47.

    Krause, A., McMahan, H.B., Guestrin, C., Gupta, A.: Robust submodular observation selection. J. Mach. Learn. Res. 9, 2761–2801 (2008)

    MATH  Google Scholar 

  48. 48.

    Lacoste-Julien, S.: Convergence rate of frank-wolfe for non-convex objectives. arXiv:1607.00345 (2016)

  49. 49.

    Lacoste-Julien, S., Jaggi, M.: On the global linear convergence of Frank-Wolfe optimization variants. In: NIPS, pp. 496–504 (2015)

  50. 50.

    Lee, Y.T., Sidford, A., Wong, S.C.w.: A faster cutting plane method and its implications for combinatorial and convex optimization. In: FOCS, pp. 1049–1065 (2015)

  51. 51.

    Lowalekar, M., Varakantham, P., Kumar, A.: Robust Influence Maximization: (Extended Abstract). In: AAMAS, pp. 1395–1396. Richland, SC (2016)

  52. 52.

    Maehara, T.: Risk averse submodular utility maximization. Oper. Res. Lett. 43(5), 526–529 (2015). https://doi.org/10.1016/j.orl.2015.08.001

    MathSciNet  Article  MATH  Google Scholar 

  53. 53.

    Maehara, T., Yabe, A., Kawarabayashi, K.i.: Budget allocation problem with multiple advertisers: a game theoretic view. In: ICML, pp. 428–437 (2015)

  54. 54.

    MOSEK ApS: MOSEK MATLAB Toolbox 8.0.0.57 (2015). http://docs.mosek.com/8.0/toolbox/index.html

  55. 55.

    Murota, K.: Discrete convex analysis. Math. Program. 83, 313–371 (2003)

    MathSciNet  MATH  Google Scholar 

  56. 56.

    Murota, K., Shioura, A.: Exact bounds for steepest descent algorithms of \(L\)-convex function minimization. Oper. Res. Lett. 42, 361–366 (2014)

    MathSciNet  Article  Google Scholar 

  57. 57.

    Nagano, K., Kawahara, Y., Aihara, K.: Size-constrained submodular minimization through minimum norm base. In: ICML, pp. 977–984 (2011)

  58. 58.

    Narasimhan, H., Parkes, D.C., Singer, Y.: Learnability of influence in networks. In: NIPS, pp. 3186–3194 (2015)

  59. 59.

    Netrapalli, P., Sanghavi, S.: Learning the graph of epidemic cascades. In: SIGMETRICS, pp. 211–222. ACM, New York, NY, USA (2012). https://doi.org/10.1145/2254756.2254783

  60. 60.

    Nikolova, E.: Approximation algorithms for reliable stochastic combinatorial optimization. In: APPROX, pp. 338–351. Springer, Berlin (2010)

  61. 61.

    Orlin, J.B., Schulz, A., Udwani, R.: Robust monotone submodular function maximization. In: IPCO (2016)

  62. 62.

    Pascual, L.D., Ben-Israel, A.: Constrained maximization of posynomials by geometric programming. J. Optim. Theory Appl. 5(2), 73–80 (1970). https://doi.org/10.1007/BF00928296

    MathSciNet  Article  MATH  Google Scholar 

  63. 63.

    Polyak, B.T.: Introduction to Optimization. 04; QA402. 5, P6. (1987)

  64. 64.

    Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)

    Article  Google Scholar 

  65. 65.

    Rockafellar, R.T., Uryasev, S.: Conditional value-at-risk for general loss distributions. J. Bank. Financ. 26(7), 1443–1471 (2002)

    Article  Google Scholar 

  66. 66.

    Soma, T., Yoshida, Y.: A generalization of submodular cover via the diminishing return property on the integer lattice. In: NIPS, pp. 847–855 (2015)

  67. 67.

    Soma, T., Kakimura, N., Inaba, K., Kawarabayashi, K.i.: Optimal budget allocation: theoretical guarantee and efficient algorithm. In: ICML, pp. 351–359 (2014)

  68. 68.

    Svitkina, Z., Fleischer, L.: Submodular approximation: sampling-based algorithms and lower bounds. SIAM J. Comput. 40(6), 1715–1737 (2011)

    MathSciNet  Article  Google Scholar 

  69. 69.

    Topkis, D.M.: Minimizing a submodular function on a lattice. Oper. Res. 26(2), 305–321 (1978)

    MathSciNet  Article  Google Scholar 

  70. 70.

    Wainwright, K., Chiang, A.: Fundamental Methods of Mathematical Economics. McGraw-Hill Education, New York (2004)

    Google Scholar 

  71. 71.

    Wilder, B.: Risk-sensitive submodular optimization. In: AAAI Conference on Artificial Intelligence (2018)

  72. 72.

    Wolfe, P.: Finding the nearest point in a polytope. Math. Program. 11(1), 128–149 (1976). https://doi.org/10.1007/BF01580381

    MathSciNet  Article  MATH  Google Scholar 

  73. 73.

    Yahoo! Webscope dataset ydata-ysm-advertiser-bids-v1\(_{-}\)0. http://research.yahoo.com/Academic_Relations

  74. 74.

    Zhang, P., Chen, W., Sun, X., Wang, Y., Zhang, J.: Minimizing seed set selection with probabilistic coverage guarantee in a social network. In: KDD, pp. 1306–1315. New York, NY, USA (2014). https://doi.org/10.1145/2623330.2623684

Download references

Acknowledgements

We thank the anonymous reviewers for their helpful suggestions. We also thank MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing computational resources. This research was conducted with Government support under and awarded by DoD, Air Force Office of Scientific Research, National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a, and also supported by NSF CAREER Award 1553284 and The Defense Advanced Research Projects Agency (Grant Number YFA17 N66001-17-1-4039). The views, opinions, and/or findings contained in this article are those of the author and should not be interpreted as representing the official views or policies, either expressed or implied, of the Defense Advanced Research Projects Agency or the Department of Defense.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Matthew Staib.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Worst-Case Approximation Ratio Versus True Worst-Case

Consider the function \(f(x;\theta )\) defined on \(\{0,1\} \times \{0,1\}\), with values given by:

$$\begin{aligned} f(x;0) = {\left\{ \begin{array}{ll} 1 &{} x=0 \\ 0.6 &{} x=1, \end{array}\right. } \quad f(x;1) = {\left\{ \begin{array}{ll} 1 &{} x=0 \\ 2 &{} x=1. \end{array}\right. } \end{aligned}$$
(65)

We wish to choose x to maximize \(f(x;\theta )\) robustly with respect to adversarial choices of \(\theta \). If \(\theta \) were fixed, we could directly choose \(x_\theta ^*\) to maximize \(f(x;\theta )\). In particular, \(x^*_0 = 0\) and \(x^*_1 = 1\). Of course, we want to deal with worst-case \(\theta \). One option is to maximize the worst-case approximation ratio:

$$\begin{aligned} \max _x \min _\theta \frac{f(x;\theta )}{f(x^*_\theta ;\theta )}. \end{aligned}$$
(66)

One can verify that the best x according to this criterion is \(x=1\), with worst-case approximation ratio 0.6 and worst-case function value 0.6. In this paper, we optimize the worst-case of the actual function value:

$$\begin{aligned} \max _x \min _\theta f(x;\theta ). \end{aligned}$$
(67)

This criterion will select \(x=0\), which has a worse worst-case approximation ratio of 0.5, but actually guarantees a function value of 1, significantly better than the 0.6 achieved by the other formulation of robustness.

DR-Submodularity and \(L^\natural \)-Convexity

A function is \(L^\natural \)-convex if it satisfies a discrete version of midpoint convexity, i.e. for all xy it holds that

$$\begin{aligned} f(x) + f(y) \ge f\left( \left\lceil \frac{x+y}{2}\right\rceil \right) + f\left( \left\lfloor \frac{x+y}{2}\right\rfloor \right) , \end{aligned}$$
(68)

where the floor \(\lfloor \cdot \rfloor \) and ceiling \(\lceil \cdot \rceil \) functions are interpreted elementwise.

Remark 1

An \(L^\natural \)-convex function need not be DR-submodular, and vice-versa. Hence algorithms for optimizing one type may not apply for the other.

Proof

Consider \(f_1(x_1,x_2) = -x_1^2 - 2x_1 x_2\) and \(f_2(x_1,x_2) = x_1^2 + x_2^2\), both defined on \(\{0,1,2\} \times \{0,1,2\}\). The function \(f_1\) is DR-submodular but violates discrete midpoint convexity for the pair of points (0, 0) and (2, 2), while \(f_2\) is \(L^\natural \)-convex but does not have diminishing returns in either dimension. \(\square \)

Intuitively-speaking, \(L^\natural \)-convex functions look like discretizations of convex functions. The continuous objective function \({\mathcal {I}}(x,y)\) we consider need not be convex, hence its discretization need not be \(L^\natural \)-convex, and we cannot use those tools. However, in some regimes (namely if each \(y(s) \in \{0\} \cup [1,\infty )\)), it happens that \({\mathcal {I}}(x,y)\) is DR-submodular in x.

Constrained Continuous Submodular Function Minimization

Solving the Optimization Problem

Here, we describe how to solve the convex problem (17) to which we reduced the original constrained submodular minimization problem. Bach [4], at the beginning of Section 5.2, states that this surrogate problem can be optimized via the Frank–Wolfe method and its variants. However, [4] only elaborates on the simpler version of Problem (17) without the extra functions \(a_{i x_i}\). Here we detail how Frank–Wolfe algorithms can be used to solve the more general parametric regularized problem. Our aim is to spell out very clearly the applicability of Frank–Wolfe to this problem, for the ease of practitioners.

Bach [4] notes that by duality, Problem (17) is equivalent to:

$$\begin{aligned}&\min _{\rho \in \prod _{i=1}^n {\mathbb {R}}_\downarrow ^{k_i - 1}} h_\downarrow (\rho ) - H(0) + \sum _{i=1}^n \sum _{x_i=1}^{k_i-1} a_{i x_i}[\rho _i(x_i)] \\&\quad = \min _{\rho \in \prod _{i=1}^n {\mathbb {R}}_\downarrow ^{k_i - 1}} \max _{w \in B(H)} \langle \rho , w \rangle + \sum _{i=1}^n \sum _{x_i=1}^{k_i-1} a_{i x_i}[\rho _i(x_i)] \\&\quad = \max _{w \in B(H)} \left\{ \min _{\rho \in \prod _{i=1}^n {\mathbb {R}}_\downarrow ^{k_i - 1}} \langle \rho , w \rangle + \sum _{i=1}^n \sum _{x_i=1}^{k_i-1} a_{i x_i}[\rho _i(x_i)] \right\} \\&\quad := \max _{w \in B(H)} f(w). \end{aligned}$$

Here, the base polytope B(H) happens to be the convex hull of all vectors w which could be output by the greedy algorithm in [4].

It is the dual problem, where we maximize over w, which is amenable to Frank–Wolfe. For Frank–Wolfe methods, we need two oracles: an oracle which, given w, returns \(\nabla f(w)\); and an oracle which, given \(\nabla f(w)\), produces a point s which solves the linear optimization problem \(\max _{s \in B(H)} \langle s, \nabla f(w) \rangle \).

Per [4], an optimizer of the linear problem can be computed directly from the greedy algorithm. For the gradient oracle, recall that we can find a subgradient of \(g(x) = \min _y h(x,y)\) at the point \(x_0\) by finding \(y(x_0)\) which is optimal for the inner problem, and then computing \(\nabla _x h(x,y(x_0))\). Moreover, if such \(y(x_0)\) is the unique optimizer, then the resulting vector is indeed the gradient of g(x) at \(x_0\). Hence, in our case, it suffices to first find \(\rho (w)\) which solves the inner problem, and then \(\nabla f(w)\) is simply \(\rho (w)\) because the inner function is linear in w. Since each function \(a_{i x_i}\) is strictly convex, the minimizer \(\rho (w)\) is unique, confirming that we indeed get a gradient of f, and that f is differentiable.

Of course, we still need to compute the minimizer \(\rho (w)\). For a given w, we want to solve

$$\begin{aligned} \min _{\rho \in \prod _{i=1}^n {\mathbb {R}}_\downarrow ^{k_i - 1}} \langle \rho , w \rangle + \sum _{i=1}^n \sum _{x_i=1}^{k_i-1} a_{i x_i}[\rho _i(x_i)] \end{aligned}$$

There are no constraints coupling the vectors \(\rho _i\), and the objective is similarly separable, so we can independently solve n problems of the form

$$\begin{aligned} \min _{\rho \in {\mathbb {R}}_\downarrow ^{k - 1}} \langle \rho , w \rangle + \sum _{j=1}^{k-1} a_{j}(\rho _j). \end{aligned}$$

Recall that each function \(a_{i y_i}(t)\) takes the form \(\frac{1}{2} t^2 r_{i y_i} \) for some \(r_{i y_i} > 0\). Let \(D = {{\,\mathrm{{\mathrm {diag}}}\,}}(r)\), the \((k-1)\times (k-1)\) matrix with diagonal entries \(r_j\). Our problem can then be written as

$$\begin{aligned} \min _{\rho \in {\mathbb {R}}_\downarrow ^{k - 1}} \langle \rho , w \rangle + \frac{1}{2} \sum _{j=1}^{k-1} r_j \rho _j^2&= \min _{\rho \in {\mathbb {R}}_\downarrow ^{k - 1}} \langle \rho , w \rangle + \frac{1}{2} \langle D \rho , \; \rho \rangle \\&= \min _{\rho \in {\mathbb {R}}_\downarrow ^{k - 1}} \langle D^{1/2}\rho , \; D^{-1/2} w \rangle + \frac{1}{2} \langle D^{1/2} \rho , \; D^{1/2}\rho \rangle . \end{aligned}$$

Completing the square, the above problem is equivalent to

$$\begin{aligned} \min _{\rho \in {\mathbb {R}}_\downarrow ^{k - 1}} ||D^{1/2} \rho + D^{-1/2} w ||_2^2&= \min _{\rho \in {\mathbb {R}}_\downarrow ^{k - 1}} \sum _{j=1}^{k-1} \Big (r_j^{1/2} \rho _j + r_j^{-1/2} w_j\Big )^2 \\&= \min _{\rho \in {\mathbb {R}}_\downarrow ^{k - 1}} \sum _{j=1}^{k-1} r_j \Big (\rho _j + r_j^{-1} w_j\Big )^2. \end{aligned}$$

This last expression is precisely the problem which is called weighted isotonic regression: we are fitting \(\rho \) to \({{\,\mathrm{{\mathrm {diag}}}\,}}(r^{-1}) w\), with weights r, subject to a monotonicity constraint. Weighted isotonic regression is solved efficiently via the Pool Adjacent Violators algorithm of [12].

Runtime

Frank–Wolfe returns an \(\varepsilon \)-suboptimal solution in \(O(\varepsilon ^{-1} D^2 L)\) iterations, where D is the diameter of the feasible region, and L is the Lipschitz constant for the gradient of the objective [42]. Our optimization problem is \(\max _{w\in B(H)} f(w)\) as defined in the previous section. Each \(w \in B(H)\) has \(O(n\delta ^{-1})\) coordinates of the form \(H^\delta (x+e_i)-H^\delta (x)\). Since \(H^\delta \) is an expected influence in the range [0, T], we can bound the magnitude of each coordinate of w by T and hence \(D^2\) by \(O(n\delta ^{-1} T^2)\). If \(\alpha \) is the minimum derivative of the functions \(R_i\), then the smallest coefficient of the functions \(a_{ix_i}(t)\) is bounded below by \(\alpha \delta \). Hence the objective is the conjugate of an \(\alpha \delta \)-strongly convex function, and therefore has \(\alpha ^{-1}\delta ^{-1}\)-Lipschitz gradient. Combining these, we arrive at the \(O(\varepsilon ^{-1} n\delta ^{-2} \alpha ^{-1} T^2)\) iteration bound. The most expensive step in each iteration is computing the subgradient, which requires sorting the \(O(n\delta ^{-1})\) elements of \(\rho \) in time \(O(n\delta ^{-1} \log {n\delta ^{-1}} )\). Hence the total runtime of Frank–Wolfe is \(O(\varepsilon ^{-1} n^2\delta ^{-3} \alpha ^{-1} T^2 \log {n\delta ^{-1}})\).

As specified in the main text, relating an approximate solution of (17) to a solution of (14) is nontrivial. Assume \(\rho ^*\) has distinct elements separated by \(\eta \), and chose \(\varepsilon \) to be less than \(\eta ^2 \alpha \delta / 8\). If \(\rho \) is \(\varepsilon \)-suboptimal, then by \(\alpha \delta \)-strong convexity we must have \(||\rho - \rho ^* ||_2 < \eta /2\), and therefore \(||\rho - \rho ^* ||_\infty < \eta /2\). Since the smallest consecutive gap between elements of \(\rho ^*\) is \(\eta \), this implies that \(\rho \) and \(\rho ^*\) have the same ordering, and therefore admit the same solution x after thresholding. Accounting for this choice in \(\varepsilon \), we have an exact solution to (14) in total runtime of \(O(\eta ^{-2} n^2\delta ^{-4} \alpha ^{-2} T^2 \log {n\delta ^{-1}})\).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Staib, M., Jegelka, S. Robust Budget Allocation Via Continuous Submodular Functions. Appl Math Optim 82, 1049–1079 (2020). https://doi.org/10.1007/s00245-019-09567-0

Download citation

Keywords

  • Submodular optimization
  • Constrained submodular optimization
  • Robust optimization
  • Nonconvex optimization
  • Budget allocation