Abstract
The optimal allocation of resources for maximizing influence, spread of information or coverage, has gained attention in the past years, in particular in machine learning and data mining. But in applications, the parameters of the problem are rarely known exactly, and using wrong parameters can lead to undesirable outcomes. We hence revisit a continuous version of the Budget Allocation or Bipartite Influence Maximization problem introduced by Alon et al. (in: WWW’12 - Proceedings of the 21st Annual Conference on World Wide, ACM, New York, 2012) from a robust optimization perspective, where an adversary may choose the least favorable parameters within a confidence set. The resulting problem is a nonconvex–concave saddle point problem (or game). We show that this nonconvex problem can be solved exactly by leveraging connections to continuous submodular functions, and by solving a constrained submodular minimization problem. Although constrained submodular minimization is hard in general, here, we establish conditions under which such a problem can be solved to arbitrary precision \(\varepsilon \).
This is a preview of subscription content, log in to check access.





References
- 1.
Adamczyk, M., Sviridenko, M., Ward, J.: Submodular stochastic probing on matroids. Math. Oper. Res. 41(3), 1022–1038 (2016). https://doi.org/10.1287/moor.2015.0766
- 2.
Alon, N., Gamzu, I., Tennenholtz, M.: Optimizing Budget Allocation Among Channels and Influencers. WWW, pp. 381–388. ACM, New York, NY (2012). https://doi.org/10.1145/2187836.2187888
- 3.
Atamtürk, A., Narayanan, V.: Polymatroids and mean-risk minimization in discrete optimization. Oper. Res. Lett. 36(5), 618–622 (2008). https://doi.org/10.1016/j.orl.2008.04.006
- 4.
Bach, F.: Submodular functions: from discrete to continous domains. Math. Program. (2018). https://doi.org/10.1007/s10107-018-1248-6
- 5.
Balkanski, E., Rubinstein, A., Singer, Y.: The power of optimization from samples. In: Proceedings of the NIPS, pp. 4017–4025 (2016)
- 6.
Balkanski, E., Rubinstein, A., Singer, Y.: The limitations of optimization from samples. In: STOC (2017)
- 7.
Becker, S.R., Candès, E.J., Grant, M.C.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3(3), 165–218 (2011)
- 8.
Ben-Tal, A., Nemirovski, A.: Robust solutions of linear programming problems contaminated with uncertain data. Math. Program. 88(3), 411–424 (2000). https://doi.org/10.1007/PL00011380
- 9.
Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)
- 10.
Bertsimas, D., Sim, M.: Robust discrete optimization and network flows. Math. Program. 98(1), 49–71 (2003)
- 11.
Bertsimas, D., Brown, D.B., Caramanis, C.: Theory and applications of robust optimization. SIAM Rev. 53(3), 464–501 (2011). https://doi.org/10.1137/080734510
- 12.
Best, M.J., Chakravarti, N.: Active set algorithms for isotonic regression. A unifying framework. Math. Program. 47(1–3), 425–439 (1990). https://doi.org/10.1007/BF01580873
- 13.
Bian, A.A., Mirzasoleiman, B., Buhmann, J.M., Krause, A.: Guaranteed non-convex optimization: submodular maximization over continuous domains. In: AISTATS (2017)
- 14.
Birkhoff, G.: Rings of sets. Duke Math. J. 3(3), 443–454 (1937)
- 15.
Borgs, C., Brautbar, M., Chayes, J., Lucier, B.: Maximizing social influence in nearly optimal time. In: SODA, pp. 946–957. Philadelphia, PA, USA (2014)
- 16.
Boyd, S., Kim, S.J., Vandenberghe, L., Hassibi, A.: A tutorial on geometric programming. Optim. Eng. 8(1), 67–127 (2007)
- 17.
Chakrabarty, D., Lee, Y.T., Sidford, A., Wong, S.C.W.: Subquadratic submodular function minimization. In: STOC (2017)
- 18.
Chandrasekaran, V., Shah, P.: Relative entropy relaxations for signomial optimization. SIAM J. Optim. 26(2), 1147–1173 (2016). https://doi.org/10.1137/140988978
- 19.
Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: KDD, pp. 199–208 (2009)
- 20.
Chen, W., Wang, C., Wang, Y.: Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: KDD, pp. 1029–1038. New York, NY, USA (2010). https://doi.org/10.1145/1835804.1835934
- 21.
Chen, W., Lin, T., Tan, Z., Zhao, M., Zhou, X.: Robust influence maximization. In: KDD, pp. 795–804. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939745
- 22.
Chiang, M.: Geometric programming for communication systems. Commun. Inf. Theory 2(1/2), 1–154 (2005). https://doi.org/10.1516/0100000005
- 23.
Deshpande, A., Hellerstein, L., Kletenik, D.: Approximation algorithms for stochastic submodular set cover with applications to Boolean function evaluation and Min-Knapsack. ACM Trans. Algorithms 12(3), 42:1–42:28 (2016). https://doi.org/10.1145/2876506
- 24.
Domingos, P., Richardson, M.: Mining the network value of customers. In: KDD, pp. 57–66 (2001)
- 25.
Du, N., Song, L., Gomez Rodriguez, M., Zha, H.: Scalable influence estimation in continuous-time diffusion networks. In: NIPS, pp. 3147–3155 (2013)
- 26.
Du, N., Liang, Y., Balcan, M.F., Song, L.: Influence function learning in information diffusion networks. In: ICML, pp. 2016–2024 (2014)
- 27.
Dunn, J.C., Harshbarger, S.: Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62(2), 432–444 (1978)
- 28.
Ecker, J.: Geometric programming: methods, computations and applications. SIAM Rev. 22(3), 338–362 (1980). https://doi.org/10.1137/1022058
- 29.
Ene, A., Nguyen, H.L.: A reduction for optimizing lattice submodular functions with diminishing returns. arXiv:1606.08362 (2016)
- 30.
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Log. Quart. 3(1–2), 95–110 (1956). https://doi.org/10.1002/nav.3800030109
- 31.
Fujishige, S.: Submodular Functions and Optimization, vol. 58. Elsevier, Amsterdam (2005)
- 32.
Goel, G., Karande, C., Tripathi, P., Wang, L.: Approximability of combinatorial problems with multi-agent submodular cost functions. In: FOCS, pp. 755–764 (2009)
- 33.
Goemans, M., Vondrák, J.: Stochastic Covering and Adaptivity. LATIN 2006: Theoretical Informatics, pp. 532–543. Springer, Berlin (2006). https://doi.org/10.1007/11682462_50
- 34.
Golovin, D., Krause, A.: Adaptive submodularity: theory and applications in active learning and stochastic optimization. J. Artif. Intell. 42, 427–486 (2011)
- 35.
Gomez Rodriguez, M., Leskovec, J., Krause, A.: Inferring networks of diffusion and influence. In: KDD, pp. 1019–1028. New York, NY, USA (2010). https://doi.org/10.1145/1835804.1835933
- 36.
Gomez Rodriguez, M., Schölkopf, B.: Influence Maximization in Continuous Time Diffusion Networks. In: ICML (2012)
- 37.
Gottschalk, C., Peis, B.: Submodular function maximization on the bounded integer lattice. In: Proceedings of the 13th International Workshop (WAOA) on Approximation and Online Algorithms (2015)
- 38.
Hassidim, A., Singer, Y.: Submodular Optimization under Noise. In: Kale, S., Shamir, O. (eds.) Proceedings of the 2017 Conference on Learning Theory, Proceedings of Machine Learning Research, vol. 65, pp. 1069–1122. PMLR, Amsterdam, Netherlands (2017). http://proceedings.mlr.press/v65/hassidim17a.html
- 39.
Hatano, D., Fukunaga, T., Maehara, T., Kawarabayashi, K.i.: Lagrangian decomposition algorithm for allocating marketing channels. In: AAAI, pp. 1144–1150 (2015)
- 40.
He, X., Kempe, D.: Robust influence maximization. In: KDD, pp. 885–894. ACM, New York, NY, USA (2016). https://doi.org/10.1145/2939672.2939760
- 41.
Iwata, S., Nagano, K.: Submodular function minimization under covering constraints. In: FOCS, pp. 671–680 (2009)
- 42.
Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization. In: ICML, pp. 427–435 (2013)
- 43.
Kempe, D., Kleinberg, J., Tardos, É.: Maximizing the spread of influence through a social network. In: KDD, pp. 137–146. New York, NY, USA (2003). https://doi.org/10.1145/956750.956769
- 44.
Khachaturov, V.R., Khachaturov, R.V., Khachaturov, R.V.: Supermodular programming on finite lattices. Comput. Math. Math. Phys. 52(6), 855–878 (2012). https://doi.org/10.1134/S0965542512060097
- 45.
Kim, S., Kojima, M.: Exact solutions of some nonconvex quadratic optimization problems via SDP and SOCP relaxations. Comput. Optim. Appl. 26(2), 143–154 (2003). https://doi.org/10.1023/A:1025794313696
- 46.
Kolmogorov, V., Shioura, A.: New algorithms for convex cost tension problem with application to computer vision. Discrete Optim. 6, 378–393 (2009)
- 47.
Krause, A., McMahan, H.B., Guestrin, C., Gupta, A.: Robust submodular observation selection. J. Mach. Learn. Res. 9, 2761–2801 (2008)
- 48.
Lacoste-Julien, S.: Convergence rate of frank-wolfe for non-convex objectives. arXiv:1607.00345 (2016)
- 49.
Lacoste-Julien, S., Jaggi, M.: On the global linear convergence of Frank-Wolfe optimization variants. In: NIPS, pp. 496–504 (2015)
- 50.
Lee, Y.T., Sidford, A., Wong, S.C.w.: A faster cutting plane method and its implications for combinatorial and convex optimization. In: FOCS, pp. 1049–1065 (2015)
- 51.
Lowalekar, M., Varakantham, P., Kumar, A.: Robust Influence Maximization: (Extended Abstract). In: AAMAS, pp. 1395–1396. Richland, SC (2016)
- 52.
Maehara, T.: Risk averse submodular utility maximization. Oper. Res. Lett. 43(5), 526–529 (2015). https://doi.org/10.1016/j.orl.2015.08.001
- 53.
Maehara, T., Yabe, A., Kawarabayashi, K.i.: Budget allocation problem with multiple advertisers: a game theoretic view. In: ICML, pp. 428–437 (2015)
- 54.
MOSEK ApS: MOSEK MATLAB Toolbox 8.0.0.57 (2015). http://docs.mosek.com/8.0/toolbox/index.html
- 55.
Murota, K.: Discrete convex analysis. Math. Program. 83, 313–371 (2003)
- 56.
Murota, K., Shioura, A.: Exact bounds for steepest descent algorithms of \(L\)-convex function minimization. Oper. Res. Lett. 42, 361–366 (2014)
- 57.
Nagano, K., Kawahara, Y., Aihara, K.: Size-constrained submodular minimization through minimum norm base. In: ICML, pp. 977–984 (2011)
- 58.
Narasimhan, H., Parkes, D.C., Singer, Y.: Learnability of influence in networks. In: NIPS, pp. 3186–3194 (2015)
- 59.
Netrapalli, P., Sanghavi, S.: Learning the graph of epidemic cascades. In: SIGMETRICS, pp. 211–222. ACM, New York, NY, USA (2012). https://doi.org/10.1145/2254756.2254783
- 60.
Nikolova, E.: Approximation algorithms for reliable stochastic combinatorial optimization. In: APPROX, pp. 338–351. Springer, Berlin (2010)
- 61.
Orlin, J.B., Schulz, A., Udwani, R.: Robust monotone submodular function maximization. In: IPCO (2016)
- 62.
Pascual, L.D., Ben-Israel, A.: Constrained maximization of posynomials by geometric programming. J. Optim. Theory Appl. 5(2), 73–80 (1970). https://doi.org/10.1007/BF00928296
- 63.
Polyak, B.T.: Introduction to Optimization. 04; QA402. 5, P6. (1987)
- 64.
Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)
- 65.
Rockafellar, R.T., Uryasev, S.: Conditional value-at-risk for general loss distributions. J. Bank. Financ. 26(7), 1443–1471 (2002)
- 66.
Soma, T., Yoshida, Y.: A generalization of submodular cover via the diminishing return property on the integer lattice. In: NIPS, pp. 847–855 (2015)
- 67.
Soma, T., Kakimura, N., Inaba, K., Kawarabayashi, K.i.: Optimal budget allocation: theoretical guarantee and efficient algorithm. In: ICML, pp. 351–359 (2014)
- 68.
Svitkina, Z., Fleischer, L.: Submodular approximation: sampling-based algorithms and lower bounds. SIAM J. Comput. 40(6), 1715–1737 (2011)
- 69.
Topkis, D.M.: Minimizing a submodular function on a lattice. Oper. Res. 26(2), 305–321 (1978)
- 70.
Wainwright, K., Chiang, A.: Fundamental Methods of Mathematical Economics. McGraw-Hill Education, New York (2004)
- 71.
Wilder, B.: Risk-sensitive submodular optimization. In: AAAI Conference on Artificial Intelligence (2018)
- 72.
Wolfe, P.: Finding the nearest point in a polytope. Math. Program. 11(1), 128–149 (1976). https://doi.org/10.1007/BF01580381
- 73.
Yahoo! Webscope dataset ydata-ysm-advertiser-bids-v1\(_{-}\)0. http://research.yahoo.com/Academic_Relations
- 74.
Zhang, P., Chen, W., Sun, X., Wang, Y., Zhang, J.: Minimizing seed set selection with probabilistic coverage guarantee in a social network. In: KDD, pp. 1306–1315. New York, NY, USA (2014). https://doi.org/10.1145/2623330.2623684
Acknowledgements
We thank the anonymous reviewers for their helpful suggestions. We also thank MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing computational resources. This research was conducted with Government support under and awarded by DoD, Air Force Office of Scientific Research, National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a, and also supported by NSF CAREER Award 1553284 and The Defense Advanced Research Projects Agency (Grant Number YFA17 N66001-17-1-4039). The views, opinions, and/or findings contained in this article are those of the author and should not be interpreted as representing the official views or policies, either expressed or implied, of the Defense Advanced Research Projects Agency or the Department of Defense.
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Worst-Case Approximation Ratio Versus True Worst-Case
Consider the function \(f(x;\theta )\) defined on \(\{0,1\} \times \{0,1\}\), with values given by:
We wish to choose x to maximize \(f(x;\theta )\) robustly with respect to adversarial choices of \(\theta \). If \(\theta \) were fixed, we could directly choose \(x_\theta ^*\) to maximize \(f(x;\theta )\). In particular, \(x^*_0 = 0\) and \(x^*_1 = 1\). Of course, we want to deal with worst-case \(\theta \). One option is to maximize the worst-case approximation ratio:
One can verify that the best x according to this criterion is \(x=1\), with worst-case approximation ratio 0.6 and worst-case function value 0.6. In this paper, we optimize the worst-case of the actual function value:
This criterion will select \(x=0\), which has a worse worst-case approximation ratio of 0.5, but actually guarantees a function value of 1, significantly better than the 0.6 achieved by the other formulation of robustness.
DR-Submodularity and \(L^\natural \)-Convexity
A function is \(L^\natural \)-convex if it satisfies a discrete version of midpoint convexity, i.e. for all x, y it holds that
where the floor \(\lfloor \cdot \rfloor \) and ceiling \(\lceil \cdot \rceil \) functions are interpreted elementwise.
Remark 1
An \(L^\natural \)-convex function need not be DR-submodular, and vice-versa. Hence algorithms for optimizing one type may not apply for the other.
Proof
Consider \(f_1(x_1,x_2) = -x_1^2 - 2x_1 x_2\) and \(f_2(x_1,x_2) = x_1^2 + x_2^2\), both defined on \(\{0,1,2\} \times \{0,1,2\}\). The function \(f_1\) is DR-submodular but violates discrete midpoint convexity for the pair of points (0, 0) and (2, 2), while \(f_2\) is \(L^\natural \)-convex but does not have diminishing returns in either dimension. \(\square \)
Intuitively-speaking, \(L^\natural \)-convex functions look like discretizations of convex functions. The continuous objective function \({\mathcal {I}}(x,y)\) we consider need not be convex, hence its discretization need not be \(L^\natural \)-convex, and we cannot use those tools. However, in some regimes (namely if each \(y(s) \in \{0\} \cup [1,\infty )\)), it happens that \({\mathcal {I}}(x,y)\) is DR-submodular in x.
Constrained Continuous Submodular Function Minimization
Solving the Optimization Problem
Here, we describe how to solve the convex problem (17) to which we reduced the original constrained submodular minimization problem. Bach [4], at the beginning of Section 5.2, states that this surrogate problem can be optimized via the Frank–Wolfe method and its variants. However, [4] only elaborates on the simpler version of Problem (17) without the extra functions \(a_{i x_i}\). Here we detail how Frank–Wolfe algorithms can be used to solve the more general parametric regularized problem. Our aim is to spell out very clearly the applicability of Frank–Wolfe to this problem, for the ease of practitioners.
Bach [4] notes that by duality, Problem (17) is equivalent to:
Here, the base polytope B(H) happens to be the convex hull of all vectors w which could be output by the greedy algorithm in [4].
It is the dual problem, where we maximize over w, which is amenable to Frank–Wolfe. For Frank–Wolfe methods, we need two oracles: an oracle which, given w, returns \(\nabla f(w)\); and an oracle which, given \(\nabla f(w)\), produces a point s which solves the linear optimization problem \(\max _{s \in B(H)} \langle s, \nabla f(w) \rangle \).
Per [4], an optimizer of the linear problem can be computed directly from the greedy algorithm. For the gradient oracle, recall that we can find a subgradient of \(g(x) = \min _y h(x,y)\) at the point \(x_0\) by finding \(y(x_0)\) which is optimal for the inner problem, and then computing \(\nabla _x h(x,y(x_0))\). Moreover, if such \(y(x_0)\) is the unique optimizer, then the resulting vector is indeed the gradient of g(x) at \(x_0\). Hence, in our case, it suffices to first find \(\rho (w)\) which solves the inner problem, and then \(\nabla f(w)\) is simply \(\rho (w)\) because the inner function is linear in w. Since each function \(a_{i x_i}\) is strictly convex, the minimizer \(\rho (w)\) is unique, confirming that we indeed get a gradient of f, and that f is differentiable.
Of course, we still need to compute the minimizer \(\rho (w)\). For a given w, we want to solve
There are no constraints coupling the vectors \(\rho _i\), and the objective is similarly separable, so we can independently solve n problems of the form
Recall that each function \(a_{i y_i}(t)\) takes the form \(\frac{1}{2} t^2 r_{i y_i} \) for some \(r_{i y_i} > 0\). Let \(D = {{\,\mathrm{{\mathrm {diag}}}\,}}(r)\), the \((k-1)\times (k-1)\) matrix with diagonal entries \(r_j\). Our problem can then be written as
Completing the square, the above problem is equivalent to
This last expression is precisely the problem which is called weighted isotonic regression: we are fitting \(\rho \) to \({{\,\mathrm{{\mathrm {diag}}}\,}}(r^{-1}) w\), with weights r, subject to a monotonicity constraint. Weighted isotonic regression is solved efficiently via the Pool Adjacent Violators algorithm of [12].
Runtime
Frank–Wolfe returns an \(\varepsilon \)-suboptimal solution in \(O(\varepsilon ^{-1} D^2 L)\) iterations, where D is the diameter of the feasible region, and L is the Lipschitz constant for the gradient of the objective [42]. Our optimization problem is \(\max _{w\in B(H)} f(w)\) as defined in the previous section. Each \(w \in B(H)\) has \(O(n\delta ^{-1})\) coordinates of the form \(H^\delta (x+e_i)-H^\delta (x)\). Since \(H^\delta \) is an expected influence in the range [0, T], we can bound the magnitude of each coordinate of w by T and hence \(D^2\) by \(O(n\delta ^{-1} T^2)\). If \(\alpha \) is the minimum derivative of the functions \(R_i\), then the smallest coefficient of the functions \(a_{ix_i}(t)\) is bounded below by \(\alpha \delta \). Hence the objective is the conjugate of an \(\alpha \delta \)-strongly convex function, and therefore has \(\alpha ^{-1}\delta ^{-1}\)-Lipschitz gradient. Combining these, we arrive at the \(O(\varepsilon ^{-1} n\delta ^{-2} \alpha ^{-1} T^2)\) iteration bound. The most expensive step in each iteration is computing the subgradient, which requires sorting the \(O(n\delta ^{-1})\) elements of \(\rho \) in time \(O(n\delta ^{-1} \log {n\delta ^{-1}} )\). Hence the total runtime of Frank–Wolfe is \(O(\varepsilon ^{-1} n^2\delta ^{-3} \alpha ^{-1} T^2 \log {n\delta ^{-1}})\).
As specified in the main text, relating an approximate solution of (17) to a solution of (14) is nontrivial. Assume \(\rho ^*\) has distinct elements separated by \(\eta \), and chose \(\varepsilon \) to be less than \(\eta ^2 \alpha \delta / 8\). If \(\rho \) is \(\varepsilon \)-suboptimal, then by \(\alpha \delta \)-strong convexity we must have \(||\rho - \rho ^* ||_2 < \eta /2\), and therefore \(||\rho - \rho ^* ||_\infty < \eta /2\). Since the smallest consecutive gap between elements of \(\rho ^*\) is \(\eta \), this implies that \(\rho \) and \(\rho ^*\) have the same ordering, and therefore admit the same solution x after thresholding. Accounting for this choice in \(\varepsilon \), we have an exact solution to (14) in total runtime of \(O(\eta ^{-2} n^2\delta ^{-4} \alpha ^{-2} T^2 \log {n\delta ^{-1}})\).
Rights and permissions
About this article
Cite this article
Staib, M., Jegelka, S. Robust Budget Allocation Via Continuous Submodular Functions. Appl Math Optim 82, 1049–1079 (2020). https://doi.org/10.1007/s00245-019-09567-0
Published:
Issue Date:
Keywords
- Submodular optimization
- Constrained submodular optimization
- Robust optimization
- Nonconvex optimization
- Budget allocation