A fractional memory-efficient approach for online continuous-time influence maximization

Abstract

Influence maximization (IM) under a continuous-time diffusion model requires finding a set of initial adopters which when activated lead to the maximum expected number of users becoming activated within a given amount of time. State-of-the-art approximation algorithms applicable to solving this intractable problem use reverse reachability influence samples to approximate the diffusion process. Unfortunately, these algorithms require storing large collections of such samples which can become prohibitive depending on the desired solution quality, properties of the diffusion process and seed set size. To remedy this, we design an algorithm that allows the influence samples to be processed in a streaming manner, avoiding the need to store them. We approach IM using two fractional objectives: a fractional relaxation and a multi-linear extension of the original objective function. We derive a progressively improved upper bound to the optimal solution, which we empirically find to be tighter than the best existing upper bound. This enables instance-dependent solution quality guarantees that are observed to be vastly superior to the theoretical worst case. Leveraging these, we develop an algorithm that delivers solutions with a superior empirical solution quality guarantee at comparable running time with greatly reduced memory usage compared to the state-of-the-art. We demonstrate the superiority of our approach via extensive experiments on five real datasets of varying sizes of up to 41M nodes and 1.5B edges.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Notes

  1. 1.

    Recall that \(\mathbf {x}^*\) is the optimal fractional solution to F (see Eq. 2).

  2. 2.

    Notice that \(1/\lambda = [\sum _{i=1}^n (1/\lambda _i)^\ell ]^{1/\ell }\).

  3. 3.

    We utilize a version of IMM that corrects the issue raised by [10].

  4. 4.

    All implementations were compiled using Intel compiler ICC 18.0.1 using optimization level -O3.

    OpenMP is used for parallel execution.

  5. 5.

    Samples that OPIM uses in its validation phase are not counted.

References

  1. 1.

    Ageev, A., Sviridenko, M.: Pipage rounding: a new method of constructing algorithms with proven performance guarantee. J. Comb. Optim. 8(3), 307–328 (2004)

    MathSciNet  Article  Google Scholar 

  2. 2.

    Arora, A., Galhotra, S., Ranu, S.: Debunking the myths of influence maximization: an in-depth benchmarking study. In: SIGMOD, pp. 651–666 (2017)

  3. 3.

    Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: massive data summarization on the fly. In: KDD, pp. 671–680 (2014)

  4. 4.

    Bateni, M., Esfandiari, H., Mirrokni, V.: Almost optimal streaming algorithms for coverage problems. In: SPAA, pp. 13–23 (2017)

  5. 5.

    Borgs, C., Brautbar, M., Chayes, J., Lucier, B.: Maximizing social influence in nearly optimal time. In: SODA, pp. 946–957 (2014)

  6. 6.

    Bury, K.V.: Statistical Models in Applied Science. Wiley, London (1975)

    Google Scholar 

  7. 7.

    Calinescu, G., Chekuri, C., Pál, M., Vondrák, J.: Maximizing a submodular set function subject to a matroid constraint (extended abstract). In: IPCO, pp. 182–196 (2007)

  8. 8.

    Chakrabarti, A., Wirth, A.: Incidence Geometries and the Pass Complexity of Semi-Streaming Set Cover, pp. 1365–1373 (2016)

  9. 9.

    Chen, L., Hassani, H., Karbasi, A.: Online continuous submodular maximization. In: AISTATS, vol. 84, pp. 1896–1905 (2018)

  10. 10.

    Chen, W.: An issue in the martingale analysis of the influence maximization algorithm imm. In: Computational Data and Social Networks, pp. 286–297 (2018)

  11. 11.

    Chen, W., Wang, C., Wang, Y.: Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: KDD, pp. 1029–1038 (2010)

  12. 12.

    Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: KDD, pp. 199–208 (2009)

  13. 13.

    Chen, W., Yuan, Y., Zhang, L.: Scalable influence maximization in social networks under the linear threshold model. In: ICDM, pp. 88–97 (2010)

  14. 14.

    Cheng, S., Shen, H., Huang, J., Chen, W., Cheng, X.: Imrank: influence maximization via finding self-consistent ranking. In: SIGIR, p. 475–484 (2014)

  15. 15.

    Cohen, E., Delling, D., Pajor, T., Werneck, R.F.: Sketch-based influence maximization and computation: scaling up with guarantees. In: CIKM, pp. 629–638 (2014)

  16. 16.

    Dagum, P., Karp, R., Luby, M., Ross, S.: An optimal algorithm for monte Carlo estimation. SIAM J. Comput. 29(5), 1484–1496 (2000)

    MathSciNet  Article  Google Scholar 

  17. 17.

    Demaine, E.D., Indyk, P., Mahabadi, S., Vakilian, A.: On streaming and communication complexity of the set cover problem. In: Discrete Computation, pp. 484–498 (2014)

  18. 18.

    Domingos, P., Richardson, M.: Mining the network value of customers. In: KDD, pp. 57–66 (2001)

  19. 19.

    Du, N., Song, L., Gomez Rodriguez, M., Zha, H.: Scalable influence estimation in continuous-time diffusion networks. In: NeurIPS, pp. 3147–3155. Curran Associates, Inc. (2013)

  20. 20.

    Du, N., Song, L., Yuan, M., Smola, A.J.: Learning networks of heterogeneous influence. In: NeurIPS, pp. 2780–2788. Curran Associates, Inc. (2012)

  21. 21.

    Duchi, J.C., Bartlett, P.L., Wainwright, M.J.: Randomized smoothing for stochastic optimization. SIAM J. Optim. 22(2), 674–701 (2012)

    MathSciNet  Article  Google Scholar 

  22. 22.

    Feige, U.: A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)

    MathSciNet  Article  Google Scholar 

  23. 23.

    Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Quart. 3, 95–110 (1956)

    MathSciNet  Article  Google Scholar 

  24. 24.

    Galhotra, S., Arora, A., Roy, S.: Holistic influence maximization: combining scalability and efficiency with opinion-aware models. In: SIGMOD, pp. 743–758 (2016)

  25. 25.

    Gibbs, D.L., Shmulevich, I.: Solving the influence maximization problem reveals regulatory organization of the yeast cell cycle. PLOS Comput. Biol. (2017)

  26. 26.

    Goemans, M.X., Williamson, D.P.: New 3/4-approximation algorithms for max sat. SIAM J. Discrete Math. 7, 313–321 (1994)

    Article  Google Scholar 

  27. 27.

    Goldenberg, J., Libai, B.: Muller: using complex systems analysis to advance marketing theory development. Acad. Market. Sci. Rev. (2001)

  28. 28.

    Goldenberg, J., Libai, B., Muller, E.: Talk of the network: a complex systems look at the underlying process of word-of-mouth. Market. Lett. 12(3), 211–223 (2001)

    Article  Google Scholar 

  29. 29.

    Gomez-Rodriguez, M., Balduzzi, D., Schölkopf, B.: Uncovering the temporal dynamics of diffusion networks. In: ICML, pp. 561–568 (2011)

  30. 30.

    Gomez-Rodriguez, M., Leskovec, J., Schölkopf, B.: Modeling information propagation with survival theory. In: ICML, pp. III–666–III–674 (2013)

  31. 31.

    Gomez Rodriguez, M., Leskovec, J., Schölkopf, B.: Structure and dynamics of information pathways in online media. In: WSDM, pp. 23–32 (2013)

  32. 32.

    Goyal, A., Lu, W., Lakshmanan, L.V.S.: Simpath: an efficient algorithm for influence maximization under the linear threshold model. In: ICDM, pp. 211–220 (2011)

  33. 33.

    Granovetter, M.: Threshold models of collective behavior. Am. J. Soc. 83(6), 1420–1443 (1978)

    Article  Google Scholar 

  34. 34.

    Guo, Q., Wang, S., Wei, Z., Chen, M.: Influence maximization revisited: efficient reverse reachable set generation with bound tightened. In: SIGMOD, pp. 2167–2181 (2020)

  35. 35.

    Har-Peled, S., Indyk, P., Mahabadi, S., Vakilian, A.: Towards tight bounds for the streaming set cover problem. In: PODS, pp. 371–383 (2016)

  36. 36.

    Huang, K., Wang, S., Bevilacqua, G.S., Xiao, X., Lakshmanan, L.V.S.: Revisiting the stop-and-stare algorithms for influence maximization. PVLDB 10(9), 913–924 (2017)

    Google Scholar 

  37. 37.

    Ienco, D., Bonchi, F., Castillo, C.: The meme ranking problem: maximizing microblogging virality. In: ICDMW, pp. 328–335 (2010)

  38. 38.

    Jaggi, M.: Revisiting Frank-Wolfe: projection-free sparse convex optimization. ICML 28, 427–435 (2013)

    Google Scholar 

  39. 39.

    Jung, K., Heo, W., Chen, W.: IRIE: scalable and robust influence maximization in social networks. In: ICDM, pp. 918–923 (2012)

  40. 40.

    Karimi, M., Lucic, M., Hassani, H., Krause, A.: Stochastic submodular maximization: the case of coverage functions. In: NeurIPS, pp. 6853–6863. Curran Associates, Inc. (2017)

  41. 41.

    Karlin, S.: Mathematical Methods and Theory in Games, Programming, and Economics. Addison-Wesley, Reading (1959)

    MATH  Google Scholar 

  42. 42.

    Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social network. In: KDD, pp. 137–146 (2003)

  43. 43.

    Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? http://an.kaist.ac.kr/traces/WWW2010.html (2010)

  44. 44.

    Lee, D., Hosanagar, K., Nair, H.: Advertising content and consumer engagement on social media: evidence from facebook. Manag. Sci. 64 (2018)

  45. 45.

    Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., Faloutsos, C., Van Briesen, J., Glance, N.: Cost-effective outbreak detection in networks. In: KDD, pp. 420–429 (2007)

  46. 46.

    Leskovec, J., Krevl, A.: SNAP datasets: stanford large network dataset collection. http://snap.stanford.edu/data (2014)

  47. 47.

    Li, X., Smith, J.D., Dinh, T.N., Thai, M.T.: Why approximate when you can get the exact? Optimal targeted viral marketing at scale. In: INFOCOM, pp. 1–9 (2017)

  48. 48.

    Li, Y., Fan, J., Zhang, D., Tan, K.L.: Discovering your selling points: personalized social influential tags exploration. In: SIGMOD, pp. 619–634 (2017)

  49. 49.

    McDiarmid, C.: Concentration. In: Habib, M., McDiarmid, C., Ramirez-Alfonsin, J., Reed, B. (eds.) Probabilistic Methods for Algorithmic Discrete Mathematics. Springer, New York (1998)

    MATH  Google Scholar 

  50. 50.

    Mokhtari, A., Hassani, H., Karbasi, A.: Conditional gradient method for stochastic submodular maximization: closing the gap. AISTATS 84, 1886–1895 (2018)

    Google Scholar 

  51. 51.

    Muthukrishnan, S.: Data streams: algorithms and applications. Found. Trends Theor. Comput. Sci. 1(2), 117–236 (2005)

    MathSciNet  Article  Google Scholar 

  52. 52.

    Nguyen, H., Nguyen, T., Phan, N.H., Dinh, T.: Importance sketching of influence dynamics in billion-scale networks. In: ICDM, pp. 337–346 (2017)

  53. 53.

    Nguyen, H.T., Thai, M.T., Dinh, T.N.: Stop-and-stare: Optimal sampling algorithms for viral marketing in billion-scale networks. In: SIGMOD, pp. 695–710 (2016)

  54. 54.

    Ohsaka, N.: The solution distribution of influence maximization: a high-level experimental study on three algorithmic approaches. In: SIGMOD, pp. 2151–2166 (2020)

  55. 55.

    Ohsaka, N., Akiba, T., Yoshida, Y., Kawarabayashi, K.I.: Fast and accurate influence maximization on large networks with pruned monte-carlo simulations. In: AAAI, pp. 138–144 (2014)

  56. 56.

    Ohsaka, N., Sonobe, T., Fujita, S., Kawarabayashi, K.I.: Coarsening massive influence networks for scalable diffusion analysis. In: SIGMOD, pp. 635–650 (2017)

  57. 57.

    Popova, D., Ohsaka, N., Kawarabayashi, K.i., Thomo, A.: Nosingles: a space-efficient algorithm for influence maximization. In: SSDBM, pp. 18:1–18:12 (2018)

  58. 58.

    Richardson, M., Domingos, P.: Mining knowledge-sharing sites for viral marketing. In: KDD, pp. 61–70 (2002)

  59. 59.

    Saha, B., Getoor, L.: On maximum coverage in the streaming model & application to multi-topic blog-watch. In: SDM, pp. 697–708 (2009)

  60. 60.

    Shapiro, H.N.: Note on a computation method in the theory of games. Commun. Pure Appl. Math. 11(4), 587–593 (1958)

    MathSciNet  Article  Google Scholar 

  61. 61.

    Shewan, D.: The comprehensive guide to online advertising costs. https://www.wordstream.com/blog/ws/2017/07/05/online-advertising-costs (2020)

  62. 62.

    Song, X., Chi, Y., Hino, K., Tseng, B.L.: Information flow modeling based on diffusion rate for prediction and ranking. In: WWW, pp. 191–200 (2007)

  63. 63.

    Tang, J., Tang, X., Xiao, X., Yuan, J.: Online processing algorithms for influence maximization. In: SIGMOD, pp. 991–1005 (2018)

  64. 64.

    Tang, Y., Shi, Y., Xiao, X.: Influence maximization in near-linear time: a martingale approach. In: SIGMOD, pp. 1539–1554 (2015)

  65. 65.

    Tang, Y., Xiao, X., Shi, Y.: Influence maximization: near-optimal time complexity meets practical efficiency. In: SIGMOD, pp. 75–86 (2014)

  66. 66.

    Wang, C., Chen, W., Wang, Y.: Scalable influence maximization for independent cascade model in large-scale social networks. Data Min. Knowl. Disc. 25(3), 545–576 (2012)

  67. 67.

    Zhang, K., Bhattacharyya, S., Ram, S.: Large-scale network analysis for online social brand advertising. MIS Q. 40, 849–868 (2016)

    Article  Google Scholar 

  68. 68.

    Zubcsek, P.P., Sarvary, M.: Advertising to a social network. Quant. Market. Econ. 9, 71–107 (2011)

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Glenn S. Bevilacqua.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A : Omitted proofs

Appendix A : Omitted proofs

Proof of Lemma 4

Starting from \(\Pr [X \ge (1 + \epsilon ) \chi ] < \exp \left( - \frac{\epsilon ^2 \chi }{2 (1 + \epsilon /3)} \right) \), where \(\chi {:}{=}\mu t\), a bound violation probability of at most \(\delta \) is needed (i.e., \(\delta = \exp ( - \frac{\epsilon ^2 \chi }{2 (1 + \epsilon /3)} )\)). Rearranging into quadratic form in \(\epsilon \),

$$\begin{aligned} 2 (1 + \epsilon /3) \log (\delta )&= - \epsilon ^2 \chi \\ \chi \epsilon ^2 + 2/3 \log (\delta ) \epsilon + 2 \log (\delta )&= 0 \end{aligned}$$

By solving the quadratic it can be determined that this is ensured if,

$$\begin{aligned} \epsilon&= \left( -2/3\log (\delta ) + \sqrt{4/ 9 \log (\delta )^2 - 8 \chi \log (\delta )}\right) /(2 \chi ) \end{aligned}$$
(30)
$$\begin{aligned}&= \left( -\log (\delta ) + \sqrt{-\log (\delta )(18 \chi - \log (\delta ))}\right) /(3 \chi ) \end{aligned}$$
(31)

Hence,

$$\begin{aligned} \Pr [X \ge \chi + \left( -\log (\delta ) + \sqrt{-\log (\delta )(18 \chi - \log (\delta ))}\right) /3]&\le \delta \end{aligned}$$
(32)

From the inner constraint we have,

$$\begin{aligned}&X \ge \chi + \left( -\log (\delta ) + \sqrt{-\log (\delta )(18 \chi - \log (\delta ))}\right) /3 \end{aligned}$$
(33)
$$\begin{aligned}&\quad 3 X + \log (\delta ) - 3 \chi \ge \sqrt{-\log (\delta )(18 \chi - \log (\delta ))} \end{aligned}$$
(34)

Squaring both sides and rearranging.

$$\begin{aligned} 0&\le \log (\delta )(18 \chi - \log (\delta )) + (3 X + \log (\delta ) - 3 \chi )^2 \end{aligned}$$
(35)
$$\begin{aligned} 0&\le 3 \chi ^2 + (4 \log (\delta ) - 6 X) \chi + 2 X \log (\delta ) + 3 X^2 \end{aligned}$$
(36)

Solving for \(\chi \) via the quadratic formula and simplifying.

$$\begin{aligned} \chi&\le X - 2/3 \log (\delta ) - \sqrt{-2/9 \log (\delta )(9 X - 2 \log (\delta ))} \end{aligned}$$
(37)

As such we have,

$$\begin{aligned}&\textstyle \Pr [\chi \le X - \frac{2}{3} \log (\delta ) - \sqrt{-\frac{2}{9} \log (\delta )(9 X - 2 \log (\delta ))}] \le \delta \end{aligned}$$
(38)
$$\begin{aligned}&\textstyle \Pr [\chi > X + \frac{2}{3} \log (\frac{1}{\delta }) - \sqrt{\frac{2}{9} \log (\frac{1}{\delta })(9 X + 2 \log (\frac{1}{\delta }))}] \ge 1 - \delta \end{aligned}$$
(39)

Let \(\text{ LB }(X,\delta ) {:}{=}X + \frac{2}{3} \log (\frac{1}{\delta }) - \sqrt{\frac{2}{9} \log (\frac{1}{\delta }) (9 X + 2 \log (\frac{1}{\delta })}\) then, \(\Pr [\mu > \text{ LB }(X,\delta )/t] \ge 1 - \delta \) \(\square \)

Proof of Lemma 5

Starting from \(\Pr [X \le (1 - \epsilon ) \chi ] \le \exp \left( - \frac{1}{2} \epsilon ^2 \chi \right) \), where \(\chi {:}{=}\mu t\), a concentration bound violation probability of at most \(\delta \) is needed (i.e., \(\delta = \exp ( - \frac{1}{2} \epsilon ^2 \chi )\)). This is ensured if \(\epsilon = \sqrt{-2 \log (\delta ) / \chi }\). Hence,

$$\begin{aligned} \Pr [X \le (1 - \sqrt{-2 \log (\delta ) / \chi }) \chi ]&\le \delta \end{aligned}$$
(40)

From the inner constraint we have, \(0 \le \chi - \sqrt{-2 \log (\delta ) \chi } - X\) Solving for \(\sqrt{\chi }\) via the quadratic formula gives, \(\sqrt{\chi } \ge \left( \sqrt{-2 \log (\delta )} + \sqrt{4X - 2 \log (\delta )}\right) /2\). As such we have,

$$\begin{aligned} \Pr [\chi \ge \left( \sqrt{-2 \log (\delta )} + \sqrt{4X - 2 \log (\delta )}\right) ^2/4]&\le \delta \end{aligned}$$
(41)
$$\begin{aligned} \Pr [\chi < \left( \sqrt{-2 \log (\delta )} + \sqrt{4X - 2 \log (\delta )}\right) ^2/4]&\ge 1 - \delta \end{aligned}$$
(42)
$$\begin{aligned} \Pr [\chi < X - \log (\delta ) + \sqrt{-\log (\delta ) (2 X - \log (\delta ))}]&\ge 1 - \delta \end{aligned}$$
(43)

Let \(\text{ UB }(X,\delta ) {:}{=}X + \log (\frac{1}{\delta }) + \sqrt{\log (\frac{1}{\delta })(2 X + \log (\frac{1}{\delta }))}\) then, \(\Pr [\mu < \text{ UB }(X,\delta )/t] \ge 1 - \delta \) \(\square \)

Proof of Lemma 8

Without lost of generality we will consider the left child of a group. From the applicability condition succeeding we have, \(\textstyle T \cdot [\sum _{i \in L \cup R} (1/\lambda _i)^\ell ]^{1/\ell } < 1\). Using this and the fact that the scale parameters are non-negative we have that, \(\textstyle T \cdot [\sum _{i \in L} (1/\lambda _i)^\ell ]^{1/\ell } < 1\). What remains to be shown is that increasing \(\ell \) can only decrease the left-hand side (the minimum shape of a child group may only be equal or large than the minimum shape of the parent group).

Consider \(\ell '\) such that \(\ell \le \ell '\) and let \(\sum _{i \in L} (1/\lambda _i)^\ell ]^{1/\ell } < c\) for \(c > 0\) then, \(\sum _{i \in L} (1/\lambda _i)^\ell < c^\ell \). For all i, \((1/\lambda _i)^\ell < c ^ \ell \) must hold, since all terms are positive, and hence \((1/\lambda _i) < c\) must also hold. Now multiplying both sides by \(c^{\ell ' - \ell }\) gives, \(\sum _{i \in L} (1/\lambda _i)^\ell c^{\ell ' - \ell } < c^{\ell '}\). Since \((1/\lambda _i) < c\) for all i it follows that \((1/\lambda _i) ^{\ell ' - \ell } < c ^ {\ell ' - \ell }\). From this it we have, \(\sum _{i \in L} (1/\lambda _i)^{\ell '} < c^{\ell '}\). Which gives, \([\sum _{i \in L} (1/\lambda _i)^{\ell '}]^{1/\ell '} < c\). \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bevilacqua, G.S., Lakshmanan, L.V.S. A fractional memory-efficient approach for online continuous-time influence maximization. The VLDB Journal (2021). https://doi.org/10.1007/s00778-021-00679-0

Download citation

Keywords

  • Influence maximization
  • Sampling
  • Streaming
  • Empirical guarantee
  • Memory footprint
  • Parallel algorithm