Abstract
In recent years, study of influence propagation in social networks has gained tremendous attention. In this context, we can identify three orthogonal dimensions—the number of seed nodes activated at the beginning (known as budget), the expected number of activated nodes at the end of the propagation (known as expected spread or coverage), and the time taken for the propagation. We can constrain one or two of these and try to optimize the third. In their seminal paper, Kempe et al. constrained the budget, left time unconstrained, and maximized the coverage: this problem is known as Influence Maximization (or MAXINF for short). In this paper, we study alternative optimization problems which are naturally motivated by resource and time constraints on viral marketing campaigns. In the first problem, termed minimum target set selection (or MINTSS for short), a coverage threshold η is given and the task is to find the minimum size seed set such that by activating it, at least η nodes are eventually activated in the expected sense. This naturally captures the problem of deploying a viral campaign on a budget. In the second problem, termed MINTIME, the goal is to minimize the time in which a predefined coverage is achieved. More precisely, in MINTIME, a coverage threshold η and a budget threshold k are given, and the task is to find a seed set of size at most k such that by activating it, at least η nodes are activated in the expected sense, in the minimum possible time. This problem addresses the issue of timing when deploying viral campaigns. Both these problems are NP-hard, which motivates our interest in their approximation. For MINTSS, we develop a simple greedy algorithm and show that it provides a bicriteria approximation. We also establish a generic hardness result suggesting that improving this bicriteria approximation is likely to be hard. For MINTIME, we show that even bicriteria and tricriteria approximations are hard under several conditions. We show, however, that if we allow the budget for number of seeds k to be boosted by a logarithmic factor and allow the coverage to fall short, then the problem can be solved exactly in PTIME, i.e., we can achieve the required coverage within the time achieved by the optimal solution to MINTIME with budget k and coverage threshold η. Finally, we establish the value of the approximation algorithms, by conducting an experimental evaluation, comparing their quality against that achieved by various heuristics.
Similar content being viewed by others
Notes
We use the terms coverage and expected spread interchangeably throughout the article.
If \(\epsilon = 1, \mathcal{A}\) outputs an empty collection.
Here, \(\text{OPT} _\mathcal{I}\) and \(\text{OPT} _\mathcal{J}\) represent the size of the optimal solution for instances \(\mathcal{I}\) and \(\mathcal{J}\) respectively.
Instead of 1, we could be left with a constant number of elements. Asymptotically, it does not make a difference.
References
Agarwal N, Liu H, Tang L, Yu P (2011) Modeling blogger influence in a community. Social Netw Anal Min 1–24. doi:10.1007/s13278-011-0039-3
Bakshy E, Hofman JM, Mason WA, Watts DJ (2011) Everyone’s an influencer: quantifying influence on twitter. In: Proceedings of the fourth ACM international conference on Web search and data mining, ACM, WSDM ’11, pp 65–74
Bar-Ilan J, Kortsarz G, Peleg D (2001) Generalized submodular cover problems and applications. Theor Comput Sci 250(1–2):179–200
Ben-Zwi O, Hermelin D, Lokshtanov D, Newman I (2009) An exact almost optimal algorithm for target set selection in social networks. In: EC ’09: Proceedings of the tenth ACM conference on electronic commerce, ACM, New York, NY, USA, pp 355–362
Bhagat S, Goyal A, Lakshmanan LVS (2012) Maximizing product adoption in social networks. In: Web search and data mining, WSDM
Bross J, Richly K, Kohnen M, Meinel C (2011) Identifying the top-dogs of the blogosphere. Social Netw Anal Min 1–15. doi:10.1007/s13278-011-0027-7
Cha M, Trez JP, Haddadi H (2011) The spread of media content through blogs. Social Netw Anal Min 1–16. doi:10.1007/s13278-011-0040-x
Chen N (2008) On the approximability of influence in social networks. In: SODA ’08: Proceedings of the nineteenth annual ACM–SIAM symposium on discrete algorithms, pp 1029–1037
Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’09)
Chen W, Wang C, Wang Y (2010a) Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’10)
Chen W, Yuan Y, Zhang L (2010b) Scalable influence maximization in social networks under the linear threshold model. In: Proceedings of the 10th IEEE international conference on data mining (ICDM’2010)
Domingos P, Richardson M (2001) Mining the network value of customers. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’01, pp 57–66
Feige U (1998) A threshold of XXX for approximating set cover. J ACM 45(4):634–652
Fujito T (1999) On approximation of the submodular set cover problem. Oper Res Lett 25(4):169–174
Fujito T (2000) Approximation algorithms for submodular set cover with applications. IEICE Trans Inf Syst 83
Goyal A, Bonchi F, Lakshmanan LVS (2008) Discovering leaders from community actions. In: Proceeding of the 17th ACM conference on information and knowledge management, ACM, New York, NY, USA, CIKM ’08, pp 499–508
Goyal A, Bonchi F, Lakshmanan LVS (2010) Learning influence probabilities in social networks. In: Proceedings of the third ACM international conference on web search and data mining, ACM, New York, NY, USA, WSDM ’10, pp 241–250
Goyal A, Bonchi F, Lakshmanan LVS (2011) A data-based approach to social influence maximization. PVLDB 5(1)
Kempe D, Kleinberg JM, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03)
Kempe D, Kleinberg J, Tardos É (2005) Influential nodes in a diffusion model for social networks. In: ICALP, Springer, Berlin, pp 1127–1138
Khuller S, Moss A, Naor JS (1999) The budgeted maximum coverage problem. Inf Process Lett 70(1):39–45
Kimura M, Saito K (2006) Tractable models for information diffusion in social networks. In: Proceedings of PKDD 2006, Lecture notes in computer science, vol 4213
Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen J, Glance NS (2007) Cost-effective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’07)
Li Gørtz I, Wirth A (2006) Asymmetry in k-center variants. Theor Comput Sci 361(2):188–199
Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions-I. Math Program 14(1):265–294
Panigrahy R, Vishwanathan S (1998) An O(log* n) approximation algorithm for the asymmetric p-center problem. J Algorithms 27(2):259–268
Richardson M, Domingos P (2002) Mining knowledge-sharing sites for viral marketing. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’02, pp 61–70
Slaví k P (1997) Improved performance of the greedy algorithm for partial cover. Inform Process Lett 64(5):251–254
Sviridenko M (2004) A note on maximizing a submodular set function subject to a knapsack constraint. Oper Res Lett 32(1):41–43
Weng J, Lim EP, Jiang J, He Q (2010) Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the third ACM international conference on web search and data mining, ACM, New York, NY, USA, WSDM ’10, pp 261–270
Wolsey LA (1982) An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica 2(4):385–393
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 A Proof of Lemma 2
Suppose there exists an algorithm \(\mathcal{A}\) that selects β k sets which covers γ η elements. Apply \(\mathcal{A}\) to an arbitrary instance \(\langle \mathcal{U}, \mathcal{S}, \eta \rangle\) of PSC. The output is a collection of sets \(\mathcal{C}_1\) such that \(|\mathcal{C}_1| \le \beta k\) and \(\left| { \cup _{{s \in c_{1} }} S} \right|{ \ge } \gamma \eta \) Next, discard the sets that have been selected and the elements they cover, and apply again the algorithm \(\mathcal{A}\) on the remaining universe. Repeat this process until 1 or fewer elements are left uncovered.Footnote 7
Let η i denote the number of elements uncovered after iteration i. In iteration i, the algorithm picks β k sets and covers at least γ η i−1 elements. Hence, \(\eta_i \le \eta_{i-1} \cdot (1 - \gamma). \) Expanding, \(\eta_i \le \eta \cdot (1 - \gamma)^i. \) Suppose after l iterations, η l = 1. The total number of sets picked is \(l\beta k. \eta \cdot (1 - \gamma)^l = 1\) implies \(l = \frac{\ln \eta}{\ln \frac{1}{1-\gamma}}. \)
We now prove the first claim. Let γ > 1 − 1/e β, then \(\ln \left( \frac{1}{1-\gamma} \right) > \beta. \) This yields a PTIME algorithm for PSC which outputs a solution of size \( l \beta k = \beta k \cdot \ln \eta / \ln \frac{1}{1-\gamma} \le c \cdot k \ln \eta\) (for some c < 1) This yields an \(c \cdot \ln \eta\)-approximation for PSC for some c < 1, which is not possible unless \({\rm NP} \subseteq \text{DTIME}(n^{O(\log \log n)})\) (Feige 1998).
To prove the second claim, assume \(\beta \le (1 - \delta) \ln \left( \frac{1}{1 -\gamma} \right). \) This gives a PTIME algorithm for PSC which outputs a solution of size \(l \beta k = \beta k \cdot \ln \eta / \ln \frac{1}{1-\gamma} \le (1 - \delta) k \cdot \ln \eta\) which is not possible unless \({\rm NP} \subseteq \text{DTIME}(n^{O(\log \log n)}). \) \(\quad\square\)
1.2 B Example illustrating performance of Wolsey’s solution
Wolsey (1982) studied the RSSC problem and showed, among many things, that the greedy algorithm provides a solution that is within a factor of \(1 + \ln (\eta/(\eta-f(S_{t-1}))\) of the optimal solution. Unfortunately, this does not yield an approximation algorithm with any guaranteed bounds. The following example shows the greedy solution with threshold η can be arbitrarily worse than the optimum.
Example
(Illustrated also in Fig. 4). Consider a ground set \(\mathcal{X} = \{w_1, w_2, v_1, v_2, \ldots, v_l\}\) with elements having unit costs. Figure 4 geometrically depicts the definition of a function \({f: 2^{\mathcal{X}} {\rightarrow}\;\mathbb{R}, }\) where for any set \(S \subset \mathcal{X},\;f(S)\) is defined to be the area (shown shaded) covered by the elements of S. Specifically, f(w 1) = f(w 2) = 1 − 1/2l+1 and f(v i ) = 1/2i−1, 1 ≤ i ≤ l. Notice, \(f(\{v_1, \ldots, v_l\}) = \Upsigma_{i=1}^l 1/2^{i-1} = 2 - 1/2^{l-1} < 2 - 1/2^l = f(\{w_1, w_2\}). \) The greedy algorithm will first pick v 1. Suppose it picks \(S = \{v_1,\ldots, v_i\}\) in i rounds. Then f(S∪{v i+1}) − f(S) = 1/2i > 1 − 1/2l+1 − 1 + 1/2i = 1 − 1/2l+1 − 1/2(2 − 1/2i−1) = f(S ∪ {w 1}) − f(S). Thus, greedy will never pick w 1 or w 2 before it picks \(v_1,\ldots, v_l. \) Suppose η = 2 − 1/2l. Clearly, the greedy solution is \(\mathcal{X}\) whereas the optimal solution is {w 1, w 2}. Here l can be arbitrarily large.
Rights and permissions
About this article
Cite this article
Goyal, A., Bonchi, F., Lakshmanan, L.V.S. et al. On minimizing budget and time in influence propagation over social networks. Soc. Netw. Anal. Min. 3, 179–192 (2013). https://doi.org/10.1007/s13278-012-0062-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13278-012-0062-z