Skip to main content
Log in

Efficiently estimating node influence through group sampling over large graphs

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

The huge amount of graph data necessitates sampling methods to support graph-based analysis applications. Node influence is to count the influential nodes with a given node in large graphs that has wide applications including product promotion and information diffusion in social networks. However, existing sampling methods mainly consider node degree to compute the node influence while ignoring the important connections in terms of groups in which nodes participate, resulting in inaccuracy of influence estimations. To this end, this paper proposes group sampling, called GVRW, to count the groups along with node degrees to evaluate node influence in large graphs. Specifically, GVRW changes the way of random walker traversing a large graph from one node to a random neighbor node of the groups to enlarge the sampling space for the sake of characterizing the nodes and groups simultaneously. Furthermore, we carefully design the corresponding estimated method to employ the samples to estimate the specific distributions of groups and node degrees to compute the node influence. Experimental results on real-world graph datasets show that our proposed sampling and estimating methods can accurately obtain the properties and approximate the node influences closer to the real values than existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Algorithm 2
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data Availability

No datasets were generated or analysed during the current study.

References

  1. Abidi, A., Zhou, R., Chen, L., Liu, C.: Pivot-based maximal biclique enumeration. In: IJCAI, pp. 3558–3564 (2020)

  2. Alspector, J., Kolcz, A., Karunanithi, N.: Comparing feature-based and clique-based user models for movie selection. In: Proceedings of the third ACM Conference on Digital Libraries, pp. 11–18. ACM (1998)

  3. Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 199–208 (2009)

  4. Gjoka, M., Butts, C.T., Kurant, M., Markopoulou, A.: Multigraph sampling of online social networks. Sel. Areas Commun. 29(9), 1893–1905 (2011)

    Article  Google Scholar 

  5. Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A.: Walking in facebook: A case study of unbiased sampling of OSNs. In: INFOCOM, pp. 1–9. IEEE (2010)

  6. Gjoka, M., Smith, E., Butts, C.: Estimating clique composition and size distributions from sampled network data. In: INFOCOM WKSHPS, pp. 837–842. IEEE (2014)

  7. Guo, Q., Wang, S., Wei, Z., Chen, M.: Influence maximization revisited: Efficient reverse reachable set generation with bound tightened. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 2167–2181 (2020)

  8. Han, M., Li, Y.: Influence analysis: A survey of the state-of-the-art. Math. Found. Comput. 1(3), 201–253 (2018)

    Article  Google Scholar 

  9. Huang, K., Tang, J., Xiao, X., Sun, A., Lim, A.: Efficient approximation algorithms for adaptive target profit maximization. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 649–660. IEEE (2020)

  10. Lee, C.-H., Xu, X., Eun, D.Y.: Beyond random walk and metropolis-hastings samplers: Why you should not backtrack for unbiased graph sampling. In: SIGMETRICS, vol. 40, pp. 319–330. ACM (2012)

  11. Leskovec, J., Adamic, L.A., Huberman, B.A.: The dynamics of viral marketing. ACM Trans. Web (TWEB) 1(1), 5 (2007)

    Article  Google Scholar 

  12. Li, R.-H., Yu, J.X., Qin, L., Mao, R., Jin, T.: On random walk based graph sampling. In: ICDE, pp. 927–938. IEEE (2015)

  13. Lovász, L.: Random walks on graphs: A survey. Combinatorics, Paul Erdos is eighty, vol. 2(1), pp. 1–46 (1993)

  14. Mo, S., Bao, Z., Zhang, P., Peng, Z.: Towards an efficient weighted random walk domination. Proc VLDB Endow 14(4), 560–572 (2020)

    Article  Google Scholar 

  15. Ribeiro, B., Towsley, D.: Estimating and sampling graphs with multidimensional random walks. In: SIGCOMM, pp. 390–403. ACM (2010)

  16. Ribeiro, B., Wang, P., Murai, F., Towsley, D.: Sampling directed graphs with random walks. In: INFOCOM, pp. 1692–1700. IEEE (2012)

  17. Strogatz, S.H.: Exploring complex networks. Nature 410(6825), 268–276 (2001)

    Article  Google Scholar 

  18. Wang, P., et al.: Efficiently estimating motif statistics of large networks. ACM Trans. Knowl. Discov. Data (TKDD) 9(2), 8 (2014)

    Google Scholar 

  19. Wang, P., Ribeiro, B., Zhao, J., Lui, J., Towsley, D., Guan, X.: Practical characterization of large networks using neighborhood information. arXiv:1311.3037 (2013)

  20. Wasserman, S., Faust, K.: Social network analysis: Methods and applications, vol. 8. Cambridge University Press (1994)

  21. Xie, H., Yi, P., Li, Y., Lui, J.C.: Optimizing random walk based statistical estimation over graphs via bootstrapping. IEEE Trans. Knowl. Data Eng. (2021)

  22. Xu, X., Lee, C.-H., et al.: Challenging the limits: Sampling online social networks with cost constraints. In: INFOCOM (2017)

  23. Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015)

    Article  Google Scholar 

  24. Zareie, A., Sakellariou, R.: Influence maximization in social networks: A survey of behaviour-aware methods. Soc. Netw. Anal. Min. 13(1), 78 (2023)

    Article  Google Scholar 

  25. Zhang, L., Jiang, H., Wang, F., Feng, D.: Draws: A dual random-walk based sampling method to efficiently estimate distributions of degree and clique size over social networks. Knowl.-Based Syst. 198, 105891 (2020)

    Article  Google Scholar 

  26. Zhang, L., Wang, F., Jiang, H., Feng, D., Xie, Y., Zhang, Z., Wang, G.: Random walk on node cliques for high-quality samples to estimate large graphs with high accuracies and low costs. Knowl. Inf. Syst. 64(7), 1909–1935 (2022)

    Article  Google Scholar 

  27. Zhang, L., Zhang, Z., Wang, G., Yuan, Y.: Efficiently sampling and estimating hypergraphs by hybrid random walk. In: 2023 IEEE 39th International Conference on Data Engineering (ICDE), pp. 1273–1285. IEEE (2023)

  28. Zhang, Y., Li, Y., Bao, Z., Zheng, B., Jagadish, H.: Minimizing the regret of an influence provider. In: Proceedings of the 2021 International Conference on Management of Data, pp. 2115–2127 (2021)

  29. Zhou, C., Zhang, P., Zang, W., Guo, L.: Maximizing the cumulative influence through a social network when repeat activation exists. Procedia Comput. Sci. 29, 422–431 (2014)

    Article  Google Scholar 

  30. Zhu, Y., Tang, J., Tang, X., Wang, S., Lim, A.: 2-hop+ sampling: Efficient and effective influence estimation. IEEE Trans. Knowl. Data Eng. (2021)

Download references

Funding

This work is supported by NSFC(Natural Science Foundation of China) 62302043.

Author information

Authors and Affiliations

Authors

Contributions

Lingling Zhang: Conceptualization, Methodology, Software, Writing - original draft. Zhiping Shi: Conceptualization, Writing - original draft. Zhiwei Zhang and Ye Yuan: Supervision, Writing - review & editing. Guoren Wang: Writing - review & editing

Corresponding author

Correspondence to Lingling Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Shi, Z., Zhang, Z. et al. Efficiently estimating node influence through group sampling over large graphs. World Wide Web 27, 18 (2024). https://doi.org/10.1007/s11280-024-01257-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11280-024-01257-4

Keywords

Navigation