Abstract
In large-scale data processing, graph analytics of complex interaction networks are indispensable. As the whole graph processing and analytics can be inefficient and usually impractical, graph sampling by keeping a portion of the original graph becomes a favorable approach. While prior work focused on fixed edge and node selection strategy based on predetermined criteria, without adaptive feedback to adjust the sampling process, this type of sampling algorithms has limited flexibility and estimation accuracy for complex graphs. In this paper, we propose an adaptive graph sampling framework, and design AdapES, an adaptive edge sampling algorithm based on this framework. Compared to non-adaptive sampling methods, our approach can continually monitor the difference between the current sampled subgraph and the original graph, and dynamically adjust the edge sampling probability based on this observed sampling difference. Guided by a preset sampling goal, this algorithm automatically adapts to the fluctuations in the random sampling process with high flexibility. The experimental evaluation in 11 datasets demonstrates that AdapES outperforms other algorithms for preserving various graph properties and statistics.
Similar content being viewed by others
Change history
09 February 2024
A Correction to this paper has been published: https://doi.org/10.1007/s13278-024-01200-5
References
Abu-El-Haija S, Fatemi B, Axiotis K, Bulut N, Gasteiger J, Dillon JV, Perozzi B, Bateni M (2023) Submix: learning to mix graph sampling heuristics. In: The 39th conference on uncertainty in artificial intelligence
Ahmed NK, Neville J, Kompella R (2013) Network sampling: from static to streaming graphs. ACM Trans Knowl Discov Data (TKDD) 8(2):1–56
Ahmed NK, Duffield N, Neville J, Kompella R (2014) Graph sample and hold: a framework for big-graph analytics. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1446–1455
Alev VL, Lau LC (2020) Improved analysis of higher order random walks and applications. In: Proceedings of the 52nd annual ACM SIGACT symposium on theory of computing, pp 1198–1211
Ben-Eliezer O, Eden T, Oren J, Fotakis D (2022) Sampling multiple nodes in large networks: beyond random walks. In: Proceedings of the fifteenth ACM international conference on web search and data mining, pp 37–47
Bera SK, Seshadhri C (2020) How to count triangles, without seeing the whole graph. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 306–316
Chen X, Tan H, Chen Y, He B, Wong WF, Chen D (2021a) ThunderGP: HLS-based graph processing framework on FPGAs. In: The 2021 ACM/SIGDA international symposium on field-programmable gate arrays, pp 69–80
Chen Y, Huang S, Zhao L, Dissanayake G (2021b) Cramér-rao bounds and optimal design metrics for pose-graph slam. IEEE Trans Robot 37(2):627–641
Choe M, Yoo J, Lee G, Baek W, Kang U, Shin K (2022) Midas: representative sampling from real-world hypergraphs. In: Proceedings of the ACM web conference, pp 1080–1092
Cong W, Forsati R, Kandemir M, Mahdavi M (2020) Minimal variance sampling with provable guarantees for fast training of graph neural networks. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1393–1403
Fan W (2022) Big graphs: challenges and opportunities. Proc VLDB Endow 15(12):3782–3797
Fan W, He T, Lai L, Li X, Li Y, Li Z, Qian Z, Tian C, Wang L, Xu J et al (2021) Graphscope: a unified engine for big graph processing. Proc VLDB Endow 14(12):2879–2892
Gao H, Liu Y, Ji S (2021) Topology-aware graph pooling networks. IEEE Trans Pattern Anal Mach Intell 43(12):4512–4518
Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in facebook: a case study of unbiased sampling of osns. In: 2010 Proceedings IEEE Infocom. IEEE, pp 1–9
Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) \(\{\)GraphX\(\}\): graph processing in a distributed dataflow framework. In: 11th USENIX symposium on operating systems design and implementation (OSDI 14), pp 599–613
Gove R (2019) A random sampling O (n) force-calculation algorithm for graph layouts. Computer graphics forum, vol 38. Wiley Online Library, pp 739–751
Hagberg A, Swart P, Chult DS (2008) Exploring network structure, dynamics, and function using NetworkX. Tech. rep., Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Hoang L, Dathathri R, Gill G, Pingali K (2021) Cusp: a customizable streaming edge partitioner for distributed graph analytics. ACM SIGOPS Oper Syst Rev 55(1):47–60
Hong SH, Lu S (2020) Graph sampling methods for big complex networks integrating centrality, k-core, and spectral sparsification. In: Proceedings of the 35th annual ACM symposium on applied computing, pp 1843–1851
Hu Z, Zheng W, Lian X (2023) Triangular stability maximization by influence spread over social networks. Proc VLDB Endow 16(11):2818–2831
Imola J, Murakami T, Chaudhuri K (2022) \(\{\)Communication-Efficient\(\}\) triangle counting under local differential privacy. In: 31st USENIX security symposium (USENIX Security 22), pp 537–554
Jangda A, Polisetty S, Guha A, Serafini M (2021) Accelerating graph sampling for graph machine learning using GPUs. In: Proceedings of the sixteenth European conference on computer systems, pp 311–326
Jin T, Li B, Li Y, Zhou Q, Ma Q, Zhao Y, Chen H, Cheng J (2023) Circinus: fast redundancy-reduced subgraph matching. Proc ACM Manag Data 1(1):1–26
Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 631–636
Leskovec J, Krevl A (2014) SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data
Li Y, Wu Z, Lin S, Xie H, Lv M, Xu Y, Lui JC (2019) Walking with perception: efficient random walk sampling via common neighbor awareness. In: 2019 IEEE 35th international conference on data engineering (ICDE), IEEE, pp 962–973
Liu P, Benson AR, Charikar M (2019) Sampling methods for counting temporal motifs. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 294–302
Mariappan M, Che J, Vora K (2021) Dzig: sparsity-aware incremental processing of streaming graphs. In: Proceedings of the sixteenth European conference on computer systems, pp 83–98
Nakajima K, Shudo K (2022) Social graph restoration via random walk sampling. In: 2022 IEEE 38th international conference on data engineering (ICDE). IEEE, pp 1–14
Nguyen D, Lenharth A, Pingali K (2013) A lightweight infrastructure for graph analytics. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, pp 456–471
Pandey P, Wheatman B, Xu H, Buluc A (2021) Terrace: a hierarchical graph container for skewed dynamic graphs. In: Proceedings of the 2021 international conference on management of data, pp 1372–1385
Preti G, De Francisci MG, Riondato M (2023) Maniacs: approximate mining of frequent subgraph patterns through sampling. ACM Trans Intell Syst Technol 14(3):1–29
Rozemberczki B, Kiss O, Sarkar R (2020) Little ball of fur: a python library for graph sampling. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 3133–3140
Sahu S, Mhedhbi A, Salihoglu S, Lin J, Özsu MT (2020) The ubiquity of large graphs and surprising challenges of graph processing: extended survey. VLDB J 29:595–618
Sakr S, Bonifati A, Voigt H, Iosup A, Ammar K, Angles R, Aref W, Arenas M, Besta M, Boncz PA et al (2021) The future is big graphs: a community view on graph processing systems. Commun ACM 64(9):62–71
Shin K, Oh S, Kim J, Hooi B, Faloutsos C (2020) Fast, accurate and provable triangle counting in fully dynamic graph streams. ACM Trans Knowl Discov Data (TKDD) 14(2):1–39
Staudt CL, Sazonovs A, Meyerhenke H (2016) Networkit: a tool suite for large-scale complex network analysis. Netw Sci 4(4):508–530
Stumpf MP, Wiuf C, May RM (2005) Subnets of scale-free networks are not scale-free: sampling properties of networks. Proc Natl Acad Sci 102(12):4221–4224
Swift IP, Ebrahimi S, Nova A, Asudeh A (2022) Maximizing fair content spread via edge suggestion in social networks. Proc VLDB Endow 15(11):2692–2705
Tan Q, Zhang J, Yao J, Liu N, Zhou J, Yang H, Hu X (2021) Sparse-interest network for sequential recommendation. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 598–606
Tětek J, Thorup M (2022) Edge sampling and graph parameter estimation via vertex neighborhood accesses. In: Proceedings of the 54th annual ACM SIGACT symposium on theory of computing, pp 1116–1129
Trolliet T, Cohen N, Giroire F, Hogie L, Pérennes S (2022) Interest clustering coefficient: a new metric for directed networks like twitter. J Complex Netw 10(1):cnab030
Van Koevering K, Benson A, Kleinberg J (2021) Random graphs with prescribed k-core sequences: a new null model for network analysis. In: Proceedings of the web conference, pp 367–378
Wan C, Li Y, Li A, Kim NS, Lin Y (2022) Bns-gcn: efficient full-graph training of graph convolutional networks with partition-parallelism and random boundary node sampling. Proc Mach Learn Syst 4:673–693
Yang C, Buluç A, Owens JD (2022) Graphblast: a high-performance linear algebra-based graph framework on the gpu. ACM Trans Math Softw (TOMS) 48(1):1–51
Yang K, Zhang M, Chen K, Ma X, Bai Y, Jiang Y (2019) Knightking: a fast distributed graph random walk engine. In: Proceedings of the 27th ACM symposium on operating systems principles, pp 524–537
You J, Leskovec J, He K, Xie S (2020) Graph structure of neural networks. In: International conference on machine learning. PMLR, pp 10,881–10,891
Zeng H, Zhou H, Srivastava A, Kannan R, Prasanna V (2020) Graphsaint: graph sampling based inductive learning method
Zhang Z, Liu Q, Hu Q, Lee CK (2022) Hierarchical graph transformer with adaptive node sampling. Adv Neural Inf Process Syst 35:21171–21183
Zhao Y, Jiang H, Qin Y, Xie H, Wu Y, Liu S, Zhou Z, Xia J, Zhou F et al (2020) Preserving minority structures in graph sampling. IEEE Trans Vis Comput Graph 27(2):1698–1708
Zheng C, Zong B, Cheng W, Song D, Ni J, Yu W, Chen H, Wang W (2020) Robust graph representation learning via neural sparsification. In: International conference on machine learning. PMLR, pp 11458–11468
Zhu Z, Wu K, Liu Z (2023) Arya: arbitrary graph pattern mining with decomposition-based sampling. In: 20th USENIX symposium on networked systems design and implementation (NSDI 23), pp 1013–1030
Author information
Authors and Affiliations
Contributions
KW contributed to the work of conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing—original draft, writing—review & editing, and visualization.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
The original online version of this arcle was revised: to update the missed corrections.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, K. An adaptive graph sampling framework for graph analytics. Soc. Netw. Anal. Min. 14, 4 (2024). https://doi.org/10.1007/s13278-023-01157-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-023-01157-x