Skip to main content
Log in

Sampling hypergraphs via joint unbiased random walk

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Hypergraphs are instrumental in modeling complex relational systems that encompass a wide spectrum of high-order interactions among components. One prevalent analysis task is the properties estimation of large-scale hypergraphs, which involves selecting a subset of nodes and hyperedges while preserving the characteristics of the entire hypergraph. This paper aims to sample hypergraphs via random walks and is the first to perform unbiased random walks for sampling of nodes and hyperedges simultaneously in large-scale hypergraphs to the best of our knowledge. Initially, we analyze the stationary distributions of nodes and hyperedges for the simple random walk, and show that there is a high bias in both nodes and hyperedges. Subsequently, to eliminate the high bias of the simple random walk, we propose unbiased random walk strategies for nodes and hyperedges, respectively. Finally, a single joint walk schema is developed for sampling nodes and hyperedges simultaneously. To accelerate the convergence process, we employ delayed acceptance and history-aware techniques to assist our algorithm in achieving fast convergence. Extensive experimental results validate our theoretical findings, and the unbiased sampling algorithms for nodes and hyperedges have their complex hypergraph scenarios for which they are applicable. The joint random walk algorithm balanced the sampling applicable to both nodes and hyperedges.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

No datasets were generated or analysed during the current study.

Notes

  1. http://www.cs.cornell.edu/~arb/data

References

  1. Shao, Y., Huang, S., Li, Y., Miao, X., Cui, B., Chen, L.: Memory-aware framework for fast and scalable second-order random walk over billion-edge natural graphs. VLDB J. 30(5), 769–797 (2021)

    Article  Google Scholar 

  2. Ahmed, N.K., Duffield, N.G., Neville, J., Kompella, R.R.: Graph sample and hold: a framework for big-graph analytics. In: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, USA - August 24 - 27, 2014, pp. 1446–1455 (2014)

  3. Cui, Y., Li, X., Li, J., Wang, H., Chen, X.: A survey of sampling method for social media embeddedness relationship. ACM Comput. Surv. 55(4), 74–17439 (2023)

    Article  Google Scholar 

  4. Krishnamurthy, V., Faloutsos, M., Chrobak, M., Cui, J., Lao, L., Percus, A.G.: Sampling large internet topologies for simulation purposes. Comput. Networks 51(15), 4284–4302 (2007)

    Article  Google Scholar 

  5. Tang, W., Luo, G., Wu, Y., Tian, L., Zheng, X., Cai, Z.: A second-order diffusion model for influence maximization in social networks. IEEE Trans. Comput. Soc. Syst. 6(4), 702–714 (2019)

    Article  Google Scholar 

  6. Zhao, Y., Jiang, H., Chen, Q., Qin, Y., Xie, H., Wu, Y., Liu, S., Zhou, Z., Xia, J., Zhou, F.: Preserving minority structures in graph sampling. IEEE Trans. Vis. Comput. Graph. 27(2), 1698–1708 (2021)

    Article  Google Scholar 

  7. Chen, J., Ma, T., Xiao, C.: Fastgcn: Fast learning with graph convolutional networks via importance sampling. In: 6th International Conference on Learning Representations, ICLR (2018)

  8. Nakajima, K., Shudo, K.: Social graph restoration via random walk sampling. In: 38th IEEE International Conference on Data Engineering, ICDE, pp. 1–14 (2022)

  9. Xu, X., Lee, C., Eun, D.Y.: Challenging the limits: Sampling online social networks with cost constraints. In: IEEE Conference on Computer Communications, INFOCOM, pp. 1–9 (2017)

  10. Musciotto, F., Battiston, F., Mantegna, R.N.: Detecting informative higher-order interactions in statistically validated hypergraphs. Commun. Phys. 4(1), 1–9 (2021)

    Article  Google Scholar 

  11. Veldt, N., Benson, A.R., Kleinberg, J.: Combinatorial characterizations and impossibilities for higher-order homophily. Sci. Adv. 9(1), 3200 (2023)

    Article  Google Scholar 

  12. Ganmor, E., Segev, R., Schneidman, E.: Sparse low-order interaction network underlies a highly correlated and learnable neural population code. Proc. Natl. Acad. Sci. 108(23), 9679–9684 (2011)

    Article  Google Scholar 

  13. Lee, G., Ko, J., Shin, K.: Hypergraph motifs: Concepts, algorithms, and discoveries. Proc. VLDB Endow. 13(11), 2256–2269 (2020)

    Article  Google Scholar 

  14. Zhou, D., Huang, J., Schölkopf, B.: Learning with hypergraphs: Clustering, classification, and embedding. In: Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, pp. 1601–1608 (2006)

  15. Zeng, Y., Huang, Y., Ren, X.-L., Lu, L.: Identifying vital nodes through augmented random walks on higher-order networks (2023)

  16. Choe, M., Yoo, J., Lee, G., Baek, W., Kang, U., Shin, K.: MiDaS: Representative Sampling from Real-world Hypergraphs. In: Proceedings of the ACM Web Conference 2022. WWW ’22, pp. 1080–1092 (2022)

  17. Chitra, U., Raphael, B.: Random Walks on Hypergraphs with Edge-Dependent Vertex Weights. In: Proceedings of the 36th International Conference on Machine Learning, pp. 1172–1181 (2019)

  18. Hayashi, K., Aksoy, S.G., Park, C.H., Park, H.: Hypergraph random walks, laplacians, and clustering. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. CIKM ’20, pp. 495–504, New York, NY, USA (2020)

  19. Dyer, M., Greenhill, C., Kleer, P., Ross, J., Stougie, L.: Sampling hypergraphs with given degrees. Discrete Math. 344(11), 112566 (2021)

    Article  MathSciNet  Google Scholar 

  20. Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A.: Walking in facebook: A case study of unbiased sampling of osns. In: 29th IEEE International Conference on Computer Communications, INFOCOM, pp. 2498–2506 (2010)

  21. Chitra, U., Raphael, B.J.: Random walks on hypergraphs with edge-dependent vertex weights. In: Proceedings of the 36th International Conference on Machine Learning, ICML, vol. 97, pp. 1172–1181 (2019)

  22. Carletti, T., Battiston, F., Cencetti, G., Fanelli, D.: Random walks on hypergraphs. Phys. Rev. E 101(2), 022308 (2020)

    Article  MathSciNet  Google Scholar 

  23. Aksoy, S.G., Joslyn, C.A., Marrero, C.O., Praggastis, B., Purvine, E.: Hypernetwork science via high-order hypergraph walks. EPJ Data Sci. 9(1), 16 (2020)

    Article  Google Scholar 

  24. Eriksson, A., Edler, D., Rojas, A., Domenico, M., Rosvall, M.: How choosing random-walk model and network representation matters for flow-based community detection in hypergraphs. Commun. Phys. 4(1), 133 (2021)

    Article  Google Scholar 

  25. Carletti, T., Fanelli, D., Lambiotte, R.: Random walks and community detection in hypergraphs. J. Phys. Complex. 2(1), 015011 (2021)

    Article  Google Scholar 

  26. Xie, H., Yi, P., Li, Y., Lui, J.C.S.: Optimizing random walk based statistical estimation over graphs via bootstrapping. IEEE Trans. Knowl. Data Eng. 35(3), 2916–2929 (2023)

    Google Scholar 

  27. Hu, P., Lau, W.C.: A survey and taxonomy of graph sampling. CoRR abs/1308.5865 (2013)

  28. Kirkland, S.: Two-mode networks exhibiting data loss. J. Complex Netw. 6(2), 297–316 (2018)

    Article  MathSciNet  Google Scholar 

  29. Hong, S., Lu, S.: Graph sampling methods for big complex networks integrating centrality, k-core, and spectral sparsification. In: SAC ’20: The 35th ACM/SIGAPP Symposium on Applied Computing, pp. 1843–1851 (2020)

  30. Han, J.-D.J., Dupuy, D., Bertin, N., Cusick, M.E., Vidal, M.: Effect of sampling on topology predictions of protein-protein interaction networks. Nat. Biotechnol 23(7), 839–844 (2005)

    Article  Google Scholar 

  31. Leskovec, J., Kleinberg, J.M., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 177–187 (2005)

  32. Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 631–636 (2006)

  33. Yousuf, M.I., Anwer, I., Anwar, R.: Empirical characterization of graph sampling algorithms. CoRR abs/2102.07980 (2021)

  34. Pearson, K.: The problem of the random walk. Nature 72(1865), 294–294 (1905)

    Article  Google Scholar 

  35. Jin, L., Chen, Y., Hui, P., Ding, C., Wang, T., Vasilakos, A.V., Deng, B., Li, X.: Albatross sampling: robust and effective hybrid vertex sampling for social graphs. In: Proceedings of the 3rd ACM International Workshop on Hot Topics in Planet-scale Measurement, pp. 11–16 (2011)

  36. Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A.: Walking in facebook: A case study of unbiased sampling of osns. In: INFOCOM 2010. 29th IEEE International Conference on Computer Communications, pp. 2498–2506 (2010)

  37. Ribeiro, B.F., Towsley, D.F.: Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th ACM SIGCOMM Internet Measurement Conference, pp. 390–403 (2010)

  38. Zhou, Z., Zhang, N., Das, G.: Leveraging history for faster sampling of online social networks. Proc. VLDB Endow. 8(10), 1034–1045 (2015)

    Article  Google Scholar 

  39. Wang, R., Li, Y., Lin, S., Wu, W., Xie, H., Xu, Y., Lui, J.C.S.: Common neighbors matter: Fast random walk sampling with common neighbor awareness. IEEE Trans. Knowl. Data Eng. 35(5), 4570–4584 (2023)

    Google Scholar 

  40. Li, Y., Wu, Z., Lin, S., Xie, H., Lv, M., Xu, Y., Lui, J.C.S.: Walking with perception: Efficient random walk sampling via common neighbor awareness. In: 35th IEEE International Conference on Data Engineering, ICDE, pp. 962–973 (2019)

  41. Liu, Q., Huang, Y., Metaxas, D.N.: Hypergraph with sampling for image retrieval. Pattern Recognit. 44(10–11), 2255–2262 (2011)

    Article  Google Scholar 

  42. Joos, F., Kim, J., Kühn, D., Osthus, D.: Hypergraph regularity and random sampling. Random Struct. Algorithms 62(4), 956–1015 (2023)

    Article  MathSciNet  Google Scholar 

  43. Xie, X., Shi, S., Wang, H., Li, M.: SAT: sampling acceleration tree for adaptive database repartition. World Wide Web (WWW) 26(5), 3503–3533 (2023)

    Article  Google Scholar 

  44. Cooper, C., Frieze, A., Radzik, T.: The cover times of random walks on random uniform hypergraphs. Theor. Comput. Sci. 509, 51–69 (2013)

    Article  MathSciNet  Google Scholar 

  45. Bermond, J., Heydemann, M., Sotteau, D.: Line graphs of hypergraphs I. Discret. Math. 18(3), 235–241 (1977)

    Article  MathSciNet  Google Scholar 

  46. Lu, L., Peng, X.: High-ordered random walks and generalized laplacians on hypergraphs. In: Algorithms and Models for the Web Graph - 8th International Workshop, WAW. Lecture Notes in Computer Science, vol. 6732, pp. 14–25 (2011)

  47. Joslyn, C., Aksoy, S., Arendt, D., Jenkins, L., Praggastis, B., Purvine, E., Zalewski, M.: High performance hypergraph analytics of domain name system relationships. In: HICSS Symposium on Cybersecurity Big Data Analytics (2019)

  48. Brooks, S., Gelman, A., Jones, G., Meng, X.-L.: Handbook of Markov Chain Monte Carlo, (2011)

  49. Avin, C., Lando, Y., Lotker, Z.: Radio cover time in hyper-graphs. Ad Hoc Networks 12, 278–290 (2014)

    Article  Google Scholar 

  50. Lee, C., Xu, X., Eun, D.Y.: Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling. In: ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS, pp. 319–330 (2012)

  51. Hastings, W.K.: Monte carlo sampling methods using markov chains and their applications. Biometrika (1970)

  52. Andrieu, C., Livingstone, S.: Peskun-tierney ordering for markovian monte carlo: Beyond the reversible scenario. Ann. Stat. 49(4), 1958–1981 (2021)

    Article  MathSciNet  Google Scholar 

  53. Diaconis, P., Holmes, S., Neal, R.M.: Analysis of a nonreversible markov chain sampler. Ann. Appl. Probab, 726–752 (2000)

  54. Neal, R.M.: Improving asymptotic variance of mcmc estimators: Non-reversible chains are better. Preprint arXiv:math/0407281 (2004)

  55. Katzir, L., Liberty, E., Somekh, O.: Estimating sizes of social networks via biased sampling. In: Proceedings of the 20th International Conference on World Wide Web, WWW, pp. 597–606 (2011)

Download references

Funding

National Key Research and Development Program of China under Grant 2020YFB1005900, National Natural Science Foundation of China (NSFC) under Grant 62122042, Shandong University multidisciplinary research and innovation team of young scholars under Grant 2020QNQT017. Young Scientists Fund of the Natural Science Foundation of Shandong Province under Grant 61150005202301.

Author information

Authors and Affiliations

Authors

Contributions

Qi Luo and Zhenzhen Xie wrote the manuscript, Dongxiao Yu and Yu Liu collected the data, and Xiuzhen Cheng, Xiahua Jia and Xuemin Lin analyzed the results. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Zhenzhen Xie.

Ethics declarations

Ethical Approval

not applicable.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luo, Q., Xie, Z., Liu, Y. et al. Sampling hypergraphs via joint unbiased random walk. World Wide Web 27, 15 (2024). https://doi.org/10.1007/s11280-024-01253-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11280-024-01253-8

Keywords

Navigation