Skip to main content
Log in

A spanning tree approach to social network sampling with degree constraints

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

Online social networks (OSNs) have become increasingly popular on the web in recent years. There are millions of users on these networks, and they generate a great deal of interaction among them. Network sampling is often used to study and measure OSNs, which produce a small, accessible network with a limited number of edges and nodes. In this paper, two algorithms are proposed to sample OSNs. The first algorithm finds several spanning trees with the highest number of nodes that satisfy a given degree constraint from randomly chosen root nodes; edges and nodes of computed spanning trees are ranked based on how often they appear in them. Finally, a fraction of the highly ranked nodes and edges are considered in the sampled network. Second, a partial spanning tree algorithm is proposed in place of a full spanning tree algorithm to improve the performance of the first algorithm. We conduct several experiments on well-known real networks to determine the performance of the proposed sampling algorithms. As a result of the experiments, the proposed algorithms outperformed some of the baseline and recently presented algorithms in terms of the Kolmogorov-Smirnov (KS) test, Skew Divergence (SD) distance, and Normalized Distance (ND).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

Data is provided within the supplementary information files.

Notes

  1. A subgraph \(H\) is an induced subgraph of \(G\) if for any two vertices \(u v\) in \(H\), \(u\) and \(v\) are adjacent in \(H\) if and only if they are adjacent in G.

References

  • Ahmed NK, Neville J, Kompella R (2014b) Network sampling: from static to streaming graphs. ACM Trans Knowl Discov Data (TKDD) 8:7

    Google Scholar 

  • Ahmed NK, Berchmans F, Neville J, Kompella R (2010) Time-based sampling of social network activity graphs. In: Proceedings of the Eighth workshop on mining and learning with graphs. ACM, pp 1–9

  • Ahmed NK, Duffield N, Neville J, Kompella R (2014a) Graph sample and hold: A framework for big-graph analytics. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1446–1455

  • Ansari N, Cheng G, Krishnan RN (2004) Efficient and reliable link state information dissemination. IEEE Commun Lett 8:317–319

    Article  Google Scholar 

  • Bearman PS, Moody J, Stovel K (2004) Chains of affection: the structure of adolescent romantic and sexual networks1. Am J Sociol 110:44–91

    Article  Google Scholar 

  • Bellur B, Ogier RG (1999) A reliable, efficient topology broadcast protocol for dynamic networks. In: INFOCOM’99. Eighteenth annual joint conference of the IEEE computer and communications societies. Proceedings. IEEE. IEEE, pp 178–186

  • Blagus N, Šubelj L, Weiss G, Bajec M (2015) Sampling promotes community structure in social and information networks. Physica A 432:206–215

    Article  Google Scholar 

  • Blomsma N, de Rooy B, Gerritse F et al (2022) Minimum spanning tree analysis of brain networks: a systematic review of network size effects, sensitivity for neuropsychiatric pathology, and disorder specificity. Netw Neurosci 6:301–319

    Article  Google Scholar 

  • Ebadi Jokandan SM, Bayat P, Farrokhbakht Foumani M (2021) CS- and GA-based hybrid evolutionary sampling algorithm for large-scale social networks. Soc Netw Anal Min 11:120. https://doi.org/10.1007/s13278-021-00836-x

    Article  Google Scholar 

  • Erdos P, Rényi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5:17–61

    MathSciNet  Google Scholar 

  • Gao Q, Ding X, Pan F, Li W (2014) An improved sampling method of complex network. Int J Mod Phys C 25:1440007

    Article  Google Scholar 

  • Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness, 1st edn. W. H Freeman, San Francisco

    Google Scholar 

  • Gile KJ, Handcock MS (2010) Respondent-driven sampling: an assessment of current methodology. Sociol Methodol 40:285–327

    Article  Google Scholar 

  • Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in Facebook: a case study of unbiased sampling of OSNs. In: Proceedings IEEE INFOCOM 2010. San Diego, CA, pp 1–9

  • Hill RJ (1999) International comparisons using spanning trees. In: International and interarea comparisons of income, Output, and Prices. University of Chicago Press, pp 109–120

  • Jalali ZS, Rezvanian A, Meybodi MR (2016) Social network sampling using spanning trees. Int J Mod Phys C 27:1650052

    Article  MathSciNet  Google Scholar 

  • James F (2006) Statistical methods in experimental physics. World Scientific

    Book  Google Scholar 

  • Jaouadi M, Romdhane LB (2021) A distributed model for sampling large scale social networks. Expert Syst Appl 186:115773

    Article  Google Scholar 

  • Kurant M, Markopoulou A, Thiran P (2011b) Towards unbiased BFS sampling. IEEE J Sel Areas Commun 29:1799–1809

    Article  Google Scholar 

  • Kurant M, Gjoka M, Butts CT, Markopoulou A (2011a) Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems. ACM, pp 281–292

  • Kurant M, Gjoka M, Wang Y, et al (2012) Coarse-grained topology estimation via graph sampling. In: Proceedings of the 2012 ACM workshop on Workshop on online social networks. ACM, pp 25–30

  • Lee L (2001) On the effectiveness of the skew divergence for statistical language analysis. In: AISTATS. Citeseer

  • Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data (TKDD) 1:1–41

    Google Scholar 

  • Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6:29–123

    Article  MathSciNet  Google Scholar 

  • Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, Philadelphia, pp 631–636

  • Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp 177–187

  • Liu X, Zhang M, Fiumara G, De Meo P (2022) Complex network hierarchical sampling method combining node neighborhood clustering coefficient with random walk. New Gener Comput 40:765–807. https://doi.org/10.1007/s00354-022-00179-x

    Article  Google Scholar 

  • Lovász L (1993) Random walks on graphs: a survey. Comb Paul Erdos Eighty 2:1–46

    Google Scholar 

  • Luo Q, Xie Z, Liu Y et al (2024) Sampling hypergraphs via joint unbiased random walk. World Wide Web 27:15. https://doi.org/10.1007/s11280-024-01253-8

    Article  Google Scholar 

  • Maiya AS, Berger-Wolf TY (2011) Benefits of bias: towards better characterization of network sampling. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 105–113

  • Murai F, Ribeiro B, Towsley D, Wang P (2013) On set size distribution estimation and the characterization of large networks via sampling. IEEE J Sel Areas Commun 31:1017–1025

    Article  Google Scholar 

  • Papagelis M, Das G, Koudas N (2013) Sampling online social networks. IEEE Trans Knowl Data Eng 25:662–676

    Article  Google Scholar 

  • Peng L, Yongli L, Chong W (2014) Towards cost-efficient sampling methods. http://arxiv.org/abs/arXiv:14055756

  • Piña-García CA, Gu D (2013) Spiraling facebook: an alternative Metropolis–Hastings random walk using a spiral proposal distribution. Soc Netw Anal Min 3:1403–1415

    Article  Google Scholar 

  • Rezvanian A, Meybodi MR (2015) Sampling social networks using shortest paths. Physica A 424:254–268

    Article  Google Scholar 

  • Rezvanian A, Rahmati M, Meybodi MR (2014) Sampling from complex networks using distributed learning automata. Physica A 396:224–234

    Article  Google Scholar 

  • Rezvanian A, Moradabadi B, Ghavipour M, et al (2019) Social network sampling. In: Learning automata approach for social networks. Springer, pp 91–149

  • Ribeiro B, Figueiredo D, de Souza e Silva E, Towsley D (2011) Characterizing continuous-time random walks on dynamic networks. In: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems. ACM, pp 151–152

  • Ribeiro B, Wang P, Murai F, Towsley D (2012) Sampling directed graphs with random walks. In: Proceedings IEEE INFOCOM. Orlando, FL, pp 1692–1700

  • Roohollahi S, Khatibi Bardsiri A, Keynia F (2022) Sampling in weighted social networks using a levy flight-based learning automata. J Supercomput 78:1458–1478. https://doi.org/10.1007/s11227-021-03905-2

    Article  Google Scholar 

  • Santos FC, Pacheco JM (2005) Scale-free networks provide a unifying framework for the emergence of cooperation. Phys Rev Lett 95:098104. https://doi.org/10.1103/PhysRevLett.95.098104

    Article  Google Scholar 

  • Siciliano MD, Yenigun D, Ertan G (2012) Estimating network structure via random sampling: cognitive social structures and the adaptive threshold method. Soc Netw 34:585–600

    Article  Google Scholar 

  • Son S-W, Christensen C, Bizhani G et al (2012) Sampling properties of directed networks. Phys Rev E 86:046104

    Article  Google Scholar 

  • Sundar S, Singh A, Rossi A (2012) New heuristics for two bounded-degree spanning tree problems. Inf Sci 195:226–240. https://doi.org/10.1016/j.ins.2012.01.037

    Article  MathSciNet  Google Scholar 

  • Tewarie P, Van Dellen E, Hillebrand A, Stam CJ (2015) The minimum spanning tree: an unbiased method for brain network analysis. Neuroimage 104:177–188

    Article  Google Scholar 

  • Wang H, Lu J (2013) Detect inflated follower numbers in OSN using star sampling. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. ACM, pp 127–133

  • Wejnert C, Heckathorn DD (2008) Web-based network sampling: efficiency and efficacy of respondent-driven sampling for online research. Sociological Methods and Research

  • White DR, Newman M (2001) Fast approximation algorithms for finding node-independent paths in networks

  • Woolhouse ME, Dye C, Etard JF et al (1997) Heterogeneities in the transmission of infectious agents: implications for the design of control programs. Proc Natl Acad Sci USA 94:338–342. https://doi.org/10.1073/pnas.94.1.338

    Article  Google Scholar 

  • Yoon S-H, Kim K-N, Hong J et al (2015) A community-based sampling method using DPL for online social networks. Inf Sci 306:53–69

    Article  Google Scholar 

Download references

Funding

No funding.

Author information

Authors and Affiliations

Authors

Contributions

AR and ZSJ proposed the original idea, developed the code, designed, performed the simulation experiments, wrote the main manuscript text, and prepared Figs. 1, 2, 3, 4, 5, 6, 7, 8, 9 and S.M.V. participated in the coordination of the study. All authors planned the work, analyzed the results, and reviewed the manuscript.

Corresponding author

Correspondence to Alireza Rezvanian.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (ZIP 7317 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rezvanian, A., Vahidipour, S.M. & Jalali, Z.S. A spanning tree approach to social network sampling with degree constraints. Soc. Netw. Anal. Min. 14, 101 (2024). https://doi.org/10.1007/s13278-024-01247-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-024-01247-4

Keywords

Navigation