Abstract
In this chapter, we focus on the goal of sampling a representative subgraph as scale-down sampling to addressing the problem of sampling subgraphs from social networks and reviewing recent social network sampling methods in the literature. Then, we introduce four sampling algorithms including DLAS, EDLAS, ICLA-NS and FLAS which utilize learning automata for producing representative subgraphs from online social networks. The DLAS and EDLAS use distributed learning automata and extended distributed learning automata, respectively for both deterministic and stochastic networks. The algorithm ICLA-NS is an extended sampling algorithm with post-processing phase, since it utilizes an irregular cellular learning automaton (ICLA) to guarantee the connectivity and the inclusion of the high degree nodes in subgraphs initially sampled by classic node sampling method. Since most previous studies on sampling from networks either has assumed the network graph is static and fully accessible at any step, or despite considering the stream evolution has not addressed the problem of sampling a representative subgraph from the original graph, the algorithm FLAS as a streaming sampling algorithm based on fixed structure learning automata is introduced with the aim of sampling from activity networks in which the stream of edges continuously evolves over time (i.e. networks are highly dynamic and include a massive volume of edges).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal CC (2006) On biased reservoir sampling in the presence of stream evolution. Vldb’06, pp 607–618
Aggarwal CC, Zhao Y, Yu PS (2011) Outlier detection in graph streams. In: Proceedings of international conference on data engineering, pp 399–409
Ahmed NNK, Berchmans F, Neville J, Kompella R (2010) Time-based sampling of social network activity graphs. Learning with graphs. ACM, New York, pp 1–9
Ahmed NK, Neville J, Kompella R (2014) Network sampling: from static to streaming graphs. ACM Trans Knowl Discov from Data 8:7
Ahn Y-Y, Han S, Kwak H et al (2007) Analysis of topological characteristics of huge online social networking services. In: Proceedings of the 16th international conference on World Wide Web—WWW’07. ACM, p 835
Albert R, Barabási A-LL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47–97. https://doi.org/10.1103/RevModPhys.74.47
Alvarez-Hamelin JI, Dall’Asta L, Barrat A, Vespignani A (2005) K-core decomposition of Internet graphs: hierarchies, self-similarity and measurement biases. arXiv Prepr cs/0511007. https://doi.org/10.3934/nhm.2008.3.371
Avrachenkov K, Ribeiro B, Towsley D (2010) Improving random walk estimation accuracy with uniform restarts. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, pp 98–109
Barabási A-L (1999) Emergence of scaling in random networks. Science (80-) 286:509–512. https://doi.org/10.1126/science.286.5439.509
Barabási A-L (2004) Evolution of networks: from biological nets to the Internet and WWW. OUP Oxford
Barabási A-L, Albert R (1999) Emergence of scaling in random networks. Science (80-) 286:509–512. https://doi.org/10.1126/science.286.5439.509
Bayer R, McCreight EM (1972) Organization and maintenance of large ordered indexes. Acta Informatica. Springer, Berlin, pp 173–189
Bild DR, Liu Y, Dick RP et al (2014) Aggregate characterization of user behavior in twitter and analysis of the Retweet graph. ACM Trans Internet Technol 15:4. https://doi.org/10.1145/2700060
Blagus N, Šubelj L, Weiss G, Bajec M (2015) Sampling promotes community structure in social and information networks. Phys A Stat Mech Appl 432:206–215. https://doi.org/10.1016/j.physa.2015.03.048
Carmi S, Havlin S, Kirkpatrick S et al (2006) MEDUSA—New model of Internet topology using k-shell decomposition. Proc Natl Acad Sci 104:11150–11154. https://doi.org/10.1073/pnas.0701175104
Chauhan A, Even S, Chauhan A (2011) Graph algorithms, 2nd edn. Cambridge University Press
Cormode G, Muthukrishnan S (2005) Space efficient mining of multigraph streams. In: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems—PODS’05. ACM Press, New York, p 271
Ebbes P, Huang Z, Rangaswamy A (2012) Subgraph sampling methods for social networks: the good, the bad, and the ugly. SSRN Electron J. https://doi.org/10.2139/ssrn.1580074
Fang M, Yin J, Zhu X (2013) Active exploration: simultaneous sampling and labeling for large graphs. In: Cikm. ACM, pp 829–834
Fang M, Yin J, Zhu X (2016a) Active exploration for large graphs. Data Min Knowl Discov 30:511–549. https://doi.org/10.1007/s10618-015-0424-z
Fang M, Yin J, Zhu X (2016b) Supervised sampling for networked data. Sig Process 124:93–102. https://doi.org/10.1016/j.sigpro.2015.09.040
Frank O (2011) Survey sampling in networks. In: The SAGE handbook of social network analysis. SAGE Publications, pp 381–403
Gao Q, Ding X, Pan F, Li W (2014) An improved sampling method of complex network. Int J Mod Phys C 25:1440007. https://doi.org/10.1142/S0129183114400075
Ghavipour M, Meybodi MR (2017) Irregular cellular learning automata-based algorithm for sampling social networks. Eng Appl Artif Intell 59:244–259. https://doi.org/10.1016/j.engappai.2017.01.004
Ghavipour M, Meybodi MR (2018) A streaming sampling algorithm for social activity networks using fixed structure learning automata. Appl Intell 48:1054–1081. https://doi.org/10.1007/s10489-017-1005-1
Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in Facebook: a case study of unbiased sampling of OSNs. In: Proceedings—IEEE INFOCOM, pp 1–9
Gjoka M, Butts CTCT, Kurant M, Markopoulou A (2011) Multigraph sampling of online social networks. IEEE J Sel Areas Commun 29:1893–1905. https://doi.org/10.1109/JSAC.2011.111012
Gleich DF (2012) Graph of Flickr photo-sharing social network crawled in May 2006. https://doi.org/10.4231/d39p2w550
Goel S, Salganik MJJ (2010) Assessing respondent-driven sampling. Proc Natl Acad Sci 107:6743–6747. https://doi.org/10.1073/pnas.1000261107
Goldstein ML, Morris SA, Yen GG (2004) Problems with fitting to the power-law distribution. Eur Phys J B 41:255–258. https://doi.org/10.1140/epjb/e2004-00316-5
Goodman LA (1961) Snowball sampling. Ann Math Stat 32:148–170. https://doi.org/10.1214/aoms/1177705148
Heckathorn DD (1997) Respondent-driven sampling: a new approach to the study of hidden populations. Soc Problem 44:174–199. https://doi.org/10.2307/3096941
Illenberger J, Kowald M, Axhausen KW, Nagel K (2011) Insights into a spatially embedded social network from a large-scale snowball sample. Eur Phys J B 84:549–561. https://doi.org/10.1140/epjb/e2011-10872-0
Jalali ZS, Rezvanian A, Meybodi MR (2016a) Social network sampling using spanning trees. Int J Mod Phys C 27:1650052. https://doi.org/10.1142/S0129183116500522
Jalali ZS, Rezvanian A, Meybodi MR (2016b) A two-phase sampling algorithm for social networks. In: Conference proceedings of 2015 2nd international conference on knowledge-based engineering and innovation, KBEI 2015. IEEE, pp 1165–1169
Jin EM, Girvan M, Newman MEJ (2001) Structure of growing social networks. Phys Rev E—Stat Phys Plasmas, Fluids, Relat Interdiscip Top 64:8. https://doi.org/10.1103/PhysRevE.64.046132
Jin L, Chen Y, Hui P et al (2011) Albatross sampling. In: Proceedings of the 3rd ACM international workshop on MobiArch—HotPlanet’11. ACM Press, New York, p 11
Konect (2016) Linux kernel mailing list replies network dataset—{KONECT}. http://konect.uni-koblenz.de/networks
Krishnamurthy V, Faloutsos M, Chrobak M et al (2007) Sampling large Internet topologies for simulation purposes. Comput Netw 51:4284–4302. https://doi.org/10.1016/j.comnet.2007.06.004
Kumar R, Novak J, Tomkins A (2010) Structure and evolution of online social networks BT—Link mining: models, algorithms, and applications. In: Link mining: models, algorithms, and applications. Springer, pp 337–357
Kurant M, Markopoulou A, Thiran P (2010) On the bias of BFS (breadth first search). In: 2010 22nd international teletraffic congress (ITC), pp 1–8
Kurant M, Gjoka M, Butts CT, Markopoulou A (2011a) Walking on a graph with a magnifying glass. In: Proceedings of the ACM SIGMETRICS joint international conference on measurement and modeling of computer systems—SIGMETRICS’11. ACM, p 281
Kurant M, Markopoulou A, Thiran P (2011b) Towards unbiased BFS sampling. IEEE J Sel Areas Commun 29:1799–1809. https://doi.org/10.1109/JSAC.2011.111005
Lee SH, Kim P-J, Jeong H (2005) Statistical properties of sampled networks. Phys Rev E 73:016102. https://doi.org/10.1103/PhysRevE.73.016102
Lee C-H, Xu X, Eun DY (2012) Beyond random walk and metropolis-hastings samplers. In: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on measurement and modeling of computer systems—SIGMETRICS’12. ACM Press, New York, p 319
Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining—KDD’06. ACM, Philadelphia, p 631
Leskovec J, Krevl A (2014) SNAP datasets: stanford large network dataset collection. SnapStanfordEdu/Data/
Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time. In: Proceeding of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining—KDD’05. ACM Press, New York, p 177
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov from Data 1:1–41
Lovász L, Lovasz L, Lovász L (1993) Random walks on graphs: a survey. Combinatorics 2:1–46. https://doi.org/10.1.1.39.2847
Lu J, Li D (2012) Sampling online social networks by random walk. ACM, pp 33–40
Luo P, Li Y, Wu C, Zhang G (2015) Toward cost-efficient sampling methods. Int J Mod Phys C 26:1550050
Maiya ASS, Berger-Wolf TYY (2010) Sampling community structure. In: Proceedings of the 19th international conference on World Wide Web. ACM, pp 701–710
Mislove A, Marcon M, Gummadi KP et al (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement—IMC’07. ACM, p 29
Mollakhalili Meybodi MR, Meybodi MR (2014) Extended distributed learning automata. Appl Intell 41:923–940. https://doi.org/10.1007/s10489-014-0577-2
Newman MEJ (2003a) The structure and function of complex networks. SIAM Rev 45:167–256. https://doi.org/10.1137/S003614450342480
Newman MEJ (2003b) Ego-centered networks and the ripple effect. Soc Netw 25:83–95. https://doi.org/10.1016/S0378-8733(02)00039-4
Opsahl T, Panzarasa P (2009) Clustering in weighted networks. Soc Netw 31:155–163. https://doi.org/10.1016/j.socnet.2009.02.002
Papagelis M, Das G, Koudas N (2013) Sampling online social networks. IEEE Trans Knowl Data Eng 25:662–676. https://doi.org/10.1109/TKDE.2011.254
Piña-GarcÃa CA, Gu D (2013) Spiraling Facebook: an alternative Metropolis-Hastings random walk using a spiral proposal distribution. Soc Netw Anal Min 3:1403–1415. https://doi.org/10.1007/s13278-013-0126-8
Rasti AH, Torkjazi M, Rejaie R, et al (2009) Respondent-driven sampling for characterizing unstructured overlays. In: Proceedings—IEEE INFOCOM. IEEE, pp 2701–2705
Rezvanian A, Meybodi MR (2015) Sampling social networks using shortest paths. Phys A Stat Mech its Appl 424:254–268. https://doi.org/10.1016/j.physa.2015.01.030
Rezvanian A, Meybodi MR (2017) A new learning automata-based sampling algorithm for social networks. Int J Commun Syst 30:e3091. https://doi.org/10.1002/dac.3091
Rezvanian A, Rahmati M, Meybodi MR (2014) Sampling from complex networks using distributed learning automata. Phys A Stat Mech Appl 396:224–234. https://doi.org/10.1016/j.physa.2013.11.015
Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks, pp 390–403
Rossi RA, Ahmed NK (2013) Network repository. In: Purdue University Computer Science Department, http://www.networkrepository.com
Seshadhri C, Pinar A, Kolda TG (2013) An in-depth analysis of stochastic Kronecker graphs. J ACM 60:1–32. https://doi.org/10.1145/2450142.2450149
Stumpf MPPH, Wiuf C, May RMM (2005) Subnets of scale-free networks are not scale-free: sampling properties of networks. Proc Natl Acad Sci 102:4221–4224. https://doi.org/10.1073/pnas.0501179102
Stutzbach D, Rejaie R, Duffield N et al (2009) On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans Netw 17:377–390. https://doi.org/10.1109/TNET.2008.2001730
Tang L, Liu H (2010) Community detection and mining in social media. Synth Lect Data Min Knowl Discov 2:1–137. https://doi.org/10.2200/S00298ED1V01Y201009DMK003
Thathachar MAL, Sastry PS (2002) Varieties of learning automata: an overview. IEEE Trans Syst Man Cybern Part B Cybern 32:711–722. https://doi.org/10.1109/TSMCB.2002.1049606
Viswanath B, Mislove A, Cha M, Gummadi KP (2009) On the evolution of user interaction in Facebook. In: Proceedings of the 2nd ACM workshop on online social networks—WOSN’09. ACM Press, New York, p 37
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442. https://doi.org/10.1038/30918
Wilson C, Boe B, Sala A et al (2009) User interactions in social networks and their implications. In: Proceedings of the fourth ACM European conference on computer systems—EuroSys’09. ACM Press, New York, p 205
Yoon S, Lee S, Yook S-H, Kim Y (2007) Statistical properties of sampled networks by random walks. Phys Rev E 75:046114. https://doi.org/10.1103/PhysRevE.75.046114
Yoon S-HH, Kim K-NN, Hong J et al (2015) A community-based sampling method using DPL for online social networks. Inf Sci (NY) 306:63–69. https://doi.org/10.1016/j.ins.2015.02.014
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Rezvanian, A., Moradabadi, B., Ghavipour, M., Daliri Khomami, M.M., Meybodi, M.R. (2019). Social Network Sampling. In: Learning Automata Approach for Social Networks. Studies in Computational Intelligence, vol 820. Springer, Cham. https://doi.org/10.1007/978-3-030-10767-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-10767-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10766-6
Online ISBN: 978-3-030-10767-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)