Sampling Connected Induced Subgraphs Uniformly at Random
A recurrent challenge for modern applications is the processing of large graphs. The ability to generate representative samples of smaller size is useful not only to circumvent scalability issues but also, per se, for statistical analysis and other data mining tasks. For such purposes adequate sampling techniques must be devised. We are interested, in this paper, in the uniform random sampling of a connected subgraph from a graph. We require that the sample contains a prescribed number of vertices. The sampled graph is the corresponding induced graph.
We devise, present and discuss several algorithms that leverage three different techniques: Rejection Sampling, Random Walk and Markov Chain Monte Carlo. We empirically evaluate and compare the performance of the algorithms. We show that they are effective and efficient but that there is a trade-off, which depends on the density of the graphs and the sample size. We propose one novel algorithm, which we call Neighbour Reservoir Sampling (NRS), that very successfully realizes the trade-off between effectiveness and efficiency.
KeywordsMarkov Chain Random Walk Original Graph Average Cluster Large Graph
Unable to display preview. Download preview PDF.
- 1.Stanford Network Analysis Project, http://snap.stanford.edu/index.html
- 2.Ahn, Y.-Y., Han, S., Kwak, H., Moon, S.B., Jeong, H.: Analysis of topological characteristics of huge online social networking services. In: WWW, pp. 835–844 (2007)Google Scholar
- 3.Batagelj, V., Brandes, U.: Efficient generation of large random networks. Physical Review E 71 (2005)Google Scholar
- 5.Geweke, J.: Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In: BAYESIAN STATISTICS, pp. 169–193. University Press (1992)Google Scholar
- 6.Gilks, W., Spiegelhalter, D.: Markov chain Monte Carlo in practice. Chapman & Hall/CRC (1996)Google Scholar
- 9.Hübler, C., Kriegel, H.-P., Borgwardt, K.M., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: ICDM, Pisa, Italy, pp. 283–292 (December 2008)Google Scholar
- 11.Kwak, H., Lee, C., Park, H., Moon, S.B.: What is twitter, a social network or a news media? In: WWW, pp. 591–600 (2010)Google Scholar
- 12.Leon-Garcia, A.: Probability, Statistics, and Random Processes for Electrical Engineering. Prentice Hall (2008)Google Scholar
- 13.Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: KDD, Philadelphia, Pennsylvania, USA, pp. 631–636 (August 2006)Google Scholar
- 14.Maiya, A.S., Berger-Wolf, T.Y.: Sampling community structure. In: WWW, Raleigh, North Carolina, USA, pp. 701–710 (April 2010)Google Scholar
- 15.Milo, R., Kashtan, N., Itzkovitz, S., Newman, M.E.J., Alon, U.: On the uniform generation of random graphs with prescribed degree sequences (May 2004)Google Scholar
- 16.Nobari, S., Lu, X., Karras, P., Bressan, S.: Fast random graph generation. In: EDBT, pp. 331–342 (2011)Google Scholar
- 18.Ribeiro, B., Towsley, D.: Estimating and sampling graphs with multidimensional random walks. In: IMC, Melbourne, Australia (November 2010)Google Scholar
- 19.Vázquez, A., Oliveira, J., Barabási, A.: Inhomogeneous evolution of subgraphs and cycles in complex networks. Physical Review E (2005)Google Scholar