Skip to main content

Sampling Connected Induced Subgraphs Uniformly at Random

  • Conference paper
Scientific and Statistical Database Management (SSDBM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7338))

Abstract

A recurrent challenge for modern applications is the processing of large graphs. The ability to generate representative samples of smaller size is useful not only to circumvent scalability issues but also, per se, for statistical analysis and other data mining tasks. For such purposes adequate sampling techniques must be devised. We are interested, in this paper, in the uniform random sampling of a connected subgraph from a graph. We require that the sample contains a prescribed number of vertices. The sampled graph is the corresponding induced graph.

We devise, present and discuss several algorithms that leverage three different techniques: Rejection Sampling, Random Walk and Markov Chain Monte Carlo. We empirically evaluate and compare the performance of the algorithms. We show that they are effective and efficient but that there is a trade-off, which depends on the density of the graphs and the sample size. We propose one novel algorithm, which we call Neighbour Reservoir Sampling (NRS), that very successfully realizes the trade-off between effectiveness and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Stanford Network Analysis Project, http://snap.stanford.edu/index.html

  2. Ahn, Y.-Y., Han, S., Kwak, H., Moon, S.B., Jeong, H.: Analysis of topological characteristics of huge online social networking services. In: WWW, pp. 835–844 (2007)

    Google Scholar 

  3. Batagelj, V., Brandes, U.: Efficient generation of large random networks. Physical Review E 71 (2005)

    Google Scholar 

  4. Cowles, M.K., Carlin, B.P.: Markov chain monte carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association 91, 883–904 (1996)

    MathSciNet  MATH  Google Scholar 

  5. Geweke, J.: Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In: BAYESIAN STATISTICS, pp. 169–193. University Press (1992)

    Google Scholar 

  6. Gilks, W., Spiegelhalter, D.: Markov chain Monte Carlo in practice. Chapman & Hall/CRC (1996)

    Google Scholar 

  7. Hastings, W.K.: Monte carlo sampling methods using markov chains and their applications. Biometrika 57(1), 97–109 (1970)

    Article  MATH  Google Scholar 

  8. Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: On near-uniform url sampling. Computer Networks 33(1-6), 295–308 (2000)

    Article  Google Scholar 

  9. Hübler, C., Kriegel, H.-P., Borgwardt, K.M., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: ICDM, Pisa, Italy, pp. 283–292 (December 2008)

    Google Scholar 

  10. Kashtan, N., Itzkovitz, S., Milo, R., Alon, U.: Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20(11), 1746–1758 (2004)

    Article  Google Scholar 

  11. Kwak, H., Lee, C., Park, H., Moon, S.B.: What is twitter, a social network or a news media? In: WWW, pp. 591–600 (2010)

    Google Scholar 

  12. Leon-Garcia, A.: Probability, Statistics, and Random Processes for Electrical Engineering. Prentice Hall (2008)

    Google Scholar 

  13. Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: KDD, Philadelphia, Pennsylvania, USA, pp. 631–636 (August 2006)

    Google Scholar 

  14. Maiya, A.S., Berger-Wolf, T.Y.: Sampling community structure. In: WWW, Raleigh, North Carolina, USA, pp. 701–710 (April 2010)

    Google Scholar 

  15. Milo, R., Kashtan, N., Itzkovitz, S., Newman, M.E.J., Alon, U.: On the uniform generation of random graphs with prescribed degree sequences (May 2004)

    Google Scholar 

  16. Nobari, S., Lu, X., Karras, P., Bressan, S.: Fast random graph generation. In: EDBT, pp. 331–342 (2011)

    Google Scholar 

  17. Przulj, N., Corneil, D.G., Jurisica, I.: Efficient estimation of graphlet frequency distributions in protein-protein interaction networks. Bioinformatics 22(8), 974–980 (2006)

    Article  Google Scholar 

  18. Ribeiro, B., Towsley, D.: Estimating and sampling graphs with multidimensional random walks. In: IMC, Melbourne, Australia (November 2010)

    Google Scholar 

  19. Vázquez, A., Oliveira, J., Barabási, A.: Inhomogeneous evolution of subgraphs and cycles in complex networks. Physical Review E (2005)

    Google Scholar 

  20. Viger, F., Latapy, M.: Efficient and Simple Generation of Random Simple Connected Graphs with Prescribed Degree Sequence. In: Wang, L. (ed.) COCOON 2005. LNCS, vol. 3595, pp. 440–449. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  21. Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lu, X., Bressan, S. (2012). Sampling Connected Induced Subgraphs Uniformly at Random. In: Ailamaki, A., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2012. Lecture Notes in Computer Science, vol 7338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31235-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31235-9_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31234-2

  • Online ISBN: 978-3-642-31235-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics