Skip to main content

Enhancing Stratified Graph Sampling Algorithms Based on Approximate Degree Distribution

  • Conference paper
  • First Online:
Artificial Intelligence and Algorithms in Intelligent Systems (CSOC2018 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 764))

Included in the following conference series:

Abstract

Sampling technique has become one of the recent research focuses in the graph-related fields. Most of the existing graph sampling algorithms tend to sample the high degree or low degree nodes in the complex networks because of the characteristic of scale-free. Scale-free means that degrees of different nodes are subject to a power law distribution. So, there is a significant difference in the degrees between the overall sampling nodes. In this paper, we propose a concept of approximate degree distribution and devise a stratified strategy using it in the complex networks. We also develop two graph sampling algorithms combining the node selection method with the stratified strategy. The experimental results show that our sampling algorithms preserve several properties of different graphs and behave more accurately than other algorithms. Further, we prove the proposed algorithms are superior to the off-the-shelf algorithms in terms of the unbiasedness of the degrees and more efficient than state-of-the-art FFS and ES-i algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Han, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco (2005)

    Google Scholar 

  2. Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data. Soc. Ind. Appl. Math. 51(4), 661–703 (2009)

    MathSciNet  MATH  Google Scholar 

  3. Yu, L.: Sampling and characterizing online social networks. Dissertation. The University of Bristol, England (2016)

    Google Scholar 

  4. Maiya, A.S., Berger-Wolf, T.Y.: Benefits of bias: towards better characterization of network sampling. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 105–113 (2011)

    Google Scholar 

  5. Ahmed, N.K., Neville, J., Kompella, R.: Network sampling: from static to streaming graphs. ACM Trans. Knowl. Discov. Data (TKDD) 8(2), 1–56 (2014)

    Google Scholar 

  6. Stutzbach, D., et al.: Sampling techniques for large, dynamic graphs. In: Proceedings of 25th IEEE International Conference on Computer Communications, INFOCOM 2006. IEEE, pp. 1–6 (2006)

    Google Scholar 

  7. Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A.: Walking in Facebook: a case study of unbiased sampling of OSNs. In: INFOCOM, Proceedings, pp. 1–9. IEEE (2010)

    Google Scholar 

  8. Lee, C.H., Xu, X., Eun, D.Y.: Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling. ACM SIGMETRICS Perform. Eval. Rev. 40(1), 319–330 (2012)

    Article  Google Scholar 

  9. Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 631–636. ACM (2006)

    Google Scholar 

  10. Kurant, M., Gjoka, M., Butts, C.T., Markopoulou, A.: Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. ACM SIGMETRICS Perform. Eval. Rev. 39(1), 241–252 (2011)

    Article  Google Scholar 

  11. Cormen, T.H., Leiserson, C.E., Rivest, R.L., et al.: Introduction to Algorithms, 3rd (edn.), 30(00), 118–118 (2015)

    Google Scholar 

  12. SNAP homepage. http://snap.stanford.edu/data/index.html

  13. Bora, D.J., Gupta, A.K.: Effect of different distance measures on the performance of k-means algorithm: an experimental study in Matlab. Computer Science (2014)

    Google Scholar 

  14. Kim, B., Kim, J.M., Yi, G.: Analysis of clustering evaluation considering features of item response data using data mining technique for setting cut-off scores. Symmetry 9(5), 62 (2017)

    Article  Google Scholar 

  15. Doran, D.: Triad-based role discovery for large social systems. In: Social Informatics, pp. 130–143 (2014)

    Google Scholar 

  16. de Heer, W.: Harmonic syntax and high-level statistics of the songs of three early Classical composers, EECS Department, University of California, Berkeley, 167 (2017)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Fund by The National Natural Science Foundation of China (Grant No. 61462012, No. 61562010, No. U1531246), Guizhou University Graduate Innovation Fund (Grant No. 2017078), the Innovation Team of the Data Analysis and Cloud Service of Guizhou Province (Grant No. [2015]53), Science and Technology Project of the Department of Science and Technology in Guizhou Province (Grant No. LH [2016]7427).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhu, J., Li, H., Chen, M., Dai, Z., Zhu, M. (2019). Enhancing Stratified Graph Sampling Algorithms Based on Approximate Degree Distribution. In: Silhavy, R. (eds) Artificial Intelligence and Algorithms in Intelligent Systems. CSOC2018 2018. Advances in Intelligent Systems and Computing, vol 764. Springer, Cham. https://doi.org/10.1007/978-3-319-91189-2_20

Download citation

Publish with us

Policies and ethics