Skip to main content

A Robust Method for Statistical Testing of Empirical Power-Law Distributions

  • Conference paper
  • First Online:
Algorithms and Models for the Web Graph (WAW 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12091))

Included in the following conference series:

Abstract

The World-Wide-Web is a complex system naturally represented by a directed network of documents (nodes) connected through hyperlinks (edges). In this work, we focus on one of the most relevant topological properties that characterize the network, i.e. being scale-free. A directed network is scale-free if its in-degree and out-degree distributions have an approximate and asymptotic power-law behavior. If we consider the Web as a whole, it presents empirical evidence of such property. On the other hand, when we restrict the study of the degree distributions to specific sub-categories of websites, there is no longer strong evidence for it. For this reason, many works questioned the almost universal ubiquity of the scale-free property. Moreover, existing statistical methods to test whether an empirical degree distribution follows a power law suffer from large sample sizes and/or noisy data.

In this paper, we propose an extension of a state-of-the-art method that overcomes such problems by applying a Monte Carlo sub-sampling procedure on the graphs. We show on synthetic experiments that even small variations of true power-law distributed data causes the state-of-the-art method to reject the hypothesis, while the proposed method is more sound and stable under such variations.

Lastly, we perform a study on 3 websites showing that indeed, depending on their category, some accept and some refuse the hypothesis of being power-law. We argue that our method could be used to better characterize topological properties deriving from different generative principles: central or peripheral.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/DaviGarba/netanalytics.

  2. 2.

    https://scrapy.org/.

References

  1. Albert, R., Jeong, H., Barabási, A.L.: Diameter of the world-wide web. Nature 401(6749), 130–131 (1999)

    Article  Google Scholar 

  2. Alstott, J., Bullmore, D.P.: Powerlaw: a Python package for analysis of heavy-tailed distributions. PLoS ONE 9, 1 (2014)

    Google Scholar 

  3. Anderson, T.W., Darling, D.A.: A test of goodness of fit. J. Am. Stat. Assoc. 49(268), 765–769 (1954)

    Article  MATH  Google Scholar 

  4. Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  5. Barabási, A.L., Albert, R., Jeong, H.: Scale-free characteristics of random networks: the topology of the world-wide web. Physica Stat. Mech. Appl. 281(1–4), 69–77 (2000)

    Article  Google Scholar 

  6. Barabási, A.L., et al.: Network Science. Cambridge University Press, Cambridge (2016)

    MATH  Google Scholar 

  7. Basirian, S., Jung, A.: Random walk sampling for big data over networks. In: 2017 International Conference on Sampling Theory and Applications (SampTA), pp. 427–431. IEEE (2017)

    Google Scholar 

  8. Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Srinivasan, S., Ramamritham, K., Kumar, A., Ravindra, M.P., Bertino, E., Kumar, R. (eds.) Proceedings of the 20th International Conference on World Wide Web, pp. 587–596. ACM Press (2011)

    Google Scholar 

  9. Boldi, P., Vigna, S.: The WebGraph framework I: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601. ACM Press, Manhattan (2004)

    Google Scholar 

  10. Broder, A., et al.: Graph structure in the web. Comput. Netw. 33(1–6), 309–320 (2000)

    Article  Google Scholar 

  11. Clauset, A., Shalizi, C.R., Newman, M.E.: Power-law distributions in empirical data. SIAM Rev. 51(4), 661–703 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  12. Clauset, A., Young, M., Gleditsch, K.S.: On the frequency of severe terrorist events. J. Conflict Resolut. 51(1), 58–87 (2007)

    Article  Google Scholar 

  13. Daniels, H.: The asymptotic efficiency of a maximum likelihood estimator. In: Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 151–163. University of California Press, Berkeley (1961)

    Google Scholar 

  14. Erdös, P., et al.: On random graphs

    Google Scholar 

  15. Hardy, M.: Pareto’s law. Math. Intell. 32(3), 38–43 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  16. Leskovec, J., Sosič, R.: SNAP: a general-purpose network analysis and graph-mining library. ACM Trans. Intell. Syst. Technol. (TIST) 8(1), 1 (2016)

    Article  Google Scholar 

  17. Lovász, L., et al.: Random walks on graphs: a survey. Combinatorics 2(1), 1–46 (1993). Paul erdos is eighty

    Google Scholar 

  18. Massey Jr., F.J.: The Kolmogorov-Smirnov test for goodness of fit. J. Am. Stat. Assoc. 46(253), 68–78 (1951)

    Article  MATH  Google Scholar 

  19. Maydeu-Olivares, A., Garcia-Forero, C.: Goodness-of-fit testing. Int. Encycl. Educ. 7(1), 190–196 (2010)

    Article  Google Scholar 

  20. Mossa, S., Barthélémy, M., Eugene Stanley, H., Nunes Amaral, L.A.: Truncation of power law behavior in “scale-free” network models due to information filtering. Phys. Rev. Lett. 88, 138701 (2002). https://link.aps.org/doi/10.1103/PhysRevLett.88.138701

    Article  Google Scholar 

  21. Newman, M.E.: Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 46(5), 323–351 (2005)

    Article  Google Scholar 

  22. Pennock, D.M., Flake, G.W., Lawrence, S., Glover, E.J., Giles, C.L.: Winners don’t take all: characterizing the competition for links on the web. Proc. Natl. Acad. Sci. 99(8), 5207–5211 (2002)

    Article  MATH  Google Scholar 

  23. Resnick, S.I.: Heavy-Tail Phenomena: Probabilistic and Statistical Modeling. ORFE. Springer, Newyork (2007). https://doi.org/10.1007/978-0-387-45024-7

    Book  MATH  Google Scholar 

  24. Stumpf, M.P., Porter, M.A.: Critical truths about power laws. Science 335(6069), 665–666 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  25. Stute, W., Manteiga, W.G., Quindimil, M.P.: Bootstrap based goodness-of-fit-tests. Metrika 40(1), 243–256 (1993)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davide Garbarino .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Garbarino, D., Tozzo, V., Vian, A., Barla, A. (2020). A Robust Method for Statistical Testing of Empirical Power-Law Distributions. In: Kamiński, B., Prałat, P., Szufel, P. (eds) Algorithms and Models for the Web Graph. WAW 2020. Lecture Notes in Computer Science(), vol 12091. Springer, Cham. https://doi.org/10.1007/978-3-030-48478-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-48478-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-48477-4

  • Online ISBN: 978-3-030-48478-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics