A Comparison of Three Algorithms for Approximating the Distance Distribution in Real-World Graphs

  • Pierluigi Crescenzi
  • Roberto Grossi
  • Leonardo Lanzi
  • Andrea Marino
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6595)


The distance for a pair of vertices in a graph G is the length of the shortest path between them. The distance distribution for G specifies how many vertex pairs are at distance h, for all feasible values h. We study three fast randomized algorithms to approximate the distance distribution in large graphs. The Eppstein-Wang (ew) algorithm exploits sampling through a limited (logarithmic) number of Breadth-First Searches (bfses). The Size-Estimation Framework (sef) by Cohen employs random ranking and least-element lists to provide several estimators. Finally, the Approximate Neighborhood Function (anf) algorithm by Palmer, Gibbons, and Faloutsos makes use of the probabilistic counting technique introduced by Flajolet and Martin, in order to estimate the number of distinct elements in a large multiset. We investigate how good is the approximation of the distance distribution, when the three algorithms are run in similar settings. The analysis of anf derives from the results on the probabilistic counting method, while the one of sef is given by Cohen. For what concerns ew (originally designed for another problem), we extend its simple analysis in order to bound its error with high probability and to show its convergence. We then perform an experimental study on 30 real-world graphs, showing that our implementation of ew combines the accuracy of sef with the performance of anf.


Distance Distribution Large Graph Stochastic Average Unweighted Graph Stochastic Average Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blondel, V., Guillaume, J.L., Hendrickx, J., Jungers, R.: Distance Distribution in Random Graphs and Applications to Network Exploration. Phys. Rev. E 76 (2007)Google Scholar
  2. 2.
    Boldi, P., Vigna, S.: The WebGraph framework I: Compression techniques. In: Proc. of the 13th International World Wide Web Conference, pp. 595–601 (2004)Google Scholar
  3. 3.
    Cohen, E.: Estimating the size of the transitive closure in linear time. In: Annual IEEE Symposium on Foundations of Computer Science, pp. 190–200 (1994)Google Scholar
  4. 4.
    Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci. 55(3), 441–453 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Cohen, E., Kaplan, H.: Bottom-k sketches: better and more efficient estimation of aggregates. In: ACM SIGMETRICS, pp. 353–354. ACM, New York (2007)Google Scholar
  6. 6.
    Cohen, E., Kaplan, H.: Spatially-decaying aggregation over a network. J. Comput. Syst. Sci. 73(3), 265–288 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Cohen, E., Kaplan, H.: Summarizing data using bottom-k sketches. In: ACM PODC, pp. 225–234 (2007)Google Scholar
  8. 8.
    Cohen, E., Kaplan, H.: Tighter estimation using bottom k sketches. PVLDB 1(1), 213–224 (2008)Google Scholar
  9. 9.
    Crescenzi, P., Grossi, R., Imbrenda, C., Lanzi, L., Marino, A.: Finding the Diameter in Real-World Graphs: Experimentally Turning a Lower Bound into an Upper Bound. In: de Berg, M., Meyer, U. (eds.) ESA 2010. LNCS, vol. 6346, pp. 302–313. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Eppstein, D., Wang, J.: Fast approximation of centrality. In: ACM/SIAM SODA, pp. 228–229 (2001)Google Scholar
  11. 11.
    Flajolet, P., Martin, G.N.: Probabilistic Counting Algorithms for Data Base Applications. Journal of Computer Systems Science 31(2), 182–209 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Latapy, M., Magnien, C.: Measuring Fundamental Properties of Real-World Complex Networks. CoRR abs/cs/0609115 (2006)Google Scholar
  13. 13.
    Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph Evolution: Densification and Shrinking Diameters. ACM Trans. Knowl. Discov. Data 1(1) (2007)Google Scholar
  14. 14.
    Lipton, R.J., Naughton, J.F.: Query size estimation by adaptive sampling. J. Comput. Syst. Sci. 51(1), 18–25 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Lynch, N.: Distributed Algorithms. Morgan Kaufmann, San Francisco (1996)zbMATHGoogle Scholar
  16. 16.
    Mehlhorn, K., Meyer, U.: External-memory breadth-first search with sublinear I/O. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 723–735. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  17. 17.
    Palmer, C.R., Gibbons, P.B., Faloutsos, C.: ANF: a Fast and Scalable Tool for Data Mining in Massive Graphs. In: ACM SIGKDD, pp. 81–90 (2002)Google Scholar
  18. 18.
    Wang, L., Subramanian, S., Latifi, S., Srimani, P.: Distance Distribution of Nodes in Star Graphs. Applied Mathematics Letters 19(8), 780–784 (2006)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Pierluigi Crescenzi
    • 1
  • Roberto Grossi
    • 2
  • Leonardo Lanzi
    • 1
  • Andrea Marino
    • 1
  1. 1.Dipartimento di Sistemi e InformaticaUniversità di FirenzeItaly
  2. 2.Dipartimento di InformaticaUniversità di PisaItaly

Personalised recommendations