A Comparison of Three Algorithms for Approximating the Distance Distribution in Real-World Graphs
The distance for a pair of vertices in a graph G is the length of the shortest path between them. The distance distribution for G specifies how many vertex pairs are at distance h, for all feasible values h. We study three fast randomized algorithms to approximate the distance distribution in large graphs. The Eppstein-Wang (ew) algorithm exploits sampling through a limited (logarithmic) number of Breadth-First Searches (bfses). The Size-Estimation Framework (sef) by Cohen employs random ranking and least-element lists to provide several estimators. Finally, the Approximate Neighborhood Function (anf) algorithm by Palmer, Gibbons, and Faloutsos makes use of the probabilistic counting technique introduced by Flajolet and Martin, in order to estimate the number of distinct elements in a large multiset. We investigate how good is the approximation of the distance distribution, when the three algorithms are run in similar settings. The analysis of anf derives from the results on the probabilistic counting method, while the one of sef is given by Cohen. For what concerns ew (originally designed for another problem), we extend its simple analysis in order to bound its error with high probability and to show its convergence. We then perform an experimental study on 30 real-world graphs, showing that our implementation of ew combines the accuracy of sef with the performance of anf.
KeywordsDistance Distribution Large Graph Stochastic Average Unweighted Graph Stochastic Average Method
Unable to display preview. Download preview PDF.
- 1.Blondel, V., Guillaume, J.L., Hendrickx, J., Jungers, R.: Distance Distribution in Random Graphs and Applications to Network Exploration. Phys. Rev. E 76 (2007)Google Scholar
- 2.Boldi, P., Vigna, S.: The WebGraph framework I: Compression techniques. In: Proc. of the 13th International World Wide Web Conference, pp. 595–601 (2004)Google Scholar
- 3.Cohen, E.: Estimating the size of the transitive closure in linear time. In: Annual IEEE Symposium on Foundations of Computer Science, pp. 190–200 (1994)Google Scholar
- 5.Cohen, E., Kaplan, H.: Bottom-k sketches: better and more efficient estimation of aggregates. In: ACM SIGMETRICS, pp. 353–354. ACM, New York (2007)Google Scholar
- 7.Cohen, E., Kaplan, H.: Summarizing data using bottom-k sketches. In: ACM PODC, pp. 225–234 (2007)Google Scholar
- 8.Cohen, E., Kaplan, H.: Tighter estimation using bottom k sketches. PVLDB 1(1), 213–224 (2008)Google Scholar
- 10.Eppstein, D., Wang, J.: Fast approximation of centrality. In: ACM/SIAM SODA, pp. 228–229 (2001)Google Scholar
- 12.Latapy, M., Magnien, C.: Measuring Fundamental Properties of Real-World Complex Networks. CoRR abs/cs/0609115 (2006)Google Scholar
- 13.Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph Evolution: Densification and Shrinking Diameters. ACM Trans. Knowl. Discov. Data 1(1) (2007)Google Scholar
- 17.Palmer, C.R., Gibbons, P.B., Faloutsos, C.: ANF: a Fast and Scalable Tool for Data Mining in Massive Graphs. In: ACM SIGKDD, pp. 81–90 (2002)Google Scholar