Extended Compact Web Graph Representations

  • Francisco Claude
  • Gonzalo Navarro
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6060)

Abstract

Many relevant Web mining tasks translate into classical algorithms on the Web graph. Compact Web graph representations allow running these tasks on larger graphs within main memory. These representations at least provide fast navigation (to the neighbors of a node), yet more sophisticated operations are desirable for several Web analyses.

We present a compact Web graph representation that, in addition, supports reverse navigation (to the nodes pointing to the given one). The standard approach to achieve this is to represent the graph and its transpose, which basically doubles the space requirement. Our structure, instead, represents the adjacency list using a compact sequence representation that allows finding the positions where a given node v is mentioned, and answers reverse navigation using that primitive. This is combined with a previous proposal based on grammar compression of the adjacency list. The combination yields interesting algorithmic problems. As a result, we achieve the smallest graph representation reported in the literature that supports direct and reverse navigation, and also obtain other variants that occupy relevant niches in the space/time tradeoff.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adler, M., Mitzenmacher, M.: Towards compressing Web graphs. In: Proc. 11th DCC, pp. 203–212 (2001)Google Scholar
  2. 2.
    Aiello, W., Chung, F., Lu, L.: A random graph model for massive graphs. In: Proc. 32th STOC, pp. 171–180 (2000)Google Scholar
  3. 3.
    Arroyuelo, D., Cánovas, R., Navarro, G., Sadakane, K.: Succinct trees in practice. In: Proc. 11th ALENEX, pp. 84–97 (2010)Google Scholar
  4. 4.
    Asano, Y., Miyawaki, Y., Nishizeki, T.: Efficient compression of Web graphs. In: Hu, X., Wang, J. (eds.) COCOON 2008. LNCS, vol. 5092, pp. 1–11. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Barbay, J., Golynski, A., Munro, I., Rao, S.S.: Adaptive searching in succinctly encoded binary relations and tree-structured documents. In: Proc. 17th CPM, pp. 24–35 (2006)Google Scholar
  6. 6.
    Barbay, J., He, M., Munro, I., Rao, S.S.: Succinct indexes for strings, binary relations and multi-labeled trees. In: Proc. 18th SODA, pp. 680–689 (2007)Google Scholar
  7. 7.
    Boldi, P., Santini, M., Vigna, S.: Permuting web graphs. In: Avrachenkov, K.E., Donato, D., Litvak, N. (eds.) WAW 2009. LNCS, vol. 5427, pp. 116–126. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  8. 8.
    Boldi, P., Vigna, S.: The WebGraph framework I: compression techniques. In: Proc. 13th WWW, pp. 595–602 (2004)Google Scholar
  9. 9.
    Brisaboa, N.R., Ladra, S., Navarro, G.: k 2-trees for compact web graph representation. In: Hyyro, H. (ed.) SPIRE 2009. LNCS, vol. 5721, pp. 18–30. Springer, Heidelberg (2009)Google Scholar
  10. 10.
    Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the Web. Journal of Computer Networks 33(1-6), 309–320 (2000)CrossRefGoogle Scholar
  11. 11.
    Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A., Raghavan, P.: On compressing social networks. In: Proc. 15th KDD, pp. 219–228 (2009)Google Scholar
  12. 12.
    Clark, D.: Compact Pat Trees. Ph.D. thesis, University of Waterloo (1996)Google Scholar
  13. 13.
    Claude, F., Navarro, G.: A fast and compact Web graph representation. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 105–116. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Claude, F., Navarro, G.: Fast and compact Web graph representations. Tech. Rep. TR/DCC-2008-3, Dept. of Comp. Sci., Univ. of Chile (2008)Google Scholar
  15. 15.
    Claude, F., Navarro, G.: Practical rank/select queries over arbitrary sequences. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 176–187. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  16. 16.
    Donato, D., Laura, L., Leonardi, S., Meyer, U., Millozzi, S., Sibeyn, J.: Algorithms and experiments for the Web graph. Journal of Graph Algorithms and Applications 10(2), 219–236 (2006)MATHMathSciNetGoogle Scholar
  17. 17.
    Erdõs, P., Rényi, A.: On random graphs I. Publicationes Mathematicae 6, 290–297 (1959)Google Scholar
  18. 18.
    Golynski, A., Munro, I., Rao, S.: Rank/select operations on large alphabets: a tool for text indexing. In: Proc. 17th SODA, pp. 368–373 (2006)Google Scholar
  19. 19.
    González, R., Navarro, G.: Compressed text indexes with fast locate. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 216–227. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  20. 20.
    Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th SODA, pp. 841–850 (2003)Google Scholar
  21. 21.
    Jacobson, G.: Succinct Static Data Structures. Ph.D. thesis, Carnegie Mellon University (1989)Google Scholar
  22. 22.
    Kleinberg, J., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: The Web as a graph: Measurements, models, and methods. In: Asano, T., Imai, H., Lee, D.T., Nakano, S.-i., Tokuyama, T. (eds.) COCOON 1999. LNCS, vol. 1627, pp. 1–17. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  23. 23.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)MATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Larsson, J., Moffat, A.: Off-line dictionary-based compression. Proc. IEEE 88(11), 1722–1732 (2000)CrossRefGoogle Scholar
  25. 25.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1), article 2 (2007)Google Scholar
  26. 26.
    Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. 13th SODA, pp. 233–242 (2002)Google Scholar
  27. 27.
    Rusmevichientong, P., Pennock, D., Lawrence, S., Giles, C.L.: Methods for sampling pages uniformly from the World Wide Web. In: Proc. AAAI Fall Symposium on Using Uncertainty Within Computation, pp. 121–128 (2001)Google Scholar
  28. 28.
    Saito, H., Toyoda, M., Kitsuregawa, M., Aihara, K.: A large-scale study of link spam detection by graph algorithms. In: Proc. 3rd AIRWeb (2007)Google Scholar
  29. 29.
    Sakamoto, H.: A fully linear-time approximation algorithm for grammar-based compression. Journal of Discrete Algorithms 3(2-4), 416–430 (2005)MATHCrossRefMathSciNetGoogle Scholar
  30. 30.
    Suel, T., Yuan, J.: Compressing the graph structure of the Web. In: Proc. 11th DCC, pp. 213–222 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Francisco Claude
    • 1
  • Gonzalo Navarro
    • 2
  1. 1.David R. Cheriton School of Computer ScienceUniversity of Waterloo 
  2. 2.Department of Computer ScienceUniversity of Chile 

Personalised recommendations