A Fast and Compact Web Graph Representation

  • Francisco Claude
  • Gonzalo Navarro
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4726)

Abstract

Compressed graphs representation has become an attractive research topic because of its applications in the manipulation of huge Web graphs in main memory. By far the best current result is the technique by Boldi and Vigna, which takes advantage of several particular properties of Web graphs. In this paper we show that the same properties can be exploited with a different and elegant technique, built on Re-Pair compression, which achieves about the same space but much faster navigation of the graph. Moreover, the technique has the potential of adapting well to secondary memory. In addition, we introduce an approximate Re-Pair version that works efficiently with limited main memory.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adler, M., Mitzenmacher, M.: Towards compressing Web graphs. In: Proc. IEEE DCC, pp. 203–212. IEEE Computer Society Press, Los Alamitos (2001)Google Scholar
  2. 2.
    Aiello, W., Chung, F., Lu, L.: A random graph model for massive graphs. In: Proc. ACM STOC, pp. 171–180. ACM Press, New York (2000)Google Scholar
  3. 3.
    Bharat, K., Broder, A., Henzinger, M., Kumar, P., Venkatasubramanian, S.: The Connectivity Server: Fast access to linkage information on the web. In: Proc. WWW, pp. 469–477 (1998)Google Scholar
  4. 4.
    Blandford, D.: Compact data structures with fast queries. PhD thesis, School of Computer Science, Carnegie Mellon University, Also as TR CMU-CS-05-196 (2006)Google Scholar
  5. 5.
    Blandford, D., Blelloch, G., Kash, I.: Compact representations of separable graphs. In: Proc. SODA, pp. 579–588 (2003)Google Scholar
  6. 6.
    P. Boldi and S. Vigna. The webgraph framework I: compression techniques. In Proc. WWW, pages 595–602, 2004.Google Scholar
  7. 7.
    Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., Wiener, J.: Graph structure in the web. J. Computer Networks 33(1–6), 309–320 (2000) Also in Proc. WWW9CrossRefGoogle Scholar
  8. 8.
    Chakrabarti, D., Papadimitriou, S., Modha, D., Faloutsos, C.: Fully automatic cross-associations. In: Proc. ACM SIGKDD, ACM Press, New York (2004)Google Scholar
  9. 9.
    Chuang, R., Garg, A., He, X., Kao, M.-Y., Lu, H.-I.: Compact encodings of planar graphs with canonical orderings and multiple parentheses. In: Larsen, K.G., Skyum, S., Winskel, G. (eds.) ICALP 1998. LNCS, vol. 1443, pp. 118–129. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  10. 10.
    Deo, N., Litow, B.: A structural approach to graph compression. In: Brim, L., Gruska, J., Zlatuška, J. (eds.) MFCS 1998. LNCS, vol. 1450, pp. 91–101. Springer, Heidelberg (1998)Google Scholar
  11. 11.
    González, R., Navarro, G.: Compressed text indexes with fast locate. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 216–227. Springer, Heidelberg (2007)Google Scholar
  12. 12.
    Gulli, A., Signorini, A.: The indexable web is more than 11.5 billion pages. In: Proc. WWW (2005)Google Scholar
  13. 13.
    He, X., Kao, M.-Y., Lu, H.-I.: Linear-time succinct encodings of planar graphs via canonical orderings. J. Discrete Mathematics 12(3), 317–325 (1999)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    He, X., Kao, M.-Y., Lu, H.-I.: A fast general methodology for information-theoretically optimal encodings of graphs. SIAM J. Comput. 30, 838–846 (2000)MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Itai, A., Rodeh, M.: Representation of graphs. Acta Informatica 17, 215–219 (1982)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Jacobson, G.: Space-efficient static trees and graphs. In: Proc. FOCS, pp. 549–554 (1989)Google Scholar
  17. 17.
    Keeler, K., Westbook, J.: Short encodings of planar graphs and maps. Discrete Applied Mathematics 58, 239–252 (1995)MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Extracting large scale knowledge bases from the Web. In: Proc. VLDB (1999)Google Scholar
  19. 19.
    Larsson, J., Moffat, A.: Off-line dictionary-based compression. Proc. IEEE 88(11), 1722–1732 (2000)CrossRefGoogle Scholar
  20. 20.
    Lu, H.-I.: Linear-time compression of bounded-genus graphs into information-theoretically optimal number of bits. In: Proc. SODA, pp. 223–224 (2002)Google Scholar
  21. 21.
    Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) Foundations of Software Technology and Theoretical Computer Science. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)Google Scholar
  22. 22.
    Munro, I., Raman, V.: Succinct representation of balanced parentheses, static trees and planar graphs. In: Proc. FOCS, pp. 118–126 (1997)Google Scholar
  23. 23.
    Naor, M.: Succinct representation of general unlabeled graphs. Discrete Applied Mathematics 28, 303–307 (1990)MATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Navarro, G.: Compressing web graphs like texts. Technical Report TR/DCC-2007-2, Dept. of Computer Science, University of Chile (2007)Google Scholar
  25. 25.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1) article 2 (2007)Google Scholar
  26. 26.
    Raghavan, S., Garcia-Molina, H.: Representing Web graphs. In: Proc. ICDE (2003)Google Scholar
  27. 27.
    Randall, K., Stata, R., Wickremesinghe, R., Wiener, J.: The LINK database: Fast access to graphs of the Web. Technical Report 175, Compaq Systems Research Center, Palo Alto, CA (2001)Google Scholar
  28. 28.
    Rossignac, J.: Edgebreaker: Connectivity compression for triangle meshes. IEEE Transactions on Visualization 5(1), 47–61 (1999)CrossRefGoogle Scholar
  29. 29.
    Suel, T., Yuan, J.: Compressing the graph structure of the Web. In: Proc. IEEE DCC, pp. 213–222. IEEE Computer Society Press, Los Alamitos (2001)Google Scholar
  30. 30.
    Turán, G.: Succinct representations of graphs. Discrete Applied Mathematics 8, 289–294 (1984)MATHCrossRefMathSciNetGoogle Scholar
  31. 31.
    Wan, R.: Browsing and Searching Compressed Documents. PhD thesis, Dept. of Computer Science and Software Engineering, University of Melbourne (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Francisco Claude
    • 1
  • Gonzalo Navarro
    • 1
  1. 1.Department of Computer Science, Universidad deChile

Personalised recommendations