Skip to main content

Permuting Web Graphs

  • Conference paper
Algorithms and Models for the Web-Graph (WAW 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5427))

Included in the following conference series:

Abstract

Since the first investigations on web graph compression, it has been clear that the ordering of the nodes of the graph has a fundamental influence on the compression rate (usually expressed as the number of bits per link). The author of the LINK database [1], for instance, investigated three different approaches: an extrinsic ordering (URL ordering) and two intrinsic (or coordinate-free) orderings based on the rows of the adjacency matrix (lexicographic and Gray code); they concluded that URL ordering has many advantages in spite of a small penalty in compression. In this paper we approach this issue in a more systematic way, testing some old orderings and proposing some new ones. Our experiments are made in the WebGraph framework [2], and show that the compression technique and the structure of the graph can produce significantly different results. In particular, we show that for the transpose web graph URL ordering is significantly less effective, and that some new orderings combining host information and Gray/lexicographic orderings outperform all previous methods. In particular, in some large transposed graphs they yield the quite incredible compression rate of 1 bit per link.

This work is partially supported by the EC Project DELIS, by MIUR PRIN Project “Automi e linguaggi formali: aspetti matematici e applicativi”, and by MIUR PRIN Project “Web Ram: web retrieval and mining”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Randall, K., Stata, R., Wickremesinghe, R., Wiener, J.L.: The LINK database: Fast access to graphs of the Web. Research Report 175, Compaq Systems Research Center, Palo Alto, CA (2001)

    Google Scholar 

  2. Boldi, P., Vigna, S.: The WebGraph framework I: Compression techniques. In: Proc. of the Thirteenth International World Wide Web Conference, Manhattan, USA, pp. 595–601. ACM Press, New York (2004)

    Google Scholar 

  3. Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tompkins, A., Upfal, E.: The Web as a graph. In: PODS 2000: Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 1–10. ACM Press, New York (2000)

    Google Scholar 

  4. Bharat, K., Broder, A., Henzinger, M., Kumar, P., Venkatasubramanian, S.: The Connectivity Server: fast access to linkage information on the Web. Computer Networks and ISDN Systems 30(1-7), 469–477 (1998)

    Article  Google Scholar 

  5. Blandford, D.K., Blelloch, G.E.: Index compression through document reordering. In: Data Compression Conference, pp. 342–351. IEEE Computer Society, Los Alamitos (2002)

    Google Scholar 

  6. Shieh, W.Y., Chen, T.F., Shann, J.J.J., Chung, C.P.: Inverted file compression through document identifier reassignment. Inf. Process. Manage 39(1), 117–131 (2003)

    Article  MATH  Google Scholar 

  7. Silvestri, F.: Sorting out the document identifier assignment problem. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 101–112. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  8. Blanco, R., Barreiro, Á.: Document identifier reassignment through dimensionality reduction. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 375–387. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Knuth, D.E.: The Art of Computer Programming. In: Fascicle 2: Generating All Tuples and Permutations (Art of Computer Programming), vol. 4. Addison-Wesley Professional, Reading (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Boldi, P., Santini, M., Vigna, S. (2009). Permuting Web Graphs. In: Avrachenkov, K., Donato, D., Litvak, N. (eds) Algorithms and Models for the Web-Graph. WAW 2009. Lecture Notes in Computer Science, vol 5427. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-95995-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-95995-3_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-95994-6

  • Online ISBN: 978-3-540-95995-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics