Analysis of Link Graph Compression Techniques

  • David Hannah
  • Craig Macdonald
  • Iadh Ounis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4956)

Abstract

Links between documents have been shown to be useful in various Information Retrieval (IR) tasks - for example, Google has been telling us for many years now that the PageRank authority measure is at the heart of its relevance calculations. To use such link analysis techniques in a search engine, special tools are required to store the link matrix of the collection of documents, due to the high number of links typically involved. This work is concerned with the application of compression to the link graph. We compare several techniques of compressing link graphs, and conclude on speed and space metrics, using various standard IR test collections.

Keywords

Compression Technique Test Collection Encode Technique Link Graph Link Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford Digital Library Technologies Project (1998)Google Scholar
  2. 2.
    Kleinberg, J.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Boldi, P., Vigna, S.: The WebGraph Framework I: compression techniques. In: Proceedings of WWW 2004, pp. 595–602 (2004)Google Scholar
  4. 4.
    Boldi, P., Vigna, S.: The WebGraph Framework II: Codes for the WWW. Technical Report 294-03, Universit di Milano (2003)Google Scholar
  5. 5.
    Elias, P.: Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory 21(2), 194–203 (1975)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A high performance and scalable IR platform. In: Proceedings of SIGIR OSIR Workshop, pp. 18–25 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • David Hannah
    • 1
  • Craig Macdonald
    • 1
  • Iadh Ounis
    • 1
  1. 1.Department of Computing ScienceUniversity of GlasgowUK

Personalised recommendations