Analysis of Link Graph Compression Techniques
Links between documents have been shown to be useful in various Information Retrieval (IR) tasks - for example, Google has been telling us for many years now that the PageRank authority measure is at the heart of its relevance calculations. To use such link analysis techniques in a search engine, special tools are required to store the link matrix of the collection of documents, due to the high number of links typically involved. This work is concerned with the application of compression to the link graph. We compare several techniques of compressing link graphs, and conclude on speed and space metrics, using various standard IR test collections.
KeywordsCompression Technique Test Collection Encode Technique Link Graph Link Database
Unable to display preview. Download preview PDF.
- 1.Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford Digital Library Technologies Project (1998)Google Scholar
- 3.Boldi, P., Vigna, S.: The WebGraph Framework I: compression techniques. In: Proceedings of WWW 2004, pp. 595–602 (2004)Google Scholar
- 4.Boldi, P., Vigna, S.: The WebGraph Framework II: Codes for the WWW. Technical Report 294-03, Universit di Milano (2003)Google Scholar
- 6.Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A high performance and scalable IR platform. In: Proceedings of SIGIR OSIR Workshop, pp. 18–25 (2006)Google Scholar