Abstract
Several methods have been proposed for compressing the linkage data of a Web graph. Among them, the method proposed by Boldi and Vigna is known as the most efficient one. In the paper, we propose a new method to compress a Web graph. Our method is more efficient than theirs with respect to the size of the compressed data. For example, our method needs only 1.99 bits per link to compress a Web graph containing 3,216,152 links connecting 325,557 pages, while the method of Boldi and Vigna needs 2.84 bits per link to compress the same Web graph.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Asano, Y., Ito, T., Imai, H., Toyoda, M., Kitsuregawa, M.: Compact Encoding of the Web Graph Exploiting Various Power Laws: Statistical Reason Behind Link Database. In: Dong, G., Tang, C.-j., Wang, W. (eds.) WAIM 2003. LNCS, vol. 2762, pp. 37–46. Springer, Heidelberg (2003)
Asano, Y., Nishizeki, T., Toyoda, M., Kitsuregawa, M.: Mining Communities on the Web Using a Max-Flow and a Site-Oriented Framework. IEICE Trans. Inf. Syst. E89-D (10), 2606–2615 (2006)
Asano, Y., Tezuka, Y., Nishizeki, T.: Improvements of HITS Algorithms for Spam Links. In: Dong, G., Lin, X., Wang, W., Yang, Y., Yu, J.X. (eds.) APWeb/WAIM 2007. LNCS, vol. 4505, pp. 479–490. Springer, Heidelberg (2007)
Bharat, K., Broder, A., Henzinger, M., Kumar, P., Venkatasubramanian, S.: The Connectivity Server: Fast Access to Linkage Information on the Web. In: Proc. of the 7th WWW, pp. 469–477 (1998)
Blandford, D.K., Blelloch, G.E., Kash, I.A.: Compact Representation of Separable Graphs. In: Proc. of the 14th SODA, pp. 679–688 (2003)
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Proc. of the 7th WWW, pp. 14–18 (1998)
Boldi, P., Vigna, S.: The Web Graph Framework I: Compression Techniques. In: Proc. of the 13th WWW, pp. 595–601 (2004)
Boldi, P., Vigna, S.: Codes for the World Wide Web. Internet Mathematics 2(4), 405–427 (2005)
Claude, F., Navarro, G.: A Fast and Compact Web Graph Representation. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 118–129. Springer, Heidelberg (2007)
Cormen, T.H., Leiserson, C.E., Rivest, R., Stein, C.: Introduction to Algorithms. 2nd edn. MIT Press, Cambridge (2001)
Elias, P.: Universal Codeword Sets and Representaions of the Integers. IEEE Transactions on Information Theory 21, 194–203 (1975)
Flake, G.W., Lawrence, S., Giles, C.L.: Efficient Identification of Web Communities. In: Proc. of the 6th KDD, pp. 150–160 (2000)
Guillaume, J.L., Latapy, M., Viennot, L.: Efficient and Simple Encodings for the Web Graph. In: Meng, X., Su, J., Wang, Y. (eds.) WAIM 2002. LNCS, vol. 2419, pp. 328–337. Springer, Heidelberg (2002)
Kleinberg, J.: Authoritative Sources in a Hyperlinked Environment. In: Proc. of the 9th SODA, pp. 668–677 (1998)
Kou, W.: Digital Image Compression: Algorithms and Standards. Springer, Heidelberg (1995)
Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the Web for Emerging Cyber-Communities. Computer Networks 31(11-16), 1481–1493 (1999)
Larsson, N.J., Moffat, A.: Off-Line Dictionary-Based Compression. Proc. IEEE 88(11), 1722–1732 (2000)
Levenstein, V.E.: On the Redundancy and Delay of Separable Codes for the Natural numbers. Problems of Cybernetics 20, 173–179 (1968)
Randall, K., Stata, R., Wickremesinghe, R., Wiener, J.L.: The Link Database: Fast Access to Graphs of the Web. Research Report 175, Compaq Systems Research Center, Palo Alto, CA (2001)
Suel, T., Yuan, J.: Compressing the Graph Structure of the Web. In: Proc. of the Data Compression Conference, pp. 213–222 (2001)
WebGraph Homepage, http://webgraph.dsi.unimi.it/
Wickremesinghe, R., Stata, R., Wiener, J.: Link Compression in the Connectivity Server. Technical Report, Compaq Systems Research Center, Palo Alto, CA (2000)
Zhang, Y., Yu, J.X., Hou, J.: Web Communities: Analysis and Construction. Springer, Berlin (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Asano, Y., Miyawaki, Y., Nishizeki, T. (2008). Efficient Compression of Web Graphs. In: Hu, X., Wang, J. (eds) Computing and Combinatorics. COCOON 2008. Lecture Notes in Computer Science, vol 5092. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69733-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-69733-6_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69732-9
Online ISBN: 978-3-540-69733-6
eBook Packages: Computer ScienceComputer Science (R0)