Graph Compression for Adjacency-Matrix Multiplication

Computing the product of the (binary) adjacency matrix of a large graph with a real-valued vector is an important operation that lies at the heart of various graph analysis tasks, such as computing PageRank. In this paper, we show that some well-known webgraph and social graph compression formats are computation-friendly, in the sense that they allow boosting the computation. We focus on the compressed representations of (a) Boldi and Vigna and (b) Hernández and Navarro, and show that the product computation can be conducted in time proportional to the compressed graph size. Our experimental results show speedups of at least 2 on graphs that were compressed at least 5 times with respect to the original.


Introduction
Let ∈ {0, 1} n×n be an n × n binary matrix and ⃗ x = (x 1 , … , x n ) ∈ ℝ n be a vector. Matrix-vector multiplication, either ⃗ x ⋅ or ⋅ ⃗ x ⊤ , is not only a fundamental operation in mathematics but also a key operation in various graph-analysis tasks, when is their adjacency matrix. A well-known example, which we use as a motivation, is the computation of PageRank on large Web graphs. PageRank is a particular case of many network centrality measures that can be approximated through the power method [1, Chapter 11.1]. Most real networks, and in particular webgraphs and social graphs, have very sparse adjacency matrices [2]. While it is straightforward to compute a matrix-vector product in time proportional to the nonzero entries of , the most successful Web and social graph compression methods exploit other properties that allow them to compress the graphs well beyond what is possible by their mere sparsity. It is therefore natural to ask whether those more powerful compression formats allow us, as sparsity does, to compute the product in time proportional to the size of the compressed representation. This is an instance of computation-friendly compression, which applies compression formats that not only reduce the size of the representation of objects, but also speed up computations on them by directly operating on the compressed representations. Elgohary et al. [3] addressed this problem for structured matrices commonly found in machine learning for matrix-vector multiplication. However, Abboud et et. [4] have proven that with sophisticated compression techniques it is difficult or even impossible to compute basic linear-algebra operations like matrixvector multiplication in subquadratic time. Additionally, Chakraborty et al. [5] showed that any data structure storing r bits with n < r < n 2 must have a query time t satisfying tr ∈ Ω(n 3 polylog(n)) . Other examples of computationfriendly compression are pattern matching in compressed strings [6], computation of edit distance between compressible strings [7], speedups for multiplying sequences of matrices and the Viterbi algorithm [8], representing bipartite graphs [9], building small and shallow circuits [10], among other tasks [11].

Our Contribution
In this article, we exploit compressed representations of webgraphs and social networks and show that matrix-vector products can be carried out much faster than just operating on all the nonzero entries of the matrix. Although our approach can be extended to other compressed representations of graphs and binary matrices, we mostly consider two representations: one proposed by Boldi and Vigna [12], and the other proposed by Hernández and Navarro [13]. For the former, the key observation for us is that adjacency lists, i.e., rows in , are compressed differentially with respect to other similar lists, and thus one can reuse and "correct" the result of the multiplication of a previous similar row with ⃗ x ⊤ . The latter representation works by extracting regular substructures in the matrix, on which the matrix multiplication becomes particularly simple. A preliminary version of this article appeared at the Data Compression Conference 2018 [14]. The conference version did not consider Hernández and Navarro's representations, and we also experiment with slicing the matrix vertically into sub-matrices.

Structure of this Article
We describe previous work in the next section (Section "Previous Work"). The following sections describe PageRank and the compression format of Boldi and Vigna (Section "Computation on the WebGraph Format"). We then describe how we exploit that compression format to speed up matrix multiplication (Section "Computation on the WebGraph Format"), and a vertical split of the matrix used for PageRank to boost up the compression in Section "Vertically Slicing into Sub-Matrices". Section "Experimental Evaluation" contains experimental results for this compression format. Subsequently, in Section "Computation with Bicliques", we show how to use the compression format of Hernández and Navarro [13]. We conclude this article with directions for future work. Compared to the conference version of this paper [14], we added Section "Vertically Slicing into Sub-Matrices", a more thorough evaluation including variable window sizes, and the second compression format of Hernández and Navarro in Section "Computation with Bicliques".

Previous Work
Matrix multiplication is a fundamental problem in computer science; see, e.g., a recent survey of results [15]. Computation-friendly matrix compression has been already considered by others, even if indirectly. Karande et al. [16] addressed it by exploiting a structural compression scheme, namely by introducing virtual nodes. Although their results were similar to the ones presented in this paper, their approach was more complex and it could not be used directly, requiring the correction of computation results. On the other hand, contrary to their belief, we show in this paper that representational compression schemes do not always require the same amount of computation, providing a much simpler approach that can be used directly without requiring corrections.
Another interesting approach was proposed by Nishino et al. [17]. Although they did not exploit compression in the same way we do, they observed that intermediate computational results for the matrix multiplication of equivalent partial rows of a matrix are the same. Their approach is to use an adjacency forest representing the rows of the matrix; this forest achieves compression by compacting common suffixes of the rows. We should note that the authors consider general real matrices, and not only Boolean matrices as we do. Nevertheless, they presented results for computing the PageRank over adjacency matrices as we do, achieving similar results. Their approach implied preprocessing the graph, however, while we start from an already compressed graph. An interesting question is how their approach could be exploited on top of k 2 -trees [18].
The question addressed here can also be of interest for the problem of Online Matrix-Vector (OMV) multiplication. Given a stream of binary vectors, ⃗ x 1 , ⃗ x 2 , ⃗ x 3 , … , the results of matrix-vector multiplications ⃗ x i ⋅ can be computed faster than computing them independently, with most approaches making use of previous computations ⃗ x j ⋅ , for j < i , to speed up the computation of each new product ⃗ x i ⋅ [19,20]. Nevertheless, none of those approaches preprocess matrix to exploit its redundancies. Hence, by exploiting a suitable compressed representation of as we do here, an improvement for OMV can be easily obtained, with computational time depending on the length of the compressed representation of instead.

PageRank
Given G = (V, E) , a directed graph with n = |V| vertices and m = |E| edges, let be its adjacency matrix; A uv = 1 if (u, v) ∈ E , and A uv = 0 otherwise. We assume that for each vertex u, there is at least one vertex v with A uv = 1 .

SN Computer Science
The normalized adjacency matrix of G is the matrix = −1 ⋅ , where is an n × n diagonal matrix with D uu the degree d u of u ∈ V , i.e., D uu = d u = ∑ v A uv ≥ 1 . Note that is the standard random-walk matrix, where a random walker at vertex u jumps to a neighbor v of u with probability 1∕d u . Moreover the k-power of , k , is the random-walk matrix after k steps, i.e., M k uv is the probability of the random walker being at vertex v after k jumps, having started at vertex u. PageRank is a typical random walk on G with transition matrix . Given a constant 0 < < 1 and a probability vector p 0 , the PageRank vector ⃗ p is given by the following recurrence [21]: The parameter is called the teleport probability or jumping factor, and ⃗ p 0 is the starting vector. In the original PageRank [22], the starting vector ⃗ p 0 is the uniform distribution over the vertices of G, i.e., ⃗ p 0 = ⃗ 1∕n . When ⃗ p 0 is not the stationary distribution, ⃗ p is called a personalized PageRank. Intuitively, ⃗ p is the probability of a lazy Web visitor to be at each page assuming that he/she surfs the Web by either randomly starting at a new page or jumping through a link from the current page. The parameter ensures that such a surfer does not get stuck at a dead end. PageRank can be approximated iteratively through the power iteration method by iterating, for t ≥ 1: We show how to speed up these matrix-vector multiplications when the adjacency matrix is compressible.

Computation on the WebGraph Format
Our main idea is to exploit the copy-property of adjacency lists observed in some graphs, such as Web graphs [12]. The adjacency lists of neighbor vertices tend to be very similar and, hence, the rows in the adjacency matrix are also very similar. Moreover, these networks reveal also strong clustering effects, with local groups of vertices being strongly connected and/or sharing many neighbors. The copy-property effect can then be further amplified through clustering and suitable vertex reordering, an important step for achieving better graph compression ratios [23]. Most compressed representations for sparse graphs rely on these properties [18,24,25]. In this paper, we consider the WebGraph framework, a suite of codes, algorithms and tools that aims at making it easy to manipulate large Web graphs [12]. Among several compression techniques used in WebGraph, our approach makes use of list referencing.
Let be an n × n binary sparse matrix, The reference r i is found in the WebGraph framework within a given window size W, i.e., r i ∈ {max(1, i − W), … , i} , and it is optimized to reduce the length of the representation of ⃗ v i . The line ⃗ v i is then represented by adding missing entries and marking spurious ones, with respect to ⃗ v r i , and encoded using several techniques, such as differential compression and codes for natural numbers [12,26].
x ∈ ℝ n , and a ref- w can be incrementally computed because r i < i and w i = y r i , ensuring that w i is already computed when required to compute y i . Given inputs ′ , ⃗ r and ⃗ x , the algorithm to compute ⃗ y is as follows: Note that the number of operations to obtain ⃗ y ⊤ = ⋅ ⃗ x ⊤ is proportional to the number of nonzeros in ′ , that is, to the compressed representation size. Depending on the properties of discussed before, this number may be much smaller than the number of nonzeros in . We present in the next section experimental results for Web graphs, where we indeed obtain considerable speedups in the computation of PageRank.

Vertically Slicing into Sub-Matrices
The quality of our approach hinges on the quality of ⃗ r . The wider the matrix is, the more difficult it can become to find a previous row adequately matching the entries. It therefore could make sense to vertically split into submatrices 1 , … , , each with n rows and Θ(n∕ ) columns, in the hope that applying the techniques of "Computation on the WebGraph Format" to every submatrix gives a better chances of finding good references, and therefore eases the chances of obtaining a representation that is more compact than using the technique on the entire matrix . This somewhat reflects real-world examples where a matrix does not usually contain complete row repetitions, but rather clustered repetitions of a certain length, which are hopefully of length Ω(n∕ ) . To see that we still can compute the matrixvector multiplication with the submatrices efficiently, take a vector ⃗ x ∈ ℝ n . We can compute the product ⃗ . ] and then ⃗ y = ∑ i=1 y i summing up all computed products. Setting = 1 disables this technique, i.e., just using the technique of Section "Computation on the WebGraph Format". Setting = n gives us the school-book matrix-vector multiplication algorithm.

Experimental Evaluation
We computed the number of nonzero entries m ′ in ′ for the adjacency matrix of several graphs available at http:// law. di. unimi. it/ datas ets. php [12,23,27]. We show in Table 1 some characteristics of the used graphs, including the number of vertices n and the number of edges m, for each graph.
We categorize our studied graphs into the following three classes:

pagegraph:
each node represents a single web page, and each arc is a link between two pages; hostgraph: each node is a (sub)domain, i.e., a host, and an arc between two hosts exists if one host has a page having a link to the page of the other host; socialnetwork: each node represents an entity such as a person or user, and an arc represents a social relation.
We call page graphs and host graphs together web graphs (not to be confused with the WebGraph framework storing graphs in its WebGraph representation). As a preprocessing step for our experiments, we extracted ′ and ⃗ r from the WebGraph representation of , and compressed ′ as a WebGraph, before actually starting the computation of the PageRank algorithm. For each window size W, we first recompressed a graph with the selected W and compressed the references with Elias-via the parameter -c REFERENCES_GAMMA. We did so because the WebGraph representation achieves high compression but is limited to storing adjacency matrices, which we needed for PageRank. Hence, this approach allowed us to do all of the computation in compressed space, which would not be possible for commodity computers to run in RAM if we had extracted in its plain form. As an optimization, whenever we kept ⃗ v i as the row in ′ . By doing so, we obtained fewer nonzero entries.
We implemented PageRank using the algorithm described in Sect. 2 computing matrix-vector products. Since Eq. (1) uses left products and our representation is row-oriented, we use the transposed adjacency matrix and right products. The implementation is in Java and based on the WebGraph representation, where ′ is represented as two graphs: a positive one for edges with weight 1, and a negative one for edges with weight −1 . All tests were conducted on a machine running Linux, with an Intel CPU i3-9100 (4 cores, cache 256 KB/6144 KB) and with 128 GB of RAM. Java code was compiled and executed with OpenJDK 11.0.9.1 and the parameters -Xmx100G -Xss100M to access 100 GB of RAM and keeping an execution stack of size at most 100 MB. We ran each PageRank computation for ten iterations, starting with the initial vector ⃗ p 0 representing the uniform distribution.
We have the benchmark results of the PageRank evaluation in the last columns of Table 2 for different window sizes. As expected, our approach works well for web graphs, with the number of nonzeros in ′ being less than 20% for page graphs and less than 30% for host graphs. Note that web graphs are known to verify the copy-property among adjacencies. Other networks we tested, instead, seem not to verify this property in the same degree, and, therefore, our approach is not beneficial. This was expected, as social networks are not as compressible as Web graphs [28]. There may exist, however, other representations for these networks that may benefit from other compression approaches (see the next section). In general, large reductions in the numbers of nonzero entries tend to give bigger speed-ups, although the relationship may be complicated by algorithmic details, system characteristics and the interaction between the two. Let us now consider the graphs eu-2015-host-hc and it-2004-hc. Observed speedups are lower than we would expect given that ′ has roughly 3 times fewer nonzeros than for eu-2015-host-hc, and roughly 4 times fewer for it-2004-hc. After profiling, we could observe that, although ′ had much fewer nonzero entries than , the nonzero entries in ′ are more dispersed than those in , with benefiting from contiguous memory accesses. The speedups are nevertheless significant, especially when we are dealing with larger graphs like uk-2014-hc.
In a subsequent experiment, we vertically split the matrix ⊤ into c submatrices as described in Sect. 4, and evaluated the compression gain in light of the compression technique of the WebGraph framework, which can find more suitable references as the matrices become slimmer. We present our evaluation in Table 3, where we can observe a slight reduction in the number of nonzero entries when comparing the sums of the submatrices with the original matrix ⊤ . We can observe that scaling up c reduces the sum of all nonzero entries in the submatrices, while scaling up W can have a beneficial effect on large graphs. This effect is minor compared to some of the reductions in Table 2, however, so we did not evaluate whether it further sped up matrix-vector multiplications. Moreover, the WebGraph framework seems not to take particular advantage of this special structured set of submatrices, since the sum of the file sizes of the submatrices grows considerably with the number of submatrices c.
The main bottleneck of the whole computation was the recompression of the graphs or the compression of the submatrices. A larger graph instance can take several hours of pre-computation, or even longer on large graphs such as eu-2015-hc or gsh-2015-hc, where we omitted some parameter choices in Tables 2 and 3 because they would take too long to compute.

Computation with Bicliques
Another suitable format is the biclique extraction method of Hernández and Navarro [13]. They decompose the edges of G into a list of bicliques and a residual set of edges. A biclique is a pair of sets of nodes of the form (S r , T r ) , where every node of S r has an edge to every node of T r . We represent the |S r | ⋅ |T r | edges of each biclique (S r , T r ) by listing both sets, which gives us |S r | + |T r | integers, i.e., the identifiers of the nodes. These can be compressed by differential encoding with a universal coder. It has been shown that this format is competitive in compressing both webgraphs and social graphs.
Let ′ denote the adjacency matrix representing the residual set of edges. To compute ⋅ ⃗ x ⊤ , we compute for each biclique (S r , T r ) the value c r = ∑ j∈T r x j . We then allocate the vector ⃗ y whose entries are initially set to zero. Subsequently, for each biclique (S r , T r ) and each node identifier i ∈ S r , we add c r to y i . Finally, for each residual edge A � ij = 1 , we add x j to y i . By doing so, the resulting vector ⃗ y ⊤ is equal to the product ⋅ ⃗ x ⊤ , and we obtained ⃗ y in time proportional to the size of the compressed matrix. We carry out a proof-of-concept implementation of this idea by building on top of the current biclique extraction software [13]. This software has some limitations that carry over our implementation. The most crucial one is that the number of total edges is expected to fit into 32 bits, which limits complete graphs to small sizes of n ≤ 2 16 . This was also the knock-out criteria for several aforementioned graph instances. We present the evaluation of the graphs that could be successfully processed by the software in Table 4. There, instances with names having a -t suffix represent the adjugate matrix A ⊤ .
Another limitation we managed to overcome is that the biclique extraction software implicitly assumes that all nodes have self-loops. To compute the later PageRank evaluation based on the compressed representation with bicliques correctly, we perform two post-processing steps: First, we filter out those nodes that are simultaneously in both S r and T r for some r, but originally did not have a selfloop. Secondly, we add self-loops to each node that originally had a self-loop, but not in both S r and T r simultaneously for any r. We denote these erroneously added edges or erroneously removed edges by Δ + and Δ − , respectively.
In the table, we additionally measure the sizes of the extracted bicliques with b S = ∑ r �S r � , and b T = ∑ r �T r � . We can see that the biclique sizes have a super-linear impact on the number of remaining edges m ′ . This is good for the PageRank computation since the relatively small overhead of the additional computation for the bicliques dwarfs the computation with the adjacency matrix ′ , which has much fewer entries than the original matrix . Especially for large graphs with many bicliques such as it-2004-hc, we have a speed-up of 2.36. However, if the ratio m∕m � between original edges and remaining edges is roughly at 1, our proposed technique is slightly slower. For these experiments, we used the same machine as in Sect. 5, but devised an implementation in Rust, which is available at https:// github. com/ koeppl/ matri xbicl ique.

Conclusion
We have shown that the adjacency matrix compression scheme of Boldi and Vigna [12] as well as the biclique extraction of Hernández and Navarro [13] are suitable representations for computing matrix-vector products in time proportional to the compressed matrix sizes. We therefore can conclude that these compression formats not only save space but also speed up an operation that is crucial for graph analysis tasks. We plan to consider other formats where it is less clear how to translate the reduction in space into a reduction in computation time [18,24,25], and study which other relevant matrix operations can be boosted by which compression formats.
We also plan to investigate whether there is a better way to speed up matrix-matrix multiplications via compression, than by treating them as repeated matrix-vector multiplications. For example, suppose we have compressed two matrices according to the Webgraph framework, with referencing vectors ⃗ r and ⃗ c , and now we want to compute = . Calculation shows In standard matrix-matrix multiplication, when we want to compute i,j , we have already computed i,⃗ c j , ⃗ r i ,j and ⃗ r i ,⃗ c j , so we need only compute the product (⃗ v i − ⃗ v ⃗ r i )( ⃗ w j − ⃗ w ⃗ c j ) of two vectors each of which -as the differences between a vector and its reference -is likely to be sparse.
If we compute that product by always scanning the nonzero entries of ⃗ v i − ⃗ v ⃗ r i and checking the corresponding entries of ⃗ w j − ⃗ w ⃗ c j (or vice versa) and performing a multiplication whenever one of those corresponding entries is also nonzero, then we are essentially computing by repeated . Table 4 Evaluation of PageRank with either the plain matrix or with the adjacency matrix of the remaining nodes after biclique extraction. m ′ is the number of nonzero entries in ′ . b S and b T denote the total sizes of the left hands and the right hands of all bicliques. Δ + and Δ − denote the number of spurious self-loops that have been added by bicliques or have been erroneously omitted in the set of remain-ing edges, respectively. t and t ′ denote the time for computing Pag-eRank on the original adjacency matrix and on the adjacency matrix of the remaining nodes with the bicliques, respectively. Finally, t∕t � is the speedup (or slowdown if < 1 ) observed in the computation of PageRank matrix-vector multiplications. If we always choose whichever of ⃗ v i − ⃗ v ⃗ r i and ⃗ w j − ⃗ w ⃗ c j has the smaller support, then we may obtain an additional speedup. Finally, it may be even faster to compute adaptively the intersection of the sets of positions of nonzero entries (see, e.g., [29]).