Improved Distance Queries and Cycle Counting by Frobenius Normal Form

Consider an unweighted, directed graph G with the diameter D. In this paper, we introduce the framework for counting cycles and walks of given length in matrix multiplication time \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\widetilde {O}(n^{\omega })$\end{document}O~(nω). The framework is based on the fast decomposition into Frobenius normal form and the Hankel matrix-vector multiplication. It allows us to solve the All-Nodes Shortest Cycles, All-Pairs All Walks problems efficiently and also give some improvement upon distance queries in unweighted graphs.


Introduction
The All-Pairs Shortest Paths (APSP) problem asks to find distances between all pairs of vertices in a graph.For a directed graphs with weights in R, there is a classical O(n 3 ) time algorithm Floyd and Warshall [12,28].Currently best upper bound for this problem is due to Williams [29] O( n 3   2 Ω(log n) 0.5 ) algorithm.It is asymptotically faster than O(n 3 / log c n) for any c > 0 (see survey [6] for earlier algorithms).Showing any algorithm that would work in O(n 3− ) time for some > 0 is a major open problem [29].
If we consider unweighted, directed graphs there are subcubic algorithms that exploit fast matrix multiplication.For the undirected graph Seidel [22] presented the optimal O(n ω ) time algorithm, where ω < 2.373 is the matrix multiplication exponent [17].For the directed case Zwick [33] presented an O(n 2.575 ) time algorithm that is based on the fast rectangular 2 Related Work
They showed an algorithm that needs O(M n ω ) preprocessing time.After preprocessing each distance δ(u, v) in the graph can be computed exactly in O(n) query time.In the special case M = 1 they showed O(n ω ) algorithm that solves Single Source Shortest Paths (SSSP).We will match their bounds (up to the polylogarithmic factors) using Frobenius normal form.Next we will extend their algorithm so it will return more information about a graph in the same query/preprocessing time.

Counting Cycles
For a given graph G determining whether G contains a simple cycle of length exactly k is NP-hard (in particular determining whether a graph contains a Hamiltonian cycle is NP-complete).However, if we fix k to be a constant this problem can be solved in polynomial time.
Alon et al. [4] introduced a color coding technique.For a fixed k if a graph G(V, E) contains a simple cycle of size exactly k then such cycle can be found in O(|V | ω ) time.Unfortunately, their algorithm depends exponentially 2 O(k) on the length of the cycle and in consequence is inapplicable for large k.In the next years, Alon et al. [5] showed (using a different technique) that for k ≤ 7 it is possible to count the number of cycles of length exactly k in a graph in O(|V | ω ) time.In [31] it is shown that for any even k, cycles of length k can be found in O(|V | 2 ) time in undirected graphs (if they contain such a cycle).Alon et al. [5] showed more methods that depend solely on a number of edges in a graph.For example for odd k they showed O(E 2− 2 k+1 ) algorithm for finding a cycles of length k.However, for dense graphs these results are worse than Alon et al. [4].
It appears that to break the exponential dependence on the length of the cycle we can do one of the following: Consider non-simple cycles (the vertices can reoccur) of length exactly k, Determine cycles of length at most k.
To detect whether a non-simple cycle of length exactly k exists one can use the folklore algorithm.It starts by taking the adjacency matrix A of a graph G. Subsequently, in O(n ω )

56:3
time compute A k by repeated squaring.If Tr A k > 0 then a non-simple k length cycle exists 1 .
Yuster [30] considered the following problem: for every vertex in a graph find a shortest cycle that contains it.He called this problem All-Nodes Shortest Cycle (ANSC).He showed a randomized algorithm that solves ANSC for undirected graphs with weights {1, . . ., M } in O( √ M n (ω+3)/2 ) time.He noted that for simple digraphs (directed graphs with no antiparallel edges) it is reduced to All-Pairs Shortest Paths problem.The fastest known APSP algorithm for unweighted, directed graphs runs in O(n 2.575 ) due to [33].Here, we will show how to solve ANSC in O(n ω ) for general, unweighted, directed graphs.Unfortunately, our techniques will allow us only to find the length of such a cycle (not determining it).But we can return the set of points, that lie on some cycle of a given length.Independently to our work Agarwal and Ramachandran [3] proved that ANSC can be solved in O(n ω ) for unweighted, undirected graphs using a completely different technique.

Preliminaries
Let T (n) be the minimal number of algebraic operations needed to compute the product of n × n matrix by an n × n matrix.We say that ω is the exponent of square matrix multiplication.For now the best known upper bound on ω is due to Le Gall [17]: Theorem 1 (Le Gall [17]).For every > 0, T (n) < O(n ω+ ) where ω < 2.37287.
In this paper, we will omit in definition and will assume (like in many other papers) that O(n ω ) operations are needed to multiply two matrices.The best lower bound for the exponent of matrix multiplication is ω ≥ 2. For convenience in this paper we will assume that ω > 2. The Õ notation hides polylogarithmic factors in the complexity.We will use it to emphasize that all our algorithms need the polylogarithmic number of calls to the fast matrix multiplication algorithm.
The comprehensive description of Frobenius normal form will be presented in Section 4. The properties of Frobenius normal form used in this paper are well known in literature [24,23,10].There are also probabilistic algorithms that compute this form in expected O(n ω ) with high probability over small fields [11].In this paper, all algorithms are deterministic if we set the upper bound on the number of distinct walks W .Then, due to the time of a single field operation we need additional O(log W ) factor in the complexity.However, if we are only interested if cycle/walk of a given length exists in a graph, we can set sufficiently small field Z p (p has O(log n) bits).This way when algorithm returns nonzero we are sure that some walk exists there.If algorithm returns zero, then with high probability there is no such walk.
In this paper, randomization occurs only because for some graphs number of walks can be exponential (e.g., W = O(2 n )) and the output requires to return them.
For matrices A ∈ R n×k and B ∈ R n×l the A ⊕ B ∈ R n× (k+l) denotes the concatenation of their columns.C a,b ∈ R n×(b−a) denotes a matrix constructed by concatenating columns c a , c a+1 , . . ., c b of the matrix C ∈ R n×m .

Consequences of Frobenius Normal Form
Let K be a commutative field.For any matrix A ∈ K n×n there exists an invertible U over K such that: and F is the Frobenius-canonical-form 2 of A. The diagonal block C i is called the companion matrix: Each companion matrix corresponds to the monic polynomial and is called the minimal polynomial of A. Each minimal polynomial has a property that C i (A) = 0. To guarantee that matrix has only one decomposition into Frobenius normal form we require that every polynomial must divide the next one, i.e., C i (x)|C i+1 (x).The final list of polynomials is called the invariant factors of matrix A [24].

Cyclic Subspaces
Frobenius decomposition can be used to get the desired power of a matrix (analogously to the diagonal decomposition): Moreover, we will use the property that the power of block diagonal matrix F is block diagonal: . 2 Sometimes called the rational-canonical form.
Now, we need a property of companion matrices that will enable us to power them efficiently.

Matching Distance Queries on Directed Unweighted Graphs
In this section, we will present a simple algorithm that matches the best known upper bounds of Yuster and Zwick [32] distance queries in directed unweighted graphs.

Answering Distance Queries by Using Frobenius Normal Form
We take the adjacency matrix A of a graph G (i.e., n×n matrix with a u,v = 1 when (u, v) ∈ G and 0 otherwise).The k-th power of the adjacency matrix of the graph G holds the number of walks, i.e., an a u,v element of A k is the count of distinct walks from u to v of length k in the graph.The shortest path between vertices u, v is the smallest k such that A k has nonzero element a u,v .For a brief moment, we will forget about graph theory interpretation and focus only on finding such k.We decompose matrix A into the Frobenius normal form.Storjohann [24] showed an algorithm that returns U and F deterministically in O(n ω ) time (note that matrix inverse can also be computed in O(n ω ) time).

Single Invariant Factor
To better explain the idea, for a start we will consider a simple case when a number of invariant factors of A is exactly 1.In that situation, the matrix F is a companion matrix C ∈ K n×n .

56:6
Improved Distance Queries and Cycle Counting by Frobenius Normal Form If the matrix U F has columns v 1 , . . ., v n and the matrix U F n+1 has columns v n+1 , . . ., v 2n , then the columns v k , . . ., v k+n−1 construct U F k (see Figure 2).It is because the matrix F has the cyclic property.
This step takes just 2 matrix multiplications, because we need to multiply U times F and F n+1 .The preprocessing phase takes only O(n ω ) time.Now, if a user wants to know the number of distinct walks from vertices u to v of length exactly k he takes: The u-th row of matrix This will give us the u, v element of matrix U F k U −1 = A k .To get the length of the shortest path (i.e., the minimal k that w u,v > 0), we will modify our matrices slightly to get the number of walks of length ≤ k.At the end, we will fit in Õ(n) query time (by using binary search) and O(n ω ) preprocessing time.
Basically, for a given k we need to get the u, v element of matrix

Now, to get
We can transform matrices U and F to matrix M in O(n 2 ) time during preprocessing.During query, we need to compute 2 dot products (u-th row of M k,k+n−1 times v-th column of U −1 and u-th row of M 1,n times v-th column of U −1 ) and subtract them.
We have an algorithm that for a given vertices u, v ∈ G and integer k ∈ {1, . . ., n} can answer: how many walks from u to v of length less or equal k are in the graph G in Õ(n) query and O(n ω ) preprocessing.
Because the result of the query is increasing in k we can use binary search.We can determine the first k for which the query will answer nonzero value in O(log n) tries.Hence, in Õ(n) we can find the length of the shortest path.This generalized query can easily return the number of walks of length exactly k, i.e., q(u, v, k) − q(u, v, k − 1).
For now, we only matched the result of Yuster and Zwick [32] for unweighted graphs with a single, invariant factor.In the next section, we will show how to generalize our technique for graphs with any number of invariant factors.Then, we will extend the Yuster and Zwick [32] algorithm.Namely, we will show that in O(n ω ) preprocessing time and Õ(n) query time we can get D numbers (where D is the graph diameter): tells how many walks of length exactly i are from vertex u to v.
With such an algorithm we can easily implement Yuster and Zwick [32] distance queries by linearly scanning (see Section 6).Thus, for the multiple invariant factors we will skip the description of algorithm that returns the number of walks of length smaller than k (the technique is the same).

Multiple Invariant Factors
Now, we will consider a case when k ≥ 1, i.e., matrix F has multiple invariant factors.First of all, we need to note that this generalization is not perfect and will allow only the walks of length up to D (the longest distance in a graph, i.e., diameter).
In the real world applications of our framework (detecting cycles, determining distance between nodes, etc.) we do not need walks longer than the longest possible distance in a graph.It is natural that the diameter is considered to be a bound of an output in graph problems [8, 9, 1, 2].

Relation of the Graph Diameter and Frobenius Normal Form
We begin with relating the graph diameter to the Frobenius normal form.It turns out that the graph diameter is bounded by the degree of a smallest invariant factor.Theorem 5 ([7]).Given a directed, unweighted graph G with a diameter D. Let µ denote the degree of the smallest invariant factor (i.e., the dimension of the smallest block in the Frobenius matrix F ) of an adjacency matrix of the graph G. Then D ≤ µ.
The bounds of this inequality are tight.There are graphs with diameter D = µ and graphs with µ = n and arbitrary small diameter [7].Our algorithms are able to return walks up to the length µ.We use the bound on D solely because it is easier to interpret diameter than the smallest degree of the invariant factor.

Generalization to Multiple Invariant Factors
Let k denote the number of blocks in the Frobenius matrix F and µ be the number of columns of the smallest block.To multiply matrix U by F we can start by multiplying strips of matrix U by appropriate blocks of F and concatenate them later (see Figure 3).
We start by splitting the matrix U into k strips with rows corresponding to the appropriate blocks of F (strip U i has as many columns as block F i ).Then we multiply U F and have k strips: with at least µ columns).Next, we multiply U F µ and we keep k strips: Our goal is to get a data structure such that if we need U F k , we can quickly choose appropriate columns and append them.
The matrix U i F i has l i columns: v 1 , . . ., v li .Because F i is a companion matrix, the U i F µ i has the cyclic property (Definition 3).And the matrix U i F µ i has columns: v µ , . . ., v µ+li .There are some duplicate columns in U i F i and U i F µ i , when µ < l i .Hence, we only need to keep columns v 1 , . . ., v µ+li for this strip.We do this for all strips (see Figure 4).
We are left with a matrix that has at most 2n columns (because l 1 +µ+l 2 +µ+. . .l k +µ = kµ + n i=1 l i = n + kµ ≤ 2n).To generate it we need to power F to µ and do multiplications U • F and U • F µ .This can be computed in O(n ω ) time.

56:8 Improved Distance Queries and Cycle Counting by Frobenius Normal Form
Combining strips into a single matrix.

Queries with Multiple Invariant Factors
When a query for the number of walks of length k from node u to v comes, we do: For each strip i take the u-th row of U i F i ⊕ U i F µ i concatenate them (see Figure 5) into vector ū, 2. Take v-th column of U −1 matrix and denote it v,

Return the dot product ū • v.
Because l 1 + l 2 + . . .+ l k = n the vector ū ∈ K n .Query needs O(n) time.Finally, this dot product is a u,v element of the matrix U F k U −1 , for a fixed k ≤ µ.Analogously to Section 5.2 one can extend this result to return the number of walks of length less or equal k.This matches (up to the polylogarithmic factor) the result of Yuster and Zwick [32].However in the next section we will show a more general result.

6
Almost Optimal Query In the previous section, we showed how to preprocess a directed, unweighted graph in O(n ω ) time in such a way that in O(n) query we can return a number of distinct walks of a length k from vertex u to v.However, in linear time O(n) we return only a single number.Our goal is to get far richer information in Õ(n) query time.
Theorem 6.Let G = (V, E) be a directed, unweighted graph with n vertices and a diameter D. There exists an algorithm that after some preprocessing can answer queries for any given u, v ∈ V : P. Sankowski and K. Węgrzycki 56:9 Query returns {w i | 1 ≤ i ≤ D}, where w i is the number of distinct walks from u to v of length exactly i, Preprocessing takes O(n ω ) and query takes Õ(n) field operations.

The algorithm is deterministic (but there are O(log W ) in derandomized version, see Section 3 for explication).
This algorithm is a significant improvement over Yuster and Zwick [32]: One can use Theorem 6 to find the distance between u, v by linearly scanning the array and returning the first k such that w k > 0, Theorem 6 can count cycles.In contrast the Yuster and Zwick [32] cannot, because the distance from u to u is always 0 (we will see that in the next section), Theorem 6 is almost optimal (when D = O(n) its output and time are O(n)).
On the other hand, Theorem 6 is only a functional improvement and it does not break the O(n ω ) of the Single Source Shortest Paths (SSSP) for dense, directed graphs.

Hankel Matrix
Now, we will focus on the proof of Theorem 6.First, we need to introduce the Hankel matrix and its properties.
Hankel matrix is defined by its first row and last column (2n − 1 numbers define n × n Hankel matrix).The numbers from the previous row are left-shifted by one and the new number is added at the end.Hankel matrices have some similarities with Topelitz and Circulant matrices.
The basic property we need is that the product of Hankel matrix and vector can be computed in O(n log n) time (see [16,25]) even though explicitly writing the Hankel matrix as n × n matrix takes O(n 2 ) time.The algorithm takes 2n − 1 parameters that define the Hankel matrix and n parameters that define the vector.The technique is based on the Fast Fourier Transformation [16,25].

Using Hankel Matrices to Return Number of Walks
The algorithm from Section 5.3.3concatenates the strips U i F i and builds a single vector.Subsequently, that vector is multiplied by a column of matrix U −1 .But we can also do it in a different order: first we multiply the strip by a section of matrix U −1 and sum the S TA C S 2 0 1 7

56:10
Improved Distance Queries and Cycle Counting by Frobenius Normal Form results at the end.Thus, we perform k (number of strips) dot products of smaller vectors (see Figure 6).
Consider the query for a number of walks of length exactly k.The strips in the matrix U −1 do not depend on k (vector (u 0 , . . ., u l )).However, the vector taken from U i F i (vectors (x i , . . ., x i+l )) will be left shifted if we want to compute the next one.
As you can see, the subsequent rows can be written as the Hankel matrix (we need to add some zeros and discard certain results to get square matrix, but it will not influence asymptotic complexity).By using the fast Hankel matrix-vector multiplication we can compute µ values for every strip i in time O(l i log l i ) (l i was defined as the length of i-th strip).At the end, we need to sum all results into a single array.The total complexity is O( We need to address some issues regarding field operations.As mentioned in the Section 3, the Õ notation hides the polylogarithmic factors.Here, we silently assume that the number of walks is bounded by W .It means that the complexity is multiplied by O(log W ) factor because of the cost of arithmetic operations.If the number of walks is exponential in n then the cost increases by a linear factor.However, at the beginning we could randomly select the prime number p with O(log n) bits and do arithmetic operations in the field Z p .In some applications we can use our algorithm to answer whether with high probability there exists a walk of a given length.The problem of large number of distinct walks is more of a theoretical issue than a practical one.

All Pairs All Walks (APAW)
Now we will show the application of Theorem 6.We begin with almost optimal algorithm to compute the number of all walks between all pairs of vertices.We are not aware of any other research concerning this problem.Definition 7 (All Pairs All Walks problem).Given a directed, unweighted graph G with a diameter D. The task is to return an array A, such that for every pair of vertices u, v ∈ G and every k ∈ {1, . . ., D} an element A[u, v, k] is the number of distinct walks from u to v of length k.
The only solution to this problem we are aware of needs O(Dn ω ) time.The naive approach: take the adjacency matrix A of graph G and save it in A[u, v, 1].Then, square it to get A 2 and save it in A[u, v, 2].Continue until you fill out complete table.In the worst case this algorithm needs D = O(n) matrix multiplications, thus its complexity is O(Dn ω ).At the first glance it is surprising that we can improve it to O(n 3 ) especially when ω > 2.
The Õ(n 3 ) algorithm works as follows.First, preprocess the algorithm from Theorem 6 which takes O(n ω ) time.Then, for every pair of vertices u, v ask a query.A single query takes Õ(n) time.Then, save it in the table A[u, v] (query gives D numbers w 1 , . . ., w D , such that w i is the number of walks of length i and save it A[u, v, i] := w i ).
Because there are O(n 2 ) pairs and for each pair we need Õ(n) time, the complexity of our solution is Õ(n 3 ).The algorithm is almost optimal because the output in the worst case may be O(n 3 ).

7
Counting and Determining the Lengths of Cycles Proof.We will use Theorem 6.We start by preprocessing the graph G in time O(n ω ).Theorem 6 allows us to ask for a number of walks from u to v and receive D numbers: w k u,v .So, we ask for the number of walks from vertex u to the same vertex u.This is exactly the number of non-simple cycles of a given length that contain vertex u.
Because we need to ask only n queries (it is the number of vertices in a graph) and each query takes Õ(n) time we have O(n ω + n 2 ) = O(n ω ) algorithm.

Solving ANSC Faster
To solve ANSC problem in O(n ω ) time and beat Yuster [30] O(n (ω+3)/2 ) algorithm we do the linear search on the output.For every vertex we search for the first nonzero element linearly.This with high probability is the length of the shortest cycle that contains it.Because the output contains O(n 2 ) numbers the complexity is equal to the preprocessing time O(n ω ).
Similarly, we can scan the output to compute the set S(c) that contains all vertices that lie on some cycle of length ≤ c.Then, by linear scan we can return the sets S( 1

Conclusion and Open Problems
We introduced the framework based on Frobenius normal form and used it to solve problems on directed, unweighted graphs in matrix multiplication time.The main open question is to use this framework to prove that APSP on such graphs can be solved in O(n ω ) or at least O(n 2.5 ).The promising way is to use the algorithms that determine operators of matrices of polynomials (e.g., determinant, solving linear system [19,15]).Additionally, algorithms for a black-box polynomial degree determination seem to be a promising path.Another interesting problem is to use this framework to obtain additive approximation for APSP.Currently, the best additive approximation of APSP is due to [21].However, none of known additive approximation of APSP works in O(n ω ) time.
Application in dynamic algorithm also seems to be a promising approach.Frandsen and Sankowski [13] showed an algorithm, that dynamically preserves Frobenius normal form in O(kn 2 ) time.Our algorithms use fast Hankel matrix-vector multiplication.The algorithm behind fast Hankel matrix-vector multiplication is based on Discrete Fourier Transform (DFT).Reif and Tate [20] presented an O( √ n) time per request algorithm for DFT.Can we use those approaches to obtain a faster dynamic algorithm?
Finally, it remains open how to apply the Frobenius normal form in the weighted directed graphs with small, integer weights {−M, . . ., M }.Cygan et al. [9] took degree M polynomials and used [19] to get radius and diameter in O(M n ω ) time.We suspect that similar technique can be applied to Frobenius normal form framework.

Figure 3
Figure 3 Multiplication of the block matrix.Example for 3 invariant factors.

Figure 5
Figure 5 Schema of obtaining vector ū for the cycles of length i.

Figure 6
Figure 6 Multiplication of strips by U −1 matrix.As you can see, matrix U −1 can be split into sections, that multiply only UiFi strips.

k
i=1 l i = n the algorithm needs O(n log n) field operations.It proves Theorem 6.

S TA C S 2 0 1 7 56: 12 Improved
Distance Queries and Cycle Counting by Frobenius Normal Form also considered following problem: given a graph and an integer t.Let S(k) denote the set of all vertices lying in a cycle of length ≤ k.Determine S(t).He considered directed graphs with weights in {−M, . . ., M } and showed O(M n ω t) algorithm Recently, Cygan et al. [9] improved his algorithm.They showed that for a fixed t ∈ [0, nM ] the set S(t) can be computed in O(M n ω ) randomized time.
We will use Theorem 6 to solve All-Nodes Shortest Cycles (ANSC) problem efficiently.
Theorem 8.There exists an algorithm that for a given unweighted digraph G with a diameter D:For every vertex u returns D numbers:c 1 u , c 2 u , . ..cD u , The c ku is the number of non-simple cycles of length exactly k, that contain vertex u, Algorithm works in O(n ω log W ) time (where W is the maximum c k u ).