Using basis dependence distance vectors in the modified Floyd–Warshall algorithm
 1.1k Downloads
 9 Citations
Abstract
In this paper, we present a modified Floyd–Warshall algorithm, where the most timeconsuming part—calculating transitive closure describing selfdependences for each loop statement—is computed applying basis dependence distance vectors derived from all vectors describing selfdependences. We demonstrate that the presented approach reduces the transitive closure calculation time for parameterized graphs representing all dependences in the loop in comparison with that yielded by means of techniques implemented in the Omega and ISL libraries. This increases the applicability scope of techniques based on transitive closure of dependence graphs and being aimed at building optimizing compilers. Experimental results for NASA Parallel Benchmarks are discussed.
Keywords
Basis dependence vectors Transitive closure Floyd–Warshall algorithm Arbitrarily nested loop Parallelizing compiler1 Introduction
Resolving many problems is based on calculating transitive closures of graphs Diestel (2010). In this paper, we deal with parameterized graphs whose number of vertices is represented with an expression including structure parameters. Such graphs can be represented by parameterized relations whose tuples represent vertices while constraints are responsible for defining edges Kelly et al. (1996). Transitive closure calculated for such relations can be used in optimizing compilers: to remove redundant synchronization Kelly et al. (1996), test the legality of iteration reordering transformations Kelly et al. (1996), apply iteration space slicing Beletska et al. (2011), form schedules for statement instances of program loops Bielecki et al. (2012); Hollermann et al. (1997); Deng et al. (1998). In general, calculating transitive closure of parameterized graphs is timeconsuming Kelly et al. (1996); Beletska et al. (2009); Verdoolaege et al. (2011). Sometimes the time of transitive closure calculation prevents applying techniques for extracting coarse and finegrained parallelism because this time is not acceptable in practice (several hours and even several days Beletska et al. (2011); Bielecki et al. (2012)). This is why improving transitive closure calculation algorithms aimed at reducing their time complexity is an actual task.

demonstration of how to calculate basis dependence distance vectors for parameterized program loops;

proposition of a way of calculating the transitive closure of a dependence relation describing all selfdependences among the instances of a given loop statement by means of basis dependence distance vectors;

suggestion to apply the transitive closure of a dependence relation describing all selfdependences among the instances of a given loop statement by means of basis dependence distance vectors in a modified Floyd–Warshall algorithm with the aim of reducing the calculation time of the transitive closure of a dependence graph representing all the dependences in a given program loop;

development of an open source software implementing presented solutions and permitting for producing the transitive closure of a dependence graph describing all the dependences for the input program loop by means of the modified Floyd–Warshall algorithm;

an evaluation of the effectiveness and efficiency of the presented algorithms and a comparison of them with those yielded by related work.
The rest of the paper is organized as follows. Section 2 introduces background. Section 3 presents an approach to calculate the transitive closure of a parameterized dependence graph. Section 4 describes related work. Section 5 presents results of an experimental study. Section 6 draws conclusions and briefly outlines future research.
2 Background
In this section, we briefly introduce basic definitions which are used throughout this paper.
The following concepts of linear algebra are used in the approach presented in this paper: vector, vector space, field, integral linear combination, linear independence. Details can be found in book Schrijver (1999).
Definition 1
(Integer Lattice) Let {\(a_{1},a_{2},...,a_{m}\)} be a set of linearly independent integer vectors. The set \(\Lambda =\{\lambda _{1}a_{1}+\lambda _{2}a_{2}+...+\lambda _{m}a_{m}\mid \lambda _{1},...,\lambda _{m}\in \mathbb {Z}\}\) is called an integer lattice generated by the basis {\(a_{1},a_{2},...,a_{m}\)}.
Definition 2
(Basis) A basis \(B\) of an integer lattice \(\Lambda \) over field \(\mathbb {Z}\) is a linearly independent subset of \(\Lambda \) that generates \(\Lambda \). Every finitedimensional vector space \(\Lambda \) has a basis Shoup (2005).
Definition 3
In this paper, we deal with the following definitions concerned program loops: iteration vector, loop domain (index set), parameterized loops, perfectlynested loops, details can be found in papers Griebl (2004).
Definition 4
(Structure Parameters) Structure parameters are integer symbolic constants, generally defining array size, iteration bounds, etc. Structure parameters may be defined once in the prologue of the program, and may not be modified elsewhere.
Definition 5
(Iteration Vector) For a given statement \(S\) in a loop, the iteration vector \(\overrightarrow{v}=(i_{1},...,i_{n})^{T}\) is the vector of the surrounding loop counters.
Definition 6
(Iteration Space) The iteration space of a given statement \(S\) in a given loop nest is a set of values taken by its iteration vector when executing the loop nest.
Definition 7
(Dependence) Two statement instances \(S_{1}(\overrightarrow{I})\) and \(S_{2}(\overrightarrow{J})\), where \(\overrightarrow{I}\) and \(\overrightarrow{J}\) are the iteration vectors, are dependent if both access the same memory location and if at least one access is a write.
Definition 8
(Dependence Distance Set, Dependence Distance Vector) We define a dependence distance set \(\Delta _{S,T}\) as a set of differences between all such vectors of the same size that stand for a pair of dependent instances of statement \(T\) and \(S\). We call each element of set \(\Delta _{S,T}\) a (dependence) distance vector and denote it as \(\delta _{S,T}\).
Definition 9
(Uniform Dependence, NonUniform Dependence) If each coordinate of vector \(\delta _{S,T}\) (see Definition 8) is constant, then we call a corespondent dependence uniform, otherwise it is nonuniform. Griebl (2004).
Definition 10
(Reduced Dependence Graph) A Reduced Dependence Graph (\(RDG\)) is the graph where a vertex stands for every statement \(S\) and an edge connects statements \(S\) and \(T\) whose instances are dependent. The number of edges between vertices \(S\) and \(T\) is equal to the number of dependence relations \(R_{S,T}\).
Definition 11
(Uniform Loop, QuasiUniform Loop) We say that a parameterized loop is uniform if it induces dependences represented by the finite number of uniform dependence distance vectors Griebl (2004). A parameterized loop is quasiuniform if all its dependence distance vectors can be represented by an integral linear combination of the finite number of linearly independent vectors with constant coordinates.
Definition 12
(Dependence Relation) A dependence relation is a tuple relation of the form \(\left\{ [input\_list] \rightarrow [output\_list]: constraints\right\} \), where \(input\_list\) and \(output\_list\) are the lists of variables used to describe input and output tuples and constraints is a Presburger formula describing the constraints imposed upon \(input\_list\) and \(output\_list\).
Definition 13
Definition 14
Definition 15
Definition 16
 if \(R^{+}\) is exact transitive closure, then:$$\begin{aligned} R^{+} = R \cup \left( R\circ R^{+}\right) , \end{aligned}$$
 if \(R^{+}\) is an over–approximation, then:$$\begin{aligned} R^{+} \nsubseteq \ R \cup \left( R\circ R^{+}\right) . \end{aligned}$$
3 Approach to computing transitive closure
The goal of the algorithm presented below is to calculate the transitive closure of a dependence relation describing all the dependences in the arbitrarynested loop.
3.1 Floyd–Warshall algorithm
The most timeconsuming part in Algorithm 1 is the expression \(D_{kj}\circ R_{kk}^{*}\circ D_{ik}\), where ‘\(\circ \)‘ denotes the composition operator applied to a pair of relations, \(D_{ik}\) describes all the dependences between instances of statement \(s_{i}\) and statement \(s_{k}\). This means that if there is a dependence from iteration \(i_{1}\) of statement \(s_{i}\) to iteration \(i_{2}\) of statement \(s_{k}\) and a chain of selfdependences from iteration \(i_{2}\) to iteration \(i_{3}\), \(R_{kk}^{*}\), and finally a dependence from iteration \(i_{3}\) of statement \(s_{k}\) to iteration \(i_{4}\) of statement \(s_{j}\) (where \(D_{kj}\) describes all dependences between instances of statement \(s_{k}\) and statement \(s_{j}\)), then there is a transitive dependence from iteration \(i_{1}\) to iteration \(i_{4}\). It should be clear that the objective of this technique is to update all the dependences through statements 1,2,...,n in an iteration of each kloop. In Sect. 3.3, we suggest calculating \(R_{kk}^{*}\) using a finite integral linear combination of basis dependence distance vectors Bielecki et al. (2013).
3.2 Replacing the parameterized vector with an integral linear combination of constant vectors
To find constant vectors whose integral linear combination represents the parameterized vector, we can apply the following theorem.
Theorem 1
Proof
3.3 Algorithm for computing transitive closure
The idea of the algorithm presented in this section is the following. Given a set \(\Delta _{S,T}\) of \(m\) dependence distance vectors in the \(n\)dimensional integer space derived from a union of dependence relations \(R_{kk}\), \(k=1,2,..., q\), where \(q\) is the number of loop statements (it describes a chain of selfdependences of statement \(s_{k}\) in the loop), we first replace all parameterized vectors with constant vectors using Theorem 1 from Sect. 3.2.
As a result, we get, \(k\), \(k\ge m\), dependence distance vectors with constant coordinates. This allows us to get rid of parameterized vectors and to form an integer matrix \(A\), \(A \in \mathbb {Z}^{n \times k}\), by inserting dependence distance vectors with constant coordinates into columns of \(A\) that generate integer lattice \(\Lambda \).
To decrease the complexity of further computations, redundant dependence distance vectors are eliminated from matrix \(A\) by finding a subset of, \(l\), \(l \le k\), linearly independent columns of \(A\). This subset of dependence distance vectors forms the basis \(B \in \mathbb {Z}^{n \times l}\) of \(A\) and generates the same integer lattice \(\Lambda \) as \(A\) does Schrijver (1999). Every element of integer lattice \(\Lambda \) can be expressed uniquely as a finite integral linear combination of the basis dependence distance vectors belonging to \(B\).
After \(B\) is completed, we can work out relations \(R_{kk}^{*}\), \(k=1,2,..,q\), representing the exact transitive closure \(R_{kk}\) or its overapproximation. For each vertex \(x\) in the data dependence graph (where \(x\) is the source of a dependence, \(x \in dom\;R_{kk}\), we can identify all vertices \(y\) (the destination(s) of a dependence(s), \(y \in ran\;R_{kk}\) that are connected with \(x\) by a path of length equal or more than \(1\), where \(y\) is calculated as \(x\) plus an integral linear combination of the basis dependence distance vectors \(B\), i.e. \(y = x + B \times \lambda \), \(\lambda \in \mathbb {Z}^{l}\). The part \(B \times \lambda \) of the formula represents all possible paths in the dependence graph, represented by relation \(R_{kk}\), connecting \(x\) and \(y\). Moreover, we have to preserve the lexicographic order for \(y\) and \(x\), i.e. \(yx\succ 0\). Below, we present the algorithm in a formal way.
 Input:

Dependence distance set \(\Delta ^{n \times m}= \delta _{1},\delta _{2}, \ldots , \delta _{m}\), where \(m\) is the number of \(n\)dimensional dependence distance vectors.
 Output:

Exact transitive closure of relation \(R_{kk}\) or its overapproximation.
 1.
Replace each parameterized dependence distance vector in \(\Delta ^{n \times m}\) with an integral linear combination of vectors with constant coordinates. For this purpose apply Theorem 1 presented in Sect. 3.2.
 2.
Using all constant dependence vectors, form matrix \(A \in \mathbb {Z}^{n \times k}\), \(k \ge m\).
 3.
Extract a finite subset of, \(l\), \(l\le k\), linearly independent columns from matrix \(A \in \mathbb {Z}^{n \times k}\) over field \(\mathbb {Z}^{n}\) and form matrix \(B^{n \times l}\), representing the basis of the dependence distance vectors set, where linearly independent vectors are represented with columns of matrix \(B^{n \times l}\). For this purpose apply the Gaussian elimination algorithm Shoup (2005); Rotman (2003).
 4.Calculate relation \(R_{kk}^{*}\) representing the exact transitive closure of a dependence relation, describing all the dependences in the input loop, or its overapproximation, as follows:where : \(R^{*}_{kk}\) describes a chain of self dependences of loop statement \(s_{k}\), \(B^{n \times l} \times \lambda \) represents an integral linear combination of the basis dependence distance vectors \(\delta _{i}\) (the columns of \(B^{n \times l}\), \(1 \le i \le l)\), \(yx\succ 0\) imposes the lexicographically forward constraints on the tuples of \(R_{kk}^{*}\), \(I\) is the identity relation.$$\begin{aligned} R_{kk}^{*} = \left\{ \begin{array}{l} \left[ x \right] \rightarrow \left[ y \right] \mid \exists \lambda \ s.t.\;y \!=\! x \!+\! B^{n \times l} \times \lambda \;\wedge \; yx\succ 0, \;\lambda \in \mathbb {Z}^{l} \;\wedge \\ \qquad \qquad \qquad \quad \quad \;\, \wedge \; y \in ran\;\; R_{kk}\;\wedge \; x \in dom\;\; R_{kk} \end{array} \right\} \cup I, \nonumber \\ \end{aligned}$$(7)

reproduces all dependence distance vectors exposed for the loop,

describes all existing (true) paths between any pair of \(x\) and \(y\) as an integral linear combination of all dependence distance vectors exposed for the loop,

can describe not existing (false) paths in the dependence graph represented by relation \(R^{*}_{kk}\).
Summing up, we conclude that relation \(R_{kk}^{*}\) describes all existing paths in the dependence graph represented by relation \(R_{kk}\) and can describe not existing paths, i.e., \((R^{*}_{kk})_{exact} \subseteq R^{*}_{kk}\); when relation \(R^{*}_{kk}\) does not represent not existing paths, \(R^{*}_{kk} = (R^{*}_{kk})_{exact}\).
3.4 Time complexity
 1.
As we have proved in Sect. 3.2, the task of replacing parameterized vectors with a linear combination of vectors with constant coordinates can be done in \(O\left( d^{2} \right) \) operations.
 2.
The task of forming a dependence matrix using all \(k\) constant dependence vectors in \(\mathbb {Z}^{n}\) requires \(O\left( kn \right) \) operations (memory accesses).
 3.
The task of identifying a set of linearly independent columns of matrix \(A\), \(A\in \mathbb {Z}^{n\times k}\) with constant coordinates to find the basis can be done in polynomial time by the Gaussian elimination algorithm. According to Cohen and Megiddo (1991), this computation can be done in \(O\left( ldk \right) \) arithmetic operations.
3.5 Illustrating example
Algorithm 1 calls Algorithm 2 to calculate relation \(R_{kk}^{*}\) for each iteration \(k\) of the outermost loop.
For \(k=1\), the dependence distance set \(\Delta _{1,1}=\emptyset \) because \(R_{11}=\emptyset \), so we get \(R^{*}_{11}=\emptyset \;\cup \;I=\{[k,i]\rightarrow [k,i]\}\).
 1.Replace all parameterized dependence distance vectors. The first parameterized vector \(\left( \begin{array}{c} 2 \\ N2 \\ N3 \\ \end{array} \right) \) is replaced with the linear combination of the vector \(\left( \begin{array}{c} 2\\ 0\\ 0\\ \end{array}\right) \) and the unit normal vectors \(\left( \begin{array}{c} 0\\ 1\\ 0\\ \end{array}\right) ,\left( \begin{array}{c} 0\\ 0\\ 1\\ \end{array}\right) \) as follows:The second parameterized vector \(\left( \begin{array}{c} 0 \\ 0 \\ N3 \\ \end{array} \right) \) is replaced with the linear combination of the vector \(\left( \begin{array}{c}0\\ 0\\ 0\\ \end{array}\right) \) and the unit normal vector \(\left( \begin{array}{c}0\\ 0\\ 1\\ \end{array}\right) \) as below:$$\begin{aligned} \left( \begin{array}{c} 2 \\ N2 \\ N3 \\ \end{array} \right) = \left( \begin{array}{c} 2 \\ 0 \\ 0 \\ \end{array} \right) + N2 \times \left( \begin{array}{c} 0 \\ 1 \\ 0 \\ \end{array} \right) + N3 \times \left( \begin{array}{c} 0 \\ 0 \\ 1 \\ \end{array} \right) . \end{aligned}$$The third parameterized vector \(\left( \begin{array}{c} 0 \\ N2 \\ N3 \\ \end{array} \right) \) is replaced with the linear combination of the vector \(\left( \begin{array}{c}0 \\ 0\\ 0\\ \end{array}\right) \) and the unit normal vectors \(\left( \begin{array}{c}0 \\ 1\\ 0\\ \end{array}\right) ,\left( \begin{array}{c} 0\\ 0\\ 1\\ \end{array}\right) \) as follows:$$\begin{aligned} \left( \begin{array}{c} 0 \\ 0 \\ N3 \\ \end{array} \right) = \left( \begin{array}{c} 0 \\ 0 \\ 0 \\ \end{array} \right) + N3 \times \left( \begin{array}{c} 0 \\ 0 \\ 1 \\ \end{array} \right) . \end{aligned}$$The last parameterized vector \(\left( \begin{array}{c} 4 \\ N2 \\ N3 \\ \end{array} \right) \) is replaced with the linear combination of the vector \(\left( \begin{array}{c}4\\ 0\\ 0\\ \end{array}\right) \) and the unit normal vectors \(\left( \begin{array}{c}0\\ 1\\ 0\\ \end{array}\right) ,\left( \begin{array}{c} 0\\ 0\\ 1\\ \end{array}\right) \) as below:$$\begin{aligned} \left( \begin{array}{c} 0 \\ N2 \\ N3 \\ \end{array} \right) = \left( \begin{array}{c} 0 \\ 0 \\ 0 \\ \end{array} \right) + N2 \times \left( \begin{array}{c} 0 \\ 1 \\ 0 \\ \end{array} \right) + N3 \times \left( \begin{array}{c} 0 \\ 0 \\ 1 \\ \end{array} \right) . \end{aligned}$$The resulting dependence distance set \(\Delta _{2,2}\) contains the vectors with constant coordinates only:$$\begin{aligned} \left( \begin{array}{c} 4 \\ N2 \\ N3 \\ \end{array} \right) = \left( \begin{array}{c} 4 \\ 0 \\ 0 \\ \end{array} \right) + N2 \times \left( \begin{array}{c} 0 \\ 1 \\ 0 \\ \end{array} \right) + N3 \times \left( \begin{array}{c} 0 \\ 0 \\ 1 \\ \end{array} \right) . \end{aligned}$$$$\begin{aligned} \Delta _{2,2}=\left\{ \left( \begin{array}{c} 2 \\ 0 \\ 0 \\ \end{array} \right) \left( \begin{array}{c} 0 \\ 1 \\ 0 \\ \end{array} \right) \left( \begin{array}{c} 0 \\ 0 \\ 1 \\ \end{array} \right) \left( \begin{array}{c} 0 \\ 0 \\ 0 \\ \end{array} \right) \left( \begin{array}{c} 4 \\ 0 \\ 0 \\ \end{array} \right) \right\} . \end{aligned}$$
 2Form a dependence matrix. The matrix \(A\), where all the constant dependence vectors from set \(\Delta _{2,2}\) are placed in columns, is as follows:$$\begin{aligned} A=\left[ \begin{array}{ccccc} 2 &{} 0 &{} 0 &{} 0 &{} 4\\ 0 &{} 1 &{} 0 &{} 0 &{} 0\\ 0 &{} 1 &{} 1 &{} 0 &{} 0\\ \end{array} \right] . \end{aligned}$$
 3.Find the basis of the dependence distance set. A set of linearly independent columns of matrix \(A \in \mathbb {Z}^{n \times k}\) over field \(\mathbb {Z}^{n}\), that can generate every vector in \(A\), holds the following matrix \(B\):$$\begin{aligned} B=\left[ \begin{array}{ccc} 2 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \\ \end{array} \right] . \end{aligned}$$
 4.Calculate the exact transitive closure of a dependence relation describing all the dependences in an input loop or its overapproximation, \(R^{*}_{22}\). Form relation \(R^{*}_{22}\) as follows:$$\begin{aligned} \begin{array}{ll} R^{*}_{22} :=&{} \left\{ \begin{array}{l} \left[ i,j,k \right] \rightarrow \left[ i',j',k' \right] \mid \exists \lambda \, s.t. \left( \begin{array}{c} i' \\ j' \\ k' \\ \end{array} \right) = \left( \begin{array}{c} i\\ j\\ k\\ \end{array} \right) + \left[ \begin{array}{ccc} 2 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \\ \end{array}\right] \times \lambda \ \wedge \\ \qquad \qquad \qquad \qquad \quad \; \wedge \; \left( \begin{array}{c} i' \\ j' \\ k' \\ \end{array} \right)  \left( \begin{array}{c} i\\ j\\ k\\ \end{array} \right) \succeq 0, \;\lambda \in \mathbb {Z}^{3}\ \wedge \\ \qquad \qquad \qquad \qquad \quad \;\ \wedge \; \left( \begin{array}{c} i' \\ j' \\ k' \\ \end{array} \right) \in ran\; R_{22} \, \wedge \left( \begin{array}{c} i\\ j\\ k\\ \end{array} \right) \in dom\; R_{22} \end{array} \right\} = \end{array}\\ \begin{array}{ll} \quad \;\;\;\ &{} \begin{array}{l} \left\{ \begin{array}{ll} [k,i,j]\rightarrow [k',i',j'] : &{} \exists \,\alpha : 2\alpha =k + k'\; \wedge 1 \le k \le k'2\; \wedge 1 \le i \le N2 \\ &{} 1 \le j,\,j' \le N3 \wedge 1 \le i' \le N2\;\wedge k'\le N1 \end{array} \right\} \cup \\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \\ \left\{ \begin{array}{ll} [k,i,j]\rightarrow [k,i,j'] :&{} 1 \le j < j' \le N3 \wedge 1 \le k \le N1 \wedge 1 \le i \le N2 \end{array} \right\} \cup \\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \\ \left\{ \begin{array}{ll} [k,i,j]\rightarrow [k,i',j'] :&{} 1 \le i < i'\le N2\;\wedge 1 \le k \le N1\wedge 1 \le j \le N3\;\wedge \\ &{} 1 \le j' \le N3\quad \\ \end{array} \right\} \cup \\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \\ \left\{ \begin{array}{ll} [k,i,j]\rightarrow [k,i,j] \end{array} \right\} \end{array} \end{array} \end{aligned}$$
4 Related work
Numerous algorithms for calculating the transitive closure of affine integer tuple relations have been proposed Kelly et al. (1996); Beletska et al. (2009); Verdoolaege et al. (2011); Ancourt et al. (2010); Boigelot (1998); Bozga et al. (2009); Eve and KurkiSuonio (1977). However, in most of them authors focus on relations whose domain and range are nonparametric polyhedra Ancourt et al. (2010); Bozga et al. (2009); Eve and KurkiSuonio (1977). The limitation of many known algorithms is that they require that the arity of input and output tuples (the number of tuple elements) of relations has to be the same Beletska et al. (2009). This is why we limit related work only to techniques dealing with parameterized relations whose tuple arities are different in general and relations can describe dependences available in program loops.
On a different line of work, Bozga et al. Bozga et al. (2009) have studied the computation of transitive closure for the analysis of counter automata (register machines) and they have implemented their method in the tool called FLATA Bozga et al. (2009). In this context, relation \(R(x,x')\) is a relation that can be written as the finite number of conjunctions of terms of the form \(\pm x_{i}\pm x_{j}\leqslant a_{i,j}\), \(\pm x'_{i}\pm x_{j}\leqslant b_{i,j}\), \(\pm x_{i}\pm x'_{j}\leqslant c_{i,j}\), \(\pm x'_{i}\pm x'_{j}\leqslant d_{i,j}\), \(\pm 2x_{i}\leqslant e_{i,j}\) or \(2x'_{i}\leqslant f_{i,j}\), where \(x\) and \(y\) describe counter values, either at the current step, or at the next step, \(a_{i,j},b_{i,j},c_{i,j},d_{i,j},e_{i,j},f_{i,j}\in \mathbb {Z}\) are integer constants and \(1\leqslant i,j \leqslant n\), \(i\ne j\). As we can see, this class of relations does not involve parameters, existentially quantified variables or unions, i.e., it cannot represent dependences in program loops. This is why we do not compare this technique with ours.
To our best knowledge, techniques for computing the transitive closure of parameterized affine integer tuple relations with different input and output arities of tuples were the subject of the investigation of a few papers only Kelly et al. (1996); Verdoolaege et al. (2011); Verdoolaege (2012). Kelly et al. Kelly et al. (1996) proposed a modified Floyd–Warshall algorithm but they have not implemented it in the Omega library (http://www.cs.umd.edu/projects/omega/). Fourteen years later Verdoolaege has improved and implemented his version of the Floyd–Warshall algorithm in the ISL library (http://www.kotnet.org/skimo/isl/), but that algorithm and implementation are not the same as ours.
While the algorithms implemented by Verdoolaege Verdoolaege et al. (2011); Verdoolaege (2012) in the ISL library (http://www.kotnet.org/skimo/isl/) are designed to compute overapproximations, Kelly et all. Kelly et al. (1996) in the Omega library (http://www.cs.umd.edu/projects/omega/) propose a heuristic algorithm to compute an underapproximation that does not guarantee calculating exact transitive closure.
5 Experimental results
The goals of experiments were to evaluate the effectiveness and time complexity of the proposed approach for calculating relation \(R^{*}_{kk}\) and using it in the modified Floyd–Warshall algorithm for loops provided by the wellknown NAS Parallel Benchmark (NPB) Suite from NASA (http://www.nas.nasa.gov) and compare received results with the effectiveness and time complexity of techniques implemented in the ISL (http://www.kotnet.org/skimo/isl/) and Omega (http://www.cs.umd.edu/projects/omega/) tools. We have implemented the presented algorithms as an ANSIC++ software module. The source code of the module was compiled using the gcc compiler v4.3.0 and can be download from: http://www.sfs.zut.edu.pl/files/mfwomega.tar.gz. Experiments were conducted using an Intel Core2Duo T7300@2.00GHz machine with the Fedora Linux v12 32bit operating system.
The results of the experiments on the proposed approach to computing transitive closure
Source loop name  Number of relations  Proposed algorithm  ISL\(^1\)  Omega\(^2\)  

ex  \(t\,[s]\)  ex  \(\Delta t\,[s]\)  ex  \(\Delta t\,[s]\)  
Perfectlynested loops  
BT_error.f2p_5  31  1  0.2451  1  1.1081  1  2.4072 
BT_initialize.f2p_8  3  1  0.0006  1  0.0017  1  0.0040 
BT_initialize.f2p_9  1  1  0.0006  1  0.0009  1  0.0015 
BT_rhs.f2p_1  46  1  2.1681  1  3.5402  1  4.1763 
BT_rhs.f2p_5  128  1  1.3442  1  16.1829  1  4.1989 
CG_cg.f2p_3  1  1  0.0007  1  0.0018  1  0.0014 
CG_cg.f2p_4  10  1  0.0035  1  0.0061  1  0.0225 
CG_cg.f2p_6  1  1  0.0004  1  0.0009  1  0.0019 
CG_cg.f2p_8  1  1  0.0006  1  0.0013  1  0.0013 
FT_auxfnct.f2p_1  1  1  0.0006  1  0.0006  1  0.0029 
FT_auxfnct.f2p_2  1  1  0.0004  1  0.0005  1  0.0020 
LU_HP_l2norm.f2p_2  9  1  0.0451  1  0.0829  1  1.1249 
LU_HP_jacld.f2p_1  2634  1  74.0127  1  75.1268  1  120.5502 
LU_HP_jacu.f2p_1  2634  1  74.0840  1  75.0540  1  108.8616 
LU_HP_pintgr.f2p_11  4  1  0.0111  1  0.0170  1  0.0516 
LU_HP_pintgr.f2p_2  109  1  0.1409  1  0.4151  1  0.3499 
LU_HP_pintgr.f2p_3  6  1  0.0103  1  0.0167  1  0.0669 
LU_HP_pintgr.f2p_7  6  1  0.0107  1  0.0195  1  0.0504 
LU_jacld.f2p_1  2594  1  397.1930  1  538.9371  1  446.8944 
LU_jacu.f2p_1  2594  1  397.0249  1  543.8912  1  421.9345 
LU_l2norm.f2p_2  9  1  0.0445  1  0.0560  1  0.4220 
LU_pintgr.f2p_11  6  1  0.0117  1  0.0165  1  0.0487 
LU_pintgr.f2p_2  109  1  0.1405  1  0.4151  1  0.3238 
LU_pintgr.f2p_3  6  1  0.0114  1  0.0233  1  0.0667 
LU_pintgr.f2p_7  6  1  0.0118  1  0.0166  1  0.0602 
MG_mg.f2p_1  1  1  0.0008  1  0.0058  1  0.0019 
MG_mg.f2p_11  1  1  0.0006  1  0.0009  1  0.0027 
MG_mg.f2p_12  1  1  0.0005  1  0.0006  1  0.0014 
MG_mg.f2p_13  1  1  0.0001  1  0.0001  1  0.0019 
MG_mg.f2p_4  1  1  0.0001  1  0.0001  1  0.0009 
SP_error.f2p_5  31  1  0.1996  1  0.9823  1  2.4040 
SP_initialize.f2p_8  3  1  0.0001  1  0.0002  1  0.0004 
SP_ninvr.f2p_1  103  1  3.5246  1  26.9442  1  8.5589 
SP_pinvr.f2p_1  103  1  3.5322  1  27.0151  1  8.6393 
SP_rhs.f2p_1  64  1  4.8383  1  51.3303  1  8.2041 
SP_rhs.f2p_5  127  1  1.3437  1  16.7552  1  4.2837 
SP_txinvr.f2p_1  271  1  64.3626  1  328.6192  1  75.9909 
SP_tzetar.f2p_1  288  1  51.7971  1  269.7633  1  63.2336 
UA_adapt.f2p_2  8  1  0.0516  1  0.0760  1  0.2282 
UA_diffuse.f2p_1  5  1  0.0787  1  0.1692  1  94.0265 
UA_diffuse.f2p_2  3  1  0.0010  1  0.0021  1  0.0029 
UA_mason.f2p_18  1  1  0.0011  1  0.0013  1  0.0057 
UA_precond.f2p_5  30  0  0.5973  0  3.0276  0  9.8670 
UA_setup.f2p_1  1  1  0.0002  1  0.0003  1  0.0008 
UA_setup.f2p_16  3  1  0.0014  1  0.0029  1  0.0111 
UA_setup.f2p_6  4  1  0.0039  1  0.0052  1  0.1116 
UA_transfer.f2p_1  1  1  0.0003  1  0.0006  1  0.0009 
UA_transfer.f2p_10  1  1  0.0012  1  0.0015  1  0.0019 
UA_transfer.f2p_13  1  1  0.0049  1  0.0013  1  0.0008 
UA_transfer.f2p_15  1  1  0.0006  1  0.0016  1  0.0009 
UA_transfer.f2p_18  1  1  0.0004  1  0.0014  1  0.0009 
UA_transfer.f2p_2  1  1  0.0032  1  0.0065  1  0.0009 
UA_transfer.f2p_3  1  1  0.0012  1  0.0012  1  0.0013 
UA_transfer.f2p_5  1  1  0.0031  1  0.0046  1  0.0065 
UA_transfer.f2p_6  1  1  0.0032  1  0.0037  1  0.0057 
UA_transfer.f2p_7  1  1  0.0024  1  0.0029  1  0.0081 
UA_transfer.f2p_8  1  1  0.0017  1  0.0035  1  0.0058 
UA_transfer.f2p_9  1  1  0.0040  1  0.0015  1  0.0099 
Imperfectlynested loops  
BT_error.f2p_2  107  1  2.9712  1  8.8938  1  9.1521 
BT_error.f2p_3  6  1  0.0038  1  0.0069  1  0.0071 
BT_error.f2p_6.t  6  1  0.0036  1  0.0082  1  0.0069 
BT_exact_rhs.f2p_2  1553  0  32.2311  0  61.3898  0  73.3091 
BT_exact_rhs.f2p_3  1553  0  31.9385  0  68.5866  0  74.3960 
BT_exact_rhs.f2p_4  1553  0  32.1856  0  61.1335  0  73.9969 
BT_initialize.f2p_2  42  1  0.2836  1  0.3948  1  3.0076 
BT_initialize.f2p_3  42  1  0.2888  1  0.3964  1  2.9983 
BT_initialize.f2p_4  42  1  0.2882  1  0.3934  1  3.0522 
BT_initialize.f2p_5  42  1  0.3139  1  0.4283  1  3.0035 
BT_initialize.f2p_6  42  1  0.3181  1  0.3944  1  3.0082 
BT_initialize.f2p_7  42  1  0.2986  1  0.3966  1  3.0189 
BT_rhs.f2p_3  702  0  26.1909  0  268.7525  0  38.5441 
BT_rhs.f2p_4  510  0  16.4426  0  236.8264  0  26.1494 
CG_cg.f2p_7  2  1  0.0017  1  0.0022  1  0.0317 
LU_blts.f2p_1  4885  1  3632.8071  1  4267.0205  1  5078.6317 
LU_buts.f2p_1  5640  1  4010.8654  1  5673.0981  1  5612.8839 
LU_erhs.f2p_2  66  1  0.1669  1  0.5987  1  5.9636 
LU_erhs.f2p_3  640  0  72.3339  0  164.4464  0  107.5848 
LU_erhs.f2p_4  640  0  74.6972  0  192.2292  0  104.2774 
LU_erhs.f2p_5  640  0  32.5497  0  237.4519  0  58.9116 
LU_HP_blts.f2p_1  3232  0  216.5695  0  216.8695  0  218.7895 
LU_HP_buts.f2p_1  3593  0  250.4280  0  447.2031  0  267.1930 
LU_HP_erhs.f2p_2  66  1  0.1640  1  0.8398  1  6.4393 
LU_HP_erhs.f2p_3  640  0  72.5601  0  263.6080  0  115.7859 
LU_HP_erhs.f2p_4  640  0  74.5099  0  262.0617  0  116.0602 
LU_HP_erhs.f2p_5  640  0  32.9287  0  236.9649  0  57.8919 
LU_HP_rhs.f2p_1  17  1  0.2142  1  1.5149  1  1.2455 
LU_HP_rhs.f2p_2  640  0  72.5522  0  387.5539  0  115.3880 
LU_HP_rhs.f2p_3  640  0  74.3030  0  262.0265  0  115.4032 
LU_HP_rhs.f2p_4  640  0  32.4699  0  237.7602  0  57.5956 
LU_rhs.f2p_1  17  1  0.2175  1  1.5029  1  1.2170 
LU_rhs.f2p_2  640  0  71.9027  0  279.4004  0  115.5498 
LU_rhs.f2p_3  640  0  73.6644  0  277.5648  0  114.4854 
LU_rhs.f2p_4  1412  0  199.7893  0  968.4744  0  354.9285 
MG_mg.f2p_10  18  1  0.0041  1  0.0043  1  0.0047 
MG_mg.f2p_3  3  1  0.0001  1  0.0002  1  0.0001 
MG_mg.f2p_5  24  0  0.6285  0  0.7923  0  2.1224 
MG_mg.f2p_6  29  0  0.9173  0  0.9739  0  2.1649 
MG_mg.f2p_7  510  1  2.0639  1  17.9808  1  5.5680 
MG_mg.f2p_8  55  0  2.2999  0  2.3069  0  7.1395 
MG_mg.f2p_9  18  1  0.0036  1  0.0043  1  0.0047 
SP_error.f2p_2  107  1  2.4962  1  9.2583  1  9.1577 
SP_error.f2p_3  6  1  0.0039  1  0.0043  1  0.0081 
SP_error.f2p_6  6  1  0.0013  1  0.0014  1  0.0073 
SP_exact_rhs.f2p_2  1553  0  32.0930  0  97.8048  0  81.8412 
SP_exact_rhs.f2p_3  1553  0  32.1471  0  106.6423  0  81.0354 
SP_exact_rhs.f2p_4  1553  0  32.2977  0  102.4652  0  81.5785 
SP_initialize.f2p_2  24  1  0.2242  1  0.5455  1  3.0368 
SP_initialize.f2p_3  24  1  0.2234  1  0.3989  1  3.1722 
SP_initialize.f2p_4  24  1  0.2214  1  0.3971  1  3.0657 
SP_initialize.f2p_5  24  1  0.2239  1  0.3943  1  3.0336 
SP_initialize.f2p_6  24  1  0.2216  1  0.4103  1  3.0342 
SP_initialize.f2p_7  24  1  0.2227  1  0.3936  1  3.0376 
SP_rhs.f2p_3  699  1  10.7808  1  231.7330  1  20.0821 
SP_rhs.f2p_4  507  1  14.1710  1  156.5537  1  23.3080 
UA_adapt.f2p_1  10  1  0.0469  1  0.0640  1  0.0930 
UA_adapt.f2p_10  14  1  0.0136  1  0.0164  1  0.0264 
UA_adapt.f2p_11  11  1  0.0134  1  0.0162  1  0.0306 
UA_adapt.f2p_9  14  1  0.0058  1  0.0163  1  0.0256 
UA_diffuse.f2p_3  1  1  0.0004  1  0.0005  1  0.0009 
UA_diffuse.f2p_4  1  1  0.0015  1  0.0042  1  0.0147 
UA_diffuse.f2p_5  1  1  0.0013  1  0.0039  1  0.0018 
UA_precond.f2p_3  1  1  0.0009  1  0.0016  1  0.0017 
UA_precond.f2p_4  1  1  0.0003  1  0.0003  1  0.0005 
UA_setup.f2p_14  31  1  0.2973  1  0.9474  1  0.3562 
UA_setup.f2p_15  15  1  0.2610  1  0.3649  1  0.2836 
UA_transfer.f2p_11  6  1  0.0048  1  0.0049  1  0.0260 
UA_transfer.f2p_12  7  1  0.0146  1  0.0044  1  0.0249 
UA_transfer.f2p_14  8  1  0.0240  1  0.0373  1  0.0780 
UA_transfer.f2p_16  4  1  0.0018  1  0.0033  1  0.0150 
UA_transfer.f2p_17  17  1  0.0238  1  0.0354  1  0.0867 
UA_transfer.f2p_19  4  1  0.0030  1  0.0033  1  0.0122 
UA_transfer.f2p_4  3  1  0.0047  1  0.0018  1  0.0132 
UA_utils.f2p_12  20  1  0.1816  1  0.1635  1  0.7387 
Analyzing the results presented in Table 1, we can derive the following conclusions. All techniques under experiments are able to calculate transitive closure for all NBP loops exposing dependences. The exactness of the presented approach is the same as that of techniques implemented in Omega and ISL. i.e., all techniques under experiments produce exact transitive closure for the same loops. Calculating relation \(R^{*}_{kk}\) by means of Algorithm 2 is less timeconsuming in comparison with techniques implemented in Omega and ISL that reduces the time of calculating the transitive closure of a relation describing all the dependences in the loop by means of the Floyd–Warshall’s algorithm. For all loops, we obtained the shortest time of producing transitive closure.
The explanation is that each relation \(R_{kk}^{*}\) that we compose in Algorithm 1 (line 10) consists of two relations, \(R_{kk}^{+}\cup I\). If there are \(m\) disjuncts in the input relation, \(R_{kk}\), then the direct application of the composition operation just like in formula (11) may therefore result in a relation with \(2^{m}\) disjuncts that is computationally expensive. In general, applying formula (7) results in the number of disjuncts that is much less than \(2^m\). This permits us to conclude that the presented approach is faster than other wellknown approaches.
6 Conclusion
In this paper, we presented a modified Floyd–Warshall algorithm, where the most time consuming part (transitive closure describing selfdependences in the program loop) is calculated by means of basis dependence distance vectors. We demonstrated how to calculate basis dependence distance vectors for parameterized program loops and how to apply them to calculate the transitive closure of a dependence relation describing all selfdependences among the instances of a given loop statement by means of basis dependence distance vectors.
This solution results in reducing the time of the transitive closure calculation of parameterized graphs representing dependences in program loops. Reducing this time is due to using a finite integral linear combination of basis dependence distance vectors to calculate the \(R_{kk}^{*}\) term in a modified Floyd–Warshall algorithm. Reducing the time of the transitive closure calculation was proved by means of numerous experiments with NPB benchmarks. The presented approach can be used for resolving many optimizing compilers problems: redundant synchronization removal (Presburger 1927), testing the legality of iteration reordering transformations (Presburger 1927), iteration space slicing (Beletska et al. 2011), forming schedules for statement instances of program loops (Bielecki et al. 2012). In our future work we plan to study the application of the presented approach for extracting both coarse and finegrained parallelism for different popular benchmarks.
References
 Ancourt C, Coelho F, Irigoin F (2010) A modular static analysis approach to affine loop invariants detection. Electron Notes Theor Comput Sci 267:3–16CrossRefGoogle Scholar
 Beletska A, Barthou D, Bielecki W, Cohen A (2009) Computing the transitive closure of a union of affine integer tuple relations. In: COCOA09. Springer, Berlin, pp. 98–109Google Scholar
 Beletska A, Bielecki W, Cohen A, Palkowski M, Siedlecki K (2011) Coarsegrained loop parallelization: iteration space slicing vs affine transformations. Parallel Comput 37(8):479–497CrossRefGoogle Scholar
 Bielecki W, Kraska K, Klimek T (2013) Transitive closure of a union of dependence relations for parameterized perfectlynested loops. In: Malyshkin V et al. (eds) PaCT2013, LNCS, vol 7979. Springer, Heidelberg, pp 37–50Google Scholar
 Bielecki W, Palkowski M, Klimek T (2012) Free scheduling for statement instances of parameterized arbitrarily nested affine loops. Parallel Comput 38:518–532. http://dx.doi.org/10.1016/j.parco.2012.06.001
 Boigelot B (1998) Symbolic methods for exploring infinite state spaces. Ph.D. thesis, Université de LiègeGoogle Scholar
 Bozga M, Girlea C, Iosif R (2009) Iterating octagons. ETAPS 2009, TACAS’09. Springer, New York, pp 337–351Google Scholar
 Cohen E, Megiddo N (1991) Recognizing properties of periodic graphs, DIMACS series in discrete mathematics and theoretical computer science, vol 4. American Mathematical Society, pp 135–146Google Scholar
 Deng X, Dymond P (1998) On multiprocessor system scheduling. J Comb Optim 1(4):377–392Google Scholar
 Diestel R (2010) Graph theory, 4th edn. Springer, New YorkCrossRefGoogle Scholar
 Eve J, KurkiSuonio R (1977) On computing the transitive closure of a relation. Acta Informatica 25. X 8(4):303–314zbMATHMathSciNetCrossRefGoogle Scholar
 Feautrier P (2012) Approximating the transitive closure of a BooleanAffine relation, IMPACT 2012 second international workshop on polyhedral compilation techniques, Paris, France, http://impact.gforge.inria.fr/impact2012/
 Griebl M (2004) Automatic parallelization of loop programs for distributed memory achitectures. Fakultät für Mathematik und Informatik Universität Passau, HabilitationGoogle Scholar
 Hollermann L, Tsansheng H, Lopez D, Vertanen K (1997) Scheduling problems in a practical allocation model. J Comb Optim 1(2):129–149MathSciNetCrossRefGoogle Scholar
 Integer Set Library, http://www.kotnet.org/skimo/isl/
 Kelly W, Maslov V, Pugh W, Rosser E, Shpeisman T, Wonnacott D (1996) New User Interface for Petit and Other Extensions. User GuideGoogle Scholar
 Kelly W, Pugh W, Rosser E, Shpeisman T (1996) Transitive closure of infinite graphs and its applications. LCPC’95, Columbus, Ohio, vol 1033. Springer, New York, pp 126–140Google Scholar
 NASA Advanced Supercomputing Division, http://www.nas.nasa.gov
 Omega Library, http://www.cs.umd.edu/projects/omega/
 Presburger M (1927) Über de vollständigkeit eines gewissen systems der arithmetik ganzer zahlen, in welchen, die addition als einzige operation hervortritt. In: Comptes Rendus du Premier Congrès des Mathématicienes des Pays Slaves, 395, Warsaw, pp 92–101Google Scholar
 Rotman J (2003) Advanced modern algebra, 2nd edn. Prentice Hall, Upper Saddle RiverGoogle Scholar
 Schrijver A (1999) Theory of linear and integer programming. Series in Discrete MathematicsGoogle Scholar
 Shoup V (2005) A computational Introduction to number theory. Cambridge University Press, New YorkzbMATHCrossRefGoogle Scholar
 Skiena S (2008) The algorithm design manual, 2nd edn. Springer, BerlinzbMATHCrossRefGoogle Scholar
 Verdoolaege S (2012) Integer set library—manual, Tech. rep. 2012, Version: isl0.11 www.kotnet.org/skimo/isl/manual
 Verdoolaege S, Cohen A, Beletska A (2011) Transitive closures of affine integer tuple relations and their overapproximations. SAS, pp 216–232Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.