Advertisement

Journal of Combinatorial Optimization

, Volume 30, Issue 2, pp 253–275 | Cite as

Using basis dependence distance vectors in the modified Floyd–Warshall algorithm

  • Włodzimierz Bielecki
  • Krzysztof Kraska
  • Tomasz Klimek
Open Access
Article

Abstract

In this paper, we present a modified Floyd–Warshall algorithm, where the most time-consuming part—calculating transitive closure describing self-dependences for each loop statement—is computed applying basis dependence distance vectors derived from all vectors describing self-dependences. We demonstrate that the presented approach reduces the transitive closure calculation time for parameterized graphs representing all dependences in the loop in comparison with that yielded by means of techniques implemented in the Omega and ISL libraries. This increases the applicability scope of techniques based on transitive closure of dependence graphs and being aimed at building optimizing compilers. Experimental results for NASA Parallel Benchmarks are discussed.

Keywords

Basis dependence vectors Transitive closure Floyd–Warshall algorithm Arbitrarily nested loop Parallelizing compiler 

1 Introduction

Resolving many problems is based on calculating transitive closures of graphs Diestel (2010). In this paper, we deal with parameterized graphs whose number of vertices is represented with an expression including structure parameters. Such graphs can be represented by parameterized relations whose tuples represent vertices while constraints are responsible for defining edges Kelly et al. (1996). Transitive closure calculated for such relations can be used in optimizing compilers: to remove redundant synchronization Kelly et al. (1996), test the legality of iteration reordering transformations Kelly et al. (1996), apply iteration space slicing Beletska et al. (2011), form schedules for statement instances of program loops Bielecki et al. (2012); Hollermann et al. (1997); Deng et al. (1998). In general, calculating transitive closure of parameterized graphs is time-consuming Kelly et al. (1996); Beletska et al. (2009); Verdoolaege et al. (2011). Sometimes the time of transitive closure calculation prevents applying techniques for extracting coarse- and fine-grained parallelism because this time is not acceptable in practice (several hours and even several days Beletska et al. (2011); Bielecki et al. (2012)). This is why improving transitive closure calculation algorithms aimed at reducing their time complexity is an actual task.

The contributions of this paper over previous work are as follows:
  • demonstration of how to calculate basis dependence distance vectors for parameterized program loops;

  • proposition of a way of calculating the transitive closure of a dependence relation describing all self-dependences among the instances of a given loop statement by means of basis dependence distance vectors;

  • suggestion to apply the transitive closure of a dependence relation describing all self-dependences among the instances of a given loop statement by means of basis dependence distance vectors in a modified Floyd–Warshall algorithm with the aim of reducing the calculation time of the transitive closure of a dependence graph representing all the dependences in a given program loop;

  • development of an open source software implementing presented solutions and permitting for producing the transitive closure of a dependence graph describing all the dependences for the input program loop by means of the modified Floyd–Warshall algorithm;

  • an evaluation of the effectiveness and efficiency of the presented algorithms and a comparison of them with those yielded by related work.

In this paper, we demonstrate how to reduce the time of calculating transitive closure describing self-dependences. For this purpose, we propose to find basis dependence distance vectors from all distance vectors describing self-dependences and then demonstrate how these vectors can be used for calculating transitive closure. For extracting such distance vectors, dependence relations, returned by means of a dependence analyzer, are used. Such relations (describing self-dependences) are characterized by the same arity (the number of tuple elements) of input and output tuples. Finaly, we present experimental results showing how the time of transitive closure calculation is reduced for NAS benchmarks (http://www.nas.nasa.gov).

The rest of the paper is organized as follows. Section 2 introduces background. Section 3 presents an approach to calculate the transitive closure of a parameterized dependence graph. Section 4 describes related work. Section 5 presents results of an experimental study. Section 6 draws conclusions and briefly outlines future research.

2 Background

In this section, we briefly introduce basic definitions which are used throughout this paper.

The following concepts of linear algebra are used in the approach presented in this paper: vector, vector space, field, integral linear combination, linear independence. Details can be found in book Schrijver (1999).

Definition 1

(Integer Lattice) Let {\(a_{1},a_{2},...,a_{m}\)} be a set of linearly independent integer vectors. The set \(\Lambda =\{\lambda _{1}a_{1}+\lambda _{2}a_{2}+...+\lambda _{m}a_{m}\mid \lambda _{1},...,\lambda _{m}\in \mathbb {Z}\}\) is called an integer lattice generated by the basis {\(a_{1},a_{2},...,a_{m}\)}.

Definition 2

(Basis) A basis \(B\) of an integer lattice \(\Lambda \) over field \(\mathbb {Z}\) is a linearly independent subset of \(\Lambda \) that generates \(\Lambda \). Every finite-dimensional vector space \(\Lambda \) has a basis Shoup (2005).

Definition 3

(Presburger Arithmetic, Presburger Formula) We define Presburger arithmetic to be the first-order theory over atomic formulas of the form:
$$\begin{aligned} \sum ^{i=1}_{n}a_{i}x_{i} \sim c, \end{aligned}$$
(1)
where \(a_{i}\) and \(c\) are integer constants, \(x_{i}\) are variables ranging over integers, and \(\sim \) is an operator from \(\left\{ =,\ne , <,\le ,>,\ge \ \right\} \). A formula \(f\) is an atomic formula (1) or it is constructed from formulas \(f_{1}\) and \(f_{2}\) recursively as follows Presburger (1927):
$$\begin{aligned} f {::} = \lnot f_{1}|f_{1}\wedge f_{2}|f_{1}\vee f_{2}. \end{aligned}$$

In this paper, we deal with the following definitions concerned program loops: iteration vector, loop domain (index set), parameterized loops, perfectly-nested loops, details can be found in papers Griebl (2004).

Definition 4

(Structure Parameters) Structure parameters are integer symbolic constants, generally defining array size, iteration bounds, etc. Structure parameters may be defined once in the prologue of the program, and may not be modified elsewhere.

Definition 5

(Iteration Vector) For a given statement \(S\) in a loop, the iteration vector \(\overrightarrow{v}=(i_{1},...,i_{n})^{T}\) is the vector of the surrounding loop counters.

Definition 6

(Iteration Space) The iteration space of a given statement \(S\) in a given loop nest is a set of values taken by its iteration vector when executing the loop nest.

Definition 7

(Dependence) Two statement instances \(S_{1}(\overrightarrow{I})\) and \(S_{2}(\overrightarrow{J})\), where \(\overrightarrow{I}\) and \(\overrightarrow{J}\) are the iteration vectors, are dependent if both access the same memory location and if at least one access is a write.

Definition 8

(Dependence Distance Set, Dependence Distance Vector) We define a dependence distance set \(\Delta _{S,T}\) as a set of differences between all such vectors of the same size that stand for a pair of dependent instances of statement \(T\) and \(S\). We call each element of set \(\Delta _{S,T}\) a (dependence) distance vector and denote it as \(\delta _{S,T}\).

Definition 9

(Uniform Dependence, Non-Uniform Dependence) If each coordinate of vector \(\delta _{S,T}\) (see Definition 8) is constant, then we call a corespondent dependence uniform, otherwise it is non-uniform. Griebl (2004).

Definition 10

(Reduced Dependence Graph) A Reduced Dependence Graph (\(RDG\)) is the graph where a vertex stands for every statement \(S\) and an edge connects statements \(S\) and \(T\) whose instances are dependent. The number of edges between vertices \(S\) and \(T\) is equal to the number of dependence relations \(R_{S,T}\).

Definition 11

(Uniform Loop, Quasi-Uniform Loop) We say that a parameterized loop is uniform if it induces dependences represented by the finite number of uniform dependence distance vectors Griebl (2004). A parameterized loop is quasi-uniform if all its dependence distance vectors can be represented by an integral linear combination of the finite number of linearly independent vectors with constant coordinates.

Let us consider the parameterized dependence distance vector \((N,2)\). It can be represented as \((0,2)+a\times (1,0)\), where \(a\ge 1, a\in \mathbb {Z}\) (see Fig. 1).
Fig. 1

The parameterized vector \((N,2)\) represented as the integral linear combination of the two linearly independent vectors with constant coordinates

Definition 12

(Dependence Relation) A dependence relation is a tuple relation of the form \(\left\{ [input\_list] \rightarrow [output\_list]: constraints\right\} \), where \(input\_list\) and \(output\_list\) are the lists of variables used to describe input and output tuples and constraints is a Presburger formula describing the constraints imposed upon \(input\_list\) and \(output\_list\).

The general form of a dependence relation is as follows Kelly et al. (1996):
$$\begin{aligned} R = \left\{ [s_{i}, \ldots , s_{k}] \rightarrow [t_{i}, \ldots , t_{k}]: \bigvee ^{n}_{i=1} \exists \alpha _{i1}, \ldots \ ,\alpha _{im_{i}}\; \text {s.t.}\; F_{i}\right\} , \end{aligned}$$
where \(F_{i}, i = 1,2,\ldots ,n\) are represented by Presburger formulas, i.e., they are conjunctions of affine equalities and inequalities on the input variables \(s_{1},\ldots , s_{k}\), the output variables \(t_{1},\ldots , t_{k}\), the existentially quantified variables \(\alpha _{i1}, \ldots \ ,\alpha _{im_{i}}\), and symbolic constants.

Definition 13

(Domain of a Relation) Let \(R\in Z^{n}\rightarrow Z^{m}\) be a relation. The domain of \(R\), \(dom\;R\), is represented with the following set
$$\begin{aligned} dom\;R\,:=\,s\rightarrow \left\{ x_{1}\in Z^{n}\,|\,\exists \,x_{2}\in Z^{m}\,:\,x_{2}=R(x_{1})\right\} . \end{aligned}$$

Definition 14

(Range of a Relation) Let \(R\in Z^{n}\rightarrow Z^{m}\) be a relation. The range of \(R\), \(ran\;R\), is calculated as follows
$$\begin{aligned} ran\;R\,:=\,s\rightarrow \left\{ x_{2}\in Z^{m}\,|\,\exists \,x_{1}\in Z^{n}\,:\,x_{1}=R^{-1}(x_{2})\right\} . \end{aligned}$$

Definition 15

(Positive Transitive Closure) Let \(R\) be an affine integer tuple relation, then the positive transitive closure \(R^{+}\) of \(R\) is the union of all positive powers of \(R\),
$$\begin{aligned} R^{+}=\bigcup _{k\geqslant 1}^{} R^{k},\quad \mathrm {with}\quad R^{k}= \left\{ \begin{array}{ll} R &{} if \;k=1 \\ R\circ R^{k-1} &{} if \;k\geqslant 2.\\ \end{array}\right. \end{aligned}$$
(2)

Definition 16

(Transitive Closure) Transitive closure, \(R^{*}\), is defined as follows Kelly et al. (1996):
$$\begin{aligned} R^{*}=R^{+}\cup I, \end{aligned}$$
where \(I\) is the identity relation. \(R^{*}\) describes the same connections in a dependence graph (represented by \(R\)) that \(R^{+}\) does plus connections of each vertex with itself.
To check whether output returned by an algorithm represents exact transitive closure, we can use the well-known fact Kelly et al. (1996) that for an acyclic relation \(R\) (for such a relation \(R \cap I = \varnothing \), where \(I\) is the identity relation) the following is true:
  • if \(R^{+}\) is exact transitive closure, then:
    $$\begin{aligned} R^{+} = R \cup \left( R\circ R^{+}\right) , \end{aligned}$$
  • if \(R^{+}\) is an over–approximation, then:
    $$\begin{aligned} R^{+} \nsubseteq \ R \cup \left( R\circ R^{+}\right) . \end{aligned}$$
In the next section, we analyse the time complexity of the proposed approach in a machine-independent way to asses the performance of algorithms. For this purpose, the RAM (Random Access Machine) model of computation is used. Under the RAM model, we measure time complexity by counting up an upper bound, \(O\), on the number of steps that an algorithm takes for a given problem. Details on the model and the time complexity analysis can be found in paper Skiena (2008).

3 Approach to computing transitive closure

The goal of the algorithm presented below is to calculate the transitive closure of a dependence relation describing all the dependences in the arbitrary-nested loop.

3.1 Floyd–Warshall algorithm

To compute the transitive closure of a dependence relation representing all the dependences exposed for an arbitrarily nested loop, we use a modified form of the Floyd–Warshall (F-W) algorithm (see Algorithm 1).The idea of the F-W algorithm is the following. Let \(^{\underrightarrow{*}} \) denote a direct or transitive path between a pair of vertices in a dependence graph, whose intermediate vertices come from a specific set \(S\). If the graph contains paths \(v^{\underrightarrow{*}}w\) and \(w^{\underrightarrow{*}}u\), then the graph also contains a path \(v^{\underrightarrow{*}}u\) such that its intermediate vertices come from the set \(S\cup \left\{ w \right\} \). F-W’s algorithm iterates from 1 to \(n\) where \(n\) is the total number of statements in the loop and in the k-th iteration it takes into account the paths whose intermediate vertices come from the set \(\left\{ v_{1}, ..., v_{k-1} \right\} \).

The most time-consuming part in Algorithm 1 is the expression \(D_{kj}\circ R_{kk}^{*}\circ D_{ik}\), where ‘\(\circ \)‘ denotes the composition operator applied to a pair of relations, \(D_{ik}\) describes all the dependences between instances of statement \(s_{i}\) and statement \(s_{k}\). This means that if there is a dependence from iteration \(i_{1}\) of statement \(s_{i}\) to iteration \(i_{2}\) of statement \(s_{k}\) and a chain of self-dependences from iteration \(i_{2}\) to iteration \(i_{3}\), \(R_{kk}^{*}\), and finally a dependence from iteration \(i_{3}\) of statement \(s_{k}\) to iteration \(i_{4}\) of statement \(s_{j}\) (where \(D_{kj}\) describes all dependences between instances of statement \(s_{k}\) and statement \(s_{j}\)), then there is a transitive dependence from iteration \(i_{1}\) to iteration \(i_{4}\). It should be clear that the objective of this technique is to update all the dependences through statements 1,2,...,n in an iteration of each k-loop. In Sect. 3.3, we suggest calculating \(R_{kk}^{*}\) using a finite integral linear combination of basis dependence distance vectors Bielecki et al. (2013).

3.2 Replacing the parameterized vector with an integral linear combination of constant vectors

To find constant vectors whose integral linear combination represents the parameterized vector, we can apply the following theorem.

Theorem 1

Let \(v_{p}\) be a vector in \(Z^{d}\) and \(p_{i}\) are its parameterized coordinates in the i-positions. We may replace vector \(v_{p}\) with an integral linear combination of a constant vector \(v_{c}\), \(v_{c}\in \mathbb {Z}^{d}\), and unit normal vectors \(e_{i}\), \(e_{i}\in \mathbb {Z}^{d}\), where \(p_{i}\) are integer parametric coefficients, as follows:
$$\begin{aligned} v_{p}=v_{c}+\sum _{i}p_{i}\times e_{i}. \end{aligned}$$
(3)

Proof

Without loss of generality, we may assume that the first \(n\) positions of \(v_{p}\) have constant coordinates and the last \(q\) positions have parameterized ones. Then, we can write:
$$\begin{aligned} \left( \begin{array}{c} c_{1} \\ \vdots \\ c_{n} \\ p_{n+1} \\ \vdots \\ p_{d} \\ \end{array} \right) = \left( \begin{array}{c} c_{1} \\ \vdots \\ c_{n} \\ 0 \\ \vdots \\ 0 \\ \end{array} \right) + \left( \begin{array}{c} 0 \\ \vdots \\ 0 \\ p_{n+1} \\ \vdots \\ p_{d} \\ \end{array} \right) , \end{aligned}$$
(4)
where here and further \(d-n=q\), the second vector can be written as the integral linear combination of unit normal vectors \(e_{k}\) and parameterized coefficients \(p_{n+1},\ldots , p_{d}\) in the last \(d\) positions:
$$\begin{aligned} \left( \begin{array}{c} 0 \\ \vdots \\ 0 \\ p_{n+1} \\ \vdots \\ p_{d} \\ \end{array} \right) \!=\! \left( \begin{array}{c} 0 \\ \vdots \\ 0 \\ p_{n+1} \\ \vdots \\ 0 \\ \end{array} \right) \!+\!\ldots \!+\!\ \left( \begin{array}{c} 0 \\ \vdots \\ 0 \\ 0 \\ \vdots \\ p_{d} \\ \end{array} \right) \!=\! p_{n+1} \times \left( \begin{array}{c} 0 \\ \vdots \\ 0 \\ 1 \\ \vdots \\ 0 \\ \end{array} \right) \!+\!\ldots \!+\!\ p_{d} \times \left( \begin{array}{c} 0 \\ \vdots \\ 0 \\ 0 \\ \vdots \\ 1 \\ \end{array} \right) \! . \end{aligned}$$
(5)
Substituting (5) into (4), we obtain:
$$\begin{aligned} \left( \begin{array}{c} c_{1} \\ \vdots \\ c_{n} \\ p_{n+1} \\ \vdots \\ p_{d} \\ \end{array} \right) = \left( \begin{array}{c} c_{1} \\ \vdots \\ c_{n} \\ 0 \\ \vdots \\ 0 \\ \end{array} \right) + p_{n+1} \times e_{n+1} +\ldots +\ p_{d} \times e_{d} . \end{aligned}$$
(6)
It is obvious that if \(v_{c}=\varvec{0}\), then \(v_{c}\) can be rejected without affecting the result. \(\square \)

3.3 Algorithm for computing transitive closure

The idea of the algorithm presented in this section is the following. Given a set \(\Delta _{S,T}\) of \(m\) dependence distance vectors in the \(n\)-dimensional integer space derived from a union of dependence relations \(R_{kk}\), \(k=1,2,..., q\), where \(q\) is the number of loop statements (it describes a chain of self-dependences of statement \(s_{k}\) in the loop), we first replace all parameterized vectors with constant vectors using Theorem 1 from Sect. 3.2.

As a result, we get, \(k\), \(k\ge m\), dependence distance vectors with constant coordinates. This allows us to get rid of parameterized vectors and to form an integer matrix \(A\), \(A \in \mathbb {Z}^{n \times k}\), by inserting dependence distance vectors with constant coordinates into columns of \(A\) that generate integer lattice \(\Lambda \).

To decrease the complexity of further computations, redundant dependence distance vectors are eliminated from matrix \(A\) by finding a subset of, \(l\), \(l \le k\), linearly independent columns of \(A\). This subset of dependence distance vectors forms the basis \(B \in \mathbb {Z}^{n \times l}\) of \(A\) and generates the same integer lattice \(\Lambda \) as \(A\) does Schrijver (1999). Every element of integer lattice \(\Lambda \) can be expressed uniquely as a finite integral linear combination of the basis dependence distance vectors belonging to \(B\).

After \(B\) is completed, we can work out relations \(R_{kk}^{*}\), \(k=1,2,..,q\), representing the exact transitive closure \(R_{kk}\) or its over-approximation. For each vertex \(x\) in the data dependence graph (where \(x\) is the source of a dependence, \(x \in dom\;R_{kk}\), we can identify all vertices \(y\) (the destination(s) of a dependence(s), \(y \in ran\;R_{kk}\) that are connected with \(x\) by a path of length equal or more than \(1\), where \(y\) is calculated as \(x\) plus an integral linear combination of the basis dependence distance vectors \(B\), i.e. \(y = x + B \times \lambda \), \(\lambda \in \mathbb {Z}^{l}\). The part \(B \times \lambda \) of the formula represents all possible paths in the dependence graph, represented by relation \(R_{kk}\), connecting \(x\) and \(y\). Moreover, we have to preserve the lexicographic order for \(y\) and \(x\), i.e. \(y-x\succ 0\). Below, we present the algorithm in a formal way.

Algorithm 2. Calculating the transitive closure of a dependence relation \(R_{kk}\) describing a chain of self-dependences of a loop statement.
Input:

Dependence distance set \(\Delta ^{n \times m}= \delta _{1},\delta _{2}, \ldots , \delta _{m}\), where \(m\) is the number of \(n\)-dimensional dependence distance vectors.

Output:

Exact transitive closure of relation \(R_{kk}\) or its over-approximation.

Method:
  1. 1.

    Replace each parameterized dependence distance vector in \(\Delta ^{n \times m}\) with an integral linear combination of vectors with constant coordinates. For this purpose apply Theorem 1 presented in Sect. 3.2.

     
  2. 2.

    Using all constant dependence vectors, form matrix \(A \in \mathbb {Z}^{n \times k}\), \(k \ge m\).

     
  3. 3.

    Extract a finite subset of, \(l\), \(l\le k\), linearly independent columns from matrix \(A \in \mathbb {Z}^{n \times k}\) over field \(\mathbb {Z}^{n}\) and form matrix \(B^{n \times l}\), representing the basis of the dependence distance vectors set, where linearly independent vectors are represented with columns of matrix \(B^{n \times l}\). For this purpose apply the Gaussian elimination algorithm Shoup (2005); Rotman (2003).

     
  4. 4.
    Calculate relation \(R_{kk}^{*}\) representing the exact transitive closure of a dependence relation, describing all the dependences in the input loop, or its over-approximation, as follows:
    $$\begin{aligned} R_{kk}^{*} = \left\{ \begin{array}{l} \left[ x \right] \rightarrow \left[ y \right] \mid \exists \lambda \ s.t.\;y \!=\! x \!+\! B^{n \times l} \times \lambda \;\wedge \; y-x\succ 0, \;\lambda \in \mathbb {Z}^{l} \;\wedge \\ \qquad \qquad \qquad \quad \quad \;\, \wedge \; y \in ran\;\; R_{kk}\;\wedge \; x \in dom\;\; R_{kk} \end{array} \right\} \cup I, \nonumber \\ \end{aligned}$$
    (7)
    where : \(R^{*}_{kk}\) describes a chain of self dependences of loop statement \(s_{k}\), \(B^{n \times l} \times \lambda \) represents an integral linear combination of the basis dependence distance vectors \(\delta _{i}\) (the columns of \(B^{n \times l}\), \(1 \le i \le l)\), \(y-x\succ 0\) imposes the lexicographically forward constraints on the tuples of \(R_{kk}^{*}\), \(I\) is the identity relation.
     
The resulting relation \(R_{kk}^{*}\) represents the exact transitive closure of relation \(R_{kk}\) or its over-approximation Feautrier (2012). To prove this, let us note that relation \(R_{kk}^{+}\) represents all possible paths between vertices \(x\) (standing for dependence sources, \(x \in dom\;\, R_{kk}\)) and vertices \(y\) (standing for dependence destinations, \(y \in ran\;\, R_{kk}\)) in the dependence graph, represented with relation \(R_{kk}\). Indeed, an integral linear combination of the basis dependence distance vectors \(B^{n \times l} \times z\):
  • reproduces all dependence distance vectors exposed for the loop,

  • describes all existing (true) paths between any pair of \(x\) and \(y\) as an integral linear combination of all dependence distance vectors exposed for the loop,

  • can describe not existing (false) paths in the dependence graph represented by relation \(R^{*}_{kk}\).

There are two cases when false (not existing) paths may occur. The first one arises due the fact that in an integral linear combination of linearly independent columns of matrix \(A\) some coefficients can be negative. Let us consider the matrix \(A\),
$$\begin{aligned} A=\left[ \begin{array}{ccccc} 3 &{} 3 &{} 2 &{} \fbox {5} &{} \fbox {4}\\ 2 &{} 0 &{} -2 &{} \fbox {0} &{} \fbox {2}\\ \end{array} \right] . \end{aligned}$$
The linearly independent columns are represented with the first three columns only because the rest ones can be represented as the integral linear combinations below (see Fig. 2a).
$$\begin{aligned} \left( \begin{array}{c} 5 \\ 0 \\ \end{array} \right) = \;1\times \left( \begin{array}{c} 3 \\ 2 \\ \end{array} \right) + 0\times \left( \begin{array}{c} 3 \\ 0 \\ \end{array} \right) + 1\times \left( \begin{array}{c} 2 \\ -2 \\ \end{array} \right) ,\\ \left( \begin{array}{c} 4 \\ 2 \\ \end{array} \right) = \;0\times \left( \begin{array}{c} 3 \\ 2 \\ \end{array} \right) + 2\times \left( \begin{array}{c} 3 \\ 0 \\ \end{array} \right) \fbox {-1}\times \left( \begin{array}{c} 2 \\ -2 \\ \end{array} \right) . \end{aligned}$$
Fig. 2

a Finding the basis \(B\) for matrix \(A\), b False paths due to linear combinations of basis \(B\)

The basis, \(B\), of matrix \(A\) is the following:
$$\begin{aligned} B=\left\{ \left( \begin{array}{c} 3 \\ 2 \\ \end{array} \right) \left( \begin{array}{c} 3 \\ 0 \\ \end{array} \right) \left( \begin{array}{c} 2 \\ -2 \\ \end{array} \right) \right\} . \end{aligned}$$
The linear combinations of vectors belonging to basis \(B\) with negative coefficients lead to false paths, for example, as presented below (see Fig. 2b).The second case takes place when on a path between \(x\) and \(y\), being described by \(R_{kk}^{*}\), there exists a vertex \(w\) such that \(w \in ran\;\, R_{kk} \,\wedge \, w \notin dom\;\, R_{kk}\). An example is presented in Fig. 3, where \(x_{2} \in ran\;\, R_{kk} \,\wedge \, x_{2} \notin dom\;\, R_{kk}\). Relation \(R_{kk}^{*}\), built according to (7), describes the false path between \(x_{1}\) and \(x_{4}\) depicted by the dotted line.
Fig. 3

False path in the dependence graph

Summing up, we conclude that relation \(R_{kk}^{*}\) describes all existing paths in the dependence graph represented by relation \(R_{kk}\) and can describe not existing paths, i.e., \((R^{*}_{kk})_{exact} \subseteq R^{*}_{kk}\); when relation \(R^{*}_{kk}\) does not represent not existing paths, \(R^{*}_{kk} = (R^{*}_{kk})_{exact}\).

3.4 Time complexity

The first tree steps of Algorithm 2 can be accomplished in polynomial time.
  1. 1.

    As we have proved in Sect. 3.2, the task of replacing parameterized vectors with a linear combination of vectors with constant coordinates can be done in \(O\left( d^{2} \right) \) operations.

     
  2. 2.

    The task of forming a dependence matrix using all \(k\) constant dependence vectors in \(\mathbb {Z}^{n}\) requires \(O\left( kn \right) \) operations (memory accesses).

     
  3. 3.

    The task of identifying a set of linearly independent columns of matrix \(A\), \(A\in \mathbb {Z}^{n\times k}\) with constant coordinates to find the basis can be done in polynomial time by the Gaussian elimination algorithm. According to Cohen and Megiddo (1991), this computation can be done in \(O\left( ldk \right) \) arithmetic operations.

     
To calculate relation \(R^{*}_{kk}\) in step 4 of Algorithm 2, we use the Presburger arithmetic. In general, calculations based on the Presburger arithmetic are not characterized by polynomial time complexity Kelly et al. (1996).

3.5 Illustrating example

Let us illustrate Algorithms 1 and 2 by means of the following example.
The following relations describe dependences in the loop above.
They form the reduced dependence graph presented in Fig. 4.
Fig. 4

Reduced dependence graph for the illustrative example

Algorithm 1 calls Algorithm 2 to calculate relation \(R_{kk}^{*}\) for each iteration \(k\) of the outermost loop.

For \(k=1\), the dependence distance set \(\Delta _{1,1}=\emptyset \) because \(R_{11}=\emptyset \), so we get \(R^{*}_{11}=\emptyset \;\cup \;I=\{[k,i]\rightarrow [k,i]\}\).

For \(k=2\), we get the following dependence relations:
and yield the following dependence distance set \(\Delta _{2,2}\)
$$\begin{aligned} \Delta _{2,2}=\left\{ \left( \begin{array}{c} 2 \\ N2 \\ N3 \\ \end{array} \right) \left( \begin{array}{c} 0 \\ 0 \\ N3 \\ \end{array} \right) \left( \begin{array}{c} 0 \\ N2 \\ N3 \\ \end{array} \right) \left( \begin{array}{c} 4 \\ N2 \\ N3 \\ \end{array} \right) \right\} . \end{aligned}$$
Applying Algorithm 2, we yield the following results.
  1. 1.
    Replace all parameterized dependence distance vectors. The first parameterized vector \(\left( \begin{array}{c} 2 \\ N2 \\ N3 \\ \end{array} \right) \) is replaced with the linear combination of the vector \(\left( \begin{array}{c} 2\\ 0\\ 0\\ \end{array}\right) \) and the unit normal vectors \(\left( \begin{array}{c} 0\\ 1\\ 0\\ \end{array}\right) ,\left( \begin{array}{c} 0\\ 0\\ 1\\ \end{array}\right) \) as follows:
    $$\begin{aligned} \left( \begin{array}{c} 2 \\ N2 \\ N3 \\ \end{array} \right) = \left( \begin{array}{c} 2 \\ 0 \\ 0 \\ \end{array} \right) + N2 \times \left( \begin{array}{c} 0 \\ 1 \\ 0 \\ \end{array} \right) + N3 \times \left( \begin{array}{c} 0 \\ 0 \\ 1 \\ \end{array} \right) . \end{aligned}$$
    The second parameterized vector \(\left( \begin{array}{c} 0 \\ 0 \\ N3 \\ \end{array} \right) \) is replaced with the linear combination of the vector \(\left( \begin{array}{c}0\\ 0\\ 0\\ \end{array}\right) \) and the unit normal vector \(\left( \begin{array}{c}0\\ 0\\ 1\\ \end{array}\right) \) as below:
    $$\begin{aligned} \left( \begin{array}{c} 0 \\ 0 \\ N3 \\ \end{array} \right) = \left( \begin{array}{c} 0 \\ 0 \\ 0 \\ \end{array} \right) + N3 \times \left( \begin{array}{c} 0 \\ 0 \\ 1 \\ \end{array} \right) . \end{aligned}$$
    The third parameterized vector \(\left( \begin{array}{c} 0 \\ N2 \\ N3 \\ \end{array} \right) \) is replaced with the linear combination of the vector \(\left( \begin{array}{c}0 \\ 0\\ 0\\ \end{array}\right) \) and the unit normal vectors \(\left( \begin{array}{c}0 \\ 1\\ 0\\ \end{array}\right) ,\left( \begin{array}{c} 0\\ 0\\ 1\\ \end{array}\right) \) as follows:
    $$\begin{aligned} \left( \begin{array}{c} 0 \\ N2 \\ N3 \\ \end{array} \right) = \left( \begin{array}{c} 0 \\ 0 \\ 0 \\ \end{array} \right) + N2 \times \left( \begin{array}{c} 0 \\ 1 \\ 0 \\ \end{array} \right) + N3 \times \left( \begin{array}{c} 0 \\ 0 \\ 1 \\ \end{array} \right) . \end{aligned}$$
    The last parameterized vector \(\left( \begin{array}{c} 4 \\ N2 \\ N3 \\ \end{array} \right) \) is replaced with the linear combination of the vector \(\left( \begin{array}{c}4\\ 0\\ 0\\ \end{array}\right) \) and the unit normal vectors \(\left( \begin{array}{c}0\\ 1\\ 0\\ \end{array}\right) ,\left( \begin{array}{c} 0\\ 0\\ 1\\ \end{array}\right) \) as below:
    $$\begin{aligned} \left( \begin{array}{c} 4 \\ N2 \\ N3 \\ \end{array} \right) = \left( \begin{array}{c} 4 \\ 0 \\ 0 \\ \end{array} \right) + N2 \times \left( \begin{array}{c} 0 \\ 1 \\ 0 \\ \end{array} \right) + N3 \times \left( \begin{array}{c} 0 \\ 0 \\ 1 \\ \end{array} \right) . \end{aligned}$$
    The resulting dependence distance set \(\Delta _{2,2}\) contains the vectors with constant coordinates only:
    $$\begin{aligned} \Delta _{2,2}=\left\{ \left( \begin{array}{c} 2 \\ 0 \\ 0 \\ \end{array} \right) \left( \begin{array}{c} 0 \\ 1 \\ 0 \\ \end{array} \right) \left( \begin{array}{c} 0 \\ 0 \\ 1 \\ \end{array} \right) \left( \begin{array}{c} 0 \\ 0 \\ 0 \\ \end{array} \right) \left( \begin{array}{c} 4 \\ 0 \\ 0 \\ \end{array} \right) \right\} . \end{aligned}$$
     
  2. 2
    Form a dependence matrix. The matrix \(A\), where all the constant dependence vectors from set \(\Delta _{2,2}\) are placed in columns, is as follows:
    $$\begin{aligned} A=\left[ \begin{array}{ccccc} 2 &{} 0 &{} 0 &{} 0 &{} 4\\ 0 &{} 1 &{} 0 &{} 0 &{} 0\\ 0 &{} 1 &{} 1 &{} 0 &{} 0\\ \end{array} \right] . \end{aligned}$$
     
  3. 3.
    Find the basis of the dependence distance set. A set of linearly independent columns of matrix \(A \in \mathbb {Z}^{n \times k}\) over field \(\mathbb {Z}^{n}\), that can generate every vector in \(A\), holds the following matrix \(B\):
    $$\begin{aligned} B=\left[ \begin{array}{ccc} 2 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \\ \end{array} \right] . \end{aligned}$$
     
  4. 4.
    Calculate the exact transitive closure of a dependence relation describing all the dependences in an input loop or its over-approximation, \(R^{*}_{22}\). Form relation \(R^{*}_{22}\) as follows:
    $$\begin{aligned} \begin{array}{ll} R^{*}_{22} :=&{} \left\{ \begin{array}{l} \left[ i,j,k \right] \rightarrow \left[ i',j',k' \right] \mid \exists \lambda \, s.t. \left( \begin{array}{c} i' \\ j' \\ k' \\ \end{array} \right) = \left( \begin{array}{c} i\\ j\\ k\\ \end{array} \right) + \left[ \begin{array}{ccc} 2 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \\ \end{array}\right] \times \lambda \ \wedge \\ \qquad \qquad \qquad \qquad \quad \; \wedge \; \left( \begin{array}{c} i' \\ j' \\ k' \\ \end{array} \right) - \left( \begin{array}{c} i\\ j\\ k\\ \end{array} \right) \succeq 0, \;\lambda \in \mathbb {Z}^{3}\ \wedge \\ \qquad \qquad \qquad \qquad \quad \;\ \wedge \; \left( \begin{array}{c} i' \\ j' \\ k' \\ \end{array} \right) \in ran\; R_{22} \, \wedge \left( \begin{array}{c} i\\ j\\ k\\ \end{array} \right) \in dom\; R_{22} \end{array} \right\} = \end{array}\\ \begin{array}{ll} \quad \;\;\;\ &{} \begin{array}{l} \left\{ \begin{array}{ll} [k,i,j]\rightarrow [k',i',j'] : &{} \exists \,\alpha : 2\alpha =k + k'\; \wedge 1 \le k \le k'-2\; \wedge 1 \le i \le N2 \\ &{} 1 \le j,\,j' \le N3 \wedge 1 \le i' \le N2\;\wedge k'\le N1 \end{array} \right\} \cup \\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \\ \left\{ \begin{array}{ll} [k,i,j]\rightarrow [k,i,j'] :&{} 1 \le j < j' \le N3 \wedge 1 \le k \le N1 \wedge 1 \le i \le N2 \end{array} \right\} \cup \\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \\ \left\{ \begin{array}{ll} [k,i,j]\rightarrow [k,i',j'] :&{} 1 \le i < i'\le N2\;\wedge 1 \le k \le N1\wedge 1 \le j \le N3\;\wedge \\ &{} 1 \le j' \le N3\quad \\ \end{array} \right\} \cup \\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \\ \left\{ \begin{array}{ll} [k,i,j]\rightarrow [k,i,j] \end{array} \right\} \end{array} \end{array} \end{aligned}$$
     
Relation \(R^{*}_{22}\) represents exact transitive closure since \(R^{+}_{22} = R_{22} \cup (R_{22} \circ R^{+}_{22})\), i.e., \(R^{*}_{22}=(R^{*}_{22})_{exact}\).
The final result is the following

4 Related work

Numerous algorithms for calculating the transitive closure of affine integer tuple relations have been proposed Kelly et al. (1996); Beletska et al. (2009); Verdoolaege et al. (2011); Ancourt et al. (2010); Boigelot (1998); Bozga et al. (2009); Eve and Kurki-Suonio (1977). However, in most of them authors focus on relations whose domain and range are non-parametric polyhedra Ancourt et al. (2010); Bozga et al. (2009); Eve and Kurki-Suonio (1977). The limitation of many known algorithms is that they require that the arity of input and output tuples (the number of tuple elements) of relations has to be the same Beletska et al. (2009). This is why we limit related work only to techniques dealing with parameterized relations whose tuple arities are different in general and relations can describe dependences available in program loops.

On a different line of work, Bozga et al. Bozga et al. (2009) have studied the computation of transitive closure for the analysis of counter automata (register machines) and they have implemented their method in the tool called FLATA Bozga et al. (2009). In this context, relation \(R(x,x')\) is a relation that can be written as the finite number of conjunctions of terms of the form \(\pm x_{i}\pm x_{j}\leqslant a_{i,j}\), \(\pm x'_{i}\pm x_{j}\leqslant b_{i,j}\), \(\pm x_{i}\pm x'_{j}\leqslant c_{i,j}\), \(\pm x'_{i}\pm x'_{j}\leqslant d_{i,j}\), \(\pm 2x_{i}\leqslant e_{i,j}\) or \(2x'_{i}\leqslant f_{i,j}\), where \(x\) and \(y\) describe counter values, either at the current step, or at the next step, \(a_{i,j},b_{i,j},c_{i,j},d_{i,j},e_{i,j},f_{i,j}\in \mathbb {Z}\) are integer constants and \(1\leqslant i,j \leqslant n\), \(i\ne j\). As we can see, this class of relations does not involve parameters, existentially quantified variables or unions, i.e., it cannot represent dependences in program loops. This is why we do not compare this technique with ours.

To our best knowledge, techniques for computing the transitive closure of parameterized affine integer tuple relations with different input and output arities of tuples were the subject of the investigation of a few papers only Kelly et al. (1996); Verdoolaege et al. (2011); Verdoolaege (2012). Kelly et al. Kelly et al. (1996) proposed a modified Floyd–Warshall algorithm but they have not implemented it in the Omega library (http://www.cs.umd.edu/projects/omega/). Fourteen years later Verdoolaege has improved and implemented his version of the Floyd–Warshall algorithm in the ISL library (http://www.kotnet.org/skimo/isl/), but that algorithm and implementation are not the same as ours.

Verdoolaege Verdoolaege et al. (2011); Verdoolaege (2012) treats each of input relations \(R_{i\leqslant m}\) as vertices \(V_{i\leqslant m}\) of the directed graph \(G\), where \(m\) is the total number of input relations. There exists a directed path \(E_{ij}\) from vertex \(V_{i\leqslant m}\) to vertex \(V_{j\leqslant m}\) (\(R_{j}\) can immediately follow \(R_{i}\)) if the range of \(R_{i}\) intersects the domain of \(R_{j}\), i.e., if
$$\begin{aligned} R_{j}\circ R_{i}\ne \emptyset . \end{aligned}$$
(8)
In order to calculate the transitive closure of a dependence relation \(R\), Verdoolaege Verdoolaege et al. (2011); Verdoolaege (2012) constructs \(m^{2}\) relations
$$\begin{aligned} R_{ij}=\bigcup _{i,j\, s.t.\,R_{j}\circ R_{i}\ne \emptyset \ }^{m}R_{j}\circ R_{i}. \end{aligned}$$
(9)
Then he applies Algorithm 1 and returns the union of all output \(R_{i,j}\) as transitive closure. In our implementation, we use information gathered with the Petit dependence analyzer Kelly et al. (1996) to insert a dependence relation describing dependences between instances of statements i and j as element i, j of array \(D\) (element \(D_{ij}\)). Then we call Algorithm 1 to get transitive closure. Information provided with Petit permits us to reduce the time complexity of the Floyd–Warshall algorithm implementation due to skipping a connection check between each pair of input dependence relations (see formula 7).
Because of the differences between our implementation of the Floyd–Warshall algorithm and that of Verdoolaege Verdoolaege et al. (2011); Verdoolaege (2012), in this paper we investigate only how different concepts of calculating the relation \(R^{*}_{kk}\) impact the time complexity of the Floyd–Warshall algorithm. For this purpose, we have chosen for calculating \(R^{*}_{kk}\) algorithms implemented in the ISL (http://www.kotnet.org/skimo/isl/) and Omega (http://www.cs.umd.edu/projects/omega/) libraries. Those algorithms are based on computing parametric powers \(R^{k}\) and then projecting out the parameter \(k\) by making it existentially quantified. As a trivial example, consider the relation \(R=\{[x]\rightarrow [x+1]\}\). The kth power of \(R\) for arbitrary k is \(R^{k}=\{[x]\rightarrow [x+k] \,|\, k\geqslant 1\}\) and the transitive closure is then \(R^{+}=\{[x]\rightarrow [y]\,|\, \exists k\in \mathbb {Z}_{\geqslant 1}:\,y=x+k\}=\{[x]\rightarrow [y]\,|\,y\geqslant x+1\}\). Both the algorithms consider the difference set \(\Delta \,R\) of the relation, but in the ISL library (http://www.kotnet.org/skimo/isl/) if all differences \(\Delta _{i}\)s are singleton sets, i.e., \(\Delta _{i}=\{\delta _{i}\}\) with \(\delta _{i}\in Z^{d}\), then \(R^{+}\) is calculated as follows:
$$\begin{aligned} R^{+}=\{x\longrightarrow y \mid \exists k_{i}\in \mathbb {Z}_{\geqslant 0} : y=x+\sum ^{}_{i}k_{i}\delta _{i}\,\wedge \sum ^{}_{i}k_{i}=k>0, \end{aligned}$$
(10)
which is essentially the same as that of Beletska et al. Beletska et al. (2009). If some of the \(\Delta _{i}\)s are parametric, then each offset \(\Delta _{i}\) is extended with an extra coordinate \(\Delta _{i}^{'}=\Delta _{i}\times \{1\}\), that is the constant equal to one. Paths constructed by summing such extended offsets are of length \(k\) encoded as the difference of their final coordinates, so \(R^{+}\) can then be decomposed into relations \(R_{i}^{+}\), one for each \(\Delta _{i}\) as follows
$$\begin{aligned} R^{+}\!=\!((R_{m}^{+}\cup I)\circ ...\circ (R_{2}^{+}\cup I)\circ (R_{1}^{+}\cup I))\!\cap \! \{x'\rightarrow y'\mid \exists k_{>0}: y_{d+1}\!-\!x_{d+1}=k\},\!\!\!\nonumber \\ \end{aligned}$$
(11)
with
$$\begin{aligned} R_{i}^{+}=s \mapsto \{x'\rightarrow y' \mid \exists k\in \mathbb {Z}_{\geqslant 1},\delta \in k\Delta _{i}^{'}(s): y'=x'+\delta \}. \end{aligned}$$
(12)
Each non-parametric constraint \(A_{1}x+c_{1}\geqslant 0\) of \(\Delta _{i}^{'}(s)\) from (11) is transformed into the form \(A_{1}x+kc_{1}\geqslant 0\) and the rest of constraints are rewritten without any changes. For more details see Verdoolaege et al. (2011); Verdoolaege (2012).

While the algorithms implemented by Verdoolaege Verdoolaege et al. (2011); Verdoolaege (2012) in the ISL library (http://www.kotnet.org/skimo/isl/) are designed to compute overapproximations, Kelly et all. Kelly et al. (1996) in the Omega library (http://www.cs.umd.edu/projects/omega/) propose a heuristic algorithm to compute an under-approximation that does not guarantee calculating exact transitive closure.

5 Experimental results

The goals of experiments were to evaluate the effectiveness and time complexity of the proposed approach for calculating relation \(R^{*}_{kk}\) and using it in the modified Floyd–Warshall algorithm for loops provided by the well-known NAS Parallel Benchmark (NPB) Suite from NASA (http://www.nas.nasa.gov) and compare received results with the effectiveness and time complexity of techniques implemented in the ISL (http://www.kotnet.org/skimo/isl/) and Omega (http://www.cs.umd.edu/projects/omega/) tools. We have implemented the presented algorithms as an ANSI-C++ software module. The source code of the module was compiled using the gcc compiler v4.3.0 and can be download from: http://www.sfs.zut.edu.pl/files/mfw-omega.tar.gz. Experiments were conducted using an Intel Core2Duo T7300@2.00GHz machine with the Fedora Linux v12 32bit operating system.

The implementation calculates transitive closure according to Algorithm 1 and permits to choose the three options for producing the relation \(R^{*}_{kk}\) by means of: (i) Algorithm 2, (ii) Omega, and (iii) ISL. Under our experiments, we have examined only such loops provided by NPB that expose dependences. There exist 75 imperfectly nested loops and 58 perfectly nested loops in NPB that expose dependences. The results of our experiments are collected in Table 1, where time is presented in seconds. The columns “Proposed algorithm”, “ISL”, and “Omega” present the time of calculating transitive closure by means of the Floyd–Warshall algorithm, where relations \(R^{*}_{kk}\) were calculated by means of applying Algorithm 2, the ISL, and Omega tools, respectively.
Table 1

The results of the experiments on the proposed approach to computing transitive closure

Source loop name

Number of relations

Proposed algorithm

ISL\(^1\)

Omega\(^2\)

ex

\(t\,[s]\)

ex

\(\Delta t\,[s]\)

ex

\(\Delta t\,[s]\)

Perfectly-nested loops

BT_error.f2p_5

31

1

0.2451

1

1.1081

1

2.4072

BT_initialize.f2p_8

3

1

0.0006

1

0.0017

1

0.0040

BT_initialize.f2p_9

1

1

0.0006

1

0.0009

1

0.0015

BT_rhs.f2p_1

46

1

2.1681

1

3.5402

1

4.1763

BT_rhs.f2p_5

128

1

1.3442

1

16.1829

1

4.1989

CG_cg.f2p_3

1

1

0.0007

1

0.0018

1

0.0014

CG_cg.f2p_4

10

1

0.0035

1

0.0061

1

0.0225

CG_cg.f2p_6

1

1

0.0004

1

0.0009

1

0.0019

CG_cg.f2p_8

1

1

0.0006

1

0.0013

1

0.0013

FT_auxfnct.f2p_1

1

1

0.0006

1

0.0006

1

0.0029

FT_auxfnct.f2p_2

1

1

0.0004

1

0.0005

1

0.0020

LU_HP_l2norm.f2p_2

9

1

0.0451

1

0.0829

1

1.1249

LU_HP_jacld.f2p_1

2634

1

74.0127

1

75.1268

1

120.5502

LU_HP_jacu.f2p_1

2634

1

74.0840

1

75.0540

1

108.8616

LU_HP_pintgr.f2p_11

4

1

0.0111

1

0.0170

1

0.0516

LU_HP_pintgr.f2p_2

109

1

0.1409

1

0.4151

1

0.3499

LU_HP_pintgr.f2p_3

6

1

0.0103

1

0.0167

1

0.0669

LU_HP_pintgr.f2p_7

6

1

0.0107

1

0.0195

1

0.0504

LU_jacld.f2p_1

2594

1

397.1930

1

538.9371

1

446.8944

LU_jacu.f2p_1

2594

1

397.0249

1

543.8912

1

421.9345

LU_l2norm.f2p_2

9

1

0.0445

1

0.0560

1

0.4220

LU_pintgr.f2p_11

6

1

0.0117

1

0.0165

1

0.0487

LU_pintgr.f2p_2

109

1

0.1405

1

0.4151

1

0.3238

LU_pintgr.f2p_3

6

1

0.0114

1

0.0233

1

0.0667

LU_pintgr.f2p_7

6

1

0.0118

1

0.0166

1

0.0602

MG_mg.f2p_1

1

1

0.0008

1

0.0058

1

0.0019

MG_mg.f2p_11

1

1

0.0006

1

0.0009

1

0.0027

MG_mg.f2p_12

1

1

0.0005

1

0.0006

1

0.0014

MG_mg.f2p_13

1

1

0.0001

1

0.0001

1

0.0019

MG_mg.f2p_4

1

1

0.0001

1

0.0001

1

0.0009

SP_error.f2p_5

31

1

0.1996

1

0.9823

1

2.4040

SP_initialize.f2p_8

3

1

0.0001

1

0.0002

1

0.0004

SP_ninvr.f2p_1

103

1

3.5246

1

26.9442

1

8.5589

SP_pinvr.f2p_1

103

1

3.5322

1

27.0151

1

8.6393

SP_rhs.f2p_1

64

1

4.8383

1

51.3303

1

8.2041

SP_rhs.f2p_5

127

1

1.3437

1

16.7552

1

4.2837

SP_txinvr.f2p_1

271

1

64.3626

1

328.6192

1

75.9909

SP_tzetar.f2p_1

288

1

51.7971

1

269.7633

1

63.2336

UA_adapt.f2p_2

8

1

0.0516

1

0.0760

1

0.2282

UA_diffuse.f2p_1

5

1

0.0787

1

0.1692

1

94.0265

UA_diffuse.f2p_2

3

1

0.0010

1

0.0021

1

0.0029

UA_mason.f2p_18

1

1

0.0011

1

0.0013

1

0.0057

UA_precond.f2p_5

30

0

0.5973

0

3.0276

0

9.8670

UA_setup.f2p_1

1

1

0.0002

1

0.0003

1

0.0008

UA_setup.f2p_16

3

1

0.0014

1

0.0029

1

0.0111

UA_setup.f2p_6

4

1

0.0039

1

0.0052

1

0.1116

UA_transfer.f2p_1

1

1

0.0003

1

0.0006

1

0.0009

UA_transfer.f2p_10

1

1

0.0012

1

0.0015

1

0.0019

UA_transfer.f2p_13

1

1

0.0049

1

0.0013

1

0.0008

UA_transfer.f2p_15

1

1

0.0006

1

0.0016

1

0.0009

UA_transfer.f2p_18

1

1

0.0004

1

0.0014

1

0.0009

UA_transfer.f2p_2

1

1

0.0032

1

0.0065

1

0.0009

UA_transfer.f2p_3

1

1

0.0012

1

0.0012

1

0.0013

UA_transfer.f2p_5

1

1

0.0031

1

0.0046

1

0.0065

UA_transfer.f2p_6

1

1

0.0032

1

0.0037

1

0.0057

UA_transfer.f2p_7

1

1

0.0024

1

0.0029

1

0.0081

UA_transfer.f2p_8

1

1

0.0017

1

0.0035

1

0.0058

UA_transfer.f2p_9

1

1

0.0040

1

0.0015

1

0.0099

Imperfectly-nested loops

BT_error.f2p_2

107

1

2.9712

1

8.8938

1

9.1521

BT_error.f2p_3

6

1

0.0038

1

0.0069

1

0.0071

BT_error.f2p_6.t

6

1

0.0036

1

0.0082

1

0.0069

BT_exact_rhs.f2p_2

1553

0

32.2311

0

61.3898

0

73.3091

BT_exact_rhs.f2p_3

1553

0

31.9385

0

68.5866

0

74.3960

BT_exact_rhs.f2p_4

1553

0

32.1856

0

61.1335

0

73.9969

BT_initialize.f2p_2

42

1

0.2836

1

0.3948

1

3.0076

BT_initialize.f2p_3

42

1

0.2888

1

0.3964

1

2.9983

BT_initialize.f2p_4

42

1

0.2882

1

0.3934

1

3.0522

BT_initialize.f2p_5

42

1

0.3139

1

0.4283

1

3.0035

BT_initialize.f2p_6

42

1

0.3181

1

0.3944

1

3.0082

BT_initialize.f2p_7

42

1

0.2986

1

0.3966

1

3.0189

BT_rhs.f2p_3

702

0

26.1909

0

268.7525

0

38.5441

BT_rhs.f2p_4

510

0

16.4426

0

236.8264

0

26.1494

CG_cg.f2p_7

2

1

0.0017

1

0.0022

1

0.0317

LU_blts.f2p_1

4885

1

3632.8071

1

4267.0205

1

5078.6317

LU_buts.f2p_1

5640

1

4010.8654

1

5673.0981

1

5612.8839

LU_erhs.f2p_2

66

1

0.1669

1

0.5987

1

5.9636

LU_erhs.f2p_3

640

0

72.3339

0

164.4464

0

107.5848

LU_erhs.f2p_4

640

0

74.6972

0

192.2292

0

104.2774

LU_erhs.f2p_5

640

0

32.5497

0

237.4519

0

58.9116

LU_HP_blts.f2p_1

3232

0

216.5695

0

216.8695

0

218.7895

LU_HP_buts.f2p_1

3593

0

250.4280

0

447.2031

0

267.1930

LU_HP_erhs.f2p_2

66

1

0.1640

1

0.8398

1

6.4393

LU_HP_erhs.f2p_3

640

0

72.5601

0

263.6080

0

115.7859

LU_HP_erhs.f2p_4

640

0

74.5099

0

262.0617

0

116.0602

LU_HP_erhs.f2p_5

640

0

32.9287

0

236.9649

0

57.8919

LU_HP_rhs.f2p_1

17

1

0.2142

1

1.5149

1

1.2455

LU_HP_rhs.f2p_2

640

0

72.5522

0

387.5539

0

115.3880

LU_HP_rhs.f2p_3

640

0

74.3030

0

262.0265

0

115.4032

LU_HP_rhs.f2p_4

640

0

32.4699

0

237.7602

0

57.5956

LU_rhs.f2p_1

17

1

0.2175

1

1.5029

1

1.2170

LU_rhs.f2p_2

640

0

71.9027

0

279.4004

0

115.5498

LU_rhs.f2p_3

640

0

73.6644

0

277.5648

0

114.4854

LU_rhs.f2p_4

1412

0

199.7893

0

968.4744

0

354.9285

MG_mg.f2p_10

18

1

0.0041

1

0.0043

1

0.0047

MG_mg.f2p_3

3

1

0.0001

1

0.0002

1

0.0001

MG_mg.f2p_5

24

0

0.6285

0

0.7923

0

2.1224

MG_mg.f2p_6

29

0

0.9173

0

0.9739

0

2.1649

MG_mg.f2p_7

510

1

2.0639

1

17.9808

1

5.5680

MG_mg.f2p_8

55

0

2.2999

0

2.3069

0

7.1395

MG_mg.f2p_9

18

1

0.0036

1

0.0043

1

0.0047

SP_error.f2p_2

107

1

2.4962

1

9.2583

1

9.1577

SP_error.f2p_3

6

1

0.0039

1

0.0043

1

0.0081

SP_error.f2p_6

6

1

0.0013

1

0.0014

1

0.0073

SP_exact_rhs.f2p_2

1553

0

32.0930

0

97.8048

0

81.8412

SP_exact_rhs.f2p_3

1553

0

32.1471

0

106.6423

0

81.0354

SP_exact_rhs.f2p_4

1553

0

32.2977

0

102.4652

0

81.5785

SP_initialize.f2p_2

24

1

0.2242

1

0.5455

1

3.0368

SP_initialize.f2p_3

24

1

0.2234

1

0.3989

1

3.1722

SP_initialize.f2p_4

24

1

0.2214

1

0.3971

1

3.0657

SP_initialize.f2p_5

24

1

0.2239

1

0.3943

1

3.0336

SP_initialize.f2p_6

24

1

0.2216

1

0.4103

1

3.0342

SP_initialize.f2p_7

24

1

0.2227

1

0.3936

1

3.0376

SP_rhs.f2p_3

699

1

10.7808

1

231.7330

1

20.0821

SP_rhs.f2p_4

507

1

14.1710

1

156.5537

1

23.3080

UA_adapt.f2p_1

10

1

0.0469

1

0.0640

1

0.0930

UA_adapt.f2p_10

14

1

0.0136

1

0.0164

1

0.0264

UA_adapt.f2p_11

11

1

0.0134

1

0.0162

1

0.0306

UA_adapt.f2p_9

14

1

0.0058

1

0.0163

1

0.0256

UA_diffuse.f2p_3

1

1

0.0004

1

0.0005

1

0.0009

UA_diffuse.f2p_4

1

1

0.0015

1

0.0042

1

0.0147

UA_diffuse.f2p_5

1

1

0.0013

1

0.0039

1

0.0018

UA_precond.f2p_3

1

1

0.0009

1

0.0016

1

0.0017

UA_precond.f2p_4

1

1

0.0003

1

0.0003

1

0.0005

UA_setup.f2p_14

31

1

0.2973

1

0.9474

1

0.3562

UA_setup.f2p_15

15

1

0.2610

1

0.3649

1

0.2836

UA_transfer.f2p_11

6

1

0.0048

1

0.0049

1

0.0260

UA_transfer.f2p_12

7

1

0.0146

1

0.0044

1

0.0249

UA_transfer.f2p_14

8

1

0.0240

1

0.0373

1

0.0780

UA_transfer.f2p_16

4

1

0.0018

1

0.0033

1

0.0150

UA_transfer.f2p_17

17

1

0.0238

1

0.0354

1

0.0867

UA_transfer.f2p_19

4

1

0.0030

1

0.0033

1

0.0122

UA_transfer.f2p_4

3

1

0.0047

1

0.0018

1

0.0132

UA_utils.f2p_12

20

1

0.1816

1

0.1635

1

0.7387

(ex: 1—exact result, 0—over-approximation; \(\Delta t\): difference between the transitive closure calculation time of a known correspondent technique and that of the presented approach)

\(^\mathrm{a}\) Integer Set Library—a library for manipulating sets and relations of integer points bounded by affine constraints (available at http://repo.or.cz/w/isl.git)

\(^\mathrm{b}\) Omega Project—frameworks and algorithms for the analysis and transformation of scientific programs (available at http://www.cs.umd.edu/projects/omega/)

Analyzing the results presented in Table 1, we can derive the following conclusions. All techniques under experiments are able to calculate transitive closure for all NBP loops exposing dependences. The exactness of the presented approach is the same as that of techniques implemented in Omega and ISL. i.e., all techniques under experiments produce exact transitive closure for the same loops. Calculating relation \(R^{*}_{kk}\) by means of Algorithm 2 is less time-consuming in comparison with techniques implemented in Omega and ISL that reduces the time of calculating the transitive closure of a relation describing all the dependences in the loop by means of the Floyd–Warshall’s algorithm. For all loops, we obtained the shortest time of producing transitive closure.

The explanation is that each relation \(R_{kk}^{*}\) that we compose in Algorithm 1 (line 10) consists of two relations, \(R_{kk}^{+}\cup I\). If there are \(m\) disjuncts in the input relation, \(R_{kk}\), then the direct application of the composition operation just like in formula (11) may therefore result in a relation with \(2^{m}\) disjuncts that is computationally expensive. In general, applying formula (7) results in the number of disjuncts that is much less than \(2^m\). This permits us to conclude that the presented approach is faster than other well-known approaches.

6 Conclusion

In this paper, we presented a modified Floyd–Warshall algorithm, where the most time consuming part (transitive closure describing self-dependences in the program loop) is calculated by means of basis dependence distance vectors. We demonstrated how to calculate basis dependence distance vectors for parameterized program loops and how to apply them to calculate the transitive closure of a dependence relation describing all self-dependences among the instances of a given loop statement by means of basis dependence distance vectors.

This solution results in reducing the time of the transitive closure calculation of parameterized graphs representing dependences in program loops. Reducing this time is due to using a finite integral linear combination of basis dependence distance vectors to calculate the \(R_{kk}^{*}\) term in a modified Floyd–Warshall algorithm. Reducing the time of the transitive closure calculation was proved by means of numerous experiments with NPB benchmarks. The presented approach can be used for resolving many optimizing compilers problems: redundant synchronization removal (Presburger 1927), testing the legality of iteration reordering transformations (Presburger 1927), iteration space slicing (Beletska et al. 2011), forming schedules for statement instances of program loops (Bielecki et al. 2012). In our future work we plan to study the application of the presented approach for extracting both coarse- and fine-grained parallelism for different popular benchmarks.

References

  1. Ancourt C, Coelho F, Irigoin F (2010) A modular static analysis approach to affine loop invariants detection. Electron Notes Theor Comput Sci 267:3–16CrossRefGoogle Scholar
  2. Beletska A, Barthou D, Bielecki W, Cohen A (2009) Computing the transitive closure of a union of affine integer tuple relations. In: COCOA09. Springer, Berlin, pp. 98–109Google Scholar
  3. Beletska A, Bielecki W, Cohen A, Palkowski M, Siedlecki K (2011) Coarse-grained loop parallelization: iteration space slicing vs affine transformations. Parallel Comput 37(8):479–497CrossRefGoogle Scholar
  4. Bielecki W, Kraska K, Klimek T (2013) Transitive closure of a union of dependence relations for parameterized perfectly-nested loops. In: Malyshkin V et al. (eds) PaCT-2013, LNCS, vol 7979. Springer, Heidelberg, pp 37–50Google Scholar
  5. Bielecki W, Palkowski M, Klimek T (2012) Free scheduling for statement instances of parameterized arbitrarily nested affine loops. Parallel Comput 38:518–532. http://dx.doi.org/10.1016/j.parco.2012.06.001
  6. Boigelot B (1998) Symbolic methods for exploring infinite state spaces. Ph.D. thesis, Université de LiègeGoogle Scholar
  7. Bozga M, Girlea C, Iosif R (2009) Iterating octagons. ETAPS 2009, TACAS’09. Springer, New York, pp 337–351Google Scholar
  8. Cohen E, Megiddo N (1991) Recognizing properties of periodic graphs, DIMACS series in discrete mathematics and theoretical computer science, vol 4. American Mathematical Society, pp 135–146Google Scholar
  9. Deng X, Dymond P (1998) On multiprocessor system scheduling. J Comb Optim 1(4):377–392Google Scholar
  10. Diestel R (2010) Graph theory, 4th edn. Springer, New YorkCrossRefGoogle Scholar
  11. Eve J, Kurki-Suonio R (1977) On computing the transitive closure of a relation. Acta Informatica 25. X 8(4):303–314zbMATHMathSciNetCrossRefGoogle Scholar
  12. Feautrier P (2012) Approximating the transitive closure of a Boolean-Affine relation, IMPACT 2012 second international workshop on polyhedral compilation techniques, Paris, France, http://impact.gforge.inria.fr/impact2012/
  13. Griebl M (2004) Automatic parallelization of loop programs for distributed memory achitectures. Fakultät für Mathematik und Informatik Universität Passau, HabilitationGoogle Scholar
  14. Hollermann L, Tsan-sheng H, Lopez D, Vertanen K (1997) Scheduling problems in a practical allocation model. J Comb Optim 1(2):129–149MathSciNetCrossRefGoogle Scholar
  15. Integer Set Library, http://www.kotnet.org/skimo/isl/
  16. Kelly W, Maslov V, Pugh W, Rosser E, Shpeisman T, Wonnacott D (1996) New User Interface for Petit and Other Extensions. User GuideGoogle Scholar
  17. Kelly W, Pugh W, Rosser E, Shpeisman T (1996) Transitive closure of infinite graphs and its applications. LCPC’95, Columbus, Ohio, vol 1033. Springer, New York, pp 126–140Google Scholar
  18. NASA Advanced Supercomputing Division, http://www.nas.nasa.gov
  19. Presburger M (1927) Über de vollständigkeit eines gewissen systems der arithmetik ganzer zahlen, in welchen, die addition als einzige operation hervortritt. In: Comptes Rendus du Premier Congrès des Mathématicienes des Pays Slaves, 395, Warsaw, pp 92–101Google Scholar
  20. Rotman J (2003) Advanced modern algebra, 2nd edn. Prentice Hall, Upper Saddle RiverGoogle Scholar
  21. Schrijver A (1999) Theory of linear and integer programming. Series in Discrete MathematicsGoogle Scholar
  22. Shoup V (2005) A computational Introduction to number theory. Cambridge University Press, New YorkzbMATHCrossRefGoogle Scholar
  23. Skiena S (2008) The algorithm design manual, 2nd edn. Springer, BerlinzbMATHCrossRefGoogle Scholar
  24. Verdoolaege S (2012) Integer set library—manual, Tech. rep. 2012, Version: isl-0.11 www.kotnet.org/skimo/isl/manual
  25. Verdoolaege S, Cohen A, Beletska A (2011) Transitive closures of affine integer tuple relations and their overapproximations. SAS, pp 216–232Google Scholar

Copyright information

© The Author(s) 2014

Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Authors and Affiliations

  • Włodzimierz Bielecki
    • 1
  • Krzysztof Kraska
    • 1
  • Tomasz Klimek
    • 1
  1. 1.Faculty of Computer Science and Information TechnologyWest Pomeranian University of TechnologySzczecinPoland

Personalised recommendations