1 Introduction

Resolving many problems is based on calculating transitive closures of graphs Diestel (2010). In this paper, we deal with parameterized graphs whose number of vertices is represented with an expression including structure parameters. Such graphs can be represented by parameterized relations whose tuples represent vertices while constraints are responsible for defining edges Kelly et al. (1996). Transitive closure calculated for such relations can be used in optimizing compilers: to remove redundant synchronization Kelly et al. (1996), test the legality of iteration reordering transformations Kelly et al. (1996), apply iteration space slicing Beletska et al. (2011), form schedules for statement instances of program loops Bielecki et al. (2012); Hollermann et al. (1997); Deng et al. (1998). In general, calculating transitive closure of parameterized graphs is time-consuming Kelly et al. (1996); Beletska et al. (2009); Verdoolaege et al. (2011). Sometimes the time of transitive closure calculation prevents applying techniques for extracting coarse- and fine-grained parallelism because this time is not acceptable in practice (several hours and even several days Beletska et al. (2011); Bielecki et al. (2012)). This is why improving transitive closure calculation algorithms aimed at reducing their time complexity is an actual task.

The contributions of this paper over previous work are as follows:

  • demonstration of how to calculate basis dependence distance vectors for parameterized program loops;

  • proposition of a way of calculating the transitive closure of a dependence relation describing all self-dependences among the instances of a given loop statement by means of basis dependence distance vectors;

  • suggestion to apply the transitive closure of a dependence relation describing all self-dependences among the instances of a given loop statement by means of basis dependence distance vectors in a modified Floyd–Warshall algorithm with the aim of reducing the calculation time of the transitive closure of a dependence graph representing all the dependences in a given program loop;

  • development of an open source software implementing presented solutions and permitting for producing the transitive closure of a dependence graph describing all the dependences for the input program loop by means of the modified Floyd–Warshall algorithm;

  • an evaluation of the effectiveness and efficiency of the presented algorithms and a comparison of them with those yielded by related work.

In this paper, we demonstrate how to reduce the time of calculating transitive closure describing self-dependences. For this purpose, we propose to find basis dependence distance vectors from all distance vectors describing self-dependences and then demonstrate how these vectors can be used for calculating transitive closure. For extracting such distance vectors, dependence relations, returned by means of a dependence analyzer, are used. Such relations (describing self-dependences) are characterized by the same arity (the number of tuple elements) of input and output tuples. Finaly, we present experimental results showing how the time of transitive closure calculation is reduced for NAS benchmarks (http://www.nas.nasa.gov).

The rest of the paper is organized as follows. Section 2 introduces background. Section 3 presents an approach to calculate the transitive closure of a parameterized dependence graph. Section 4 describes related work. Section 5 presents results of an experimental study. Section 6 draws conclusions and briefly outlines future research.

2 Background

In this section, we briefly introduce basic definitions which are used throughout this paper.

The following concepts of linear algebra are used in the approach presented in this paper: vector, vector space, field, integral linear combination, linear independence. Details can be found in book Schrijver (1999).

Definition 1

(Integer Lattice) Let {\(a_{1},a_{2},...,a_{m}\)} be a set of linearly independent integer vectors. The set \(\Lambda =\{\lambda _{1}a_{1}+\lambda _{2}a_{2}+...+\lambda _{m}a_{m}\mid \lambda _{1},...,\lambda _{m}\in \mathbb {Z}\}\) is called an integer lattice generated by the basis {\(a_{1},a_{2},...,a_{m}\)}.

Definition 2

(Basis) A basis \(B\) of an integer lattice \(\Lambda \) over field \(\mathbb {Z}\) is a linearly independent subset of \(\Lambda \) that generates \(\Lambda \). Every finite-dimensional vector space \(\Lambda \) has a basis Shoup (2005).

Definition 3

(Presburger Arithmetic, Presburger Formula) We define Presburger arithmetic to be the first-order theory over atomic formulas of the form:

$$\begin{aligned} \sum ^{i=1}_{n}a_{i}x_{i} \sim c, \end{aligned}$$
(1)

where \(a_{i}\) and \(c\) are integer constants, \(x_{i}\) are variables ranging over integers, and \(\sim \) is an operator from \(\left\{ =,\ne , <,\le ,>,\ge \ \right\} \). A formula \(f\) is an atomic formula (1) or it is constructed from formulas \(f_{1}\) and \(f_{2}\) recursively as follows Presburger (1927):

$$\begin{aligned} f {::} = \lnot f_{1}|f_{1}\wedge f_{2}|f_{1}\vee f_{2}. \end{aligned}$$

In this paper, we deal with the following definitions concerned program loops: iteration vector, loop domain (index set), parameterized loops, perfectly-nested loops, details can be found in papers Griebl (2004).

Definition 4

(Structure Parameters) Structure parameters are integer symbolic constants, generally defining array size, iteration bounds, etc. Structure parameters may be defined once in the prologue of the program, and may not be modified elsewhere.

Definition 5

(Iteration Vector) For a given statement \(S\) in a loop, the iteration vector \(\overrightarrow{v}=(i_{1},...,i_{n})^{T}\) is the vector of the surrounding loop counters.

Definition 6

(Iteration Space) The iteration space of a given statement \(S\) in a given loop nest is a set of values taken by its iteration vector when executing the loop nest.

Definition 7

(Dependence) Two statement instances \(S_{1}(\overrightarrow{I})\) and \(S_{2}(\overrightarrow{J})\), where \(\overrightarrow{I}\) and \(\overrightarrow{J}\) are the iteration vectors, are dependent if both access the same memory location and if at least one access is a write.

Definition 8

(Dependence Distance Set, Dependence Distance Vector) We define a dependence distance set \(\Delta _{S,T}\) as a set of differences between all such vectors of the same size that stand for a pair of dependent instances of statement \(T\) and \(S\). We call each element of set \(\Delta _{S,T}\) a (dependence) distance vector and denote it as \(\delta _{S,T}\).

Definition 9

(Uniform Dependence, Non-Uniform Dependence) If each coordinate of vector \(\delta _{S,T}\) (see Definition 8) is constant, then we call a corespondent dependence uniform, otherwise it is non-uniform. Griebl (2004).

Definition 10

(Reduced Dependence Graph) A Reduced Dependence Graph (\(RDG\)) is the graph where a vertex stands for every statement \(S\) and an edge connects statements \(S\) and \(T\) whose instances are dependent. The number of edges between vertices \(S\) and \(T\) is equal to the number of dependence relations \(R_{S,T}\).

Definition 11

(Uniform Loop, Quasi-Uniform Loop) We say that a parameterized loop is uniform if it induces dependences represented by the finite number of uniform dependence distance vectors Griebl (2004). A parameterized loop is quasi-uniform if all its dependence distance vectors can be represented by an integral linear combination of the finite number of linearly independent vectors with constant coordinates.

Let us consider the parameterized dependence distance vector \((N,2)\). It can be represented as \((0,2)+a\times (1,0)\), where \(a\ge 1, a\in \mathbb {Z}\) (see Fig. 1).

Fig. 1
figure 1

The parameterized vector \((N,2)\) represented as the integral linear combination of the two linearly independent vectors with constant coordinates

Definition 12

(Dependence Relation) A dependence relation is a tuple relation of the form \(\left\{ [input\_list] \rightarrow [output\_list]: constraints\right\} \), where \(input\_list\) and \(output\_list\) are the lists of variables used to describe input and output tuples and constraints is a Presburger formula describing the constraints imposed upon \(input\_list\) and \(output\_list\).

The general form of a dependence relation is as follows Kelly et al. (1996):

$$\begin{aligned} R = \left\{ [s_{i}, \ldots , s_{k}] \rightarrow [t_{i}, \ldots , t_{k}]: \bigvee ^{n}_{i=1} \exists \alpha _{i1}, \ldots \ ,\alpha _{im_{i}}\; \text {s.t.}\; F_{i}\right\} , \end{aligned}$$

where \(F_{i}, i = 1,2,\ldots ,n\) are represented by Presburger formulas, i.e., they are conjunctions of affine equalities and inequalities on the input variables \(s_{1},\ldots , s_{k}\), the output variables \(t_{1},\ldots , t_{k}\), the existentially quantified variables \(\alpha _{i1}, \ldots \ ,\alpha _{im_{i}}\), and symbolic constants.

Definition 13

(Domain of a Relation) Let \(R\in Z^{n}\rightarrow Z^{m}\) be a relation. The domain of \(R\), \(dom\;R\), is represented with the following set

$$\begin{aligned} dom\;R\,:=\,s\rightarrow \left\{ x_{1}\in Z^{n}\,|\,\exists \,x_{2}\in Z^{m}\,:\,x_{2}=R(x_{1})\right\} . \end{aligned}$$

Definition 14

(Range of a Relation) Let \(R\in Z^{n}\rightarrow Z^{m}\) be a relation. The range of \(R\), \(ran\;R\), is calculated as follows

$$\begin{aligned} ran\;R\,:=\,s\rightarrow \left\{ x_{2}\in Z^{m}\,|\,\exists \,x_{1}\in Z^{n}\,:\,x_{1}=R^{-1}(x_{2})\right\} . \end{aligned}$$

Definition 15

(Positive Transitive Closure) Let \(R\) be an affine integer tuple relation, then the positive transitive closure \(R^{+}\) of \(R\) is the union of all positive powers of \(R\),

$$\begin{aligned} R^{+}=\bigcup _{k\geqslant 1}^{} R^{k},\quad \mathrm {with}\quad R^{k}= \left\{ \begin{array}{ll} R &{} if \;k=1 \\ R\circ R^{k-1} &{} if \;k\geqslant 2.\\ \end{array}\right. \end{aligned}$$
(2)

Definition 16

(Transitive Closure) Transitive closure, \(R^{*}\), is defined as follows Kelly et al. (1996):

$$\begin{aligned} R^{*}=R^{+}\cup I, \end{aligned}$$

where \(I\) is the identity relation. \(R^{*}\) describes the same connections in a dependence graph (represented by \(R\)) that \(R^{+}\) does plus connections of each vertex with itself.

To check whether output returned by an algorithm represents exact transitive closure, we can use the well-known fact Kelly et al. (1996) that for an acyclic relation \(R\) (for such a relation \(R \cap I = \varnothing \), where \(I\) is the identity relation) the following is true:

  • if \(R^{+}\) is exact transitive closure, then:

    $$\begin{aligned} R^{+} = R \cup \left( R\circ R^{+}\right) , \end{aligned}$$
  • if \(R^{+}\) is an over–approximation, then:

    $$\begin{aligned} R^{+} \nsubseteq \ R \cup \left( R\circ R^{+}\right) . \end{aligned}$$

In the next section, we analyse the time complexity of the proposed approach in a machine-independent way to asses the performance of algorithms. For this purpose, the RAM (Random Access Machine) model of computation is used. Under the RAM model, we measure time complexity by counting up an upper bound, \(O\), on the number of steps that an algorithm takes for a given problem. Details on the model and the time complexity analysis can be found in paper Skiena (2008).

3 Approach to computing transitive closure

The goal of the algorithm presented below is to calculate the transitive closure of a dependence relation describing all the dependences in the arbitrary-nested loop.

3.1 Floyd–Warshall algorithm

To compute the transitive closure of a dependence relation representing all the dependences exposed for an arbitrarily nested loop, we use a modified form of the Floyd–Warshall (F-W) algorithm (see Algorithm 1).The idea of the F-W algorithm is the following. Let \(^{\underrightarrow{*}} \) denote a direct or transitive path between a pair of vertices in a dependence graph, whose intermediate vertices come from a specific set \(S\). If the graph contains paths \(v^{\underrightarrow{*}}w\) and \(w^{\underrightarrow{*}}u\), then the graph also contains a path \(v^{\underrightarrow{*}}u\) such that its intermediate vertices come from the set \(S\cup \left\{ w \right\} \). F-W’s algorithm iterates from 1 to \(n\) where \(n\) is the total number of statements in the loop and in the k-th iteration it takes into account the paths whose intermediate vertices come from the set \(\left\{ v_{1}, ..., v_{k-1} \right\} \).

figure a

The most time-consuming part in Algorithm 1 is the expression \(D_{kj}\circ R_{kk}^{*}\circ D_{ik}\), where ‘\(\circ \)‘ denotes the composition operator applied to a pair of relations, \(D_{ik}\) describes all the dependences between instances of statement \(s_{i}\) and statement \(s_{k}\). This means that if there is a dependence from iteration \(i_{1}\) of statement \(s_{i}\) to iteration \(i_{2}\) of statement \(s_{k}\) and a chain of self-dependences from iteration \(i_{2}\) to iteration \(i_{3}\), \(R_{kk}^{*}\), and finally a dependence from iteration \(i_{3}\) of statement \(s_{k}\) to iteration \(i_{4}\) of statement \(s_{j}\) (where \(D_{kj}\) describes all dependences between instances of statement \(s_{k}\) and statement \(s_{j}\)), then there is a transitive dependence from iteration \(i_{1}\) to iteration \(i_{4}\). It should be clear that the objective of this technique is to update all the dependences through statements 1,2,...,n in an iteration of each k-loop. In Sect. 3.3, we suggest calculating \(R_{kk}^{*}\) using a finite integral linear combination of basis dependence distance vectors Bielecki et al. (2013).

3.2 Replacing the parameterized vector with an integral linear combination of constant vectors

To find constant vectors whose integral linear combination represents the parameterized vector, we can apply the following theorem.

Theorem 1

Let \(v_{p}\) be a vector in \(Z^{d}\) and \(p_{i}\) are its parameterized coordinates in the i-positions. We may replace vector \(v_{p}\) with an integral linear combination of a constant vector \(v_{c}\), \(v_{c}\in \mathbb {Z}^{d}\), and unit normal vectors \(e_{i}\), \(e_{i}\in \mathbb {Z}^{d}\), where \(p_{i}\) are integer parametric coefficients, as follows:

$$\begin{aligned} v_{p}=v_{c}+\sum _{i}p_{i}\times e_{i}. \end{aligned}$$
(3)

Proof

Without loss of generality, we may assume that the first \(n\) positions of \(v_{p}\) have constant coordinates and the last \(q\) positions have parameterized ones. Then, we can write:

$$\begin{aligned} \left( \begin{array}{c} c_{1} \\ \vdots \\ c_{n} \\ p_{n+1} \\ \vdots \\ p_{d} \\ \end{array} \right) = \left( \begin{array}{c} c_{1} \\ \vdots \\ c_{n} \\ 0 \\ \vdots \\ 0 \\ \end{array} \right) + \left( \begin{array}{c} 0 \\ \vdots \\ 0 \\ p_{n+1} \\ \vdots \\ p_{d} \\ \end{array} \right) , \end{aligned}$$
(4)

where here and further \(d-n=q\), the second vector can be written as the integral linear combination of unit normal vectors \(e_{k}\) and parameterized coefficients \(p_{n+1},\ldots , p_{d}\) in the last \(d\) positions:

$$\begin{aligned} \left( \begin{array}{c} 0 \\ \vdots \\ 0 \\ p_{n+1} \\ \vdots \\ p_{d} \\ \end{array} \right) \!=\! \left( \begin{array}{c} 0 \\ \vdots \\ 0 \\ p_{n+1} \\ \vdots \\ 0 \\ \end{array} \right) \!+\!\ldots \!+\!\ \left( \begin{array}{c} 0 \\ \vdots \\ 0 \\ 0 \\ \vdots \\ p_{d} \\ \end{array} \right) \!=\! p_{n+1} \times \left( \begin{array}{c} 0 \\ \vdots \\ 0 \\ 1 \\ \vdots \\ 0 \\ \end{array} \right) \!+\!\ldots \!+\!\ p_{d} \times \left( \begin{array}{c} 0 \\ \vdots \\ 0 \\ 0 \\ \vdots \\ 1 \\ \end{array} \right) \! . \end{aligned}$$
(5)

Substituting (5) into (4), we obtain:

$$\begin{aligned} \left( \begin{array}{c} c_{1} \\ \vdots \\ c_{n} \\ p_{n+1} \\ \vdots \\ p_{d} \\ \end{array} \right) = \left( \begin{array}{c} c_{1} \\ \vdots \\ c_{n} \\ 0 \\ \vdots \\ 0 \\ \end{array} \right) + p_{n+1} \times e_{n+1} +\ldots +\ p_{d} \times e_{d} . \end{aligned}$$
(6)

It is obvious that if \(v_{c}=\varvec{0}\), then \(v_{c}\) can be rejected without affecting the result. \(\square \)

3.3 Algorithm for computing transitive closure

The idea of the algorithm presented in this section is the following. Given a set \(\Delta _{S,T}\) of \(m\) dependence distance vectors in the \(n\)-dimensional integer space derived from a union of dependence relations \(R_{kk}\), \(k=1,2,..., q\), where \(q\) is the number of loop statements (it describes a chain of self-dependences of statement \(s_{k}\) in the loop), we first replace all parameterized vectors with constant vectors using Theorem 1 from Sect. 3.2.

As a result, we get, \(k\), \(k\ge m\), dependence distance vectors with constant coordinates. This allows us to get rid of parameterized vectors and to form an integer matrix \(A\), \(A \in \mathbb {Z}^{n \times k}\), by inserting dependence distance vectors with constant coordinates into columns of \(A\) that generate integer lattice \(\Lambda \).

To decrease the complexity of further computations, redundant dependence distance vectors are eliminated from matrix \(A\) by finding a subset of, \(l\), \(l \le k\), linearly independent columns of \(A\). This subset of dependence distance vectors forms the basis \(B \in \mathbb {Z}^{n \times l}\) of \(A\) and generates the same integer lattice \(\Lambda \) as \(A\) does Schrijver (1999). Every element of integer lattice \(\Lambda \) can be expressed uniquely as a finite integral linear combination of the basis dependence distance vectors belonging to \(B\).

After \(B\) is completed, we can work out relations \(R_{kk}^{*}\), \(k=1,2,..,q\), representing the exact transitive closure \(R_{kk}\) or its over-approximation. For each vertex \(x\) in the data dependence graph (where \(x\) is the source of a dependence, \(x \in dom\;R_{kk}\), we can identify all vertices \(y\) (the destination(s) of a dependence(s), \(y \in ran\;R_{kk}\) that are connected with \(x\) by a path of length equal or more than \(1\), where \(y\) is calculated as \(x\) plus an integral linear combination of the basis dependence distance vectors \(B\), i.e. \(y = x + B \times \lambda \), \(\lambda \in \mathbb {Z}^{l}\). The part \(B \times \lambda \) of the formula represents all possible paths in the dependence graph, represented by relation \(R_{kk}\), connecting \(x\) and \(y\). Moreover, we have to preserve the lexicographic order for \(y\) and \(x\), i.e. \(y-x\succ 0\). Below, we present the algorithm in a formal way.

Algorithm 2. Calculating the transitive closure of a dependence relation \(R_{kk}\) describing a chain of self-dependences of a loop statement.

Input: :

Dependence distance set \(\Delta ^{n \times m}= \delta _{1},\delta _{2}, \ldots , \delta _{m}\), where \(m\) is the number of \(n\)-dimensional dependence distance vectors.

Output: :

Exact transitive closure of relation \(R_{kk}\) or its over-approximation.

Method:

  1. 1.

    Replace each parameterized dependence distance vector in \(\Delta ^{n \times m}\) with an integral linear combination of vectors with constant coordinates. For this purpose apply Theorem 1 presented in Sect. 3.2.

  2. 2.

    Using all constant dependence vectors, form matrix \(A \in \mathbb {Z}^{n \times k}\), \(k \ge m\).

  3. 3.

    Extract a finite subset of, \(l\), \(l\le k\), linearly independent columns from matrix \(A \in \mathbb {Z}^{n \times k}\) over field \(\mathbb {Z}^{n}\) and form matrix \(B^{n \times l}\), representing the basis of the dependence distance vectors set, where linearly independent vectors are represented with columns of matrix \(B^{n \times l}\). For this purpose apply the Gaussian elimination algorithm Shoup (2005); Rotman (2003).

  4. 4.

    Calculate relation \(R_{kk}^{*}\) representing the exact transitive closure of a dependence relation, describing all the dependences in the input loop, or its over-approximation, as follows:

    $$\begin{aligned} R_{kk}^{*} = \left\{ \begin{array}{l} \left[ x \right] \rightarrow \left[ y \right] \mid \exists \lambda \ s.t.\;y \!=\! x \!+\! B^{n \times l} \times \lambda \;\wedge \; y-x\succ 0, \;\lambda \in \mathbb {Z}^{l} \;\wedge \\ \qquad \qquad \qquad \quad \quad \;\, \wedge \; y \in ran\;\; R_{kk}\;\wedge \; x \in dom\;\; R_{kk} \end{array} \right\} \cup I, \nonumber \\ \end{aligned}$$
    (7)

    where : \(R^{*}_{kk}\) describes a chain of self dependences of loop statement \(s_{k}\), \(B^{n \times l} \times \lambda \) represents an integral linear combination of the basis dependence distance vectors \(\delta _{i}\) (the columns of \(B^{n \times l}\), \(1 \le i \le l)\), \(y-x\succ 0\) imposes the lexicographically forward constraints on the tuples of \(R_{kk}^{*}\), \(I\) is the identity relation.

The resulting relation \(R_{kk}^{*}\) represents the exact transitive closure of relation \(R_{kk}\) or its over-approximation Feautrier (2012). To prove this, let us note that relation \(R_{kk}^{+}\) represents all possible paths between vertices \(x\) (standing for dependence sources, \(x \in dom\;\, R_{kk}\)) and vertices \(y\) (standing for dependence destinations, \(y \in ran\;\, R_{kk}\)) in the dependence graph, represented with relation \(R_{kk}\). Indeed, an integral linear combination of the basis dependence distance vectors \(B^{n \times l} \times z\):

  • reproduces all dependence distance vectors exposed for the loop,

  • describes all existing (true) paths between any pair of \(x\) and \(y\) as an integral linear combination of all dependence distance vectors exposed for the loop,

  • can describe not existing (false) paths in the dependence graph represented by relation \(R^{*}_{kk}\).

There are two cases when false (not existing) paths may occur. The first one arises due the fact that in an integral linear combination of linearly independent columns of matrix \(A\) some coefficients can be negative. Let us consider the matrix \(A\),

$$\begin{aligned} A=\left[ \begin{array}{ccccc} 3 &{} 3 &{} 2 &{} \fbox {5} &{} \fbox {4}\\ 2 &{} 0 &{} -2 &{} \fbox {0} &{} \fbox {2}\\ \end{array} \right] . \end{aligned}$$

The linearly independent columns are represented with the first three columns only because the rest ones can be represented as the integral linear combinations below (see Fig. 2a).

$$\begin{aligned} \left( \begin{array}{c} 5 \\ 0 \\ \end{array} \right) = \;1\times \left( \begin{array}{c} 3 \\ 2 \\ \end{array} \right) + 0\times \left( \begin{array}{c} 3 \\ 0 \\ \end{array} \right) + 1\times \left( \begin{array}{c} 2 \\ -2 \\ \end{array} \right) ,\\ \left( \begin{array}{c} 4 \\ 2 \\ \end{array} \right) = \;0\times \left( \begin{array}{c} 3 \\ 2 \\ \end{array} \right) + 2\times \left( \begin{array}{c} 3 \\ 0 \\ \end{array} \right) \fbox {-1}\times \left( \begin{array}{c} 2 \\ -2 \\ \end{array} \right) . \end{aligned}$$
Fig. 2
figure 2

a Finding the basis \(B\) for matrix \(A\), b False paths due to linear combinations of basis \(B\)

The basis, \(B\), of matrix \(A\) is the following:

$$\begin{aligned} B=\left\{ \left( \begin{array}{c} 3 \\ 2 \\ \end{array} \right) \left( \begin{array}{c} 3 \\ 0 \\ \end{array} \right) \left( \begin{array}{c} 2 \\ -2 \\ \end{array} \right) \right\} . \end{aligned}$$

The linear combinations of vectors belonging to basis \(B\) with negative coefficients lead to false paths, for example, as presented below (see Fig. 2b).

The second case takes place when on a path between \(x\) and \(y\), being described by \(R_{kk}^{*}\), there exists a vertex \(w\) such that \(w \in ran\;\, R_{kk} \,\wedge \, w \notin dom\;\, R_{kk}\). An example is presented in Fig. 3, where \(x_{2} \in ran\;\, R_{kk} \,\wedge \, x_{2} \notin dom\;\, R_{kk}\). Relation \(R_{kk}^{*}\), built according to (7), describes the false path between \(x_{1}\) and \(x_{4}\) depicted by the dotted line.

Fig. 3
figure 3

False path in the dependence graph

Summing up, we conclude that relation \(R_{kk}^{*}\) describes all existing paths in the dependence graph represented by relation \(R_{kk}\) and can describe not existing paths, i.e., \((R^{*}_{kk})_{exact} \subseteq R^{*}_{kk}\); when relation \(R^{*}_{kk}\) does not represent not existing paths, \(R^{*}_{kk} = (R^{*}_{kk})_{exact}\).

3.4 Time complexity

The first tree steps of Algorithm 2 can be accomplished in polynomial time.

  1. 1.

    As we have proved in Sect. 3.2, the task of replacing parameterized vectors with a linear combination of vectors with constant coordinates can be done in \(O\left( d^{2} \right) \) operations.

  2. 2.

    The task of forming a dependence matrix using all \(k\) constant dependence vectors in \(\mathbb {Z}^{n}\) requires \(O\left( kn \right) \) operations (memory accesses).

  3. 3.

    The task of identifying a set of linearly independent columns of matrix \(A\), \(A\in \mathbb {Z}^{n\times k}\) with constant coordinates to find the basis can be done in polynomial time by the Gaussian elimination algorithm. According to Cohen and Megiddo (1991), this computation can be done in \(O\left( ldk \right) \) arithmetic operations.

To calculate relation \(R^{*}_{kk}\) in step 4 of Algorithm 2, we use the Presburger arithmetic. In general, calculations based on the Presburger arithmetic are not characterized by polynomial time complexity Kelly et al. (1996).

3.5 Illustrating example

Let us illustrate Algorithms 1 and 2 by means of the following example.

figure b

The following relations describe dependences in the loop above.

figure c

They form the reduced dependence graph presented in Fig. 4.

Fig. 4
figure 4

Reduced dependence graph for the illustrative example

Algorithm 1 calls Algorithm 2 to calculate relation \(R_{kk}^{*}\) for each iteration \(k\) of the outermost loop.

For \(k=1\), the dependence distance set \(\Delta _{1,1}=\emptyset \) because \(R_{11}=\emptyset \), so we get \(R^{*}_{11}=\emptyset \;\cup \;I=\{[k,i]\rightarrow [k,i]\}\).

For \(k=2\), we get the following dependence relations:

figure d

and yield the following dependence distance set \(\Delta _{2,2}\)

$$\begin{aligned} \Delta _{2,2}=\left\{ \left( \begin{array}{c} 2 \\ N2 \\ N3 \\ \end{array} \right) \left( \begin{array}{c} 0 \\ 0 \\ N3 \\ \end{array} \right) \left( \begin{array}{c} 0 \\ N2 \\ N3 \\ \end{array} \right) \left( \begin{array}{c} 4 \\ N2 \\ N3 \\ \end{array} \right) \right\} . \end{aligned}$$

Applying Algorithm 2, we yield the following results.

  1. 1.

    Replace all parameterized dependence distance vectors. The first parameterized vector \(\left( \begin{array}{c} 2 \\ N2 \\ N3 \\ \end{array} \right) \) is replaced with the linear combination of the vector \(\left( \begin{array}{c} 2\\ 0\\ 0\\ \end{array}\right) \) and the unit normal vectors \(\left( \begin{array}{c} 0\\ 1\\ 0\\ \end{array}\right) ,\left( \begin{array}{c} 0\\ 0\\ 1\\ \end{array}\right) \) as follows:

    $$\begin{aligned} \left( \begin{array}{c} 2 \\ N2 \\ N3 \\ \end{array} \right) = \left( \begin{array}{c} 2 \\ 0 \\ 0 \\ \end{array} \right) + N2 \times \left( \begin{array}{c} 0 \\ 1 \\ 0 \\ \end{array} \right) + N3 \times \left( \begin{array}{c} 0 \\ 0 \\ 1 \\ \end{array} \right) . \end{aligned}$$

    The second parameterized vector \(\left( \begin{array}{c} 0 \\ 0 \\ N3 \\ \end{array} \right) \) is replaced with the linear combination of the vector \(\left( \begin{array}{c}0\\ 0\\ 0\\ \end{array}\right) \) and the unit normal vector \(\left( \begin{array}{c}0\\ 0\\ 1\\ \end{array}\right) \) as below:

    $$\begin{aligned} \left( \begin{array}{c} 0 \\ 0 \\ N3 \\ \end{array} \right) = \left( \begin{array}{c} 0 \\ 0 \\ 0 \\ \end{array} \right) + N3 \times \left( \begin{array}{c} 0 \\ 0 \\ 1 \\ \end{array} \right) . \end{aligned}$$

    The third parameterized vector \(\left( \begin{array}{c} 0 \\ N2 \\ N3 \\ \end{array} \right) \) is replaced with the linear combination of the vector \(\left( \begin{array}{c}0 \\ 0\\ 0\\ \end{array}\right) \) and the unit normal vectors \(\left( \begin{array}{c}0 \\ 1\\ 0\\ \end{array}\right) ,\left( \begin{array}{c} 0\\ 0\\ 1\\ \end{array}\right) \) as follows:

    $$\begin{aligned} \left( \begin{array}{c} 0 \\ N2 \\ N3 \\ \end{array} \right) = \left( \begin{array}{c} 0 \\ 0 \\ 0 \\ \end{array} \right) + N2 \times \left( \begin{array}{c} 0 \\ 1 \\ 0 \\ \end{array} \right) + N3 \times \left( \begin{array}{c} 0 \\ 0 \\ 1 \\ \end{array} \right) . \end{aligned}$$

    The last parameterized vector \(\left( \begin{array}{c} 4 \\ N2 \\ N3 \\ \end{array} \right) \) is replaced with the linear combination of the vector \(\left( \begin{array}{c}4\\ 0\\ 0\\ \end{array}\right) \) and the unit normal vectors \(\left( \begin{array}{c}0\\ 1\\ 0\\ \end{array}\right) ,\left( \begin{array}{c} 0\\ 0\\ 1\\ \end{array}\right) \) as below:

    $$\begin{aligned} \left( \begin{array}{c} 4 \\ N2 \\ N3 \\ \end{array} \right) = \left( \begin{array}{c} 4 \\ 0 \\ 0 \\ \end{array} \right) + N2 \times \left( \begin{array}{c} 0 \\ 1 \\ 0 \\ \end{array} \right) + N3 \times \left( \begin{array}{c} 0 \\ 0 \\ 1 \\ \end{array} \right) . \end{aligned}$$

    The resulting dependence distance set \(\Delta _{2,2}\) contains the vectors with constant coordinates only:

    $$\begin{aligned} \Delta _{2,2}=\left\{ \left( \begin{array}{c} 2 \\ 0 \\ 0 \\ \end{array} \right) \left( \begin{array}{c} 0 \\ 1 \\ 0 \\ \end{array} \right) \left( \begin{array}{c} 0 \\ 0 \\ 1 \\ \end{array} \right) \left( \begin{array}{c} 0 \\ 0 \\ 0 \\ \end{array} \right) \left( \begin{array}{c} 4 \\ 0 \\ 0 \\ \end{array} \right) \right\} . \end{aligned}$$
  2. 2

    Form a dependence matrix. The matrix \(A\), where all the constant dependence vectors from set \(\Delta _{2,2}\) are placed in columns, is as follows:

    $$\begin{aligned} A=\left[ \begin{array}{ccccc} 2 &{} 0 &{} 0 &{} 0 &{} 4\\ 0 &{} 1 &{} 0 &{} 0 &{} 0\\ 0 &{} 1 &{} 1 &{} 0 &{} 0\\ \end{array} \right] . \end{aligned}$$
  3. 3.

    Find the basis of the dependence distance set. A set of linearly independent columns of matrix \(A \in \mathbb {Z}^{n \times k}\) over field \(\mathbb {Z}^{n}\), that can generate every vector in \(A\), holds the following matrix \(B\):

    $$\begin{aligned} B=\left[ \begin{array}{ccc} 2 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \\ \end{array} \right] . \end{aligned}$$
  4. 4.

    Calculate the exact transitive closure of a dependence relation describing all the dependences in an input loop or its over-approximation, \(R^{*}_{22}\). Form relation \(R^{*}_{22}\) as follows:

    $$\begin{aligned} \begin{array}{ll} R^{*}_{22} :=&{} \left\{ \begin{array}{l} \left[ i,j,k \right] \rightarrow \left[ i',j',k' \right] \mid \exists \lambda \, s.t. \left( \begin{array}{c} i' \\ j' \\ k' \\ \end{array} \right) = \left( \begin{array}{c} i\\ j\\ k\\ \end{array} \right) + \left[ \begin{array}{ccc} 2 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \\ \end{array}\right] \times \lambda \ \wedge \\ \qquad \qquad \qquad \qquad \quad \; \wedge \; \left( \begin{array}{c} i' \\ j' \\ k' \\ \end{array} \right) - \left( \begin{array}{c} i\\ j\\ k\\ \end{array} \right) \succeq 0, \;\lambda \in \mathbb {Z}^{3}\ \wedge \\ \qquad \qquad \qquad \qquad \quad \;\ \wedge \; \left( \begin{array}{c} i' \\ j' \\ k' \\ \end{array} \right) \in ran\; R_{22} \, \wedge \left( \begin{array}{c} i\\ j\\ k\\ \end{array} \right) \in dom\; R_{22} \end{array} \right\} = \end{array}\\ \begin{array}{ll} \quad \;\;\;\ &{} \begin{array}{l} \left\{ \begin{array}{ll} [k,i,j]\rightarrow [k',i',j'] : &{} \exists \,\alpha : 2\alpha =k + k'\; \wedge 1 \le k \le k'-2\; \wedge 1 \le i \le N2 \\ &{} 1 \le j,\,j' \le N3 \wedge 1 \le i' \le N2\;\wedge k'\le N1 \end{array} \right\} \cup \\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \\ \left\{ \begin{array}{ll} [k,i,j]\rightarrow [k,i,j'] :&{} 1 \le j < j' \le N3 \wedge 1 \le k \le N1 \wedge 1 \le i \le N2 \end{array} \right\} \cup \\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \\ \left\{ \begin{array}{ll} [k,i,j]\rightarrow [k,i',j'] :&{} 1 \le i < i'\le N2\;\wedge 1 \le k \le N1\wedge 1 \le j \le N3\;\wedge \\ &{} 1 \le j' \le N3\quad \\ \end{array} \right\} \cup \\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \\ \left\{ \begin{array}{ll} [k,i,j]\rightarrow [k,i,j] \end{array} \right\} \end{array} \end{array} \end{aligned}$$

Relation \(R^{*}_{22}\) represents exact transitive closure since \(R^{+}_{22} = R_{22} \cup (R_{22} \circ R^{+}_{22})\), i.e., \(R^{*}_{22}=(R^{*}_{22})_{exact}\).

The final result is the following

figure e

4 Related work

Numerous algorithms for calculating the transitive closure of affine integer tuple relations have been proposed Kelly et al. (1996); Beletska et al. (2009); Verdoolaege et al. (2011); Ancourt et al. (2010); Boigelot (1998); Bozga et al. (2009); Eve and Kurki-Suonio (1977). However, in most of them authors focus on relations whose domain and range are non-parametric polyhedra Ancourt et al. (2010); Bozga et al. (2009); Eve and Kurki-Suonio (1977). The limitation of many known algorithms is that they require that the arity of input and output tuples (the number of tuple elements) of relations has to be the same Beletska et al. (2009). This is why we limit related work only to techniques dealing with parameterized relations whose tuple arities are different in general and relations can describe dependences available in program loops.

On a different line of work, Bozga et al. Bozga et al. (2009) have studied the computation of transitive closure for the analysis of counter automata (register machines) and they have implemented their method in the tool called FLATA Bozga et al. (2009). In this context, relation \(R(x,x')\) is a relation that can be written as the finite number of conjunctions of terms of the form \(\pm x_{i}\pm x_{j}\leqslant a_{i,j}\), \(\pm x'_{i}\pm x_{j}\leqslant b_{i,j}\), \(\pm x_{i}\pm x'_{j}\leqslant c_{i,j}\), \(\pm x'_{i}\pm x'_{j}\leqslant d_{i,j}\), \(\pm 2x_{i}\leqslant e_{i,j}\) or \(2x'_{i}\leqslant f_{i,j}\), where \(x\) and \(y\) describe counter values, either at the current step, or at the next step, \(a_{i,j},b_{i,j},c_{i,j},d_{i,j},e_{i,j},f_{i,j}\in \mathbb {Z}\) are integer constants and \(1\leqslant i,j \leqslant n\), \(i\ne j\). As we can see, this class of relations does not involve parameters, existentially quantified variables or unions, i.e., it cannot represent dependences in program loops. This is why we do not compare this technique with ours.

To our best knowledge, techniques for computing the transitive closure of parameterized affine integer tuple relations with different input and output arities of tuples were the subject of the investigation of a few papers only Kelly et al. (1996); Verdoolaege et al. (2011); Verdoolaege (2012). Kelly et al. Kelly et al. (1996) proposed a modified Floyd–Warshall algorithm but they have not implemented it in the Omega library (http://www.cs.umd.edu/projects/omega/). Fourteen years later Verdoolaege has improved and implemented his version of the Floyd–Warshall algorithm in the ISL library (http://www.kotnet.org/skimo/isl/), but that algorithm and implementation are not the same as ours.

Verdoolaege Verdoolaege et al. (2011); Verdoolaege (2012) treats each of input relations \(R_{i\leqslant m}\) as vertices \(V_{i\leqslant m}\) of the directed graph \(G\), where \(m\) is the total number of input relations. There exists a directed path \(E_{ij}\) from vertex \(V_{i\leqslant m}\) to vertex \(V_{j\leqslant m}\) (\(R_{j}\) can immediately follow \(R_{i}\)) if the range of \(R_{i}\) intersects the domain of \(R_{j}\), i.e., if

$$\begin{aligned} R_{j}\circ R_{i}\ne \emptyset . \end{aligned}$$
(8)

In order to calculate the transitive closure of a dependence relation \(R\), Verdoolaege Verdoolaege et al. (2011); Verdoolaege (2012) constructs \(m^{2}\) relations

$$\begin{aligned} R_{ij}=\bigcup _{i,j\, s.t.\,R_{j}\circ R_{i}\ne \emptyset \ }^{m}R_{j}\circ R_{i}. \end{aligned}$$
(9)

Then he applies Algorithm 1 and returns the union of all output \(R_{i,j}\) as transitive closure. In our implementation, we use information gathered with the Petit dependence analyzer Kelly et al. (1996) to insert a dependence relation describing dependences between instances of statements i and j as element i, j of array \(D\) (element \(D_{ij}\)). Then we call Algorithm 1 to get transitive closure. Information provided with Petit permits us to reduce the time complexity of the Floyd–Warshall algorithm implementation due to skipping a connection check between each pair of input dependence relations (see formula 7).

Because of the differences between our implementation of the Floyd–Warshall algorithm and that of Verdoolaege Verdoolaege et al. (2011); Verdoolaege (2012), in this paper we investigate only how different concepts of calculating the relation \(R^{*}_{kk}\) impact the time complexity of the Floyd–Warshall algorithm. For this purpose, we have chosen for calculating \(R^{*}_{kk}\) algorithms implemented in the ISL (http://www.kotnet.org/skimo/isl/) and Omega (http://www.cs.umd.edu/projects/omega/) libraries. Those algorithms are based on computing parametric powers \(R^{k}\) and then projecting out the parameter \(k\) by making it existentially quantified. As a trivial example, consider the relation \(R=\{[x]\rightarrow [x+1]\}\). The kth power of \(R\) for arbitrary k is \(R^{k}=\{[x]\rightarrow [x+k] \,|\, k\geqslant 1\}\) and the transitive closure is then \(R^{+}=\{[x]\rightarrow [y]\,|\, \exists k\in \mathbb {Z}_{\geqslant 1}:\,y=x+k\}=\{[x]\rightarrow [y]\,|\,y\geqslant x+1\}\). Both the algorithms consider the difference set \(\Delta \,R\) of the relation, but in the ISL library (http://www.kotnet.org/skimo/isl/) if all differences \(\Delta _{i}\)s are singleton sets, i.e., \(\Delta _{i}=\{\delta _{i}\}\) with \(\delta _{i}\in Z^{d}\), then \(R^{+}\) is calculated as follows:

$$\begin{aligned} R^{+}=\{x\longrightarrow y \mid \exists k_{i}\in \mathbb {Z}_{\geqslant 0} : y=x+\sum ^{}_{i}k_{i}\delta _{i}\,\wedge \sum ^{}_{i}k_{i}=k>0, \end{aligned}$$
(10)

which is essentially the same as that of Beletska et al. Beletska et al. (2009). If some of the \(\Delta _{i}\)s are parametric, then each offset \(\Delta _{i}\) is extended with an extra coordinate \(\Delta _{i}^{'}=\Delta _{i}\times \{1\}\), that is the constant equal to one. Paths constructed by summing such extended offsets are of length \(k\) encoded as the difference of their final coordinates, so \(R^{+}\) can then be decomposed into relations \(R_{i}^{+}\), one for each \(\Delta _{i}\) as follows

$$\begin{aligned} R^{+}\!=\!((R_{m}^{+}\cup I)\circ ...\circ (R_{2}^{+}\cup I)\circ (R_{1}^{+}\cup I))\!\cap \! \{x'\rightarrow y'\mid \exists k_{>0}: y_{d+1}\!-\!x_{d+1}=k\},\!\!\!\nonumber \\ \end{aligned}$$
(11)

with

$$\begin{aligned} R_{i}^{+}=s \mapsto \{x'\rightarrow y' \mid \exists k\in \mathbb {Z}_{\geqslant 1},\delta \in k\Delta _{i}^{'}(s): y'=x'+\delta \}. \end{aligned}$$
(12)

Each non-parametric constraint \(A_{1}x+c_{1}\geqslant 0\) of \(\Delta _{i}^{'}(s)\) from (11) is transformed into the form \(A_{1}x+kc_{1}\geqslant 0\) and the rest of constraints are rewritten without any changes. For more details see Verdoolaege et al. (2011); Verdoolaege (2012).

While the algorithms implemented by Verdoolaege Verdoolaege et al. (2011); Verdoolaege (2012) in the ISL library (http://www.kotnet.org/skimo/isl/) are designed to compute overapproximations, Kelly et all. Kelly et al. (1996) in the Omega library (http://www.cs.umd.edu/projects/omega/) propose a heuristic algorithm to compute an under-approximation that does not guarantee calculating exact transitive closure.

5 Experimental results

The goals of experiments were to evaluate the effectiveness and time complexity of the proposed approach for calculating relation \(R^{*}_{kk}\) and using it in the modified Floyd–Warshall algorithm for loops provided by the well-known NAS Parallel Benchmark (NPB) Suite from NASA (http://www.nas.nasa.gov) and compare received results with the effectiveness and time complexity of techniques implemented in the ISL (http://www.kotnet.org/skimo/isl/) and Omega (http://www.cs.umd.edu/projects/omega/) tools. We have implemented the presented algorithms as an ANSI-C++ software module. The source code of the module was compiled using the gcc compiler v4.3.0 and can be download from: http://www.sfs.zut.edu.pl/files/mfw-omega.tar.gz. Experiments were conducted using an Intel Core2Duo T7300@2.00GHz machine with the Fedora Linux v12 32bit operating system.

The implementation calculates transitive closure according to Algorithm 1 and permits to choose the three options for producing the relation \(R^{*}_{kk}\) by means of: (i) Algorithm 2, (ii) Omega, and (iii) ISL. Under our experiments, we have examined only such loops provided by NPB that expose dependences. There exist 75 imperfectly nested loops and 58 perfectly nested loops in NPB that expose dependences. The results of our experiments are collected in Table 1, where time is presented in seconds. The columns “Proposed algorithm”, “ISL”, and “Omega” present the time of calculating transitive closure by means of the Floyd–Warshall algorithm, where relations \(R^{*}_{kk}\) were calculated by means of applying Algorithm 2, the ISL, and Omega tools, respectively.

Table 1 The results of the experiments on the proposed approach to computing transitive closure

Analyzing the results presented in Table 1, we can derive the following conclusions. All techniques under experiments are able to calculate transitive closure for all NBP loops exposing dependences. The exactness of the presented approach is the same as that of techniques implemented in Omega and ISL. i.e., all techniques under experiments produce exact transitive closure for the same loops. Calculating relation \(R^{*}_{kk}\) by means of Algorithm 2 is less time-consuming in comparison with techniques implemented in Omega and ISL that reduces the time of calculating the transitive closure of a relation describing all the dependences in the loop by means of the Floyd–Warshall’s algorithm. For all loops, we obtained the shortest time of producing transitive closure.

The explanation is that each relation \(R_{kk}^{*}\) that we compose in Algorithm 1 (line 10) consists of two relations, \(R_{kk}^{+}\cup I\). If there are \(m\) disjuncts in the input relation, \(R_{kk}\), then the direct application of the composition operation just like in formula (11) may therefore result in a relation with \(2^{m}\) disjuncts that is computationally expensive. In general, applying formula (7) results in the number of disjuncts that is much less than \(2^m\). This permits us to conclude that the presented approach is faster than other well-known approaches.

6 Conclusion

In this paper, we presented a modified Floyd–Warshall algorithm, where the most time consuming part (transitive closure describing self-dependences in the program loop) is calculated by means of basis dependence distance vectors. We demonstrated how to calculate basis dependence distance vectors for parameterized program loops and how to apply them to calculate the transitive closure of a dependence relation describing all self-dependences among the instances of a given loop statement by means of basis dependence distance vectors.

This solution results in reducing the time of the transitive closure calculation of parameterized graphs representing dependences in program loops. Reducing this time is due to using a finite integral linear combination of basis dependence distance vectors to calculate the \(R_{kk}^{*}\) term in a modified Floyd–Warshall algorithm. Reducing the time of the transitive closure calculation was proved by means of numerous experiments with NPB benchmarks. The presented approach can be used for resolving many optimizing compilers problems: redundant synchronization removal (Presburger 1927), testing the legality of iteration reordering transformations (Presburger 1927), iteration space slicing (Beletska et al. 2011), forming schedules for statement instances of program loops (Bielecki et al. 2012). In our future work we plan to study the application of the presented approach for extracting both coarse- and fine-grained parallelism for different popular benchmarks.