Keywords

1 Introduction

In this work we revisit the work by Barthou and Jeannot [1]. We consider that we have two platforms, each with an unbounded number of processors. We want to execute an application represented as a Directed Acyclic Graph (DAG) using these two platforms. Each task of the application has two possible execution times, depending on the platform it is executed on. Finally, there is a cost to transfer data from one platform to another one between successive tasks.

In their work, Barthou and Jeannot [1] considered that each task could be executed on both platforms and were able to compute in polynomial time an optimal schedule. Here we study the problem where tasks cannot be re-executed. While this problem arises more from a theoretical understanding of the process, we can envision several directions linked to the usage of parallel machines where it could be useful, in High-Performance Computing or Cloud Computing.

In High-Performance Computing, one has to deal with simulations using millions of nodes. These simulations run on machines consisting often of either homogeneous, or of two types of nodes (e.g. CPU+GPU)Footnote 1. These simulations generates huge volume of data, saturating access to the Parallel File System. A recent technique to deal with this data is to analyze it in-situ [2], that is, while it is generated. This analysis can be done both on CPUs or GPUs, with a cost to move data around. It uses fewer nodes than the simulation by many orders of magnitude, and the only constraint is not to decelerate the main simulation. Hence one will allocate as many nodes as needed to these analysis (hence almost an unbounded number).

Another motivation in the context of Big-Data Analytics is the concept of Geo-Distributed Data-centers [3]. Information for each jobs is located in different data-centers, and the main cost is to move data-around. The number of nodes in each data-center is less an issue. Furthermore in Big-Data analytics, the data-dependencies of the graph are often linked to Map-Reduce-like applications (Hadoop, Spark etc), also called Bi-Partite Graph. This is a more general version of our problem where we have k instead of 2 unbounded resources.

Related Work: Recently, the problem of scheduling jobs on hybrid parallel platforms (k types of homogeneous machines) has attracted a lot of attention. Due to lack of space we focus on those work closest to us. More details are available in the companion report of this work [4].

The most commonly studied problem is the one when \(k=2\) (typically CPU/GPU platforms) with the objective of minimizing the makespan. The problem is in NP even when the number of each resource is bounded. In this case, several families of approximation algorithms have been studied, see for example Ait Aba et al. [5] for general graphs, or Kedad-Sidhoum et al. [6] and Marchal et al. [7] for independent tasks.

In the context of an unlimited number of processors, to limit the size of the description of the problem, one needs to consider a limited number of performance profile (computation/communication costs). Indeed otherwise if the size of the problem is not bounded, (almost) any algorithm is polynomial in the size of the instance. If there are no communication delays, the problem is trivial, where each task is simply assigned to the fastest machine. In the case where all processors have the same processing power and there is a cost for any communication the problem remains NP-complete. Darbha and Agrawal [8] provide an optimal solution TDS (Task Duplication based Scheduling) when the communications are not too large w.r.t the computation costs. Later, Park and Choe [9] extended this work when the communications are significantly larger than computations.

The closest to our work is the work of Barthou and Jeannot [1] who studied the problem of minimizing the makespan on two unbounded hybrid platform. They provide a \(\varTheta (4|E| + 2|V|)\) polynomial-time algorithm when duplication of jobs is allowed (namely, each job is executed on both platforms as soon as possible). They further discuss a possible extension of their work to the case where the number of processors of each type is limited by differentiating the allocation part (using their algorithm) and the scheduling part. While the problem with duplication makes sense when the number of processors is unbounded to reduce the makespan, it may lead to other problems, such as additional energy consumption and significant memory footprint, hence motivating our study without duplication.

Finally, there is a wide range of heuristic solutions to the problem of CPU-GPU. They can be roughly partitioned in two classes: clustering algorithms and list-scheduling algorithms. Clustering algorithms [10] usually provide good solutions for communication-intensive graphs by scheduling heavily communicating tasks onto the same processor. List-scheduling heuristics such as HEFT [11] often have no performance guarantee with communication costs, but allow to handle a limited number of processors.

Results: Our main contributions are the following. We formalize the model in Sect. 2, and show that the problem is NP-complete for graphs of depth at least three but polynomial for graphs of depth at most two. We show that the problem cannot be approximated to a factor smaller than 3/2 unless \(\mathcal{P}=\mathcal{NP}\). Then, we provide polynomial-time algorithms for several classes of graphs. Those results are presented in Sect. 3. Finally, in Sect. 4, we provide concluding remarks and future directions.

2 Model

An application is represented by a Directed Acyclic Graph (DAG) \(\mathcal {G}=(V,E)\), such that for all \((v_1,v_2)\in E\), \(v_2\) cannot start its execution before the end of the execution of \(v_1\). We consider a parallel platform of two types of machines: machines of type \(\mathcal {A}\) and machines of type \(\mathcal {B}\). For each type of machine we consider that there are an unbounded number of them.

We define two cost functions: \(t_{\mathcal {A}}: V\rightarrow \mathbb {R+}\) (resp. \(t_{\mathcal {B}}: V\rightarrow \mathbb {R+}\)) that define the time to execute a task \(v\in V\) on a machine of type \(\mathcal {A}\) (resp. \(\mathcal {B}\)).

We also define two communication cost functions: \(c_{\mathcal {A} \mathcal {B}}: E \rightarrow \mathbb {R+}\) (resp. \(c_{\mathcal {B} \mathcal {A}}: E \rightarrow \mathbb {R+}\)), such that for all \((v_1,v_2)\in E\), if \(v_1\) is scheduled on a machine of type \(\mathcal {A}\) (resp. \(\mathcal {B}\)) and \(v_2\) is scheduled on a machine of type \(\mathcal {B}\) (resp. \(\mathcal {A}\)), then \(v_2\) needs to wait \(c_{\mathcal {A} \mathcal {B}} (v_1,v_2)\) (resp. \(c_{\mathcal {B} \mathcal {A}} (v_1,v_2)\)) units of time after the end of the execution of \(v_1\) to start its execution. We assume that there is no communication cost within a platform of a given type (\(c_{\mathcal {A} \mathcal {A}} =c_{\mathcal {B} \mathcal {B}} =0\)).

The goal is to find a schedule of each task that minimizes the execution time (or makespan). Since there is an unbounded number of processors of each type, it corresponds to finding an allocation \(\sigma :V\rightarrow \{\mathcal {A},\mathcal {B} \}\) of all tasks on each type of processors. For an allocation \(\sigma \) and a path \(p=v_1\rightarrow v_2 \rightarrow \cdots \rightarrow v_p\) of \(\mathcal {G}\), we define the length of the path

$$\begin{aligned} \texttt {len} (p,\sigma )&= t_{\sigma (v_1)} (v_1) +c_{\sigma (v_1)\sigma (v_2)} (v_1,v_2) + t_{\sigma (v_2)} (v_2) +\cdots + t_{\sigma (v_p)} (v_p). \end{aligned}$$

The makespan is then obtained by computing the longest path of the graph \(\mathcal {G}\) including the corresponding duration of the tasks and the computations costs: \(MS(\mathcal {G},\sigma )=\max _{p \in \{\text {paths of }\mathcal {G}\}}\texttt {len} (p,\sigma )\).

3 Results

In this section, we start by showing that the problem is strongly NP-complete for graph of depth 3, before providing some algorithms for specific graphs.

3.1 Complexity

Theorem 1

The problem of deciding whether an instance of our main problem has a schedule of length 2 is strongly NP-complete even for graphs of depth 3.

We perform the reduction from the 3-Satisfiability (3-SAT) problem which is known to be strongly NP-complete [12, 13]: given \(C_1,\cdots ,C_m\) be a set of disjunctive clauses where each clause contains exactly three literals over \(X=\{x_1, \cdots ,x_n\}\) a set of boolean variables. Is there a truth assignment to X such that each clause is satisfied?

In the following, we write each clause \(C_i = \tilde{x}_{i_1} \vee \tilde{x}_{i_2} \vee \tilde{x}_{i_3}\) where \((x_{i_1},x_{i_2},x_{i_3}) \in X^3\), and \(\tilde{x}_k=x_k\) or \(\bar{x}_k\). We are looking for a truth assignment such that \(\bigwedge _{i=1}^m C_i\) is true.

Proof

From an instance \(\mathcal {I}_1\) of 3-SAT: \(C_1,\cdots ,C_m\) over \(\{x_1,\cdots ,x_n\}\), we construct the following instance \(\mathcal {I}_2\) for our problem.

For all \(i\in \{1,\cdots ,n\}\), we define 2 tasks \(v^{0}_{i} \) and \(v^{\infty }_{i} \), and an edge \((v^{0}_{i} ,v^{\infty }_{i} )\). Then for each clause \(C_i = \tilde{x}_{i_1} \vee \tilde{x}_{i_2} \vee \tilde{x}_{i_3}\), 3 tasks \(v^{i}_{i_1},v^{i}_{i_2},v^{i}_{i_3} \) are created and the following set of edges: \(\{ (v^{i}_{i_1},v^{i}_{i_2}),(v^{i}_{i_2},v^{i}_{i_3}),(v^{i}_{i_1},v^{\infty }_{i_1} ),(v^{0}_{i_2} ,v^{i}_{i_2}),(v^{0}_{i_3} ,v^{i}_{i_3})\}\). For any \(j\in \{1,\cdots ,n\}\), \(v_j^\star \) denotes the set of all the instanciations of \(x_j\) in \(\mathcal{G}\).

Overall, the graph \(\mathcal {G}=(V,E)\) of depth 3 has \(2n+3m\) vertices and \(n + 5m\) edges.

We then define the execution and communication costs that can be written in unit size: \(\forall j\in \{1, \cdots ,n\},~ t_{\mathcal {A}} (v^{\infty }_{j} )=t_{\mathcal {B}} (v^{\infty }_{j} )=t_{\mathcal {A}} (v^{0}_{j} )=t_{\mathcal {B}} (v^{0}_{j} ) =0\) and \(c_{\mathcal {A} \mathcal {B}} (v^{0}_{j} ,v^{\infty }_{j} ) = c_{\mathcal {B} \mathcal {A}} (v^{0}_{j} ,v^{\infty }_{j} ) =3\). For all edges \((v^{i}_{j},v^{\infty }_{j} ),(v^{0}_{j'} ,v^{i'}_{j'}) \in E\), we add the communication costs \(c_{\mathcal {A} \mathcal {B}} (v^{i}_{j},v^{\infty }_{j} ) = c_{\mathcal {B} \mathcal {A}} (v^{i}_{j},v^{\infty }_{j} )=c_{\mathcal {A} \mathcal {B}} (v^{0}_{j'} ,v^{i'}_{j'})= c_{\mathcal {B} \mathcal {A}} (v^{0}_{j'} ,v^{i'}_{j'}) =3\). Then for \(C_i=\tilde{x}_{i_1} \vee \tilde{x}_{i_2} \vee \tilde{x}_{i_3}\) we define the time costs:

$$\begin{aligned} t_{\mathcal {A}} (v^{i}_{i_j}) = 1- t_{\mathcal {B}} (v^{i}_{i_j}) = {\left\{ \begin{array}{ll} 1 &{} \text {if } \tilde{x}_{i_j} = \bar{x}_{i_j} \\ 0 &{} \text {if } \tilde{x}_{i_j} = x_{i_j} \end{array}\right. } \end{aligned}$$
(1)

and we set \(c_{\mathcal {A} \mathcal {B}} (v^{i}_{i_1},v^{i}_{i_2}) = c_{\mathcal {B} \mathcal {A}} (v^{i}_{i_1},v^{i}_{i_2}) = c_{\mathcal {A} \mathcal {B}} (v^{i}_{i_2},v^{i}_{i_3})= c_{\mathcal {B} \mathcal {A}} (v^{i}_{i_2},v^{i}_{i_3})=0\).

Finally, in the instance \(\mathcal {I}_2\), we want to study whether there exists a schedule \(\sigma \) whose makespan is not greater than 2.

Fig. 1.
figure 1

Transformation of \((x_{1} \vee \bar{x}_{4} \vee x_{2}) \bigwedge (\bar{x}_{3} \vee \bar{x}_{4} \vee x_{1}) \bigwedge (x_{1} \vee x_{2} \vee x_{3})\) (\(m=3\) clauses, \(n=4\) variables) into the associated graph \(\mathcal {G}=(V,E)\).

We show an example in Fig. 1 of the construction of the graph. Here, the clause \(C_1=x_{1} \vee \bar{x}_{4} \vee x_{2}\) is associated with the vertices \(v_1^1\), \(v_4^1\) and \(v_2^1\) and the arcs set \(\{(v_1^1, v_4^1), (v_4^1, v_2^1), (v_1^1, v_1^\infty ), (v_4^0, v_4^1), (v_2^0,v_2^1)\}\). Moreover, \(t_{\mathcal {A}} (v_1^1)=t_{\mathcal {A}} (v_2^1)=0\), \(t_{\mathcal {A}} (v_4^1)=1\), \(t_{\mathcal {B}} (v_1^1)=t_{\mathcal {B}} (v_2^1)=1\) and \(t_{\mathcal {B}} (v_4^1)=0\). Note that \(v_1^\star =\{v_1^0, v_1^\infty , v_1^1, v_1^2, v_1^3\}\), \(v_2^\star =\{v_2^0, v_2^\infty , v_2^1, v_2^3\}\), \(v_3^\star =\{v_3^0, v_3^\infty , v_3^2, v_3^3\}\) and \(v_4^\star =\{v_4^0, v_4^\infty , v_4^1, v_4^2\}\).

Let \(\mathcal {S}\) be the set of schedules such that, \(\forall \sigma \in \mathcal {S}\), all tasks from \(v^\star _j\) are scheduled by the same type of machines, i.e, for any couple \((v_j^\alpha , v_j^\beta )\in v^\star _j\times v^\star _j\), \(\sigma (v_j^\alpha )=\sigma (v_j^\beta )\). The next lemmas provide dominance properties on feasible schedules of \(\mathcal{I}_2\):

Lemma 1

Any feasible solution \(\sigma \) of \(\mathcal{I}_2\) belongs to \(\mathcal {S}\).

Proof

Let us suppose by contradiction that a feasible solution \(\sigma \not \in \mathcal{S}\). Two cases must then be considered:

  • If there exists \(j\in \{1, \cdots ,n\}\) with \(\sigma (v_j^0)\ne \sigma (v_j^\infty )\), then there is a communication delay of 3 between them and \(\texttt {len} (v_j^0 \rightarrow v_j^1,\sigma )=3\).

  • Otherwise, \(\forall j\in \{1, \cdots ,n\}, \sigma (v_j^0)=\sigma (v_j^\infty )\). Thus, there exists a task \(v_j^i\) with \(\sigma (v_j^i)\ne \sigma (v_j^0)\). If \(v_j^i\) is associated to the first term of the clause \(C_i\), then \((v_j^0, v_j^i)\in E\) and \(\texttt {len} (v_j^0 \rightarrow v_j^i,\sigma )=3\). Otherwise, \((v_j^i, v_j^\infty )\in E\) and \(\texttt {len} (v_j^i \rightarrow v_j^\infty ,\sigma )=3\).

The makespan of \(\sigma \) is at least 3 in both cases, the contradiction.

Lemma 2

For any schedule \(\sigma \in \mathcal{S}\), \(MS(\mathcal{G}, \sigma )=\max _{i\in \{1,\cdots , m\}} \texttt {len} (v_{i_1}^i \rightarrow v_{i_2}^i\rightarrow v_{i_3}^i, \sigma )\).

Proof

To do this, we study the length of paths of \(\mathcal{G}\).

  • Let \(j\in \{1,\cdots ,n\}\), \(\texttt {len} (v^{0}_{j} \rightarrow v^{\infty }_{j} ,\sigma )=0\) since \(\sigma (v^0_j)=\sigma (v^{\infty }_{j} )\).

  • Let \(i\in \{1,\cdots ,m\}\) associated with the clause \(C_i = \tilde{x}_{i_1} \vee \tilde{x}_{i_2} \vee \tilde{x}_{i_3}\):

    1. 1.

      Let us consider first the path \(v^{i}_{i_1} \rightarrow v^{\infty }_{i_1} \). By Lemma 1, \(\sigma (v^{i}_{i_1})=\sigma (v^{\infty }_{i_1} )\) and thus \(c_{\sigma (v^{i}_{i_1})\sigma (v^{\infty }_{i_1} )} (v^{i}_{i_1},v^{\infty }_{i_1} )=0\). Since \(\texttt {len} (v^{\infty }_{i_1} ,\sigma )=0\),

      $$\texttt {len} (v^{i}_{i_1} \rightarrow v^{\infty }_{i_1} ,\sigma )= \texttt {len} (v^{i}_{i_1},\sigma )\le \texttt {len} (v^{i}_{i_1} \rightarrow v^{i}_{i_2} \rightarrow v^{i}_{i_3},\sigma ).$$
    2. 2.

      Let us consider now the path \(v^{0}_{i_2} \rightarrow v^{i}_{i_2} \rightarrow v^{i}_{i_3} \). Similarly, \(\sigma (v^{0}_{i_2} )=\sigma (v^{i}_{i_2})\) hence

      $$\texttt {len} (v^{0}_{i_2} \rightarrow v^{i}_{i_2} \rightarrow v^{i}_{i_3},\sigma ) = \texttt {len} (v^{i}_{i_2} \rightarrow v^{i}_{i_3},\sigma )\le \texttt {len} (v^{i}_{i_1} \rightarrow v^{i}_{i_2} \rightarrow v^{i}_{i_3},\sigma ).$$
    3. 3.

      Lastly, for the path \((v^{0}_{i_3} \rightarrow v^{i}_{i_3})\), since \(\sigma (v^{0}_{i_3} )=\sigma (v^{i}_{i_3})\),

      $$\texttt {len} (v^{0}_{i_3} \rightarrow v^{i}_{i_3},\sigma )=\texttt {len} (v^{i}_{i_3},\sigma )\le \texttt {len} (v^{i}_{i_1} \rightarrow v^{i}_{i_2} \rightarrow v^{i}_{i_3},\sigma ),$$

      which concludes the lemma.

Assume that \(\lambda \) is a solution of \(\mathcal {I}_1\), Let us show that the schedule defined as follow, \(\forall j\in \{1,\cdots ,n\}\), \(\forall v_j^\alpha \in v_j^\star \),

$$\sigma _{\lambda }: v_j^\alpha \mapsto {\left\{ \begin{array}{ll} \mathcal {A} &{} \text {if } \lambda (x_{j}) = 1 \\ \mathcal {B} &{} \text {if } \lambda (x_{j}) = 0 \end{array}\right. } $$

has a makespan not greater than 2 and thus is a solution. Following Lemma 2, we must prove that \(\forall i\in \{1, \cdots ,n\}\), \(\texttt {len} (v_{i_1}^i \rightarrow v_{i_2}^i\rightarrow v_{i_3}^i, \sigma _{\lambda })\le 2\).

For any clause \(C_i = \tilde{x}_{i_1} \vee \tilde{x}_{i_2} \vee \tilde{x}_{i_3}\), since \(\lambda (C_i)=1\), there exists \(j\in \{1,2,3\}\) such that \(\lambda (\tilde{x}_{i_j})=1\). Two cases must be considered:

  1. 1.

    If \(\tilde{x}_{i_j}=x_{i_j}\), then by definition \(t_\mathcal {A} (v_{i_j}^{i})=0\). Since \(\lambda (x_{i_j})=1\), \(\sigma _{\lambda }(v_{i_j}^{i})=\mathcal {A} \) and thus \(\texttt {len} (v_{i_j}^{i},\sigma _{\lambda })=t_\mathcal {A} (v_{i_j}^{i})=0\).

  2. 2.

    Otherwise, \(\tilde{x}_{i_j}=\bar{x}_{i_j}\) and \(t_\mathcal {B} (v_{i_j}^{i})=0\). Now, as \(\lambda ({x}_{i_j})=0\), \(\sigma _{\lambda }(v_{i_j}^{i})=\mathcal {B} \) and thus \(\texttt {len} (v_{i_j}^{i},\sigma _{\lambda })=t_\mathcal {B} (v_{i_j}^{i})=0\).

\(\texttt {len} (v_{i_j}^{i},\sigma _{\lambda })=0\) in both cases, so \(\texttt {len} (v_{i_1}^i \rightarrow v_{i_2}^i\rightarrow v_{i_3}^i, \sigma _{\lambda })\le 2\).

Assume now that we have a solution \(\sigma \) of \(\mathcal {I}_2\),let us show that \(\lambda _{\sigma }(x_j) = [\sigma (v^{\infty }_{j} ) = \mathcal {A} ]\) is a solution to \(\mathcal {I}_1\).

Following Lemma 1, \(\sigma \in \mathcal{S}\). Moreover, for any clause \(C_i = \tilde{x}_{i_1} \vee \tilde{x}_{i_2} \vee \tilde{x}_{i_3}\), the corresponding path of \(\mathcal{G}\) verifies \(\texttt {len} (v_{i_1}^i \rightarrow v_{i_2}^i\rightarrow v_{i_3}^i, \sigma )\le 2\). Thus, there is \(j\in \{1,2,3\}\) with \(\texttt {len} (v_{i_j}^i, \sigma )=0\). Two cases must be considered:

  1. 1.

    If \(\tilde{x}_{i_j}=x_{i_j}\) then by definition \(t_\mathcal {A} (v_{i_j}^i)=0\) and \(t_\mathcal {B} (v_{i_j}^i)=1\). So, \(\sigma (v_{i_j}^i)=\mathcal {A} \) and thus \(\lambda _{\sigma }(x_{i_j})=1\).

  2. 2.

    Else, \(\tilde{x}_{i_j}=\bar{x}_{i_j}\) and thus \(t_\mathcal {A} (v_{i_j}^i)=1\) and \(t_\mathcal {B} (v_{i_j}^i)=0\). So, \(\sigma (v_{i_j}^i)=\mathcal {B} \) and thus \(\lambda _{\sigma }(\bar{x}_{i_j})=1\).

So, at least one term of \(C_i\) is true following \(\lambda _{\sigma }\), \(\lambda _{\sigma }\) is then a solution to \(\mathcal {I}_1\).

This concludes the proof that the problem is strongly NP-complete.

Corollary 1

There is no polynomial-time algorithm for the problem with a performance bound smaller than \(\frac{3}{2}\) unless \(\mathcal {P}= \mathcal {NP}\).

Proof

By contradiction, let us suppose that there exists a polynomial-time algorithm with a performance ratio \(\rho < \frac{3}{2}\). This algorithm can be used to decide the existence of a schedule a length at most 2 for any instance \(\mathcal {I}\). We deduce that there exists a polynomial time algorithm to decide the existence of a schedule of length strictly less than 3, which contradicts Theorem 1.

3.2 Polynomial Algorithms

Bi-partite Graphs. We have shown that the problem is NP-hard if the graph has depth 3. The natural question that arises is whether it is already NP-hard for graphs of lower depth. We show that it can be solved in polynomial time for graphs of depth 2 (bipartite graphs).

Theorem 2

\(\textsc {BiPartAlgo}(\mathcal {G})\) described below provides an optimal solution in polynomial time with a complexity of \(\varTheta (n|E|)\) when \(\mathcal {G}\) has depth 2.

Observe that in the case of a bipartite graph \(\mathcal {G}=(V,E)\), the paths are exactly the edges of \(\mathcal {G}\). The intuition of the algorithm is then to compute first the makespan of all possible allocations for all edges, and then to remove pairs associated to forbidden allocations.

For any edge \((i,j)\in E\), 4 allocations are possible: \((\sigma (i),\sigma (j)) \in \{\mathcal {A},\mathcal {B} \}^2=\{(\mathcal {A},\mathcal {A}),(\mathcal {A},\mathcal {B}),(\mathcal {B},\mathcal {A}),(\mathcal {B},\mathcal {B})\}\). We define the set of quintuplet of all these allocations:

$$\begin{aligned} \texttt {WgPaths}&= \Big \{(\texttt {len} (i\rightarrow j,\sigma ),i,j,\sigma _i,\sigma _j ) \big | \\&\quad \quad \quad (i,j)\in V, (\sigma (i),\sigma (j)) \in \{\mathcal {A},\mathcal {B} \}^2, \sigma (i)=\sigma _i, \sigma (j)=\sigma _j \Big \}. \end{aligned}$$

This set can be constructed in linear time by a simple iteration through all the edges of the graph by a procedure that we call MkWgPaths(VE).

Finally to minimize the makespan, we iteratively remove from \(\texttt {WgPaths}\) the allocations that would maximize the makespan and check that there still exists a possible schedule.

figure a

In the rest, we use the following notation for a schedule \(\sigma \) and a time D:

$$\begin{aligned} \text {WP}(D) =&\big \{ (i,j, \sigma _i,\sigma _j) \text { s.t. }\left( t_{\sigma _i\sigma _j}, i,j, \sigma _i,\sigma _j\right) \in \texttt {WgPaths} \quad \text { and } t_{\sigma _i\sigma _j} > D\big \}\\ P_{D}(\sigma ) =&\bigwedge _{( i,j, \sigma _i,\sigma _j)\in \text {WP}(D)} \left[ (\sigma (i) \ne \sigma _i) \vee (\sigma (j) \ne \sigma _j)\right] \end{aligned}$$

Intuitively, \(\text {WP}(D)\) is the set of paths and allocations of length greater than D.

Lemma 3

Let \(\sigma \) be a schedule of makespan D, then \(P_{D}(\sigma )\) is satisfied.

This result is a direct consequence of the fact that there should be no path of length greater than D. Hence for \(( i,j, \sigma _i,\sigma _j)\in \text {WP}(D)\), we know that we do not have simultaneously in the schedule \((\sigma (i) = \sigma _i)\) and \((\sigma (j) = \sigma _j)\). Hence,

$$\begin{aligned} \lnot \bigvee _{( i,j, \sigma _i,\sigma _j)\in \text {WP}(D)} \left[ (\sigma (i) = \sigma _i) \wedge (\sigma (j) = \sigma _j)\right] \nonumber \qquad \qquad \qquad \qquad \qquad \qquad \\ =\bigwedge _{( i,j, \sigma _i,\sigma _j)\in \text {WP}(D)} \left[ (\sigma (i) \ne \sigma _i) \vee (\sigma (j) \ne \sigma _j)\right] = P_{D}(\sigma ) \end{aligned}$$
(2)

Proof

(Proof of Theorem 2). Consider an instance \(\mathcal {G}\) of the problem. Let \(D_{\text {alg}}\) be the deadline of the schedule returned by \(\textsc {BiPartAlgo}(\mathcal {G})\). Clearly, \(D_{\text {alg}}=\max _{(i,j)\in E} (t_{\sigma (i)}(i) + c_{\sigma (i)\sigma (j)} (i,j) + t_{\sigma (j)}(j))\). Let \(P_{\text {alg}}\) be the set of clauses computed by it (line 9). Let \(W_{\text {alg}} = \{ (i,j,\sigma _i,\sigma _j) | (t_{\sigma _i\sigma _j},i,j,\sigma _i,\sigma _j) \in \texttt {WgPaths} \}\) s.t. \(P_{\text {alg}} = \bigwedge _{( i,j, \sigma _i,\sigma _j)\in W_{\text {alg}}} \left[ (\sigma (i) \ne \sigma _i) \vee (\sigma (j) \ne \sigma _j)\right] \). Then by construction of \(P_{\text {alg}}\), we have the following properties:

  1. 1.

    For all \(\varepsilon >0\), \(\text {WP}(D_{\text {alg}}) \subset W_{\text {alg}} \subset \text {WP}(D_{\text {alg}-\varepsilon })\), because we add paths by decreasing value of makespan (line 4).

  2. 2.

    There exists \((D_{\text {alg}},i_0,j_0,\sigma _{i_0},\sigma _{j_0})\in \texttt {WgPaths} \) such that \(P_{\text {alg}}\) is satisfiable and \(P_{\text {alg}}\bigwedge \left[ (\sigma (i_0) \ne \sigma _{i_0}) \vee (\sigma (j_0) \ne \sigma _{j_0})\right] \) is not satisfiable. This is the stopping condition on line 6.

We show the optimality of Algorithm 1 by contradiction. If it is not optimal, then \(D_{\text {opt}} < D_{\text {alg}}\), and \(W_{\text {alg}} \cup (i_0,j_0,\sigma _{i_0},\sigma _{j_0} ) \subset \text {WP}(D_{\text {opt}})\). Furthermore, according to Lemma 3, \(P_{D_{\text {opt}}}(\sigma _{\text {opt}})\) is satisfied, hence \(\sigma _{\text {opt}}\) is also a solution to \(P_{\text {alg}}\bigwedge \left[ (\sigma (i_0) \ne \sigma _{i_0}) \vee (\sigma (j_0) \ne \sigma _{j_0})\right] \). This contradicts the fact that it does not admit a solution hence contradicting the non-optimality.

Finally, the complexity of MkWgPaths(VE) is \(\varTheta (|E|)\). In Algorithm 1, we unwind the loop for (line 4) 4|E| times, and we verify if \(P_{\text {tmp}}\) is satisfiable in line 6 with a complexity of \(\varTheta (n+k)\) where k is the number of clauses is \(P_{\text {tmp}}\). Since the number of iterations is bounded by \(3 \vert E \vert \), the complexity of Algorithm 1 is \(\mathcal{O}(|E|^2)\).

Out-Tree Graphs. We assume now that the DAG \(\mathcal{G}=(V,E)\) is an out-tree rooted by \(r\in V\). For any task \(u\in V\), the sub-tree rooted by u is the sub-graph \(\mathcal{G}_u\) of \(\mathcal{G}\) which vertices are u and the descendants of u.

For any task \(u\in V\), let us denote by \(D^\mathcal {A} (u)\) (resp. \(D^\mathcal {B} (u)\)) the lower bound of the minimal makespan of \(\mathcal{G}_u\) assuming that \(\sigma (u)=\mathcal {A} \) (resp. \(\sigma (u)=\mathcal {B} \)). Let us suppose that the arc \((u,v)\in E\). Observe that, if \(D^\mathcal {A} (v)\le c_{\mathcal {A} \mathcal {B}} (u,v)+D^\mathcal {B} (v)\), then \(D^\mathcal {A} (u)\ge t_{\mathcal {A}} (u)+D^\mathcal {A} (v)\). In the opposite, \(D^\mathcal {A} (u)\ge t_{\mathcal {A}} (u)+c_{\mathcal {A} \mathcal {B}} (u,v)+D^\mathcal {B} (v)\) and thus \(D^\mathcal {A} (u)\ge t_{\mathcal {A}} (u)+\min (D^\mathcal {A} (v), c_{\mathcal {A} \mathcal {B}} (u,v)+D^\mathcal {B} (v))\). Similarly, \(D^\mathcal {B} (u)\ge t_{\mathcal {B}} (u)+\min (D^\mathcal {B} (v), c_{\mathcal {B} \mathcal {A}} (u,v)+D^\mathcal {A} (v))\).

For any task \(u\in V\), we set \(\varGamma ^+(u)=\{v\in V, (u,v)\in E\}\). For any allocation function \(\sigma \), let \(\bar{\sigma }(u)=\mathcal {A} \) if \(\sigma (u)=\mathcal {B} \), \(\bar{\sigma }(u)=\mathcal {B} \) otherwise. Then, for any task \(u\in V\), we get \(D^{\sigma (u)}(u)=t_{\sigma (u)}(u)+\max _{v\in \varGamma ^+(u)}\min (D^{\sigma (u)}(v),c_{\sigma (u)\bar{\sigma }(u)}+D^{\bar{\sigma }(u)}(v))\).

Theorem 3

For an out-tree graph \(\mathcal{G}=(V,E)\) rooted by \(r\in V\), an allocation \(\sigma \) may be built such that the corresponding schedule of length D(r) verifies \(D(r)=\min (D^\mathcal {A} (r), D^\mathcal {B} (r))\) and thus is optimal.

Proof

Let us suppose that lower bounds \(D^\mathcal {A} (u)\) and \(D^\mathcal {B} (u)\) for \(u\in V\) are given. Let us define the allocation \(\sigma \) as \(\sigma (r)=\mathcal {A} \) if \(D^\mathcal {A} (r)\le D^\mathcal {B} (r)\) and \(\sigma (r)=\mathcal {B} \) in the opposite. For any task \(v\ne r\) with \((u,v)\in E\), we set \(\sigma (v)=\sigma (u)\) if \(D^{\sigma (u)}(v) < D^{{\bar{\sigma }}(u)}(v) +c_{\sigma (u)\bar{\sigma }(u)}(u,v)\), and \(\sigma (v)=\bar{\sigma }(u)\) otherwise.

For any task u, we prove that the length D(u) of the schedule of \(\mathcal{G}_u\) for the allocation \(\sigma \) verifies \(D(u)=D^{\sigma (u)}(u)\). If u is a leaf, \(D(u)=t_{\sigma (u)}(u)=D^{\sigma (u)}(u)\).

Now, let suppose that \(\varGamma ^+(u)\ne \emptyset \). By definition, for any arc \((u,v)\in E\), if \(\sigma (u)=\sigma (v)\), \(c_{\sigma (u)\sigma (v)}(u,v)=0\). Then, if we set \(\varDelta ^\sigma (u,v)=D(v)+c_{\sigma (u){\sigma }(v)}(u,v)\), we get by induction \(\varDelta ^\sigma (u,v)=D^{\sigma (v)}(v)+c_{\sigma (u){\sigma }(v)}(u,v)\) and by definition of \(\sigma \), \(\varDelta ^\sigma (u,v)=\min (D^{\sigma (u)}(v), D^{{\bar{\sigma }}(u)}(v)+c_{\sigma (u)\bar{\sigma }(u)}(u,v))\). Now, \(D(u)=t_{\sigma (u)}(u)+\max _{v\in \varGamma ^+(u)}\varDelta ^\sigma (u,v)\) and thus by definition of \(D^{\sigma (u)}\), \(D(u)=D^{\sigma (u)}\), which concludes the proof.

A polynomial time algorithm of time complexity \(\varTheta (n)\) can be deduced by computing first \(D^\mathcal {A} \), \(D^\mathcal {B} \) and then \(\sigma \).

Example 1

Let us consider as example the out-tree pictured by Fig. 2. Figure 3 shows the lower bound \(D^\mathcal {A} \) and \(D^\mathcal {B} \) and a corresponding optimal schedule.

Fig. 2.
figure 2

An out-tree \(\mathcal{G}\), duration of tasks and communication costs.

Fig. 3.
figure 3

Lower bounds \(D^\mathcal {A} \) and \(D^\mathcal {B} \). An optimal schedule is presented for the allocation \(\sigma (1)=\mathcal {A} \), \(\sigma (2)=\mathcal {B} \), \(\sigma (3)=\mathcal {A} \), \(\sigma (4)=\mathcal {B} \), \(\sigma (5)=\mathcal {B} \), \(\sigma (6)=\mathcal {B} \), \(\sigma (7)=\mathcal {A} \) and \(\sigma (8)=\mathcal {B} \).

Series-Parallel Graphs. Let us consider a two terminal Series Parallel digraph (2SP in short) as defined in [14, 15]. Each element of this class has a unique source s and a unique sink t with \(s\ne t\). It is formally defined as follows where \(\mathcal{G}\) and \(\mathcal{H}\) are two 2SP graphs.

  • The arc \((s,t)\in 2SP\);

  • The series composition of \(\mathcal{G}\) and \(\mathcal{H}\) is denoted by \(\mathcal{G.H}\) and is built by identifying the sink of \(\mathcal{G}\) with the source of \(\mathcal{H}\);

  • The parallel composition is denoted by \(\mathcal{G+H}\) and identifies respectively the sinks and the sources of the two digraphs.

Figure 4 pictures a 2SP graph and its associated decomposition tree.

Fig. 4.
figure 4

A 2SP graph and its associated decomposition tree. Leaves correspond to arcs, while internal nodes are series or parallel compositions.

For any element \(\mathcal{G}\in 2SP\) with a source s and a sink t and for any couple \((\alpha , \beta )\in \{\mathcal {A},\mathcal {B} \}^2\), let us denote by \(D^{\alpha \beta }(\mathcal{G})\) a lower bound defined as follows of the minimum length of a schedule of \(\mathcal{G}\) with \(\sigma (s)=\alpha \) and \(\sigma (t)=\beta \). For any graph \(\mathcal{G}\) with a unique arc \(e=(s, t)\), for any couple \((\alpha , \beta )\in \{\mathcal {A},\mathcal {B} \}^2\),

$$D^{\alpha \beta }(\mathcal{G})= \left\{ \begin{array}{ll} t_\alpha (s)+t_\beta (t)+c_{\alpha \beta }(s,t) &{} \text{ if } \alpha \ne \beta \\ t_\alpha (s)+t_\beta (t)&{} \text{ otherwise }. \end{array} \right. $$

Now, if \(\mathcal{G}\) and \(\mathcal{H}\) are two 2SP, then for the series composition, we set \(D^{\alpha \beta }(\mathcal{G}.\mathcal{H})=\min _{\gamma \in \{\mathcal {A}, \mathcal {B} \}} (D^{\alpha \gamma }(\mathcal{G})+D^{\gamma \beta }(\mathcal{H})-t_\gamma (t))\) where t is the sink of \(\mathcal{G}\). Similarly, for the parallel composition, we set \(D^{\alpha \beta }(\mathcal{G}+\mathcal{H})=\max (D^{\alpha \beta }(\mathcal{G}), D^{\alpha \beta }(\mathcal{H}))\).

We define the allocation function \(\sigma \) associated with a 2SP graph \(\mathcal{G}\) and the corresponding length \(D(\mathcal{G})\) as follows. We set \(D(\mathcal{G})=\min _{(\alpha , \beta )\in \{\mathcal {A},\mathcal {B} \}^2}(D^{\alpha \beta }(\mathcal{G}))\). We also set \(\sigma (s)\) and \(\sigma (t)\) the allocation function of the source and the sink of \(\mathcal{G}\) as \(D(\mathcal{G})=D^{\sigma (s)\sigma (t)}(\mathcal{G})\). Now, for any series composition, let us suppose that s and t (resp. \(s'\) and \(t'\)) are the source and the sink of \(\mathcal{G}\) (resp. \(\mathcal{H}\)). We also suppose that \(\sigma (s)\) and \(\sigma (t')\) are fixed. Then, for \(\mathcal{G}.\mathcal{H}\), \(t=s'\) and we get \(\sigma (t)=\gamma \in \{\mathcal {A}, \mathcal {B} \}\) such that \(D(\mathcal{G}.\mathcal{H})=D^{\sigma (s)\sigma (t)}(\mathcal{G})+D^{\sigma (s')\sigma (t')}(\mathcal{H})-t_{\sigma (t)}(t)\).

If \(\mathcal{G}\) is a 2SP graph of source s and sink t, any vertex \(v\in V-\{s, t\}\) is involved in a series composition, and thus \(\sigma \) is completely defined.

Theorem 4

For any 2SP graph \(\mathcal{G}\) of source s and sink t, \(D(\mathcal{G})=D^{\sigma (s)\sigma (t)}(\mathcal{G})\).

Proof

The equality is clearly true if \(\mathcal{G}\) is an arc (st). Indeed, we get in this case \(D(\mathcal{G})=\min _{(\alpha , \beta )\in \{\mathcal {A},\mathcal {B} \}^2}(D^{\alpha \beta }(\mathcal{G}))=D^{\sigma (s)\sigma (t)}(\mathcal{G})\).

Now, let us suppose that s and t (resp. \(s'\) and \(t'\)) are the source and the sink of \(\mathcal{G}\) (resp. \(\mathcal{H}\)) and that \(D(\mathcal{G})=D^{\sigma (s)\sigma (t)}(\mathcal{G})\) and \(D(\mathcal{H})=D^{\sigma (s')\sigma (t')}(\mathcal{H})\). For a parallel composition, \(D(\mathcal{G}+\mathcal{H})= \max (D^{\sigma (s)\sigma (t)}(\mathcal{G}),D^{\sigma (s')\sigma (t')}(\mathcal{H}))= D^{\sigma (s)\sigma (t)}(\mathcal{G}+\mathcal{H}) \) as \(s=s'\) and \(t=t'\).

For the series composition, \(D(\mathcal{G}.\mathcal{H})=D(\mathcal{G})+D(\mathcal{H})-t_{\sigma (t)}(t)=D^{\sigma (s)\sigma (t)}(\mathcal{G}.\mathcal{H}),\) since \(t=s'\), which concludes the proof.

Corollary 2

A polynomial-time algorithm of time complexity \(\varTheta (\vert E \vert )\) can be deduced by computing lower bounds \(D^{\alpha \beta }\), \((\alpha , \beta )\in \{A,B\}^2\) for each graph issued from the decomposition of G and a corresponding allocation \(\sigma \).

4 Future Directions

With this work we have studied the problem of scheduling a Directed Acyclic Graph on an unbounded hybrid platform. Specifically our platform consists of two machines, each with an unbounded number of resources. Moving data from one machine to the other one has a communication cost. We have shown the intractability of the problem by reducing this problem to the 3-satisfiability problem. We have shown that there does not exist 3/2-approximation algorithms unless P=NP. We have further provided some polynomial time algorithms for special cases of graphs. While this model seems very theoretical, we can see several applications both in High-Performance Computing (In-Situ analysis) and in Big Data analytics in the cloud (Geo-distributed data-Centers).

There are several extensions that we can see to this work. In the context of two unbounded platforms, it would be interesting to find some polynomial time algorithms with proven bounds to the optimal. We do not expect to be able to find one in the general case, but we hope that with some constraints between the communication costs and computation cost (as is often done in the context of scheduling DAGs with communications), one may able to find such algorithms. We plan then to evaluate these algorithms with In-Situ frameworks. Finally, another direction we are interested by is a version of this problem where only one machine has an unbounded number of resources, and where the data is located on the other one. For example in the context of smartphone applications, we can model the frontend/backend context where the phone (Machine 1) has a limited number of available processors, but can rely on sending some of the computation on a backend machine (cloud-based), with an unbounded number of processors. Similarly to here, the problem is a data and communication problem: given the cost to transfer data from one machine to the other one, what is the most efficient strategy.